 | Level: Intermediate Peter Bendel (peter_bendel@de.ibm.com), Intelligent Miner architect, IBM
14 Sep 2006 Data mining methodologies traditionally find rules and patterns in structured input data such as relational tables or views. However, IBM®
DB2® 9 XML columns cannot be directly used as input to DB2 data mining algorithms.
Learn how you can create a relational view over semi-structured XML columns to use XML input data for data mining.
DB2 data mining -- Intelligent Miner™ -- is now an integral part of DB2 Data Warehouse Edition (DWE).
Introduction
Data Mining can answer business questions such as:
- What market segments do my customers fall into and what are their characteristcs?
- Which customers are most likely to respond to my promotion?
- Which transactions are likely to be fraudulent?
- Which customer is at risk of leaving?
The answers to these questions bring business value, including:
- Personalized customer relationships for higher customer satisfaction and retention
- Quick detection of fraud in order to take immediate action to minimize cost
- Preventing loss of profitable customers.
This article illustrates an example of using data mining to create a customer segmentation
model on the customers of a large retail bank. Segmentation is a way to derive homogeneous groups based on common characteristics.
Those attributes in combination provide insight whereby profiles can be synthesized and segment ids assigned to each customer. The segments can be further
analyzed and programs derived that are tailored to the profile of each group, maximizing response and engagement.
We assume that we have a customer table with two types of input columns about the customers to be clustered:
- Behavioural data such as the types of accounts or the average balances of the customers in normal columns of a relational table
- Demographic data about each customer such as age, gender, and marital status in an XML column
DB2 data mining does not support the use of columns of type XML as input to the data mining algorithms.
When you use a table containing XML columns, these columns are ignored.
Learn how you can create a relational view over semi-structured XML columns to use XML input data for data mining.
Preparing the input data used in this example
DB2 data mining - formerly DB2 Intelligent Miner - is now an integral part of DB2 Data Warehouse Edition.
To run these examples:
- Install DB2 Data Warehouse Edition 9.1.1 using DB2 V9 Enterprise Edition and including Intelligent Miner V9.1.1.
To allow XML column types you need to create a new database using codeset UTF-8. Listing 1 shows how to create a database named MINEXML and enable it for data mining
using the idmenabledb command. This command creates database objects including stored procedures, user defined functions, methods, types and tables required for data mining in the DB2 catalog.
Listing 1. Creating a XML database (codeset UTF-8) and enabling it for data mining using the idmenabledb command
DB2 CREATE DB MINEXML USING CODESET UTF-8 TERRITORY US
idmenabledb MINEXML dbcfg
db2 connect to MINEXML
|
Next we create our input table BANKCUSTOMERS_XML and import some data rows from a comma delimited file.
Listing 2. Creating a table with an XML column and importing some data
CREATE TABLE "BANKCUSTOMERS_XML" (
"CLIENT_ID" CHAR(9) NOT NULL PRIMARY KEY,
"NBR_YEARS_CLI" REAL,
"SAVINGS_ACCOUNT" CHAR(3),
"HAS_LIFE_INSURANCE" CHAR(3),
"INT_CREDITCARD" CHAR(3),
"ONLINE_ACCESS" CHAR(3),
"JOINED_ACCOUNTS" CHAR(3),
"BANKCARD" CHAR(3),
"AVERAGE_BALANCE" DECIMAL(10,2),
"NO_CRED_TRANS" REAL,
"NO_DEBIT_TRANS" REAL,
"DEMOGRAPHICS" XML
) NOT LOGGED INITIALLY;
IMPORT FROM "customerxml.del" OF DEL INSERT INTO "BANKCUSTOMERS_XML"
( "CLIENT_ID",
"NBR_YEARS_CLI",
"SAVINGS_ACCOUNT",
"HAS_LIFE_INSURANCE",
"INT_CREDITCARD",
"ONLINE_ACCESS",
"JOINED_ACCOUNTS",
"BANKCARD",
"AVERAGE_BALANCE",
"NO_CRED_TRANS",
"NO_DEBIT_TRANS",
"DEMOGRAPHICS");
|
Notice that column DEMOGRAPHICS has type XML, while all the other columns contain attributes about each customer,
such as number years client, has savings account, has life insurance, in normal columns with numeric and charater type. Listing 3 retrieves a subset of the columns and five rows
to illustrate the table's contents. Figure 1 shows the result.
Listing 3. Retrieve 5 sample rows from BANKCUSTOMERS_XML
select client_id, joined_accounts, demographics from bankcustomers_xml fetch first 5 rows
only |
Figure 1. Sample content of table BANKCUSTOMERS_XML
When we open a single row's value for column DEMOGRAPHICS in an XML viewer, we see that the single column contains an XML fragment with several XML tags, each representing a customer attribute.
Figure 2. A value in the XML column "DEMOGRAPHICS"
Creating a relational view using the XMLTable function
Intelligent Miner does not support columns of type XML, so when we would use table BANKCUSTOMERS_XML as input to the clustering algorithm, this column would simply be ignored.
To include the contents in the DEMOGRAPHICS column, we can create a relational view using the XMLTable function (See Listing 4.)
We pass in column DEMOGRAPHICS to the XMLTable function and define the resulting column type for each XML tag to be extracted.
Listing 4 Creating a relational view joining the structured part and the XML data
CREATE VIEW BANKCUSTOMERS_VIEW
(
"CLIENT_ID",
"NBR_YEARS_CLI",
"SAVINGS_ACCOUNT",
"HAS_LIFE_INSURANCE",
"INT_CREDITCARD",
"ONLINE_ACCESS",
"JOINED_ACCOUNTS",
"BANKCARD",
"AVERAGE_BALANCE",
"NO_CRED_TRANS",
"NO_DEBIT_TRANS",
"AGE",
"GENDER",
"MARITAL_STATUS",
"PROFESSION"
)
AS
select B."CLIENT_ID", B."NBR_YEARS_CLI",B."SAVINGS_ACCOUNT", B."HAS_LIFE_INSURANCE",
B."INT_CREDITCARD", B."ONLINE_ACCESS", B."JOINED_ACCOUNTS",
B."BANKCARD", B."AVERAGE_BALANCE", B."NO_CRED_TRANS",
B."NO_DEBIT_TRANS",
T.AGE, T.GENDER, T.MARITAL_STATUS, T.PROFESSION
from BANKCUSTOMERS_XML B,
xmltable('$d/demographics' passing B.DEMOGRAPHICS as "d"
columns
"AGE" DOUBLE path 'age',
"GENDER" CHAR(2) path 'gender',
"MARITAL_STATUS" CHAR(12) path 'marital_status',
"PROFESSION" CHAR(24) path 'profession') AS T;
|
Note that we join the result of the XMLTable function to the original table to obtain a view that includes both the existing columns and those created
by the XMLTable function.
The SELECT statemnt in Listing 5 retrieves some sample rows from the view BANKCUSTOMERS_VIEW to illustrate its contents
and Figure 3 shows the result. Now the XML fragments have been shredded into individual relational columns for each XML tag.
Listing 5. Retrieve 5 sample rows from relational view "BANKCUSTOMERS_VIEW"
select client_id, joined_accounts,age, gender, marital_status, profession from
bankcustomers_view fetch first 5 rows only
|
Figure 3. Sample content of view BANKCUSTOMERS_VIEW
Creating a clustering model using Intelligent Miner Modeling
Now we can use the combined view as input data for our data mining stored procedure IDMMX.BuildClusModel. The simplest overload of this procedure receives two parameters:
- The name of the data mining model to be created
- The table or view to be analyzed
Listing 6 shows the procedure call statement.
Listing 6. Creating a clustering model on the BANKCUSTOMERS_VIEW
CALL IDMMX.BuildClusModel('demo.CustomerSegments','BANKCUSTOMERS_VIEW');
|
We can now explore customer segments in the data mining model using Intelligent Miner Visualizer. To invoke the Visualizer, use the DWE DesignStudio or the
imvisualizer command. Figure 4 shows the clustering visualizer.
Figure 4. The clustering model "demo.CustomerSegments" in the Intelligent Miner Visualizer
Notes:
- Although it's easy to create relational views over XML column data, you should consider their use carefully. DB2 doesn't use XML column indexes when queries are issued against such views.
- Empty XML elements like <age/> when cast to a numeric type, cause a runtime error in an XML table. Instead of using empty elements in your XML documents,
the element should be removed from the XML document.
Conclusion
Data Mining can provide valuable business insight from data in your data warehouse.
Traditionally it finds rules and patterns in structured input data like relational tables or views.
DB2 Viper introduces pure XML, a new, highly efficient way to store XML data.
In this article you have learned: that you can create a relational view over XML data in XML columns and join it to the structured part of your data, and how to use this view as an input to DB2 data mining.
Download | Description | Name | Size | Download method |
|---|
| Sample script and data | 0609bendelsample.zip | 17KB | HTTP |
|---|
Resources Learn
Get products and technologies
-
To test the data mining function, you will need to install a licensed version of DB2 Data Warehouse Edition V9.1.1.
Note: Since Intelligent Miner 9.1.1 is now an integral part of DWE there is no longer a separate Try & and Buy version of Intelligent Miner.
-
If you only want to test how to create relational view over XML columns, download a free trial version of DB2 Enterprise 9.
About the author  | 
|  | For more than ten years, Peter has held development positions in different projects in the IBM Software Group, among them information retrieval, workflow software, pervasive computing, and portal solutions. He has in-depth experience with IBM data management products and object-oriented programming. Peter works in the IBM software lab in Böblingen, Germany.
At present, Peter is the architect for IBM's data mining family Intelligent Miner. To that end, he is working to simplify the integration of mining technology into business applications and custom analytical applications.
|
Rate this page
|  |