Exforsys

Analysis Services Training

  1. MSAS - Browsing the Dependency Network
  2. MSAS - Building a Relational Decision Tree Model
  3. MSAS - Introduction to Data Mining
  4. MSAS - Applying security to a Dimension
  5. Tutorial 65: MSAS - Managing Cube Roles
  6. MSAS - Understanding Database Roles
  7. MSAS - Securing User Authentication
  8. MSAS - Introducing Analysis Services Security
  9. MSAS - Writebacks
  10. MSAS - Defining and Creating Drillthrough
  11. MSAS - Defining and Creating Auctions
  12. MSAS - Creating and Maintaining Calculated Members in Virtual Cubes
  13. MSAS - Building a Virtual Cube
  14. MSAS - Understanding Virtual Cubes
  15. MSAS - Introducing Solve Order
  16. MSAS - Implementing Calculations Using MDX Part 2
  17. MSAS - Implementing Calculations Using MDX Part 1
  18. MSAS - Merging Partitions
  19. MSAS - Introduction and Managing Partitions
  20. MSAS - Troubleshooting Cube Processing
  21. MSAS - Optimizing Cube Processing
  22. MSAS - Processing Dimensions and Cubes
  23. MSAS - Introducing Dimension and Cube Processing
  24. MSAS: Optimization Tuning Part 2
  25. MSAS: Optimization Tuning Part 1
  26. MSAS: Usage-Based Optimization
  27. MSAS: Analysis Services Aggregations
  28. MSAS: The Storage Design Wizard
  29. MSAS: Analysis Server Cube Storage
  30. MSAS: Defining Cube Properties
  31. MSAS: Introduction and Working with Measures
  32. MSAS: Introduction and Working with Cubes
  33. MSAS: Virtual Dimensions
  34. MSAS: Introducing Member Properties
  35. MSAS: Creating Custom Rollups
  36. MSAS: Creating a Time Dimension
  37. MSAS: Understanding Hierarchies
  38. MSAS: Dimension Storage Modes and Levels
  39. MSAS: Working with Levels and Hierarchies
  40. MSAS: Working with Parent-Child Dimensions
  41. MSAS : Basics of Levels
  42. MSAS : Working with Standard Dimensions
  43. MSAS : Shared vs Private Dimensions
  44. Understanding Dimension Basics
  45. MSAS : Office 2000 OLAP Components
  46. MSAS : Client Architecture
  47. MSAS : Cube Storage options
  48. MSAS : Meta data Repository
  49. MSAS : Analysis services Tools for Extended Functionality
  50. MSAS : The Wizards
  51. MSAS : The Analysis Manager and Analysis Server
  52. MSAS : The Data warehousing framework of SQL Server 2000 - Part 2
  53. MSAS : The Data warehousing framework of SQL Server 2000 - Part 1
  54. MSAS : Microsoft Data Warehousing Overview
  55. MSAS : Browsing the Cube
  56. MSAS : Designing Storage and Processing the Cube
  57. MSAS : Building the Cube Part #3
  58. MSAS : Building the Cube Part #2
  59. MSAS : Building the Cube Part #1
  60. MSAS : Setting up the Database in Analysis Server
  61. MSAS : Preparing to Create the Cube
  62. MSAS : Introducing Analysis Manager Wizards
  63. Microsoft Analysis Services Installation
  64. MSAS - Applying OLAP Cubes
  65. Understanding OLAP Models
  66. Designing the Dimensional Model and Preparing the data for OLAP
  67. Design of the data warehouse: Kimball Vs Inmon
  68. Defining OLAP Solutions and Data Warehouse design
  69. Microsoft Analysis Services Training
  70. Data Warehouse database and OLTP database
  71. Introduction to Data Warehousing

Ads


Home arrow Technical Training arrow Analysis Services Training

MSAS - Introduction to Data Mining

Page 1 of 3
Author : Exforsys Inc.     Published on: 6th May 2005
The process of probing into a set of information for descriptive and predictive purposes is called data mining. The purpose is to identify those trends and patterns which indicate the direction of effort to achieve desired outcomes. SQL Server 2000 and Analysis Services, has inbuilt powerful data mining capabilities including algorithms for Clustering and for Decision Trees.

Before actually studying the data mining capabilities of Analysis Services, let us briefly look at some terminology generally used while discussing data mining.

Ads

Understanding Terms used in Data Mining

A case is the term used for the facts being studied. The data used to study these facts are called case sets. Each data mining case has a unique identifier called a key. Descriptive pieces of information are called attributes or measures. The case may contain information about a single table or from multiple tables. If there are multiple tables from which data is derived, such a case is defined as a case with nested tables. The hierarchical attributes of a case that can be conveniently grouped are called dimensions of the case.

Clustering breaks down large chunks of data into more manageable groups by identifying similar traits. The clusters provide description of the attributes of the members in each cluster. It is often the first technique that is used in a project and the data is used as a source of future mining efforts as it highlights promising areas to investigate. Microsoft SQL 2000 Analysis Server uses a Scaleable Expectation Maximization algorithm to create clusters based on population density. The advantage of this process is that it requires only a single pass over the entire data and the algorithm creates clusters as it passes and the centers of these clusters are adjusted as more data is processed. It provides reasonable results at any point during its computation. Moreover, it works with a minimum amount of memory.

 The strengths of clustering is that it is

1. Undirected
2. Not limited to any type of analysis
3. It handles large case sets
4. Uses minimum memory.

Its weaknesses are:

1. Measurements need to be carefully chosen
2. Results may be difficult to interpret
3. Has no guaranteed value

Another technique used in data mining is the Decision Tree. The Decision tree is used to solve predictive problems. For instance it may be used to predict whether a customer will purchase a particular brand of a product at a particular store in a given geographical location. The algorithm identifies the most relevant characteristics and defines a set of rules that give the percentage of probabilities that new cases will follow the pattern identified.

The decision tree is created using a technique called the recursive partitioning. This algorithm defines the most relevant attributes and splits the population of data on the basis of the attribute. Each partition is called Node. The process is repeated for each subgroup until a good stopping point is found. This may be the point at which all nodes meet the criteria or there are no more nodes that meet the criteria. The last group of cases in the decision tree is called a Leaf node. When all the leaf nodes in a decision tree have only one value the model is said to be over fitted. An over fitted model has the danger of providing unrealistic predictions.

The decision tree model has the following strengths:

1. It provides visual results
2. It is built on understandable rules
3. It is predictive
4. It enables performance for prediction
5. Shows what is important.

On the other hand the weakness of this model is that:

1. It can get spread too thin
2. Performance of training is very expensive as the cases in the node are stored multiple times.

Ads

While doing a data mining analysis three distinct types of data sets are used. The initial set of data that is processed and saved for future use is called a training set. This data is used to ‘teach’ the model about the population of data being analyzed. The second set of data required are called test cases. This set is used to build confidence in the hypothesis. Some known attributes are omitted to help us confirm that the model correctly predicts the missing values. The third type of data set used is the evaluation set. The evaluation set focuses on the investigation. It is the final set of cases in which we process the situational data. The model is used to predict behavior based on what the mining activities have learned and is used to drive business strategy.

In the sections that follow we will work on building an OLAP Clustering Data Mining model using the FoodMart 2000 database. We will assume that FoodMart wants to study the loyalty of customers. It will study the number of customers who use FoodMart and how many of them use the FoodMart Club Card. Let us begin by categorizing customer behaviour by using clustering to define the characteristics of people who have the FoodMart Club Card.

The Data mining model can use the OLAP or a Relational Data store. The former allows the user have the advantage of predetermined aggregations and dimensional information from the source cube. The latter requires the user to input information on what describes a complete case. If there are multiple tables the table joins have to be specified.



 
This tutorial is part of a Analysis Services Training tutorial series. Read it from the beginning and learn yourself.

Analysis Services Training

  1. MSAS - Browsing the Dependency Network
  2. MSAS - Building a Relational Decision Tree Model
  3. MSAS - Introduction to Data Mining
  4. MSAS - Applying security to a Dimension
  5. Tutorial 65: MSAS - Managing Cube Roles
  6. MSAS - Understanding Database Roles
  7. MSAS - Securing User Authentication
  8. MSAS - Introducing Analysis Services Security
  9. MSAS - Writebacks
  10. MSAS - Defining and Creating Drillthrough
  11. MSAS - Defining and Creating Auctions
  12. MSAS - Creating and Maintaining Calculated Members in Virtual Cubes
  13. MSAS - Building a Virtual Cube
  14. MSAS - Understanding Virtual Cubes
  15. MSAS - Introducing Solve Order
  16. MSAS - Implementing Calculations Using MDX Part 2
  17. MSAS - Implementing Calculations Using MDX Part 1
  18. MSAS - Merging Partitions
  19. MSAS - Introduction and Managing Partitions
  20. MSAS - Troubleshooting Cube Processing
  21. MSAS - Optimizing Cube Processing
  22. MSAS - Processing Dimensions and Cubes
  23. MSAS - Introducing Dimension and Cube Processing
  24. MSAS: Optimization Tuning Part 2
  25. MSAS: Optimization Tuning Part 1
  26. MSAS: Usage-Based Optimization
  27. MSAS: Analysis Services Aggregations
  28. MSAS: The Storage Design Wizard
  29. MSAS: Analysis Server Cube Storage
  30. MSAS: Defining Cube Properties
  31. MSAS: Introduction and Working with Measures
  32. MSAS: Introduction and Working with Cubes
  33. MSAS: Virtual Dimensions
  34. MSAS: Introducing Member Properties
  35. MSAS: Creating Custom Rollups
  36. MSAS: Creating a Time Dimension
  37. MSAS: Understanding Hierarchies
  38. MSAS: Dimension Storage Modes and Levels
  39. MSAS: Working with Levels and Hierarchies
  40. MSAS: Working with Parent-Child Dimensions
  41. MSAS : Basics of Levels
  42. MSAS : Working with Standard Dimensions
  43. MSAS : Shared vs Private Dimensions
  44. Understanding Dimension Basics
  45. MSAS : Office 2000 OLAP Components
  46. MSAS : Client Architecture
  47. MSAS : Cube Storage options
  48. MSAS : Meta data Repository
  49. MSAS : Analysis services Tools for Extended Functionality
  50. MSAS : The Wizards
  51. MSAS : The Analysis Manager and Analysis Server
  52. MSAS : The Data warehousing framework of SQL Server 2000 - Part 2
  53. MSAS : The Data warehousing framework of SQL Server 2000 - Part 1
  54. MSAS : Microsoft Data Warehousing Overview
  55. MSAS : Browsing the Cube
  56. MSAS : Designing Storage and Processing the Cube
  57. MSAS : Building the Cube Part #3
  58. MSAS : Building the Cube Part #2
  59. MSAS : Building the Cube Part #1
  60. MSAS : Setting up the Database in Analysis Server
  61. MSAS : Preparing to Create the Cube
  62. MSAS : Introducing Analysis Manager Wizards
  63. Microsoft Analysis Services Installation
  64. MSAS - Applying OLAP Cubes
  65. Understanding OLAP Models
  66. Designing the Dimensional Model and Preparing the data for OLAP
  67. Design of the data warehouse: Kimball Vs Inmon
  68. Defining OLAP Solutions and Data Warehouse design
  69. Microsoft Analysis Services Training
  70. Data Warehouse database and OLTP database
  71. Introduction to Data Warehousing
 

Comments