49,40 €*
Versandkostenfrei per Post / DHL
Lieferzeit 2-3 Wochen
The newest edition of the leading introductory book on data mining, fully updated and revised
Who will remain a loyal customer and who won't? Which messages are most effective with which segments? How can customer value be maximized? This book supplies powerful tools for extracting the answers to these and other crucial business questions from the corporate databases where they lie buried. In the years since the first edition of this book, data mining has grown to become an indispensable tool of modern business. In this latest edition, Linoff and Berry have made extensive updates and revisions to every chapter and added several new ones. The book retains the focus of earlier editions?showing marketing analysts, business managers, and data mining specialists how to harness data mining methods and techniques to solve important business problems. While never sacrificing accuracy for the sake of simplicity, Linoff and Berry present even complex topics in clear, concise English with minimal use of technical jargon or mathematical formulas. Technical topics are illustrated with case studies and practical real-world examples drawn from the authors' experiences, and every chapter contains valuable tips for practitioners. Among the techniques newly covered, or covered in greater depth, are linear and logistic regression models, incremental response (uplift) modeling, naïve Bayesian models, table lookup models, similarity models, radial basis function networks, expectation maximization (EM) clustering, and swarm intelligence. New chapters are devoted to data preparation, derived variables, principal components and other variable reduction techniques, and text mining.
After establishing the business context with an overview of data mining applications, and introducing aspects of data mining methodology common to all data mining projects, the book covers each important data mining technique in detail.
This third edition of Data Mining Techniques covers such topics as:
How to create stable, long-lasting predictive models
Data preparation and variable selection
Modeling specific targets with directed techniques such as regression, decision trees, neural networks, and memory based reasoning
Finding patterns with undirected techniques such as clustering, association rules, and link analysis
Modeling business time-to-event problems such as time to next purchase and expected remaining lifetime
Mining unstructured text
The companion website provides data that can be used to test out the various data mining techniques in the book.
The newest edition of the leading introductory book on data mining, fully updated and revised
Who will remain a loyal customer and who won't? Which messages are most effective with which segments? How can customer value be maximized? This book supplies powerful tools for extracting the answers to these and other crucial business questions from the corporate databases where they lie buried. In the years since the first edition of this book, data mining has grown to become an indispensable tool of modern business. In this latest edition, Linoff and Berry have made extensive updates and revisions to every chapter and added several new ones. The book retains the focus of earlier editions?showing marketing analysts, business managers, and data mining specialists how to harness data mining methods and techniques to solve important business problems. While never sacrificing accuracy for the sake of simplicity, Linoff and Berry present even complex topics in clear, concise English with minimal use of technical jargon or mathematical formulas. Technical topics are illustrated with case studies and practical real-world examples drawn from the authors' experiences, and every chapter contains valuable tips for practitioners. Among the techniques newly covered, or covered in greater depth, are linear and logistic regression models, incremental response (uplift) modeling, naïve Bayesian models, table lookup models, similarity models, radial basis function networks, expectation maximization (EM) clustering, and swarm intelligence. New chapters are devoted to data preparation, derived variables, principal components and other variable reduction techniques, and text mining.
After establishing the business context with an overview of data mining applications, and introducing aspects of data mining methodology common to all data mining projects, the book covers each important data mining technique in detail.
This third edition of Data Mining Techniques covers such topics as:
How to create stable, long-lasting predictive models
Data preparation and variable selection
Modeling specific targets with directed techniques such as regression, decision trees, neural networks, and memory based reasoning
Finding patterns with undirected techniques such as clustering, association rules, and link analysis
Modeling business time-to-event problems such as time to next purchase and expected remaining lifetime
Mining unstructured text
The companion website provides data that can be used to test out the various data mining techniques in the book.
GORDON S. LINOFF and MICHAEL J. A. BERRY are the founders of Data Miners, Inc., a consultancy specializing in data mining. They have jointly authored two of the leading data mining titles in the field, Data Mining Techniques and Mastering Data Mining (both from Wiley). They each have decades of experience applying data mining techniques to business problems in marketing and customer relationship management.
Introduction xxxvii
Chapter 1 What Is Data Mining and Why Do It? 1
What Is Data Mining? 2
Data Mining Is a Business Process 2
Large Amounts of Data 3
Meaningful Patterns and Rules 3
Data Mining and Customer Relationship Management 4
Why Now? 6
Data Is Being Produced 6
Data Is Being Warehoused 6
Computing Power Is Affordable 7
Interest in Customer Relationship Management Is Strong 7
Commercial Data Mining Software Products Have Become Available 8
Skills for the Data Miner 9
The Virtuous Cycle of Data Mining 9
A Case Study in Business Data Mining 11
Identifying BofA's Business Challenge 12
Applying Data Mining 12
Acting on the Results 13
Measuring the Effects of Data Mining 14
Steps of the Virtuous Cycle 15
Identify Business Opportunities 16
Transform Data into Information 17
Act on the Information 19
Measure the Results 20
Data Mining in the Context of the Virtuous Cycle 23
Lessons Learned 26
Chapter 2 Data Mining Applications in Marketing and Customer Relationship Management 27
Two Customer Lifecycles 27
The Customer's Lifecycle 28
The Customer Lifecycle 28
Subscription Relationships versus Event-Based Relationships 30
Organize Business Processes Around the Customer Lifecycle 32
Customer Acquisition 33
Customer Activation 36
Customer Relationship Management 37
Winback 38
Data Mining Applications for Customer Acquisition 38
Identifying Good Prospects 39
Choosing a Communication Channel 39
Picking Appropriate Messages 40
A Data Mining Example: Choosing the Right Place to Advertise 40
Who Fits the Profile? 41
Measuring Fitness for Groups of Readers 44
Data Mining to Improve Direct Marketing Campaigns 45
Response Modeling 46
Optimizing Response for a Fixed Budget 47
Optimizing Campaign Profitability 49
Reaching the People Most Influenced by the Message 53
Using Current Customers to Learn About Prospects 54
Start Tracking Customers Before They Become "Customers" 55
Gather Information from New Customers 55
Acquisition-Time Variables Can Predict Future Outcomes 56
Data Mining Applications for Customer Relationship Management 56
Matching Campaigns to Customers 56
Reducing Exposure to Credit Risk 58
Determining Customer Value 59
Cross-selling, Up-selling, and Making Recommendations 60
Retention 60
Recognizing Attrition 60
Why Attrition Matters 61
Different Kinds of Attrition 62
Different Kinds of Attrition Model 63
Beyond the Customer Lifecycle 64
Lessons Learned 65
Chapter 3 The Data Mining Process 67
What Can Go Wrong? 68
Learning Things That Aren't True 68
Learning Things That Are True, but Not Useful 73
Data Mining Styles 74
Hypothesis Testing 75
Directed Data Mining 81
Undirected Data Mining 81
Goals, Tasks, and Techniques 82
Data Mining Business Goals 82
Data Mining Tasks 83
Data Mining Techniques 88
Formulating Data Mining Problems: From Goals to Tasks to Techniques 88
What Techniques for Which Tasks? 95
Is There a Target or Targets? 96
What Is the Target Data Like? 96
What Is the Input Data Like? 96
How Important Is Ease of Use? 97
How Important Is Model Explicability? 97
Lessons Learned 98
Chapter 4 Statistics 101: What You Should Know About Data 101
Occam's Razor 103
Skepticism and Simpson's Paradox 103
The Null Hypothesis 104
P-Values 105
Looking At and Measuring Data 106
Categorical Values 106
Numeric Variables 117
A Couple More Statistical Ideas 120
Measuring Response 120
Standard Error of a Proportion 121
Comparing Results Using Confidence Bounds 123
Comparing Results Using Difference of Proportions 124
Size of Sample 125
What the Confidence Interval Really Means 126
Size of Test and Control for an Experiment 127
Multiple Comparisons 129
The Confidence Level with Multiple Comparisons 129
Bonferroni's Correction 129
Chi-Square Test 130
Expected Values 130
Chi-Square Value 132
Comparison of Chi-Square to Difference of Proportions 134
An Example: Chi-Square for Regions and Starts 134
Case Study: Comparing Two Recommendation Systems with an A/B Test 138
First Metric: Participating Sessions 140
Data Mining and Statistics 144
Lessons Learned 148
Chapter 5 Descriptions and Prediction: Profiling and Predictive Modeling 151
Directed Data Mining Models 152
Defining the Model Structure and Target 152
Incremental Response Modeling 154
Model Stability 156
Time-Frames in the Model Set 157
Directed Data Mining Methodology 159
Step 1: Translate the Business Problem into a Data Mining Problem 161
How Will Results Be Used? 163
How Will Results Be Delivered? 163
The Role of Domain Experts and Information Technology 164
Step 2: Select Appropriate Data 165
What Data Is Available? 166
How Much Data Is Enough? 167
How Much History Is Required? 167
How Many Variables? 168
What Must the Data Contain? 168
Step 3: Get to Know the Data 169
Examine Distributions 169
Compare Values with Descriptions 170
Validate Assumptions 170
Ask Lots of Questions 171
Step 4: Create a Model Set 172
Assembling Customer Signatures 172
Creating a Balanced Sample 172
Including Multiple Timeframes 174
Creating a Model Set for Prediction 174
Creating a Model Set for Profiling 176
Partitioning the Model Set 176
Step 5: Fix Problems with the Data 177
Categorical Variables with Too Many Values 177
Numeric Variables with Skewed Distributions and Outliers 178
Missing Values 178
Values with Meanings That Change over Time 179
Inconsistent Data Encoding 179
Step 6: Transform Data to Bring Information to the Surface 180
Step 7: Build Models 180
Step 8: Assess Models 180
Assessing Binary Response Models and Classifiers 181
Assessing Binary Response Models Using Lift 182
Assessing Binary Response Model Scores Using Lift Charts 184
Assessing Binary Response Model Scores Using Profitability Models 185
Assessing Binary Response Models Using ROC Charts 186
Assessing Estimators 188
Assessing Estimators Using Score Rankings 189
Step 9: Deploy Models 190
Practical Issues in Deploying Models 190
Optimizing Models for Deployment 191
Step 10: Assess Results 191
Step 11: Begin Again 193
Lessons Learned 193
Chapter 6 Data Mining Using Classic Statistical Techniques 195
Similarity Models 196
Similarity and Distance 196
Example: A Similarity Model for Product Penetration 197
Table Lookup Models 203
Choosing Dimensions 204
Partitioning the Dimensions 205
From Training Data to Scores 205
Handling Sparse and Missing Data by Removing Dimensions 205
RFM: A Widely Used Lookup Model 206
RFM Cell Migration 207
RFM and the Test-and-Measure Methodology 208
RFM and Incremental Response Modeling 209
Naïve Bayesian Models 210
Some Ideas from Probability 210
The Naïve Bayesian Calculation 212
Comparison with Table Lookup Models 213
Linear Regression 213
The Best-fit Line 215
Goodness of Fit 217
Multiple Regression 220
The Equation 220
The Range of the Target Variable 221
Interpreting Coefficients of Linear Regression Equations 221
Capturing Local Effects with Linear Regression 223
Additional Considerations with Multiple Regression 224
Variable Selection for Multiple Regression 225
Logistic Regression 227
Modeling Binary Outcomes 227
The Logistic Function 229
Fixed Effects and Hierarchical Effects 231
Hierarchical Effects 232
Within and Between Effects 232
Fixed Effects 233
Lessons Learned 234
Chapter 7 Decision Trees 237
What Is a Decision Tree and How Is It Used? 238
A Typical Decision Tree 238
Using the Tree to Learn About Churn 240
Using the Tree to Learn About Data and Select Variables 241
Using the Tree to Produce Rankings 243
Using the Tree to Estimate Class Probabilities 243
Using the Tree to Classify Records 244
Using the Tree to Estimate Numeric Values 244
Decision Trees Are Local Models 245
Growing Decision Trees 247
Finding the Initial Split 248
Growing the Full Tree 251
Finding the Best Split 252
Gini (Population Diversity) as a Splitting Criterion 253
Entropy Reduction or Information Gain as a Splitting Criterion 254
Information Gain Ratio 256
Chi-Square Test as a Splitting Criterion 256
Incremental Response as a Splitting Criterion 258
Reduction in Variance as a Splitting Criterion for Numeric Targets 259
F Test 262
Pruning 262
The CART Pruning Algorithm 263
Pessimistic Pruning: The C5.0 Pruning Algorithm 267
Stability-Based Pruning 268
Extracting Rules from Trees 269
Decision Tree Variations 270
Multiway Splits 270
Splitting on More Than One Field at a Time 271
Creating Nonrectangular Boxes 271
Assessing the Quality of a Decision Tree 275
When...
Erscheinungsjahr: | 2011 |
---|---|
Fachbereich: | Anwendungs-Software |
Genre: | Importe, Informatik |
Rubrik: | Naturwissenschaften & Technik |
Medium: | Taschenbuch |
Inhalt: | 896 S. |
ISBN-13: | 9780470650936 |
ISBN-10: | 0470650931 |
Sprache: | Englisch |
Herstellernummer: | 14565093000 |
Einband: | Kartoniert / Broschiert |
Autor: |
Linoff, Gordon S
Berry, Michael J a |
Auflage: | 3rd edition |
Hersteller: |
Wiley
John Wiley & Sons |
Verantwortliche Person für die EU: | Wiley-VCH GmbH, Boschstr. 12, D-69469 Weinheim, product-safety@wiley.com |
Maße: | 235 x 191 x 48 mm |
Von/Mit: | Gordon S Linoff (u. a.) |
Erscheinungsdatum: | 12.04.2011 |
Gewicht: | 1,622 kg |
GORDON S. LINOFF and MICHAEL J. A. BERRY are the founders of Data Miners, Inc., a consultancy specializing in data mining. They have jointly authored two of the leading data mining titles in the field, Data Mining Techniques and Mastering Data Mining (both from Wiley). They each have decades of experience applying data mining techniques to business problems in marketing and customer relationship management.
Introduction xxxvii
Chapter 1 What Is Data Mining and Why Do It? 1
What Is Data Mining? 2
Data Mining Is a Business Process 2
Large Amounts of Data 3
Meaningful Patterns and Rules 3
Data Mining and Customer Relationship Management 4
Why Now? 6
Data Is Being Produced 6
Data Is Being Warehoused 6
Computing Power Is Affordable 7
Interest in Customer Relationship Management Is Strong 7
Commercial Data Mining Software Products Have Become Available 8
Skills for the Data Miner 9
The Virtuous Cycle of Data Mining 9
A Case Study in Business Data Mining 11
Identifying BofA's Business Challenge 12
Applying Data Mining 12
Acting on the Results 13
Measuring the Effects of Data Mining 14
Steps of the Virtuous Cycle 15
Identify Business Opportunities 16
Transform Data into Information 17
Act on the Information 19
Measure the Results 20
Data Mining in the Context of the Virtuous Cycle 23
Lessons Learned 26
Chapter 2 Data Mining Applications in Marketing and Customer Relationship Management 27
Two Customer Lifecycles 27
The Customer's Lifecycle 28
The Customer Lifecycle 28
Subscription Relationships versus Event-Based Relationships 30
Organize Business Processes Around the Customer Lifecycle 32
Customer Acquisition 33
Customer Activation 36
Customer Relationship Management 37
Winback 38
Data Mining Applications for Customer Acquisition 38
Identifying Good Prospects 39
Choosing a Communication Channel 39
Picking Appropriate Messages 40
A Data Mining Example: Choosing the Right Place to Advertise 40
Who Fits the Profile? 41
Measuring Fitness for Groups of Readers 44
Data Mining to Improve Direct Marketing Campaigns 45
Response Modeling 46
Optimizing Response for a Fixed Budget 47
Optimizing Campaign Profitability 49
Reaching the People Most Influenced by the Message 53
Using Current Customers to Learn About Prospects 54
Start Tracking Customers Before They Become "Customers" 55
Gather Information from New Customers 55
Acquisition-Time Variables Can Predict Future Outcomes 56
Data Mining Applications for Customer Relationship Management 56
Matching Campaigns to Customers 56
Reducing Exposure to Credit Risk 58
Determining Customer Value 59
Cross-selling, Up-selling, and Making Recommendations 60
Retention 60
Recognizing Attrition 60
Why Attrition Matters 61
Different Kinds of Attrition 62
Different Kinds of Attrition Model 63
Beyond the Customer Lifecycle 64
Lessons Learned 65
Chapter 3 The Data Mining Process 67
What Can Go Wrong? 68
Learning Things That Aren't True 68
Learning Things That Are True, but Not Useful 73
Data Mining Styles 74
Hypothesis Testing 75
Directed Data Mining 81
Undirected Data Mining 81
Goals, Tasks, and Techniques 82
Data Mining Business Goals 82
Data Mining Tasks 83
Data Mining Techniques 88
Formulating Data Mining Problems: From Goals to Tasks to Techniques 88
What Techniques for Which Tasks? 95
Is There a Target or Targets? 96
What Is the Target Data Like? 96
What Is the Input Data Like? 96
How Important Is Ease of Use? 97
How Important Is Model Explicability? 97
Lessons Learned 98
Chapter 4 Statistics 101: What You Should Know About Data 101
Occam's Razor 103
Skepticism and Simpson's Paradox 103
The Null Hypothesis 104
P-Values 105
Looking At and Measuring Data 106
Categorical Values 106
Numeric Variables 117
A Couple More Statistical Ideas 120
Measuring Response 120
Standard Error of a Proportion 121
Comparing Results Using Confidence Bounds 123
Comparing Results Using Difference of Proportions 124
Size of Sample 125
What the Confidence Interval Really Means 126
Size of Test and Control for an Experiment 127
Multiple Comparisons 129
The Confidence Level with Multiple Comparisons 129
Bonferroni's Correction 129
Chi-Square Test 130
Expected Values 130
Chi-Square Value 132
Comparison of Chi-Square to Difference of Proportions 134
An Example: Chi-Square for Regions and Starts 134
Case Study: Comparing Two Recommendation Systems with an A/B Test 138
First Metric: Participating Sessions 140
Data Mining and Statistics 144
Lessons Learned 148
Chapter 5 Descriptions and Prediction: Profiling and Predictive Modeling 151
Directed Data Mining Models 152
Defining the Model Structure and Target 152
Incremental Response Modeling 154
Model Stability 156
Time-Frames in the Model Set 157
Directed Data Mining Methodology 159
Step 1: Translate the Business Problem into a Data Mining Problem 161
How Will Results Be Used? 163
How Will Results Be Delivered? 163
The Role of Domain Experts and Information Technology 164
Step 2: Select Appropriate Data 165
What Data Is Available? 166
How Much Data Is Enough? 167
How Much History Is Required? 167
How Many Variables? 168
What Must the Data Contain? 168
Step 3: Get to Know the Data 169
Examine Distributions 169
Compare Values with Descriptions 170
Validate Assumptions 170
Ask Lots of Questions 171
Step 4: Create a Model Set 172
Assembling Customer Signatures 172
Creating a Balanced Sample 172
Including Multiple Timeframes 174
Creating a Model Set for Prediction 174
Creating a Model Set for Profiling 176
Partitioning the Model Set 176
Step 5: Fix Problems with the Data 177
Categorical Variables with Too Many Values 177
Numeric Variables with Skewed Distributions and Outliers 178
Missing Values 178
Values with Meanings That Change over Time 179
Inconsistent Data Encoding 179
Step 6: Transform Data to Bring Information to the Surface 180
Step 7: Build Models 180
Step 8: Assess Models 180
Assessing Binary Response Models and Classifiers 181
Assessing Binary Response Models Using Lift 182
Assessing Binary Response Model Scores Using Lift Charts 184
Assessing Binary Response Model Scores Using Profitability Models 185
Assessing Binary Response Models Using ROC Charts 186
Assessing Estimators 188
Assessing Estimators Using Score Rankings 189
Step 9: Deploy Models 190
Practical Issues in Deploying Models 190
Optimizing Models for Deployment 191
Step 10: Assess Results 191
Step 11: Begin Again 193
Lessons Learned 193
Chapter 6 Data Mining Using Classic Statistical Techniques 195
Similarity Models 196
Similarity and Distance 196
Example: A Similarity Model for Product Penetration 197
Table Lookup Models 203
Choosing Dimensions 204
Partitioning the Dimensions 205
From Training Data to Scores 205
Handling Sparse and Missing Data by Removing Dimensions 205
RFM: A Widely Used Lookup Model 206
RFM Cell Migration 207
RFM and the Test-and-Measure Methodology 208
RFM and Incremental Response Modeling 209
Naïve Bayesian Models 210
Some Ideas from Probability 210
The Naïve Bayesian Calculation 212
Comparison with Table Lookup Models 213
Linear Regression 213
The Best-fit Line 215
Goodness of Fit 217
Multiple Regression 220
The Equation 220
The Range of the Target Variable 221
Interpreting Coefficients of Linear Regression Equations 221
Capturing Local Effects with Linear Regression 223
Additional Considerations with Multiple Regression 224
Variable Selection for Multiple Regression 225
Logistic Regression 227
Modeling Binary Outcomes 227
The Logistic Function 229
Fixed Effects and Hierarchical Effects 231
Hierarchical Effects 232
Within and Between Effects 232
Fixed Effects 233
Lessons Learned 234
Chapter 7 Decision Trees 237
What Is a Decision Tree and How Is It Used? 238
A Typical Decision Tree 238
Using the Tree to Learn About Churn 240
Using the Tree to Learn About Data and Select Variables 241
Using the Tree to Produce Rankings 243
Using the Tree to Estimate Class Probabilities 243
Using the Tree to Classify Records 244
Using the Tree to Estimate Numeric Values 244
Decision Trees Are Local Models 245
Growing Decision Trees 247
Finding the Initial Split 248
Growing the Full Tree 251
Finding the Best Split 252
Gini (Population Diversity) as a Splitting Criterion 253
Entropy Reduction or Information Gain as a Splitting Criterion 254
Information Gain Ratio 256
Chi-Square Test as a Splitting Criterion 256
Incremental Response as a Splitting Criterion 258
Reduction in Variance as a Splitting Criterion for Numeric Targets 259
F Test 262
Pruning 262
The CART Pruning Algorithm 263
Pessimistic Pruning: The C5.0 Pruning Algorithm 267
Stability-Based Pruning 268
Extracting Rules from Trees 269
Decision Tree Variations 270
Multiway Splits 270
Splitting on More Than One Field at a Time 271
Creating Nonrectangular Boxes 271
Assessing the Quality of a Decision Tree 275
When...
Erscheinungsjahr: | 2011 |
---|---|
Fachbereich: | Anwendungs-Software |
Genre: | Importe, Informatik |
Rubrik: | Naturwissenschaften & Technik |
Medium: | Taschenbuch |
Inhalt: | 896 S. |
ISBN-13: | 9780470650936 |
ISBN-10: | 0470650931 |
Sprache: | Englisch |
Herstellernummer: | 14565093000 |
Einband: | Kartoniert / Broschiert |
Autor: |
Linoff, Gordon S
Berry, Michael J a |
Auflage: | 3rd edition |
Hersteller: |
Wiley
John Wiley & Sons |
Verantwortliche Person für die EU: | Wiley-VCH GmbH, Boschstr. 12, D-69469 Weinheim, product-safety@wiley.com |
Maße: | 235 x 191 x 48 mm |
Von/Mit: | Gordon S Linoff (u. a.) |
Erscheinungsdatum: | 12.04.2011 |
Gewicht: | 1,622 kg |