Gordon S Linoff: Data Mining Techniques

Cover: 9780470650936 | Data Mining Techniques | Gordon S Linoff (u. a.) | Taschenbuch | 2011

Dekorationsartikel gehören nicht zum Leistungsumfang.

Sprache: Englisch

55,90 €*

inkl. MwSt.

Versandkostenfrei per Post / DHL

Aktuell nicht verfügbar

Kategorien:

Der Artikel ist nicht in deutscher Sprache verfasst.
Die Sprache der Produktbeschreibung und Details kann in Englisch angegeben sein. Sollten Sie Verständnisprobleme haben, wenden Sie sich gerne an uns.

Beschreibung

The newest edition of the leading introductory book on data mining, fully updated and revised

Who will remain a loyal customer and who won't? Which messages are most effective with which segments? How can customer value be maximized? This book supplies powerful tools for extracting the answers to these and other crucial business questions from the corporate databases where they lie buried. In the years since the first edition of this book, data mining has grown to become an indispensable tool of modern business. In this latest edition, Linoff and Berry have made extensive updates and revisions to every chapter and added several new ones. The book retains the focus of earlier editions?showing marketing analysts, business managers, and data mining specialists how to harness data mining methods and techniques to solve important business problems. While never sacrificing accuracy for the sake of simplicity, Linoff and Berry present even complex topics in clear, concise English with minimal use of technical jargon or mathematical formulas. Technical topics are illustrated with case studies and practical real-world examples drawn from the authors' experiences, and every chapter contains valuable tips for practitioners. Among the techniques newly covered, or covered in greater depth, are linear and logistic regression models, incremental response (uplift) modeling, naïve Bayesian models, table lookup models, similarity models, radial basis function networks, expectation maximization (EM) clustering, and swarm intelligence. New chapters are devoted to data preparation, derived variables, principal components and other variable reduction techniques, and text mining.

After establishing the business context with an overview of data mining applications, and introducing aspects of data mining methodology common to all data mining projects, the book covers each important data mining technique in detail.

This third edition of Data Mining Techniques covers such topics as:

How to create stable, long-lasting predictive models
Data preparation and variable selection
Modeling specific targets with directed techniques such as regression, decision trees, neural networks, and memory based reasoning
Finding patterns with undirected techniques such as clustering, association rules, and link analysis
Modeling business time-to-event problems such as time to next purchase and expected remaining lifetime
Mining unstructured text

The companion website provides data that can be used to test out the various data mining techniques in the book.

The newest edition of the leading introductory book on data mining, fully updated and revised

This third edition of Data Mining Techniques covers such topics as:

How to create stable, long-lasting predictive models
Data preparation and variable selection
Modeling specific targets with directed techniques such as regression, decision trees, neural networks, and memory based reasoning
Finding patterns with undirected techniques such as clustering, association rules, and link analysis
Modeling business time-to-event problems such as time to next purchase and expected remaining lifetime
Mining unstructured text

The companion website provides data that can be used to test out the various data mining techniques in the book.

Über den Autor

GORDON S. LINOFF and MICHAEL J. A. BERRY are the founders of Data Miners, Inc., a consultancy specializing in data mining. They have jointly authored two of the leading data mining titles in the field, Data Mining Techniques and Mastering Data Mining (both from Wiley). They each have decades of experience applying data mining techniques to business problems in marketing and customer relationship management.

Inhaltsverzeichnis

Introduction xxxvii

Chapter 1 What Is Data Mining and Why Do It? 1

What Is Data Mining? 2

Data Mining Is a Business Process 2

Large Amounts of Data 3

Meaningful Patterns and Rules 3

Data Mining and Customer Relationship Management 4

Why Now? 6

Data Is Being Produced 6

Data Is Being Warehoused 6

Computing Power Is Affordable 7

Interest in Customer Relationship Management Is Strong 7

Commercial Data Mining Software Products Have Become Available 8

Skills for the Data Miner 9

The Virtuous Cycle of Data Mining 9

A Case Study in Business Data Mining 11

Identifying BofA's Business Challenge 12

Applying Data Mining 12

Acting on the Results 13

Measuring the Effects of Data Mining 14

Steps of the Virtuous Cycle 15

Identify Business Opportunities 16

Transform Data into Information 17

Act on the Information 19

Measure the Results 20

Data Mining in the Context of the Virtuous Cycle 23

Lessons Learned 26

Chapter 2 Data Mining Applications in Marketing and Customer Relationship Management 27

Two Customer Lifecycles 27

The Customer's Lifecycle 28

The Customer Lifecycle 28

Subscription Relationships versus Event-Based Relationships 30

Organize Business Processes Around the Customer Lifecycle 32

Customer Acquisition 33

Customer Activation 36

Customer Relationship Management 37

Winback 38

Data Mining Applications for Customer Acquisition 38

Identifying Good Prospects 39

Choosing a Communication Channel 39

Picking Appropriate Messages 40

A Data Mining Example: Choosing the Right Place to Advertise 40

Who Fits the Profile? 41

Measuring Fitness for Groups of Readers 44

Data Mining to Improve Direct Marketing Campaigns 45

Response Modeling 46

Optimizing Response for a Fixed Budget 47

Optimizing Campaign Profitability 49

Reaching the People Most Influenced by the Message 53

Using Current Customers to Learn About Prospects 54

Start Tracking Customers Before They Become "Customers" 55

Gather Information from New Customers 55

Acquisition-Time Variables Can Predict Future Outcomes 56

Data Mining Applications for Customer Relationship Management 56

Matching Campaigns to Customers 56

Reducing Exposure to Credit Risk 58

Determining Customer Value 59

Cross-selling, Up-selling, and Making Recommendations 60

Retention 60

Recognizing Attrition 60

Why Attrition Matters 61

Different Kinds of Attrition 62

Different Kinds of Attrition Model 63

Beyond the Customer Lifecycle 64

Lessons Learned 65

Chapter 3 The Data Mining Process 67

What Can Go Wrong? 68

Learning Things That Aren't True 68

Learning Things That Are True, but Not Useful 73

Data Mining Styles 74

Hypothesis Testing 75

Directed Data Mining 81

Undirected Data Mining 81

Goals, Tasks, and Techniques 82

Data Mining Business Goals 82

Data Mining Tasks 83

Data Mining Techniques 88

Formulating Data Mining Problems: From Goals to Tasks to Techniques 88

What Techniques for Which Tasks? 95

Is There a Target or Targets? 96

What Is the Target Data Like? 96

What Is the Input Data Like? 96

How Important Is Ease of Use? 97

How Important Is Model Explicability? 97

Lessons Learned 98

Chapter 4 Statistics 101: What You Should Know About Data 101

Occam's Razor 103

Skepticism and Simpson's Paradox 103

The Null Hypothesis 104

P-Values 105

Looking At and Measuring Data 106

Categorical Values 106

Numeric Variables 117

A Couple More Statistical Ideas 120

Measuring Response 120

Standard Error of a Proportion 121

Comparing Results Using Confidence Bounds 123

Comparing Results Using Difference of Proportions 124

Size of Sample 125

What the Confidence Interval Really Means 126

Size of Test and Control for an Experiment 127

Multiple Comparisons 129

The Confidence Level with Multiple Comparisons 129

Bonferroni's Correction 129

Chi-Square Test 130

Expected Values 130

Chi-Square Value 132

Comparison of Chi-Square to Difference of Proportions 134

An Example: Chi-Square for Regions and Starts 134

Case Study: Comparing Two Recommendation Systems with an A/B Test 138

First Metric: Participating Sessions 140

Data Mining and Statistics 144

Lessons Learned 148

Chapter 5 Descriptions and Prediction: Profiling and Predictive Modeling 151

Directed Data Mining Models 152

Defining the Model Structure and Target 152

Incremental Response Modeling 154

Model Stability 156

Time-Frames in the Model Set 157

Directed Data Mining Methodology 159

Step 1: Translate the Business Problem into a Data Mining Problem 161

How Will Results Be Used? 163

How Will Results Be Delivered? 163

The Role of Domain Experts and Information Technology 164

Step 2: Select Appropriate Data 165

What Data Is Available? 166

How Much Data Is Enough? 167

How Much History Is Required? 167

How Many Variables? 168

What Must the Data Contain? 168

Step 3: Get to Know the Data 169

Examine Distributions 169

Compare Values with Descriptions 170

Validate Assumptions 170

Ask Lots of Questions 171

Step 4: Create a Model Set 172

Assembling Customer Signatures 172

Creating a Balanced Sample 172

Including Multiple Timeframes 174

Creating a Model Set for Prediction 174

Creating a Model Set for Profiling 176

Partitioning the Model Set 176

Step 5: Fix Problems with the Data 177

Categorical Variables with Too Many Values 177

Numeric Variables with Skewed Distributions and Outliers 178

Missing Values 178

Values with Meanings That Change over Time 179

Inconsistent Data Encoding 179

Step 6: Transform Data to Bring Information to the Surface 180

Step 7: Build Models 180

Step 8: Assess Models 180

Assessing Binary Response Models and Classifiers 181

Assessing Binary Response Models Using Lift 182

Assessing Binary Response Model Scores Using Lift Charts 184

Assessing Binary Response Model Scores Using Profitability Models 185

Assessing Binary Response Models Using ROC Charts 186

Assessing Estimators 188

Assessing Estimators Using Score Rankings 189

Step 9: Deploy Models 190

Practical Issues in Deploying Models 190

Optimizing Models for Deployment 191

Step 10: Assess Results 191

Step 11: Begin Again 193

Lessons Learned 193

Chapter 6 Data Mining Using Classic Statistical Techniques 195

Similarity Models 196

Similarity and Distance 196

Example: A Similarity Model for Product Penetration 197

Table Lookup Models 203

Choosing Dimensions 204

Partitioning the Dimensions 205

From Training Data to Scores 205

Handling Sparse and Missing Data by Removing Dimensions 205

RFM: A Widely Used Lookup Model 206

RFM Cell Migration 207

RFM and the Test-and-Measure Methodology 208

RFM and Incremental Response Modeling 209

Naïve Bayesian Models 210

Some Ideas from Probability 210

The Naïve Bayesian Calculation 212

Comparison with Table Lookup Models 213

Linear Regression 213

The Best-fit Line 215

Goodness of Fit 217

Multiple Regression 220

The Equation 220

The Range of the Target Variable 221

Interpreting Coefficients of Linear Regression Equations 221

Capturing Local Effects with Linear Regression 223

Additional Considerations with Multiple Regression 224

Variable Selection for Multiple Regression 225

Logistic Regression 227

Modeling Binary Outcomes 227

The Logistic Function 229

Fixed Effects and Hierarchical Effects 231

Hierarchical Effects 232

Within and Between Effects 232

Fixed Effects 233

Lessons Learned 234

Chapter 7 Decision Trees 237

What Is a Decision Tree and How Is It Used? 238

A Typical Decision Tree 238

Using the Tree to Learn About Churn 240

Using the Tree to Learn About Data and Select Variables 241

Using the Tree to Produce Rankings 243

Using the Tree to Estimate Class Probabilities 243

Using the Tree to Classify Records 244

Using the Tree to Estimate Numeric Values 244

Decision Trees Are Local Models 245

Growing Decision Trees 247

Finding the Initial Split 248

Growing the Full Tree 251

Finding the Best Split 252

Gini (Population Diversity) as a Splitting Criterion 253

Entropy Reduction or Information Gain as a Splitting Criterion 254

Information Gain Ratio 256

Chi-Square Test as a Splitting Criterion 256

Incremental Response as a Splitting Criterion 258

Reduction in Variance as a Splitting Criterion for Numeric Targets 259

F Test 262

Pruning 262

The CART Pruning Algorithm 263

Pessimistic Pruning: The C5.0 Pruning Algorithm 267

Stability-Based Pruning 268

Extracting Rules from Trees 269

Decision Tree Variations 270

Multiway Splits 270

Splitting on More Than One Field at a Time 271

Creating Nonrectangular Boxes 271

Assessing the Quality of a Decision Tree 275

When...

Details

Erscheinungsjahr:	2011
Fachbereich:	Anwendungs-Software
Genre:	Importe , Informatik
Rubrik:	Naturwissenschaften & Technik
Medium:	Taschenbuch
Inhalt:	896 S.
ISBN-13:	9780470650936
ISBN-10:	0470650931
Sprache:	Englisch
Herstellernummer:	14565093000
Einband:	Kartoniert / Broschiert
Autor:	Linoff, Gordon S Berry, Michael J a
Auflage:	3rd edition
Hersteller:	Wiley John Wiley & Sons
Verantwortliche Person für die EU:	Wiley-VCH GmbH, Boschstr. 12, D-69469 Weinheim, product-safety@wiley.com
Maße:	235 x 191 x 48 mm
Von/Mit:	Gordon S Linoff (u. a.)
Erscheinungsdatum:	12.04.2011
Gewicht:	1,622 kg

Artikel-ID: 107164843