PacktLib: IBM SPSS Modeler Cookbook

IBM SPSS Modeler Cookbook



About the Authors

About the Reviewers


Data Understanding


Using an empty aggregate to evaluate sample size

Evaluating the need to sample from the initial data

Using CHAID stumps when interviewing an SME

Using a single cluster K-means as an alternative to anomaly detection

Using an @NULL multiple Derive to explore missing data

Creating an Outlier report to give to SMEs

Detecting potential model instability early using the Partition node and Feature Selection node

Data Preparation – Select


Using the Feature Selection node creatively to remove or decapitate perfect predictors

Running a Statistics node on anti-join to evaluate the potential missing data

Evaluating the use of sampling for speed

Removing redundant variables using correlation matrices

Selecting variables using the CHAID Modeling node

Selecting variables using the Means node

Selecting variables using single-antecedent Association Rules

Data Preparation – Clean


Binning scale variables to address missing data

Using a full data model/partial data model approach to address missing data

Imputing in-stream mean or median

Imputing missing values randomly from uniform or normal distributions

Using random imputation to match a variable's distribution

Searching for similar records using a Neural Network for inexact matching

Using neuro-fuzzy searching to find similar names

Producing longer Soundex codes

Data Preparation – Construct


Building transformations with multiple Derive nodes

Calculating and comparing conversion rates

Grouping categorical values

Transforming high skew and kurtosis variables with a multiple Derive node

Creating flag variables for aggregation

Using Association Rules for interaction detection/feature creation

Creating time-aligned cohorts

Data Preparation – Integrate and Format


Speeding up merge with caching and optimization settings

Merging a lookup table

Shuffle-down (nonstandard aggregation)

Cartesian product merge using key-less merge by key

Multiplying out using Cartesian product merge, user source, and derive dummy

Changing large numbers of variable names without scripting

Parsing nonstandard dates

Parsing and performing a conversion on a complex stream

Sequence processing

Selecting and Building a Model


Evaluating balancing with Auto Classifier

Building models with and without outliers

Using Neural Network for Feature Selection

Creating a bootstrap sample

Creating bagged logistic regression models

Using KNN to match similar cases

Using Auto Classifier to tune models

Next-Best-Offer for large datasets

Modeling – Assessment, Evaluation, Deployment, and Monitoring


How (and why) to validate as well as test

Using classification trees to explore the predictions of a Neural Network

Correcting a confusion matrix for an imbalanced target variable by incorporating priors

Using aggregate to write cluster centers to Excel for conditional formatting

Creating a classification tree financial summary using aggregate and an Excel Export node

Reformatting data for reporting with a Transpose node

Changing formatting of fields in a Table node

Combining generated filters

CLEM Scripting


Building iterative Neural Network forecasts

Quantifying variable importance with Monte Carlo simulation

Implementing champion/challenger model management

Detecting outliers with the jackknife method

Optimizing K-means cluster solutions

Automating time series forecasts

Automating HTML reports and graphs

Rolling your own modeling algorithm – Weibull analysis

Business Understanding

Business Understanding

Business Understanding

Business Understanding

Business Understanding

Business Understanding