PacktLib: Exploring Data with RapidMiner

Exploring Data with RapidMiner

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Setting the Scene

A process framework

Data volume and velocity

Data variety, formats, and meanings

Missing data

Cleaning data

Visualizing data

Resource constraints

Terminology

Accompanying material

Summary

Loading Data

Reading files

Databases

Using macros

Summary

Visualizing Data

Getting started

Statistical summaries

Relationships between attributes

Time series data

Relations between examples

Summary

Parsing and Converting Attributes

Generating attributes

Renaming attributes

Summary

Outliers

Manual inspection

Automated detection of example outliers

Summary

Missing Values

Missing or empty?

Types of missing data

Categorizing missing data

Effect of missing data

Options for handling missing data

Summary

Transforming Data

Creating new attributes

Aggregation

Using pivoting

Using de-pivoting

Windowing data

Summary

Reducing Data Size

Removing examples using sampling

Removing attributes

Summary

Resource Constraints

Measuring and estimating performance

Adding memory

Parallel processing

Restructuring processes

Summary

Debugging

Breakpoints in RapidMiner Studio

Logging data in RapidMiner Studio

RapidMiner Studio console printing

Groovy scripts

Regex tools

Using XPath effectively

Summary

Taking Stock

Exploring new techniques

Where to go next

Index