PacktLib: Practical Data Analysis

Practical Data Analysis

Credits

Foreword

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Preface

Getting Started

Computer science

Artificial intelligence (AI)

Machine Learning (ML)

Statistics

Mathematics

Knowledge domain

Data, information, and knowledge

The nature of data

The data analysis process

Quantitative versus qualitative data analysis

Importance of data visualization

What about big data?

Summary

Working with Data

Datasource

Data scrubbing

Data formats

Getting started with OpenRefine

Summary

Data Visualization

Data-Driven Documents (D3)

Getting started with D3.js

Interaction and animation

Summary

Text Classification

Learning and classification

Bayesian classification

E-mail subject line tester

The algorithm

Classifier accuracy

Summary

Similarity-based Image Retrieval

Image similarity search

Dynamic time warping (DTW)

Processing the image dataset

Implementing DTW

Analyzing the results

Summary

Simulation of Stock Prices

Financial time series

Random walk simulation

Monte Carlo methods

Generating random numbers

Implementation in D3.js

Summary

Predicting Gold Prices

Working with the time series data

Smoothing the time series

The data – historical gold prices

Nonlinear regression

Summary

Working with Support Vector Machines

Understanding the multivariate dataset

Dimensionality reduction

Getting started with support vector machine

Summary

Modeling Infectious Disease with Cellular Automata

Introduction to epidemiology

The epidemic models

Modeling with cellular automata

Simulation of the SIRS model in CA with D3.js

Summary

Working with Social Graphs

Structure of a graph

Social Networks Analysis

Acquiring my Facebook graph

Representing graphs with Gephi

Statistical analysis

Degree distribution

Transforming GDF to JSON

Graph visualization with D3.js

Summary

Sentiment Analysis of Twitter Data

The anatomy of Twitter data

Using OAuth to access Twitter API

Getting started with Twython

Sentiment classification

Getting started with Natural Language Toolkit (NLTK)

Summary

Data Processing and Aggregation with MongoDB

Getting started with MongoDB

Data preparation

Group

The aggregation framework

Summary

Working with MapReduce

MapReduce overview

Programming model

Using MapReduce with MongoDB

Filtering the input collection

Grouping and aggregation

Word cloud visualization of the most common positive words in tweets

Summary

Online Data Analysis with IPython and Wakari

Getting started with Wakari

Getting started with IPython Notebook

Introduction to image processing with PIL

Getting started with Pandas

Multiprocessing with IPython

Sharing your Notebook

Summary

Setting Up the Infrastructure

Setting Up the Infrastructure

Setting Up the Infrastructure

Setting Up the Infrastructure

Setting Up the Infrastructure

Setting Up the Infrastructure

Setting Up the Infrastructure

Setting Up the Infrastructure

Setting Up the Infrastructure

Index