PacktLib: Apache Mahout Cookbook

Apache Mahout Cookbook

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Preface

Mahout is Not So Difficult!

Introduction

Installing Java and Hadoop

Setting up a Maven and NetBeans development environment

Coding a basic recommender

Using Sequence Files – When and Why?

Introduction

Creating sequence files from the command line

Generating sequence files from code

Reading sequence files from code

Integrating Mahout with an External Datasource

Introduction

Importing an external datasource into HDFS

Exporting data from HDFS to RDBMS

Creating a Sqoop job to deal with RDBMS

Importing data using Sqoop API

Implementing the Naϊve Bayes classifier in Mahout

Introduction

Using the Mahout text classifier to demonstrate the basic use case

Using the Naïve Bayes classifier from code

Using Complementary Naïve Bayes from the command line

Coding the Complementary Naïve Bayes classifier

Stock Market Forecasting with Mahout

Introduction

Preparing data for logistic regression

Predicting GOOG movements using logistic regression

Using adaptive logistic regression in Java code

Using logistic regression on large-scale datasets

Using Random Forest to forecast market movements

Canopy Clustering in Mahout

Introduction

Command-line-based Canopy clustering

Command-line-based Canopy clustering with parameters

Using Canopy clustering from the Java code

Coding your own cluster distance evaluation

Spectral Clustering in Mahout

Introduction

Using EigenCuts from the command line

Using EigenCuts from Java code

Creating a similarity matrix from raw data

Using spectral clustering with image segmentation

K-means Clustering

Introduction

Using K-means clustering from Java code

Clustering traffic accidents using K-means

K-means clustering using MapReduce

Using K-means clustering from the command line

Soft Computing with Mahout

Introduction

Frequent Pattern Mining with Mahout

Creating metrics for Frequent Pattern Mining

Using Frequent Pattern Mining from Java code

Using LDA for creating topics

Implementing the Genetic Algorithm in Mahout

Introduction

Setting up Mahout for using GA

Using the genetic algorithm over graphs

Using the genetic algorithm from Java code

Index