PacktLib: Apache Mahout Cookbook

Apache Mahout Cookbook


About the Author


About the Reviewers


Mahout is Not So Difficult!


Installing Java and Hadoop

Setting up a Maven and NetBeans development environment

Coding a basic recommender

Using Sequence Files – When and Why?


Creating sequence files from the command line

Generating sequence files from code

Reading sequence files from code

Integrating Mahout with an External Datasource


Importing an external datasource into HDFS

Exporting data from HDFS to RDBMS

Creating a Sqoop job to deal with RDBMS

Importing data using Sqoop API

Implementing the Naϊve Bayes classifier in Mahout


Using the Mahout text classifier to demonstrate the basic use case

Using the Naïve Bayes classifier from code

Using Complementary Naïve Bayes from the command line

Coding the Complementary Naïve Bayes classifier

Stock Market Forecasting with Mahout


Preparing data for logistic regression

Predicting GOOG movements using logistic regression

Using adaptive logistic regression in Java code

Using logistic regression on large-scale datasets

Using Random Forest to forecast market movements

Canopy Clustering in Mahout


Command-line-based Canopy clustering

Command-line-based Canopy clustering with parameters

Using Canopy clustering from the Java code

Coding your own cluster distance evaluation

Spectral Clustering in Mahout


Using EigenCuts from the command line

Using EigenCuts from Java code

Creating a similarity matrix from raw data

Using spectral clustering with image segmentation

K-means Clustering


Using K-means clustering from Java code

Clustering traffic accidents using K-means

K-means clustering using MapReduce

Using K-means clustering from the command line

Soft Computing with Mahout


Frequent Pattern Mining with Mahout

Creating metrics for Frequent Pattern Mining

Using Frequent Pattern Mining from Java code

Using LDA for creating topics

Implementing the Genetic Algorithm in Mahout


Setting up Mahout for using GA

Using the genetic algorithm over graphs

Using the genetic algorithm from Java code