PacktLib: Talend for Big Data

Talend for Big Data

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Getting Started with Talend Big Data

Talend Unified Platform presentation

Knowing about the Hadoop ecosystem

Prerequisites for running examples

Downloading Talend Open Studio for Big Data

Installing TOSBD

Running TOSBD for the first time

Summary

Building Our First Big Data Job

TOSBD – the development environment

A simple HDFS writer job

Checking the result in HDFS

Summary

Formatting Data

Twitter Sentiment Analysis

Writing the tweets in HDFS

Setting our Apache Hive tables

Formatting tweets with Apache Hive

Summary

Processing Tweets with Apache Hive

Extracting hashtags

Extracting emoticons

Joining the dots

Summary

Aggregate Data with Apache Pig

Knowing about Pig

Extracting the top Twitter users

Extracting the top hashtags, emoticons, and sentiments

Summary

Back to the SQL Database

Linking HDFS and RDBMS with Sqoop

Exporting and importing data to a MySQL database

Summary

Big Data Architecture and Integration Patterns

The streaming pattern

The partitioning pattern

Summary

Installing Your Hadoop Cluster with Cloudera CDH VM

Installing Your Hadoop Cluster with Cloudera CDH VM

Installing Your Hadoop Cluster with Cloudera CDH VM

Installing Your Hadoop Cluster with Cloudera CDH VM

Installing Your Hadoop Cluster with Cloudera CDH VM

Index