PacktLib: Apache Flume: Distributed Log Collection for Hadoop

Apache Flume: Distributed Log Collection for Hadoop

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Overview and Architecture

Flume 0.9

Flume 1.X (Flume-NG)

The problem with HDFS and streaming data/logs

Sources, channels, and sinks

Flume events

Summary

Flume Quick Start

Downloading Flume

Flume configuration file overview

Starting up with "Hello World"

Summary

Channels

Memory channel

File channel

Summary

Sinks and Sink Processors

HDFS sink

Compression codecs

Event serializers

Sink groups

Summary

Sources and Channel Selectors

The problem with using tail

The exec source

The spooling directory source

Syslog sources

Channel selectors

Summary

Interceptors, ETL, and Routing

Interceptors

Tiering data flows

Routing

Summary

Monitoring Flume

Monitoring the agent process

Monitoring performance metrics

Summary

There Is No Spoon – The Realities of Real-time Distributed Data Collection

Transport time versus log time

Time zones are evil

Capacity planning

Considerations for multiple data centers

Compliance and data expiry

Summary

Index