PacktLib: Optimizing Hadoop for MapReduce

Optimizing Hadoop for MapReduce

Credits

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Preface

Understanding Hadoop MapReduce

The MapReduce model

An overview of Hadoop MapReduce

Hadoop MapReduce internals

Factors affecting the performance of MapReduce

Summary

An Overview of the Hadoop Parameters

Investigating the Hadoop parameters

Hadoop MapReduce metrics

Performance monitoring tools

Summary

Detecting System Bottlenecks

Performance tuning

Creating a performance baseline

Identifying resource bottlenecks

Summary

Identifying Resource Weaknesses

Identifying cluster weakness

Sizing your Hadoop cluster

Configuring your cluster correctly

Summary

Enhancing Map and Reduce Tasks

Enhancing map tasks

Enhancing reduce tasks

Tuning map and reduce parameters

Summary

Optimizing MapReduce Tasks

Using Combiners

Using compression

Using appropriate Writable types

Reusing types smartly

Optimizing mappers and reducers code

Summary

Best Practices and Recommendations

Hardware tuning and OS recommendations

Hadoop best practices and recommendations

Summary

Index