PacktLib: Pig Design Patterns

Pig Design Patterns

Credits

Foreword

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Preface

Setting the Context for Design Patterns in Pig

Understanding design patterns

The scope of design patterns in Pig

Hadoop demystified – a quick reckoner

Pig – a quick intro

Understanding Pig through the code

Summary

Data Ingest and Egress Patterns

The context of data ingest and egress

Types of data in the enterprise

Ingest and egress patterns for multistructured data

The ingress and egress patterns for the NoSQL data

The ingress and egress patterns for structured data

The ingress and egress patterns for semi-structured data

JSON ingress and egress patterns

Summary

Data Profiling Patterns

Data profiling for Big Data

Rationale for using Pig in data profiling

The data type inference pattern

The basic statistical profiling pattern

The pattern-matching pattern

The string profiling pattern

The unstructured text profiling pattern

Summary

Data Validation and Cleansing Patterns

Data validation and cleansing for Big Data

Choosing Pig for validation and cleansing

The constraint validation and cleansing design pattern

The regex validation and cleansing design pattern

The corrupt data validation and cleansing design pattern

The unstructured text data validation and cleansing design pattern

Summary

Data Transformation Patterns

Data transformation processes

The structured-to-hierarchical transformation pattern

The data normalization pattern

The data integration pattern

The aggregation pattern

The data generalization pattern

Summary

Understanding Data Reduction Patterns

Data reduction – a quick introduction

Data reduction considerations for Big Data

Dimensionality reduction – the Principal Component Analysis design pattern

Numerosity reduction – the histogram design pattern

Numerosity reduction – sampling design pattern

Numerosity reduction – clustering design pattern

Summary

Advanced Patterns and Future Work

The clustering pattern

The topic discovery pattern

The natural language processing pattern

The classification pattern

Future trends

Summary

Index