PacktLib: Talend Open Studio Cookbook

Talend Open Studio Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Introduction and General Principles

Before you begin

Installing the software

Enabling tHashInput and tHashOutput

Metadata and Schemas

Introduction

Hand-cranking a built-in schema

Propagating schema changes

Creating a generic schema from the existing metadata

Cutting and pasting schema information

Dropping schemas to empty components

Creating schemas from lists

Validating Data

Introduction

Enabling and disabling reject flows

Gathering all rejects prior to killing a job

Validating against the schema

Rejecting rows using tMap

Checking a column against a list of allowed values

Checking a column against a lookup

Creating validation rules for more complex requirements

Creating binary error codes to store multiple test results

Mapping Data

Introduction

Simple mapping and tMap time savers

Creating tMap expressions

Using the ternary operator for conditional logic

Using intermediate variables in tMap

Filtering input rows

Splitting an input row into multiple outputs based on input conditions

Joining data using tMap

Hierarchical joins using tMap

Using reload at each row to process real-time / near real-time data

Using Java in Talend

Introduction

Performing one-off pieces of logic using tJava

Setting the context and globalMap variables using tJava

Adding complex logic into a flow using tJavaRow

Creating pseudo components using tJavaFlex

Creating custom functions using code routines

Importing JAR files to allow use of external Java classes

Managing Context Variables

Introduction

Creating a context group

Adding a context group to your job

Adding contexts to a context group

Using tContextLoad to load contexts

Using implicit context loading to load contexts

Turning implicit context loading on and off in a job

Setting the context file location in the operating system

Working with Databases

Introduction

Setting up a database connection

Importing the table schemas

Reading from database tables

Using context and globalMap variables in SQL queries

Printing your input query

Writing to a database table

Printing your output query

Managing database sessions

Passing a session to a child job

Selecting different fields and keys for insert, update, and delete

Capturing individual rejects and errors

Database and table management

Managing surrogate keys for parent and child tables

Rewritable lookups using an in-process database

Managing Files

Introduction

Appending records to a file

Reading rows using a regular expression

Using temporary files

Storing intermediate data in the memory using tHashMap

Reading headers and trailers using tMap

Reading headers and trailers with no identifiers

Using the information in the header and trailer

Adding a header and trailer to a file

Moving, copying, renaming, and deleting files and folders

Capturing file information

Processing multiple files at once

Processing control/validation files

Creating and writing files depending on the input data

Working with XML, Queues, and Web Services

Introduction

Using tXMLMap to read XML

Using tXMLMap to create an XML document

Reading complex hierarchical XML

Writing complex XML

Calling a SOAP web service

Calling a RESTful web service

Reading and writing to a queue

Ensuring lossless queues using sessions

Debugging, Logging, and Testing

Introduction

Find the location of compilation errors using the Problems tab

Locating execution errors from the console output

Using the Talend debug mode – row-by-row execution

Using the Java debugger to debug Talend jobs

Using tLogRow to show data in a row

Using tJavaRow to display row information

Using tJava to display status messages and variables

Printing out the context

Dumping the console output to a file from within a job

Creating simple test data using tRowGenerator

Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences

Creating random test data using lookups

Creating test data using Excel

Testing logic – the most-used pattern

Killing a job from within tJavaRow

Deploying and Scheduling Talend Code

Introduction

Creating compiled executables

Using a different context

Adding command-line context parameters

Managing job dependencies

Capturing and acting on different return codes

Returning codes from a child job without tDie

Passing parameters to a child job

Executing non-Talend objects and operating system commands

Common Mistakes and Other Useful Hints and Tips

Introduction

My tab is missing

Finding the code routine

Finding a new context variable

Reloads going missing at each row global variable

Dragging component globalMap variables

Some complex date formats

Capturing tMap rejects

Adding job name, project name, and other job specific information

Printing tMap variables

Stopping memory errors in Talend

Common Type Conversions

Management of Contexts

Management of Contexts

Management of Contexts

Management of Contexts

Management of Contexts

Management of Contexts

Index