PacktLib: Talend Open Studio Cookbook

Talend Open Studio Cookbook


About the Author

About the Reviewers


Introduction and General Principles

Before you begin

Installing the software

Enabling tHashInput and tHashOutput

Metadata and Schemas


Hand-cranking a built-in schema

Propagating schema changes

Creating a generic schema from the existing metadata

Cutting and pasting schema information

Dropping schemas to empty components

Creating schemas from lists

Validating Data


Enabling and disabling reject flows

Gathering all rejects prior to killing a job

Validating against the schema

Rejecting rows using tMap

Checking a column against a list of allowed values

Checking a column against a lookup

Creating validation rules for more complex requirements

Creating binary error codes to store multiple test results

Mapping Data


Simple mapping and tMap time savers

Creating tMap expressions

Using the ternary operator for conditional logic

Using intermediate variables in tMap

Filtering input rows

Splitting an input row into multiple outputs based on input conditions

Joining data using tMap

Hierarchical joins using tMap

Using reload at each row to process real-time / near real-time data

Using Java in Talend


Performing one-off pieces of logic using tJava

Setting the context and globalMap variables using tJava

Adding complex logic into a flow using tJavaRow

Creating pseudo components using tJavaFlex

Creating custom functions using code routines

Importing JAR files to allow use of external Java classes

Managing Context Variables


Creating a context group

Adding a context group to your job

Adding contexts to a context group

Using tContextLoad to load contexts

Using implicit context loading to load contexts

Turning implicit context loading on and off in a job

Setting the context file location in the operating system

Working with Databases


Setting up a database connection

Importing the table schemas

Reading from database tables

Using context and globalMap variables in SQL queries

Printing your input query

Writing to a database table

Printing your output query

Managing database sessions

Passing a session to a child job

Selecting different fields and keys for insert, update, and delete

Capturing individual rejects and errors

Database and table management

Managing surrogate keys for parent and child tables

Rewritable lookups using an in-process database

Managing Files


Appending records to a file

Reading rows using a regular expression

Using temporary files

Storing intermediate data in the memory using tHashMap

Reading headers and trailers using tMap

Reading headers and trailers with no identifiers

Using the information in the header and trailer

Adding a header and trailer to a file

Moving, copying, renaming, and deleting files and folders

Capturing file information

Processing multiple files at once

Processing control/validation files

Creating and writing files depending on the input data

Working with XML, Queues, and Web Services


Using tXMLMap to read XML

Using tXMLMap to create an XML document

Reading complex hierarchical XML

Writing complex XML

Calling a SOAP web service

Calling a RESTful web service

Reading and writing to a queue

Ensuring lossless queues using sessions

Debugging, Logging, and Testing


Find the location of compilation errors using the Problems tab

Locating execution errors from the console output

Using the Talend debug mode – row-by-row execution

Using the Java debugger to debug Talend jobs

Using tLogRow to show data in a row

Using tJavaRow to display row information

Using tJava to display status messages and variables

Printing out the context

Dumping the console output to a file from within a job

Creating simple test data using tRowGenerator

Creating complex test data using tRowGenerator, tFlowToIterate, tMap, and sequences

Creating random test data using lookups

Creating test data using Excel

Testing logic – the most-used pattern

Killing a job from within tJavaRow

Deploying and Scheduling Talend Code


Creating compiled executables

Using a different context

Adding command-line context parameters

Managing job dependencies

Capturing and acting on different return codes

Returning codes from a child job without tDie

Passing parameters to a child job

Executing non-Talend objects and operating system commands

Common Mistakes and Other Useful Hints and Tips


My tab is missing

Finding the code routine

Finding a new context variable

Reloads going missing at each row global variable

Dragging component globalMap variables

Some complex date formats

Capturing tMap rejects

Adding job name, project name, and other job specific information

Printing tMap variables

Stopping memory errors in Talend

Common Type Conversions

Management of Contexts

Management of Contexts

Management of Contexts

Management of Contexts

Management of Contexts

Management of Contexts