(Talend) Stress Test 4: Flat to ORACLE


Interaction between Talend and Oracle via Conventional Inserts and Bulk Load. Performance comparison between the two designs and its comparison with DS.


In this article we will discuss how Talend behaves with Oracle, but not only with conventional INSERTs, but also will compare the test with a massive load. Talend is the first tool we're going to test in order to analyze the different ways of loading data. We had already interacted with Oracle from Datastage (IBM DS 8.1) Stress Test 4: Flat to ORACLE (but without BULK LOAD).

First, why using BULK LOAD? What is the advantage in relation to the conventional Insert?

BULK LOAD provides high performance for data loads. It loads data into an Oracle table from an existing text file, instructed by the control file in which metadata describes how the data will be loaded, including the table name, column data types, formatting, transformation, and so on. As you might know, BULK LOAD is typically used to load large volume of data into databases. Unlike conventional SQL insert, BULK speeds up the load significantly by building blocks of data in memory and saves these blocks directly into the extents allocated for the table being loaded. Redo log entries are not generated unless the database is in archive log mode (setting the table being loaded to nologging can also reduce redo log entries if the database is in archive log mode). Direct path loads use the field specifications to build whole Oracle blocks of data, and write the blocks directly to the Oracle data files. Direct path load bypasses the database buffer cache and accesses the SGA only for extent management and adjustments of the high-water mark.

TIMES:

In this equation, is missing the BULK LOAD of Datastage, which will be added in future posts.

ARCHITECTURE:

Environment: Infraestructure composed of 3 nodes

1) ESXi 5.0:

a) Physical Datastore 1: VM ETL Talend (10GB RAM - 2 Cores * 2 Sockets)

b) Physical Datastore 2: VM Database Server Oracle (6GB RAM - 2 Cores * 2 Sockets)

2) Monitor Performance: VM Monitor ESXi + SQL Server 2008 (with 4 GB RAM)

3) Operator ETL: ESXi Client (with 4 GB RAM)

CASE 1: CONVENTIONAL INSERTS

Objective:

- To measure elapsed time reading and writing 6 million rows, from Flat file, to Oracle Database.

- Analyze use of the resources.

ETL Tool Talend 5.0
Rows: 6.024.000 M
Columns: 37 Columns

Structure:

(Metadata)

Design & Run

Parameters:

LOG

Log:

Elapsed time (s) 168 Secs
Rows p/s (avg)

35.800 r/s avg

How to Improve

Performance

- Adjust the parameters:

- COMMIT Size

USE OF RESOURCES: VM TALEND

CPU:

CPU Monitoring, "Passive and Active state" in different executions. Last Execution: 23:03-23:06

Legend CPU:

Memory:

Memory Monitoring: Last Execution: 23:03-23:06

Legend Memory:

Network

Network Monitoring: Last Execution: 23:03-23:06

Legend Network

DataST

Legend DataStore

CASE 2: BULK LOAD

Objective:

- To measure elapsed time reading and writing 6 million rows, from Flat file, to Oracle Database with BULK LOAD - Analyze use of the resources.

ETL Tool Talend 5.0
Rows: 6.024.000 M
Columns: 37 Columns
Design & Run:
Resources:
Elapsed time (s) 50 Secs