(Pentaho) Stress Test 3: Lookups & Filters (RELOAD)

Another analysis to do is measure the time taken for the transformation of "lookup" and "filter" with the same file that the previous tests. VERSION 2

In this Third Pentaho Test, with the new architecture, we again noticed a big change in Performance. This change appears when making minor modifications in design, which are possible due to the availability of resources. In first place, the File Input Stage was replaced with the CSV to achieve parallelism from Reading. Several runs were made, modifying in each the Number of copies per operator. As a result, the optimal design is shown in the picture below, parallelizing the flow from the input through the active operators, and collecting at the end in a flat file (since the number of rows to write is low).

Among the changes implemented, we can also mention the dump FData + N | O BS, the records store of the Lookup into the Memory, etc. (The numbers showed in the image will be detailed in the log run of the graphics below)



- Environment: Infraestructure composed of 3 nodes

- 1) ESXi 5.0:\

1.a) Physical Datastore 1: VM ETL Clover (12GB RAM - 2 Cores * 2 Sockets)

1.b) Physical Datastore 2: VM Database Server MySQL/Oracle (4GB RAM - 2 Cores * 2 Sockets)

- 2) Monitor Performance: VM Monitor ESXi + SQL Server 2008 (with 4 GB RAM)

- 3) Operator ETL: ESXi Client (with 3 GB RAM)

CASE 1: CSV + Lazy Conv + X4 Cop + FData dump + N|O BS 1.500.000 + CACHE MEMORY Lookup


- To measure elapsed time reading and writing 6 million rows, from Flat file, to .CSV file.

- Compare performances in the 2 environments.

- Analyze use of the resources

ETL Tool Pentaho (Spoon) 4.1
Rows: 6.024.000 M
Columns: 37 Columns



Design & Run



Elapsed time (s) 16 Secs.
Rows p/s (avg)

VERSION 2: Performance improvement.

How to Improve


- Adjust the parameters:

- Use CSV -

- Use Lazy Conversion

- Use Fast Data Dump

- Set N|O BS to 1.5M

- Set 4X (Copies)

- Use Cache memory on Lookup



Important: Memory Swap: 0

CPU/Datastore: CPU Usage Mhz / Datastore usage between 22:18:50-22:19:10

Menmory: After several executions, the memory consumption remains stable in 2,5 GB


CPU Monitoring, "Passive and Active state" in different executions. Last Execution: 22:18:50-22:19:10


Memory Monitoring: Last Execution: 22:18:50-22:19:10