(Clover) Stress Test 3: Lookups & Filters (RELOAD)

Another analysis to do is measure the time taken for the transformation of "lookup" and "filter" with the same file that the previous tests.


As in most of the Test, there are several cases using differents techniques, setting of variables, with the same objective, satisfy the requirement and improve the performance. Always the first case is the best, and the rest of cases, are jobs without tuning, or an incomplet set of techniques (but helped on the way).

Here we noticed a marked difference when adjusting the parameters and ran again. The rows/s grew fivefold.

LINKS

CASE 1 [183 Secs.]

CASE 2 [643 Secs.]

CASE 1: USE '-server' + JAVA MEMORY SIZE Mb (1500) -Xmx

Objective: To measure elapsed time reading 6 million rows, from Flat file, join the main flow with a lookup table (MySql) and take attributes. Filter the flow and write a txt file.
Rows: 6.024.000 M
Columns: 37 Columns
Resources: Virtual machine with: 2 GB RAM, Clover like main process over the virtual plataform. The resources used are anecdotal, today, Any production environment has enough processing power for current and future requirements. The objective here, is to build, to execute and to measure with the same environment (regardless of the limited resources)
Structure:
Design & Run
Elapsed time (s) 183 sec.
Rows per sec (avg)

Here, show the evolution of the passage of rows via the transformations, each 5 seconds.

TRANSFORMATION: HASH_JOIN (ExthashJoin)

SERIES 1: ROWS IN = 33.413 R/s

SERIES 2: ROWS OUT

TRANSFORMATION: FILTER (Filter_MAN)

SERIES 1: ROWS IN = 33.395 R/s

SERIES 2: ROWS OUT

% cpu

OPERATOR FILTER:

0,05 % a 0,09%

OPERATOR JOIN:

0,29 % a 0,41%

Summary of log INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **-------------------
INFO [WatchDog] - Time: 08/03/12 15:13:55
INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s
INFO [WatchDog] - ---------------------------------------------------------------------------------
INFO [WatchDog] - DBInputTable DB_INPUT_TABLE0 FINISHED_OK
INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0
INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING
INFO [WatchDog] - %cpu:0.41 In:0 5306825 1345247 30386 7778
INFO [WatchDog] - In:1 6 0 0 0
INFO [WatchDog] - Out:0 5305742 1377506 30412 8021
INFO [WatchDog] - ExtFilter EXT_FILTER0 RUNNING
INFO [WatchDog] - %cpu:0.09 In:0 5305742 1377506 30412 8021
INFO [WatchDog] - Out:0 39251 11247 646 192
INFO [WatchDog] - UniversalDataWriter DATA_WRITER0 RUNNING
INFO [WatchDog] - %cpu:.. In:0 39251 11247 646 192
INFO [WatchDog] - UniversalDataReader DATA_READER0 RUNNING
INFO [WatchDog] - %cpu:0.45 Out:0 5306825 1345247 30386 7778
INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **-------------------
INFO [WatchDog] - Time: 08/03/12 15:14:00
INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s
INFO [WatchDog] - ---------------------------------------------------------------------------------
INFO [WatchDog] - DBInputTable DB_INPUT_TABLE0 FINISHED_OK
INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0
INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING
INFO [WatchDog] - %cpu:0.41 In:0 5480126 1388644 34288 8586
INFO [WatchDog] - In:1 6 0 0 0
INFO [WatchDog] - Out:0 5478734 1421670 34227 8738
INFO [WatchDog] - ExtFilter EXT_FILTER0 RUNNING
INFO [WatchDog] - %cpu:0.09 In:0 5478734 1421670 34227 8738
INFO [WatchDog] - Out:0 39251 11247 0 0
INFO [WatchDog] - UniversalDataWriter DATA_WRITER0 RUNNING
INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0
INFO [WatchDog] - UniversalDataReader DATA_READER0 RUNNING
INFO [WatchDog] - %cpu:0.45 Out:0 5480126 1388644 34288 8586
INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **-------------------
INFO [WatchDog] - Time: 08/03/12 15:14:05
INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s
INFO [WatchDog] - ---------------------------------------------------------------------------------
INFO [WatchDog] - DBInputTable DB_INPUT_TABLE0 FINISHED_OK
INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0
INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING
INFO [WatchDog] - %cpu:0.41 In:0 5656612 1432821 34996 8760
INFO [WatchDog] - In:1 6 0 0 0
INFO [WatchDog] - Out:0 5655381 1466749 35027 8938
INFO [WatchDog] - ExtFilter EXT_FILTER0 RUNNING
INFO [WatchDog] - %cpu:0.09 In:0 5655381 1466749 35027 8938
INFO [WatchDog] - Out:0 39251 11247 0 0
INFO [WatchDog] - UniversalDataWriter DATA_WRITER0 RUNNING
INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0
INFO [WatchDog] - UniversalDataReader DATA_READER0 RUNNING
INFO [WatchDog] - %cpu:0.45 Out:0 5656612 1432821 34996 8760
INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **-------------------
INFO [WatchDog] - Time: 08/03/12 15:14:10
INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s
INFO [WatchDog] - ---------------------------------------------------------------------------------
INFO [WatchDog] - DBInputTable DB_INPUT_TABLE0 FINISHED_OK
INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0
INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING
INFO [WatchDog] - %cpu:0.41 In:0 5831072 1476490 34582 8656
INFO [WatchDog] - In:1 6 0 0 0
INFO [WatchDog] - Out:0 5830864 1511532 34785 8877
INFO [WatchDog] - ExtFilter EXT_FILTER0 RUNNING
INFO [WatchDog] - %cpu:0.09 In:0 5830864 1511532 34785 8877
INFO [WatchDog] - Out:0 39251 11247 0 0
INFO [WatchDog] - UniversalDataWriter DATA_WRITER0 RUNNING
INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0
INFO [WatchDog] - UniversalDataReader DATA_READER0 RUNNING
INFO [WatchDog] - %cpu:0.45 Out:0 5831072 1476490 34582 8656
INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **-------------------
INFO [WatchDog] - Time: 08/03/12 15:14:15
INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s
INFO [WatchDog] - ---------------------------------------------------------------------------------
INFO [WatchDog] - DBInputTable DB_INPUT_TABLE0 FINISHED_OK
INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0
INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING
INFO [WatchDog] - %cpu:0.41 In:0 6004860 1520000 34560 8652
INFO [WatchDog] - In:1 6 0 0 0
INFO [WatchDog] - Out:0 6003459 1555587 34323 8761
INFO [WatchDog] - ExtFilter EXT_FILTER0 RUNNING
INFO [WatchDog] - %cpu:0.09 In:0 6003459 1555587 34323 8761
INFO [WatchDog] - Out:0 39251 11247 0 0
INFO [WatchDog] - UniversalDataWriter DATA_WRITER0 RUNNING
INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0
INFO [WatchDog] - UniversalDataReader DATA_READER0 RUNNING
INFO [WatchDog] - %cpu:0.46 Out:0 6004860 1520000 34559 8652
INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO [WatchDog] - [Clover] Post-execute phase finalization: 0
INFO [WatchDog] - [Clover] phase: 0 post-execute finalization successfully.
INFO [WatchDog] - ----------------------** Final tracking Log for phase [0] **---------------------
INFO [WatchDog] - Time: 08/03/12 15:14:16
INFO [WatchDog] - Node ID Port #Records #KB aRec/s aKB/s
INFO [WatchDog] - ---------------------------------------------------------------------------------
INFO [WatchDog] - DBInputTable DB_INPUT_TABLE0 FINISHED_OK
INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0
INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 FINISHED_OK
INFO [WatchDog] - %cpu:0.41 In:0 6029426 1526144 32947 8339
INFO [WatchDog] - In:1 6 0 0 0
INFO [WatchDog] - Out:0 6029426 1562207 32947 8536
INFO [WatchDog] - ExtFilter EXT_FILTER0 FINISHED_OK
INFO [WatchDog] - %cpu:0.09 In:0 6029426 1562207 32947 8536
INFO [WatchDog] - Out:0 39251 11247 214 61
INFO [WatchDog] - UniversalDataWriter DATA_WRITER0 FINISHED_OK
INFO [WatchDog] - %cpu:0.03 In:0 39251 11247 214 61
INFO [WatchDog] - UniversalDataReader DATA_READER0 FINISHED_OK
INFO [WatchDog] - %cpu:0.46 Out:0 6029426 1526144 32947 8339
INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO [WatchDog] - Execution of phase [0] successfully finished - elapsed time(sec): 183
INFO [WatchDog] - Post-execute finalization of connection:
INFO [WatchDog] - DBConnection driver[org.jetel.connection.jdbc.driver.JdbcDriver@178655]:jndi[null]:url[jdbc:mysql://192.168.0.114:3306/world]:user[root] ... OK
INFO [WatchDog] - -----------------------** Summary of Phases execution **---------------------
INFO [WatchDog] - Phase# Finished Status RunTime(sec) MemoryAllocation(KB)
INFO [WatchDog] - 0 FINISHED_OK 183 13349
INFO [WatchDog] - ------------------------------** End of Summary **---------------------------
INFO [WatchDog] - WatchDog thread finished - total execution time: 183 (sec)
INFO [main] - Freeing graph resources.
INFO [main] - Execution of graph successful !

CASE 2: Without '-server' & Xmx=0

Objective: To measure elapsed time reading 6 million rows, from Flat file, join the main flow with a lookup table (MySql) and take attributes. Filter the flow and write a txt file.
Rows: 6.024.000 M
Columns: 37 Columns
Resources: Virtual machine with: 2 GB RAM, Clover like main process over the virtual plataform. The resources used are anecdotal, today, Any production environment has enough processing power for current and future requirements. The objective here, is to build, to execute and to measure with the same environment (regardless of the limited resources)
Structure:
Design & Run
Elapsed time (s) 643 sec.
Rows per sec (avg)

Here, show the evolution of the passage of rows via the transformations, each 5 seconds.

TRANSFORMATION: HASH_JOIN (ExthashJoin)

SERIES 1: ROWS IN = 7.251 R/s

SERIES 2: ROWS OUT

TRANSFORMATION: FILTER (Filter_MAN)

SERIES 1: ROWS IN = 7.384 R/s

SERIES 2: ROWS OUT

% cpu

OPERATOR FILTER:

0,08 % a 0,10%

OPERATOR JOIN:

0,33 % a 0,41%

OPERATOR WRITER:

0,35 % a 0,41%

Summary of log INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **-------------------
INFO [WatchDog] - Time: 24/02/12 17:12:10
INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s
INFO [WatchDog] - ---------------------------------------------------------------------------------
INFO [WatchDog] - LK Manufacture LK_MANUFACTURE FINISHED_OK
INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0
INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING
INFO [WatchDog] - %cpu:0.33 In:0 5981988 1514280 2460 616
INFO [WatchDog] - In:1 6 0 0 0
INFO [WatchDog] - Out:0 5980309 1549682 2366 603
INFO [WatchDog] - Filter_MAN FILTER_3Brands RUNNING
INFO [WatchDog] - %cpu:0.08 In:0 5980309 1549682 2366 603
INFO [WatchDog] - Out:0 39251 11247 0 0
INFO [WatchDog] - Manufacture_3Brands MANUFACTURE_3BRANDS RUNNING
INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0
INFO [WatchDog] - DwContentTXT DW_CONTENT_TXT RUNNING
INFO [WatchDog] - %cpu:0.35 Out:0 5981988 1514280 2460 616
INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **-------------------
INFO [WatchDog] - Time: 24/02/12 17:12:16
INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s
INFO [WatchDog] - ---------------------------------------------------------------------------------
INFO [WatchDog] - LK Manufacture LK_MANUFACTURE FINISHED_OK
INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0
INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING
INFO [WatchDog] - %cpu:0.33 In:0 5995706 1517712 2538 635
INFO [WatchDog] - In:1 6 0 0 0
INFO [WatchDog] - Out:0 5995335 1553516 2780 709
INFO [WatchDog] - Filter_MAN FILTER_3Brands RUNNING
INFO [WatchDog] - %cpu:0.08 In:0 5995335 1553516 2780 709
INFO [WatchDog] - Out:0 39251 11247 0 0
INFO [WatchDog] - Manufacture_3Brands MANUFACTURE_3BRANDS RUNNING
INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0
INFO [WatchDog] - DwContentTXT DW_CONTENT_TXT RUNNING
INFO [WatchDog] - %cpu:0.35 Out:0 5995706 1517712 2538 635
INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **-------------------
INFO [WatchDog] - Time: 24/02/12 17:12:21
INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s
INFO [WatchDog] - ---------------------------------------------------------------------------------
INFO [WatchDog] - LK Manufacture LK_MANUFACTURE FINISHED_OK
INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0
INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING
INFO [WatchDog] - %cpu:0.33 In:0 6011583 1521681 3173 793
INFO [WatchDog] - In:1 6 0 0 0
INFO [WatchDog] - Out:0 6010673 1557425 3066 781
INFO [WatchDog] - Filter_MAN FILTER_3Brands RUNNING
INFO [WatchDog] - %cpu:0.08 In:0 6010673 1557425 3066 781
INFO [WatchDog] - Out:0 39251 11247 0 0
INFO [WatchDog] - Manufacture_3Brands MANUFACTURE_3BRANDS RUNNING
INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0
INFO [WatchDog] - DwContentTXT DW_CONTENT_TXT RUNNING
INFO [WatchDog] - %cpu:0.35 Out:0 6011583 1521681 3173 793
INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **-------------------
INFO [WatchDog] - Time: 24/02/12 17:12:26
INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s
INFO [WatchDog] - ---------------------------------------------------------------------------------
INFO [WatchDog] - LK Manufacture LK_MANUFACTURE FINISHED_OK
INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0
INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING
INFO [WatchDog] - %cpu:0.33 In:0 6027138 1525572 2912 728
INFO [WatchDog] - In:1 6 0 0 0
INFO [WatchDog] - Out:0 6026195 1561383 2905 740
INFO [WatchDog] - Filter_MAN FILTER_3Brands RUNNING
INFO [WatchDog] - %cpu:0.08 In:0 6026195 1561383 2905 740
INFO [WatchDog] - Out:0 39251 11247 0 0
INFO [WatchDog] - Manufacture_3Brands MANUFACTURE_3BRANDS RUNNING
INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0
INFO [WatchDog] - DwContentTXT DW_CONTENT_TXT RUNNING
INFO [WatchDog] - %cpu:0.35 Out:0 6027138 1525572 2912 728
INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO [WatchDog] - [Clover] Post-execute phase finalization: 0
INFO [WatchDog] - [Clover] phase: 0 post-execute finalization successfully.
INFO [WatchDog] - ----------------------** Final tracking Log for phase [0] **---------------------
INFO [WatchDog] - Time: 24/02/12 17:12:28
INFO [WatchDog] - Node ID Port #Records #KB aRec/s aKB/s
INFO [WatchDog] - ---------------------------------------------------------------------------------
INFO [WatchDog] - LK Manufacture LK_MANUFACTURE FINISHED_OK
INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0
INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 FINISHED_OK
INFO [WatchDog] - %cpu:0.41 In:0 6029426 1526144 9347 2366
INFO [WatchDog] - In:1 6 0 0 0
INFO [WatchDog] - Out:0 6029426 1562207 9347 2422
INFO [WatchDog] - Filter_MAN FILTER_3Brands FINISHED_OK
INFO [WatchDog] - %cpu:0.10 In:0 6029426 1562207 9347 2422
INFO [WatchDog] - Out:0 39251 11247 60 17
INFO [WatchDog] - Manufacture_3Brands MANUFACTURE_3BRANDS FINISHED_OK
INFO [WatchDog] - %cpu:0.03 In:0 39251 11247 60 17
INFO [WatchDog] - DwContentTXT DW_CONTENT_TXT FINISHED_OK
INFO [WatchDog] - %cpu:0.41 Out:0 6029426 1526144 9347 2366
INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------
INFO [WatchDog] - Execution of phase [0] successfully finished - elapsed time(sec): 645
INFO [WatchDog] - Post-execute finalization of connection:
INFO [WatchDog] - DBConnection driver[org.jetel.connection.jdbc.driver.JdbcDriver@883357]:jndi[null]:url[jdbc:mysql://192.168.0.114:3306/world]:user[root] ... OK
INFO [WatchDog] - -----------------------** Summary of Phases execution **---------------------
INFO [WatchDog] - Phase# Finished Status RunTime(sec) MemoryAllocation(KB)
INFO [WatchDog] - 0 FINISHED_OK 645 10961
INFO [WatchDog] - ------------------------------** End of Summary **---------------------------
INFO [WatchDog] - WatchDog thread finished - total execution time: 646 (sec)
INFO [main] - Freeing graph resources.
INFO [main] - Execution of graph successful !

In this Third Pentaho Test, with the new architecture, we again noticed a big change in Performance. This change appears when making minor modifications in design, which are possible due to the availability of resources. In first place, the File Input Stage was replaced with the CSV to achieve parallelism from Reading. Several runs were made, modifying in each the Number of copies per operator. As a result, the optimal design is shown in the picture below, parallelizing the flow from the input through the active operators, and collecting at the end in a flat file (since the number of rows to write is low).

Among the changes implemented, we can also mention the dump FData + N | O BS, the records store of the Lookup into the Memory, etc. (The numbers showed in the image will be detailed in the log run of the graphics below)

Times:

Architecture:

- Environment: Infraestructure composed of 3 nodes

- 1) ESXi 5.0:\

1.a) Physical Datastore 1: VM ETL Clover (12GB RAM - 2 Cores * 2 Sockets)

1.b) Physical Datastore 2: VM Database Server MySQL/Oracle (4GB RAM - 2 Cores * 2 Sockets)

- 2) Monitor Performance: VM Monitor ESXi + SQL Server 2008 (with 4 GB RAM)

- 3) Operator ETL: ESXi Client (with 3 GB RAM)

CASE 1: CSV + Lazy Conv + X4 Cop + FData dump + N|O BS 1.500.000 + CACHE MEMORY Lookup

Objective:

- To measure elapsed time reading and writing 6 million rows, from Flat file, to .CSV file.

- Compare performances in the 2 environments.

- Analyze use of the resources

ETL Tool Pentaho (Spoon) 4.1
Rows: 6.024.000 M
Columns: 37 Columns

Structure:

(Metadata)

Design & Run

LOG

Log:

Elapsed time (s) 16 Secs.
Rows p/s (avg)

VERSION 2: Performance improvement.

How to Improve

Performance

- Adjust the parameters:

- Use CSV -

- Use Lazy Conversion

- Use Fast Data Dump

- Set N|O BS to 1.5M

- Set 4X (Copies)

- Use Cache memory on Lookup

USE OF RESOURCES: VM PENTAHO

TOTAL

Important: Memory Swap: 0

CPU/Datastore: CPU Usage Mhz / Datastore usage between 22:18:50-22:19:10

Menmory: After several executions, the memory consumption remains stable in 2,5 GB

CPU:

CPU Monitoring, "Passive and Active state" in different executions. Last Execution: 22:18:50-22:19:10

Memory:

Memory Monitoring: Last Execution: 22:18:50-22:19:10

Network

DataST