Another analysis to do is measure the time taken for the transformation of "lookup" and "filter" with the same file that the previous tests. |
|
As in most of the Test, there are several cases using differents techniques, setting of variables, with the same objective, satisfy the requirement and improve the performance. Always the first case is the best, and the rest of cases, are jobs without tuning, or an incomplet set of techniques (but helped on the way).
Here we noticed a marked difference when adjusting the parameters and ran again. The rows/s grew fivefold.
LINKS
|
|
CASE 1: USE '-server' + JAVA MEMORY SIZE Mb (1500) -Xmx
Objective: | To measure elapsed time reading 6 million rows, from Flat file, join the main flow with a lookup table (MySql) and take attributes. Filter the flow and write a txt file. |
Rows: | 6.024.000 M |
Columns: | 37 Columns |
Resources: | Virtual machine with: 2 GB RAM, Clover like main process over the virtual plataform. The resources used are anecdotal, today, Any production environment has enough processing power for current and future requirements. The objective here, is to build, to execute and to measure with the same environment (regardless of the limited resources) |
Structure: | ![]() |
Design & Run | ![]() |
Elapsed time (s) | 183 sec. |
Rows per sec (avg) |
Here, show the evolution of the passage of rows via the transformations, each 5 seconds. TRANSFORMATION: HASH_JOIN (ExthashJoin) SERIES 1: ROWS IN = 33.413 R/s SERIES 2: ROWS OUT TRANSFORMATION: FILTER (Filter_MAN) SERIES 1: ROWS IN = 33.395 R/s SERIES 2: ROWS OUT |
% cpu |
OPERATOR FILTER: 0,05 % a 0,09% OPERATOR JOIN: 0,29 % a 0,41%
|
Summary of log | INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **------------------- INFO [WatchDog] - Time: 08/03/12 15:13:55 INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s INFO [WatchDog] - --------------------------------------------------------------------------------- INFO [WatchDog] - DBInputTable DB_INPUT_TABLE0 FINISHED_OK INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0 INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING INFO [WatchDog] - %cpu:0.41 In:0 5306825 1345247 30386 7778 INFO [WatchDog] - In:1 6 0 0 0 INFO [WatchDog] - Out:0 5305742 1377506 30412 8021 INFO [WatchDog] - ExtFilter EXT_FILTER0 RUNNING INFO [WatchDog] - %cpu:0.09 In:0 5305742 1377506 30412 8021 INFO [WatchDog] - Out:0 39251 11247 646 192 INFO [WatchDog] - UniversalDataWriter DATA_WRITER0 RUNNING INFO [WatchDog] - %cpu:.. In:0 39251 11247 646 192 INFO [WatchDog] - UniversalDataReader DATA_READER0 RUNNING INFO [WatchDog] - %cpu:0.45 Out:0 5306825 1345247 30386 7778 INFO [WatchDog] - ---------------------------------** End of Log **-------------------------------- INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **------------------- INFO [WatchDog] - Time: 08/03/12 15:14:00 INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s INFO [WatchDog] - --------------------------------------------------------------------------------- INFO [WatchDog] - DBInputTable DB_INPUT_TABLE0 FINISHED_OK INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0 INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING INFO [WatchDog] - %cpu:0.41 In:0 5480126 1388644 34288 8586 INFO [WatchDog] - In:1 6 0 0 0 INFO [WatchDog] - Out:0 5478734 1421670 34227 8738 INFO [WatchDog] - ExtFilter EXT_FILTER0 RUNNING INFO [WatchDog] - %cpu:0.09 In:0 5478734 1421670 34227 8738 INFO [WatchDog] - Out:0 39251 11247 0 0 INFO [WatchDog] - UniversalDataWriter DATA_WRITER0 RUNNING INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0 INFO [WatchDog] - UniversalDataReader DATA_READER0 RUNNING INFO [WatchDog] - %cpu:0.45 Out:0 5480126 1388644 34288 8586 INFO [WatchDog] - ---------------------------------** End of Log **-------------------------------- INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **------------------- INFO [WatchDog] - Time: 08/03/12 15:14:05 INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s INFO [WatchDog] - --------------------------------------------------------------------------------- INFO [WatchDog] - DBInputTable DB_INPUT_TABLE0 FINISHED_OK INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0 INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING INFO [WatchDog] - %cpu:0.41 In:0 5656612 1432821 34996 8760 INFO [WatchDog] - In:1 6 0 0 0 INFO [WatchDog] - Out:0 5655381 1466749 35027 8938 INFO [WatchDog] - ExtFilter EXT_FILTER0 RUNNING INFO [WatchDog] - %cpu:0.09 In:0 5655381 1466749 35027 8938 INFO [WatchDog] - Out:0 39251 11247 0 0 INFO [WatchDog] - UniversalDataWriter DATA_WRITER0 RUNNING INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0 INFO [WatchDog] - UniversalDataReader DATA_READER0 RUNNING INFO [WatchDog] - %cpu:0.45 Out:0 5656612 1432821 34996 8760 INFO [WatchDog] - ---------------------------------** End of Log **-------------------------------- INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **------------------- INFO [WatchDog] - Time: 08/03/12 15:14:10 INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s INFO [WatchDog] - --------------------------------------------------------------------------------- INFO [WatchDog] - DBInputTable DB_INPUT_TABLE0 FINISHED_OK INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0 INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING INFO [WatchDog] - %cpu:0.41 In:0 5831072 1476490 34582 8656 INFO [WatchDog] - In:1 6 0 0 0 INFO [WatchDog] - Out:0 5830864 1511532 34785 8877 INFO [WatchDog] - ExtFilter EXT_FILTER0 RUNNING INFO [WatchDog] - %cpu:0.09 In:0 5830864 1511532 34785 8877 INFO [WatchDog] - Out:0 39251 11247 0 0 INFO [WatchDog] - UniversalDataWriter DATA_WRITER0 RUNNING INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0 INFO [WatchDog] - UniversalDataReader DATA_READER0 RUNNING INFO [WatchDog] - %cpu:0.45 Out:0 5831072 1476490 34582 8656 INFO [WatchDog] - ---------------------------------** End of Log **-------------------------------- INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **------------------- INFO [WatchDog] - Time: 08/03/12 15:14:15 INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s INFO [WatchDog] - --------------------------------------------------------------------------------- INFO [WatchDog] - DBInputTable DB_INPUT_TABLE0 FINISHED_OK INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0 INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING INFO [WatchDog] - %cpu:0.41 In:0 6004860 1520000 34560 8652 INFO [WatchDog] - In:1 6 0 0 0 INFO [WatchDog] - Out:0 6003459 1555587 34323 8761 INFO [WatchDog] - ExtFilter EXT_FILTER0 RUNNING INFO [WatchDog] - %cpu:0.09 In:0 6003459 1555587 34323 8761 INFO [WatchDog] - Out:0 39251 11247 0 0 INFO [WatchDog] - UniversalDataWriter DATA_WRITER0 RUNNING INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0 INFO [WatchDog] - UniversalDataReader DATA_READER0 RUNNING INFO [WatchDog] - %cpu:0.46 Out:0 6004860 1520000 34559 8652 INFO [WatchDog] - ---------------------------------** End of Log **-------------------------------- INFO [WatchDog] - [Clover] Post-execute phase finalization: 0 INFO [WatchDog] - [Clover] phase: 0 post-execute finalization successfully. INFO [WatchDog] - ----------------------** Final tracking Log for phase [0] **--------------------- INFO [WatchDog] - Time: 08/03/12 15:14:16 INFO [WatchDog] - Node ID Port #Records #KB aRec/s aKB/s INFO [WatchDog] - --------------------------------------------------------------------------------- INFO [WatchDog] - DBInputTable DB_INPUT_TABLE0 FINISHED_OK INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0 INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 FINISHED_OK INFO [WatchDog] - %cpu:0.41 In:0 6029426 1526144 32947 8339 INFO [WatchDog] - In:1 6 0 0 0 INFO [WatchDog] - Out:0 6029426 1562207 32947 8536 INFO [WatchDog] - ExtFilter EXT_FILTER0 FINISHED_OK INFO [WatchDog] - %cpu:0.09 In:0 6029426 1562207 32947 8536 INFO [WatchDog] - Out:0 39251 11247 214 61 INFO [WatchDog] - UniversalDataWriter DATA_WRITER0 FINISHED_OK INFO [WatchDog] - %cpu:0.03 In:0 39251 11247 214 61 INFO [WatchDog] - UniversalDataReader DATA_READER0 FINISHED_OK INFO [WatchDog] - %cpu:0.46 Out:0 6029426 1526144 32947 8339 INFO [WatchDog] - ---------------------------------** End of Log **-------------------------------- INFO [WatchDog] - Execution of phase [0] successfully finished - elapsed time(sec): 183 INFO [WatchDog] - Post-execute finalization of connection: INFO [WatchDog] - DBConnection driver[org.jetel.connection.jdbc.driver.JdbcDriver@178655]:jndi[null]:url[jdbc:mysql://192.168.0.114:3306/world]:user[root] ... OK INFO [WatchDog] - -----------------------** Summary of Phases execution **--------------------- INFO [WatchDog] - Phase# Finished Status RunTime(sec) MemoryAllocation(KB) INFO [WatchDog] - 0 FINISHED_OK 183 13349 INFO [WatchDog] - ------------------------------** End of Summary **--------------------------- INFO [WatchDog] - WatchDog thread finished - total execution time: 183 (sec) INFO [main] - Freeing graph resources. INFO [main] - Execution of graph successful ! |
CASE 2: Without '-server' & Xmx=0
Objective: | To measure elapsed time reading 6 million rows, from Flat file, join the main flow with a lookup table (MySql) and take attributes. Filter the flow and write a txt file. |
Rows: | 6.024.000 M |
Columns: | 37 Columns |
Resources: | Virtual machine with: 2 GB RAM, Clover like main process over the virtual plataform. The resources used are anecdotal, today, Any production environment has enough processing power for current and future requirements. The objective here, is to build, to execute and to measure with the same environment (regardless of the limited resources) |
Structure: | ![]() |
Design & Run | ![]() |
Elapsed time (s) | 643 sec. |
Rows per sec (avg) |
Here, show the evolution of the passage of rows via the transformations, each 5 seconds. TRANSFORMATION: HASH_JOIN (ExthashJoin) SERIES 1: ROWS IN = 7.251 R/s SERIES 2: ROWS OUT TRANSFORMATION: FILTER (Filter_MAN) SERIES 1: ROWS IN = 7.384 R/s SERIES 2: ROWS OUT |
% cpu |
OPERATOR FILTER: 0,08 % a 0,10% OPERATOR JOIN: 0,33 % a 0,41% OPERATOR WRITER: 0,35 % a 0,41%
|
Summary of log | INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **------------------- INFO [WatchDog] - Time: 24/02/12 17:12:10 INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s INFO [WatchDog] - --------------------------------------------------------------------------------- INFO [WatchDog] - LK Manufacture LK_MANUFACTURE FINISHED_OK INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0 INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING INFO [WatchDog] - %cpu:0.33 In:0 5981988 1514280 2460 616 INFO [WatchDog] - In:1 6 0 0 0 INFO [WatchDog] - Out:0 5980309 1549682 2366 603 INFO [WatchDog] - Filter_MAN FILTER_3Brands RUNNING INFO [WatchDog] - %cpu:0.08 In:0 5980309 1549682 2366 603 INFO [WatchDog] - Out:0 39251 11247 0 0 INFO [WatchDog] - Manufacture_3Brands MANUFACTURE_3BRANDS RUNNING INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0 INFO [WatchDog] - DwContentTXT DW_CONTENT_TXT RUNNING INFO [WatchDog] - %cpu:0.35 Out:0 5981988 1514280 2460 616 INFO [WatchDog] - ---------------------------------** End of Log **-------------------------------- INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **------------------- INFO [WatchDog] - Time: 24/02/12 17:12:16 INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s INFO [WatchDog] - --------------------------------------------------------------------------------- INFO [WatchDog] - LK Manufacture LK_MANUFACTURE FINISHED_OK INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0 INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING INFO [WatchDog] - %cpu:0.33 In:0 5995706 1517712 2538 635 INFO [WatchDog] - In:1 6 0 0 0 INFO [WatchDog] - Out:0 5995335 1553516 2780 709 INFO [WatchDog] - Filter_MAN FILTER_3Brands RUNNING INFO [WatchDog] - %cpu:0.08 In:0 5995335 1553516 2780 709 INFO [WatchDog] - Out:0 39251 11247 0 0 INFO [WatchDog] - Manufacture_3Brands MANUFACTURE_3BRANDS RUNNING INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0 INFO [WatchDog] - DwContentTXT DW_CONTENT_TXT RUNNING INFO [WatchDog] - %cpu:0.35 Out:0 5995706 1517712 2538 635 INFO [WatchDog] - ---------------------------------** End of Log **-------------------------------- INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **------------------- INFO [WatchDog] - Time: 24/02/12 17:12:21 INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s INFO [WatchDog] - --------------------------------------------------------------------------------- INFO [WatchDog] - LK Manufacture LK_MANUFACTURE FINISHED_OK INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0 INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING INFO [WatchDog] - %cpu:0.33 In:0 6011583 1521681 3173 793 INFO [WatchDog] - In:1 6 0 0 0 INFO [WatchDog] - Out:0 6010673 1557425 3066 781 INFO [WatchDog] - Filter_MAN FILTER_3Brands RUNNING INFO [WatchDog] - %cpu:0.08 In:0 6010673 1557425 3066 781 INFO [WatchDog] - Out:0 39251 11247 0 0 INFO [WatchDog] - Manufacture_3Brands MANUFACTURE_3BRANDS RUNNING INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0 INFO [WatchDog] - DwContentTXT DW_CONTENT_TXT RUNNING INFO [WatchDog] - %cpu:0.35 Out:0 6011583 1521681 3173 793 INFO [WatchDog] - ---------------------------------** End of Log **-------------------------------- INFO [WatchDog] - ---------------------** Start of tracking Log for phase [0] **------------------- INFO [WatchDog] - Time: 24/02/12 17:12:26 INFO [WatchDog] - Node ID Port #Records #KB Rec/s KB/s INFO [WatchDog] - --------------------------------------------------------------------------------- INFO [WatchDog] - LK Manufacture LK_MANUFACTURE FINISHED_OK INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0 INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 RUNNING INFO [WatchDog] - %cpu:0.33 In:0 6027138 1525572 2912 728 INFO [WatchDog] - In:1 6 0 0 0 INFO [WatchDog] - Out:0 6026195 1561383 2905 740 INFO [WatchDog] - Filter_MAN FILTER_3Brands RUNNING INFO [WatchDog] - %cpu:0.08 In:0 6026195 1561383 2905 740 INFO [WatchDog] - Out:0 39251 11247 0 0 INFO [WatchDog] - Manufacture_3Brands MANUFACTURE_3BRANDS RUNNING INFO [WatchDog] - %cpu:.. In:0 39251 11247 0 0 INFO [WatchDog] - DwContentTXT DW_CONTENT_TXT RUNNING INFO [WatchDog] - %cpu:0.35 Out:0 6027138 1525572 2912 728 INFO [WatchDog] - ---------------------------------** End of Log **-------------------------------- INFO [WatchDog] - [Clover] Post-execute phase finalization: 0 INFO [WatchDog] - [Clover] phase: 0 post-execute finalization successfully. INFO [WatchDog] - ----------------------** Final tracking Log for phase [0] **--------------------- INFO [WatchDog] - Time: 24/02/12 17:12:28 INFO [WatchDog] - Node ID Port #Records #KB aRec/s aKB/s INFO [WatchDog] - --------------------------------------------------------------------------------- INFO [WatchDog] - LK Manufacture LK_MANUFACTURE FINISHED_OK INFO [WatchDog] - %cpu:.. Out:0 6 0 0 0 INFO [WatchDog] - ExtHashJoin EXT_HASH_JOIN0 FINISHED_OK INFO [WatchDog] - %cpu:0.41 In:0 6029426 1526144 9347 2366 INFO [WatchDog] - In:1 6 0 0 0 INFO [WatchDog] - Out:0 6029426 1562207 9347 2422 INFO [WatchDog] - Filter_MAN FILTER_3Brands FINISHED_OK INFO [WatchDog] - %cpu:0.10 In:0 6029426 1562207 9347 2422 INFO [WatchDog] - Out:0 39251 11247 60 17 INFO [WatchDog] - Manufacture_3Brands MANUFACTURE_3BRANDS FINISHED_OK INFO [WatchDog] - %cpu:0.03 In:0 39251 11247 60 17 INFO [WatchDog] - DwContentTXT DW_CONTENT_TXT FINISHED_OK INFO [WatchDog] - %cpu:0.41 Out:0 6029426 1526144 9347 2366 INFO [WatchDog] - ---------------------------------** End of Log **-------------------------------- INFO [WatchDog] - Execution of phase [0] successfully finished - elapsed time(sec): 645 INFO [WatchDog] - Post-execute finalization of connection: INFO [WatchDog] - DBConnection driver[org.jetel.connection.jdbc.driver.JdbcDriver@883357]:jndi[null]:url[jdbc:mysql://192.168.0.114:3306/world]:user[root] ... OK INFO [WatchDog] - -----------------------** Summary of Phases execution **--------------------- INFO [WatchDog] - Phase# Finished Status RunTime(sec) MemoryAllocation(KB) INFO [WatchDog] - 0 FINISHED_OK 645 10961 INFO [WatchDog] - ------------------------------** End of Summary **--------------------------- INFO [WatchDog] - WatchDog thread finished - total execution time: 646 (sec) INFO [main] - Freeing graph resources. INFO [main] - Execution of graph successful ! |
In this Third Pentaho Test, with the new architecture, we again noticed a big change in Performance. This change appears when making minor modifications in design, which are possible due to the availability of resources. In first place, the File Input Stage was replaced with the CSV to achieve parallelism from Reading. Several runs were made, modifying in each the Number of copies per operator. As a result, the optimal design is shown in the picture below, parallelizing the flow from the input through the active operators, and collecting at the end in a flat file (since the number of rows to write is low).
Among the changes implemented, we can also mention the dump FData + N | O BS, the records store of the Lookup into the Memory, etc. (The numbers showed in the image will be detailed in the log run of the graphics below)
Times:
Architecture:
- Environment: Infraestructure composed of 3 nodes
- 1) ESXi 5.0:\
1.a) Physical Datastore 1: VM ETL Clover (12GB RAM - 2 Cores * 2 Sockets)
1.b) Physical Datastore 2: VM Database Server MySQL/Oracle (4GB RAM - 2 Cores * 2 Sockets)
- 2) Monitor Performance: VM Monitor ESXi + SQL Server 2008 (with 4 GB RAM)
- 3) Operator ETL: ESXi Client (with 3 GB RAM)
CASE 1: CSV + Lazy Conv + X4 Cop + FData dump + N|O BS 1.500.000 + CACHE MEMORY Lookup
Objective: |
- To measure elapsed time reading and writing 6 million rows, from Flat file, to .CSV file. - Compare performances in the 2 environments. - Analyze use of the resources |
ETL Tool | Pentaho (Spoon) 4.1 |
Rows: | 6.024.000 M |
Columns: | 37 Columns |
Structure: (Metadata) |
|
Design & Run |
|
LOG |
Log:
|
Elapsed time (s) | 16 Secs. |
Rows p/s (avg) |
VERSION 2: Performance improvement. |
How to Improve Performance |
- Adjust the parameters: - Use CSV - - Use Lazy Conversion - Use Fast Data Dump - Set N|O BS to 1.5M - Set 4X (Copies) - Use Cache memory on Lookup |
USE OF RESOURCES: VM PENTAHO
TOTAL |
Important: Memory Swap: 0 CPU/Datastore: CPU Usage Mhz / Datastore usage between 22:18:50-22:19:10 Menmory: After several executions, the memory consumption remains stable in 2,5 GB |
CPU: |
CPU Monitoring, "Passive and Active state" in different executions. Last Execution: 22:18:50-22:19:10 |
Memory: |
Memory Monitoring: Last Execution: 22:18:50-22:19:10 |
Network |
|
DataST |
|