(Pentaho) Stress Test 3: Lookups & Filters


Another analysis to do is measure the time taken for the transformation of "lookup" and "filter" with the same file that the previous tests.


In this Test, there are 3 cases, changing the "number of copies" of each step. Parallelization aims to harness available resources to the fullest, instantiating "X" process to complete the same task. The negative side is the penalty for this instantiation, consuming more resources. In an ideal world with unlimited resources, this technique favors the execution exponentially. Here I look for to measure the several performances and techniques, based on minimum resources

Here, in this dark world, The first case, where there isn't copies, the elapsed time is the smallest.

Links

Case 1: [523 Secs.]

Case 3: [575 Secs.]

CASE 1: WITHOUT COPIES (1X)

Objective:

To measure elapsed time reading 6 million rows, from Flat file, join the main flow with a lookup table (MySql) and take attributes. Filter the flow and write a txt file.

"In each of the tests I point that the target is to compare all the tools, today, trying to apply the best techniques (with more experience in certain tools than others), and seeking to quantify the performance. However, this is not the last word, from comments and looking for better ways of doing the same work, each of the tests will be reanalyzed. This is why we will welcome criticism, comment, and improvements." E.G.

Rows: 6.024.000 M
Columns: 37 Columns
Resources:

- Virtual machine with: 2 GB RAM, Pentaho like main process over the virtual plataform. The resources used are anecdotal, today, Any production environment has enough processing power for current and future requirements. The objective here, is to build, to execute, and to measure with the same environment (regardless of the limited resources)

- 1 Processor

- Database out of the Virtual machine.

Structure:

(Metadata)

Improve Performance

  • Several Copies of a Step
  • Use CSV stage for optimal performance
  • Lazy Conversion
  • Use Cache memory for the 6 records of the Database
Design & Run

Elapsed time (s) 523 sec.
Rows per sec (avg)

Here, show the evolution of the passage of rows via the transformations

OPERATOR: OUTPUT_TEXT_FILE - Write

OPERATOR: JOIN LK - Write

OPERATOR: FILTER - Write

Summary of log

2012/03/02 15:57:49 - Transformation metadata - The shared object fie [null] is empty!
2012/03/02 15:57:49 - Spoon - Transformation opened.
2012/03/02 15:57:49 - Spoon - Launching transformation [Stress3_Lookups_Filters]...
2012/03/02 15:57:49 - Spoon - Started the transformation execution.
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Dispatching started for transformation [Stress3_Lookups_Filters]
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Nr of arguments detected:0
2012/03/02 15:57:49 - Stress3_Lookups_Filters - This is not a replay transformation
2012/03/02 15:57:49 - Transformation metadata - Natural sort of steps executed in 1ms (5 time previous steps calculated)
2012/03/02 15:57:49 - Stress3_Lookups_Filters - I found 5 different steps to launch.
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Allocating rowsets...
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Allocating rowsets for step 0 --> Filter_3_Brands
2012/03/02 15:57:49 - Stress3_Lookups_Filters - prevcopies = 1, nextcopies=1
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Transformation allocated new rowset [Filter_3_Brands.0 - TXTOutput_3Brands.0]
2012/03/02 15:57:49 - Stress3_Lookups_Filters - prevcopies = 1, nextcopies=1
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Transformation allocated new rowset [Filter_3_Brands.0 - Dummy.0]
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Allocated 2 rowsets for step 0 --> Filter_3_Brands
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Allocating rowsets for step 1 --> TXTOutput_3Brands
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Allocated 2 rowsets for step 1 --> TXTOutput_3Brands
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Allocating rowsets for step 2 --> Dummy
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Allocated 2 rowsets for step 2 --> Dummy
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Allocating rowsets for step 3 --> DwContent_Input
2012/03/02 15:57:49 - Stress3_Lookups_Filters - prevcopies = 1, nextcopies=1
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Transformation allocated new rowset [DwContent_Input.0 - LK_Manufacture.0]
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Allocated 3 rowsets for step 3 --> DwContent_Input
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Allocating rowsets for step 4 --> LK_Manufacture
2012/03/02 15:57:49 - Stress3_Lookups_Filters - prevcopies = 1, nextcopies=1
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Transformation allocated new rowset [LK_Manufacture.0 - Filter_3_Brands.0]
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Allocated 4 rowsets for step 4 --> LK_Manufacture
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Allocating Steps & StepData...
2012/03/02 15:57:49 - Stress3_Lookups_Filters - Transformation is about to allocate step [Filter_3_Brands] of type [FilterRows]
2012/03/02 15:57:49 - Filter_3_Brands.0 - distribution activated
2012/03/02 15:57:49 - Filter_3_Brands.0 - Starting allocation of buffers & new threads...
2012/03/02 15:57:49 - Filter_3_Brands.0 - Step info: nrinput=1 nroutput=2
2012/03/02 15:57:49 - Filter_3_Brands.0 - Got previous step from [Filter_3_Brands] #0 --> LK_Manufacture
2012/03/02 15:57:49 - Filter_3_Brands.0 - input rel is 1:1

..........

2012/03/02 16:05:57 - DwContent_Input.0 - linenr 5600000
2012/03/02 16:05:57 - LK_Manufacture.0 - linenr 5600000
2012/03/02 16:05:57 - Filter_3_Brands.0 - linenr 5600000
2012/03/02 16:06:00 - Dummy.0 - Linenr 5600000
2012/03/02 16:06:01 - DwContent_Input.0 - linenr 5650000
2012/03/02 16:06:01 - LK_Manufacture.0 - linenr 5650000
2012/03/02 16:06:01 - Filter_3_Brands.0 - linenr 5650000
2012/03/02 16:06:04 - Dummy.0 - Linenr 5650000
2012/03/02 16:06:05 - DwContent_Input.0 - linenr 5700000
2012/03/02 16:06:05 - LK_Manufacture.0 - linenr 5700000
2012/03/02 16:06:05 - Filter_3_Brands.0 - linenr 5700000
2012/03/02 16:06:08 - Dummy.0 - Linenr 5700000
2012/03/02 16:06:09 - DwContent_Input.0 - linenr 5750000
2012/03/02 16:06:09 - LK_Manufacture.0 - linenr 5750000
2012/03/02 16:06:09 - Filter_3_Brands.0 - linenr 5750000
2012/03/02 16:06:12 - Dummy.0 - Linenr 5750000
2012/03/02 16:06:13 - DwContent_Input.0 - linenr 5800000
2012/03/02 16:06:13 - LK_Manufacture.0 - linenr 5800000
2012/03/02 16:06:13 - Filter_3_Brands.0 - linenr 5800000
2012/03/02 16:06:16 - Dummy.0 - Linenr 5800000
2012/03/02 16:06:17 - DwContent_Input.0 - linenr 5850000
2012/03/02 16:06:17 - LK_Manufacture.0 - linenr 5850000
2012/03/02 16:06:17 - Filter_3_Brands.0 - linenr 5850000
2012/03/02 16:06:21 - Dummy.0 - Linenr 5850000
2012/03/02 16:06:21 - DwContent_Input.0 - linenr 5900000
2012/03/02 16:06:22 - LK_Manufacture.0 - linenr 5900000
2012/03/02 16:06:22 - Filter_3_Brands.0 - linenr 5900000
2012/03/02 16:06:25 - Dummy.0 - Linenr 5900000
2012/03/02 16:06:26 - DwContent_Input.0 - linenr 5950000
2012/03/02 16:06:26 - LK_Manufacture.0 - linenr 5950000
2012/03/02 16:06:26 - Filter_3_Brands.0 - linenr 5950000
2012/03/02 16:06:29 - Dummy.0 - Linenr 5950000
2012/03/02 16:06:30 - DwContent_Input.0 - linenr 6000000
2012/03/02 16:06:30 - LK_Manufacture.0 - linenr 6000000
2012/03/02 16:06:30 - Filter_3_Brands.0 - linenr 6000000
2012/03/02 16:06:32 - DwContent_Input.0 - Finished processing (I=6029427, O=0, R=0, W=6029426, U=1, E=0)
2012/03/02 16:06:32 - Filter_3_Brands.0 - Finished processing (I=0, O=0, R=6029426, W=6029426, U=0, E=0)
2012/03/02 16:06:32 - TXTOutput_3Brands.0 - Finished processing (I=0, O=39252, R=39251, W=39251, U=0, E=0)
2012/03/02 16:06:32 - MySQLExternal - Connection to database closed!
2012/03/02 16:06:32 - LK_Manufacture.0 - Finished processing (I=6, O=0, R=6029426, W=6029426, U=0, E=0)
2012/03/02 16:06:32 - Dummy.0 - Finished processing (I=0, O=0, R=5990175, W=5990175, U=0, E=0)
2012/03/02 16:06:32 - Spoon - The transformation has finished!!


CASE 3: LAUNCHING SEVERAL COPIES (4X)

Objective:

To measure elapsed time reading 6 million rows, from Flat file, join the main flow with a lookup table (MySql) and take attributes. Filter the flow and write a txt file.

For performance reasons, (such as reducing latency), launch a database lookup step three times or more. Launching the same step several times keeps the database busy on different connections, effectively lowering the latency.

Rows: 6.024.000 M
Columns: 37 Columns
Resources: Same Environment.

Structure:

(Metadata)

Design & Run

Elapsed time (s) 575 sec.
Rows per sec (avg)

Here, show the evolution of the passage of rows via the transformations

OPERATOR: OUTPUT_TEXT_FILE - Write

OPERATOR: FILTER - Write

OPERATOR: Input - Write

Summary of log

2012/03/02 16:40:42 - Transformation metadata - The shared object fie [null] is empty!
2012/03/02 16:40:42 - Spoon - Transformation opened.
2012/03/02 16:40:42 - Spoon - Launching transformation [Stress3_Lookups_Filters]...
2012/03/02 16:40:42 - Spoon - Started the transformation execution.
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Dispatching started for transformation [Stress3_Lookups_Filters]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Nr of arguments detected:0
2012/03/02 16:40:42 - Stress3_Lookups_Filters - This is not a replay transformation
2012/03/02 16:40:42 - Transformation metadata - Natural sort of steps executed in 0ms (5 time previous steps calculated)
2012/03/02 16:40:42 - Stress3_Lookups_Filters - I found 5 different steps to launch.
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Allocating rowsets...
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Allocating rowsets for step 0 --> Filter_3_Brands
2012/03/02 16:40:42 - Stress3_Lookups_Filters - prevcopies = 4, nextcopies=1
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [Filter_3_Brands.0 - TXTOutput_3Brands.0]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [Filter_3_Brands.1 - TXTOutput_3Brands.0]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [Filter_3_Brands.2 - TXTOutput_3Brands.0]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [Filter_3_Brands.3 - TXTOutput_3Brands.0]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - prevcopies = 4, nextcopies=1
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [Filter_3_Brands.0 - Dummy.0]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [Filter_3_Brands.1 - Dummy.0]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [Filter_3_Brands.2 - Dummy.0]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [Filter_3_Brands.3 - Dummy.0]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Allocated 8 rowsets for step 0 --> Filter_3_Brands
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Allocating rowsets for step 1 --> TXTOutput_3Brands
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Allocated 8 rowsets for step 1 --> TXTOutput_3Brands
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Allocating rowsets for step 2 --> Dummy
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Allocated 8 rowsets for step 2 --> Dummy
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Allocating rowsets for step 3 --> DwContent_Input
2012/03/02 16:40:42 - Stress3_Lookups_Filters - prevcopies = 1, nextcopies=4
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [DwContent_Input.0 - LK_Manufacture.0]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [DwContent_Input.0 - LK_Manufacture.1]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [DwContent_Input.0 - LK_Manufacture.2]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [DwContent_Input.0 - LK_Manufacture.3]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Allocated 12 rowsets for step 3 --> DwContent_Input
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Allocating rowsets for step 4 --> LK_Manufacture
2012/03/02 16:40:42 - Stress3_Lookups_Filters - prevcopies = 4, nextcopies=4
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [LK_Manufacture.0 - Filter_3_Brands.0]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [LK_Manufacture.1 - Filter_3_Brands.1]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [LK_Manufacture.2 - Filter_3_Brands.2]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation allocated new rowset [LK_Manufacture.3 - Filter_3_Brands.3]
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Allocated 16 rowsets for step 4 --> LK_Manufacture
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Allocating Steps & StepData...
2012/03/02 16:40:42 - Stress3_Lookups_Filters - Transformation is about to allocate step [Filter_3_Brands] of type [FilterRows]
2012/03/02 16:40:42 - Filter_3_Brands.0 - distribution activated
2012/03/02 16:40:42 - Filter_3_Brands.0 - Starting allocation of buffers & new threads...
2012/03/02 16:40:42 - Filter_3_Brands.0 - Step info: nrinput=1 nroutput=2
2012/03/02 16:40:42 - Filter_3_Brands.0 - Got previous step from [Filter_3_Brands] #0 --> LK_Manufacture
..........

2012/03/02 16:49:14 - Dummy.0 - Linenr 5350000
2012/03/02 16:49:14 - LK_Manufacture.1 - linenr 1350000
2012/03/02 16:49:14 - LK_Manufacture.3 - linenr 1350000
2012/03/02 16:49:14 - LK_Manufacture.0 - linenr 1350000
2012/03/02 16:49:14 - LK_Manufacture.2 - linenr 1350000
2012/03/02 16:49:14 - Filter_3_Brands.1 - linenr 1350000
2012/03/02 16:49:14 - Filter_3_Brands.3 - linenr 1350000
2012/03/02 16:49:14 - Filter_3_Brands.0 - linenr 1350000
2012/03/02 16:49:14 - Filter_3_Brands.2 - linenr 1350000
2012/03/02 16:49:18 - DwContent_Input.0 - linenr 5450000
2012/03/02 16:49:18 - Dummy.0 - Linenr 5400000
2012/03/02 16:49:23 - DwContent_Input.0 - linenr 5500000
2012/03/02 16:49:23 - Dummy.0 - Linenr 5450000
2012/03/02 16:49:28 - DwContent_Input.0 - linenr 5550000
2012/03/02 16:49:28 - Dummy.0 - Linenr 5500000
2012/03/02 16:49:34 - DwContent_Input.0 - linenr 5600000
2012/03/02 16:49:34 - Dummy.0 - Linenr 5550000
2012/03/02 16:49:34 - LK_Manufacture.0 - linenr 1400000
2012/03/02 16:49:34 - LK_Manufacture.1 - linenr 1400000
2012/03/02 16:49:34 - LK_Manufacture.3 - linenr 1400000
2012/03/02 16:49:34 - LK_Manufacture.2 - linenr 1400000
2012/03/02 16:49:35 - Filter_3_Brands.2 - linenr 1400000
2012/03/02 16:49:35 - Filter_3_Brands.0 - linenr 1400000
2012/03/02 16:49:35 - Filter_3_Brands.3 - linenr 1400000
2012/03/02 16:49:35 - Filter_3_Brands.1 - linenr 1400000
2012/03/02 16:49:39 - DwContent_Input.0 - linenr 5650000
2012/03/02 16:49:39 - Dummy.0 - Linenr 5600000
2012/03/02 16:49:44 - DwContent_Input.0 - linenr 5700000
2012/03/02 16:49:44 - Dummy.0 - Linenr 5650000
2012/03/02 16:49:49 - DwContent_Input.0 - linenr 5750000
2012/03/02 16:49:49 - Dummy.0 - Linenr 5700000
2012/03/02 16:49:54 - DwContent_Input.0 - linenr 5800000
2012/03/02 16:49:54 - Dummy.0 - Linenr 5750000
2012/03/02 16:49:55 - LK_Manufacture.0 - linenr 1450000
2012/03/02 16:49:55 - LK_Manufacture.1 - linenr 1450000
2012/03/02 16:49:55 - LK_Manufacture.2 - linenr 1450000
2012/03/02 16:49:55 - LK_Manufacture.3 - linenr 1450000
2012/03/02 16:49:55 - Filter_3_Brands.3 - linenr 1450000
2012/03/02 16:49:55 - Filter_3_Brands.1 - linenr 1450000
2012/03/02 16:49:55 - Filter_3_Brands.0 - linenr 1450000
2012/03/02 16:49:55 - Filter_3_Brands.2 - linenr 1450000
2012/03/02 16:50:00 - DwContent_Input.0 - linenr 5850000
2012/03/02 16:50:00 - Dummy.0 - Linenr 5800000
2012/03/02 16:50:04 - DwContent_Input.0 - linenr 5900000
2012/03/02 16:50:04 - Dummy.0 - Linenr 5850000
2012/03/02 16:50:09 - DwContent_Input.0 - linenr 5950000
2012/03/02 16:50:09 - Dummy.0 - Linenr 5900000
2012/03/02 16:50:13 - DwContent_Input.0 - linenr 6000000
2012/03/02 16:50:14 - Dummy.0 - Linenr 5950000
2012/03/02 16:50:14 - LK_Manufacture.1 - linenr 1500000
2012/03/02 16:50:14 - LK_Manufacture.3 - linenr 1500000
2012/03/02 16:50:14 - LK_Manufacture.2 - linenr 1500000
2012/03/02 16:50:14 - LK_Manufacture.0 - linenr 1500000
2012/03/02 16:50:14 - Filter_3_Brands.3 - linenr 1500000
2012/03/02 16:50:14 - Filter_3_Brands.0 - linenr 1500000
2012/03/02 16:50:14 - Filter_3_Brands.1 - linenr 1500000
2012/03/02 16:50:14 - Filter_3_Brands.2 - linenr 1500000
2012/03/02 16:50:16 - DwContent_Input.0 - Finished processing (I=6029427, O=0, R=0, W=6029426, U=1, E=0)
2012/03/02 16:50:16 - MySQLExternal - Connection to database closed!
2012/03/02 16:50:16 - MySQLExternal - Connection to database closed!
2012/03/02 16:50:16 - Filter_3_Brands.3 - Finished processing (I=0, O=0, R=1507356, W=1507356, U=0, E=0)
2012/03/02 16:50:16 - LK_Manufacture.0 - Finished processing (I=6, O=0, R=1507357, W=1507357, U=0, E=0)
2012/03/02 16:50:16 - LK_Manufacture.2 - Finished processing (I=6, O=0, R=1507356, W=1507356, U=0, E=0)
2012/03/02 16:50:16 - MySQLExternal - Connection to database closed!
2012/03/02 16:50:17 - LK_Manufacture.1 - Finished processing (I=6, O=0, R=1507357, W=1507357, U=0, E=0)
2012/03/02 16:50:17 - Filter_3_Brands.2 - Finished processing (I=0, O=0, R=1507356, W=1507356, U=0, E=0)
2012/03/02 16:50:17 - Filter_3_Brands.0 - Finished processing (I=0, O=0, R=1507357, W=1507357, U=0, E=0)
2012/03/02 16:50:17 - Filter_3_Brands.1 - Finished processing (I=0, O=0, R=1507357, W=1507357, U=0, E=0)
2012/03/02 16:50:17 - MySQLExternal - Connection to database closed!
2012/03/02 16:50:17 - LK_Manufacture.3 - Finished processing (I=6, O=0, R=1507356, W=1507356, U=0, E=0)
2012/03/02 16:50:17 - TXTOutput_3Brands.0 - Finished processing (I=0, O=39252, R=39251, W=39251, U=0, E=0)
2012/03/02 16:50:17 - Dummy.0 - Finished processing (I=0, O=0, R=5990175, W=5990175, U=0, E=0)
2012/03/02 16:50:17 - Spoon - The transformation has finished!!