(Pentaho) Stress Test 2: Flat to MySQL (RELOAD)

The Second stress test involves transfering 6 millions of records from a Flat File, to a table in MySQL. (Version 2).


Pentaho was the first tool chosen to impact on the database. In the previous versions, the time obtained was not good at all, (1 hour, 18 minutes). Playing with options for parallelism was not allowed in the first architecture due to the resource limitations. It was also necessary to keep the size of the commit size very low to avoid saturating the system.

Regarding the design of the solution, it only changes the number of copies of the Stage, the Buffer size and the commit size with respect to the previous version. And in this occasion, the time dropped to less than 15 minutes.

Another strong point of the architecture that benefits the test is that, physically, both environments are in different Datastores. In Version 1, although they were in different virtual environments, everything was running in the same physical disk.

Times:

Architecture:

- Environment: Infraestructure composed of 3 nodes

- 1) ESXi 5.0:

1.a) Physical Datastore 1: VM ETL Clover (12GB RAM - 2 Cores * 2 Sockets)

1.b) Physical Datastore 2: VM Database Server MySQL/Oracle (4GB RAM - 2 Cores * 2 Sockets)

- 2) Monitor Performance: VM Monitor ESXi + SQL Server 2008 (with 4 GB RAM)

- 3) Operator ETL: ESXi Client (with 3 GB RAM)

CASE 1: CSV + Lazy Conv + X4 Cop + FData dump + N|O BS 1.500.000

Objective:

- To measure elapsed time reading and writing 6 million rows, from Flat file, to .CSV file.

- Compare performances in the 2 environments.

- Analyze use of the resources

ETL Tool Pentaho (Spoon) 4.1
Rows: 6.024.000 M
Columns: 37 Columns

Structure:

(Metadata)

Design & Run

LOG

Log:

Elapsed time (s) 890 Secs.
Rows p/s (avg)

VERSION 2: Performance improvement.

How to Improve

Performance

- Adjust the parameters:

- Use CSV -

- Use Lazy Conversion

- Use Fast Data Dump

- Set N|O BS to 1.5M

- Set 4X (Copies)

USE OF RESOURCES: VM PENTAHO

TOTAL

Important: Memory Swap: 0

CPU/Datastore: CPU Usage Mhz / Datastore usage between 18:21-18:36

Menmory: After several executions, the memory consumption remains stable in 2,5 GB

CPU:

CPU Monitoring, "Passive and Active state" in different executions. Last Execution: 18:21-18:36

Memory:

Memory Monitoring: Last Execution: 18:21-18:36

Network

DataST