(Informatica) Stress Test 1: Flat to Flat

First stress test: Transfering 6 millions of records from a Flat File, to CSV file.


Informatica, is an extremely powerful tool, I've been competing in many proofs of concept (POCs), and most often has wiped out everyone who might decide to enter in the battle. It is oriented to processes with unimaginable volumes of data.

It is characterized by a rapid solution development, large processing capabilities, and many more compliments that could get out of my mouth. I in my experience in BI world, I have always been in the other tools side.

However, in this case, we are putting a whale in a pool of children and we want to see it jump like a dolphin lol. The difference with other tools is that the repository of data that needs Informatica (in my case an instance of oracle 11) also consumes the few resources that were assigned.

The VM does not drag, but a request, will stuck the CPU. Still, the performance is not bad, I thought it would be worse. But I know that in an other ambient, the story will be different. Hope you have the patience to wait for the Lab (Objective 2).

Disk writes are very expensive in this small room, so try to keep the I / O to a minimum and not cross my head parallelize.

As in most of the Test, there are several cases using differents techniques, setting of variables, with the same objective, satisfy the requirement and improve the performance. Always the first case is the best, and the rest of cases, are jobs without tuning, or an incomplete set of techniques.

LINKS

CASE 1:[422 secs.]

CASE 2:[885 secs.]

CASE 1: -I/O: 200K, -BufferLn 2GB, Max % Use of Memory: 90%

Objective: To measure elapsed time reading and writing 6 million rows, from Flat file, to .CSV file, working on local disk.
Rows: 6.024.000 M
Columns: 37 Columns
Resources: Virtual machine with: 2 GB RAM, Informatica like main process over the virtual plataform. The resources used are anecdotal, today, Any production environment has enough processing power for current and future requirements. The objective here, is to build, to execute and to measure with the same environment (regardless of the limited resources)

Structure:

(Metadata)

Design & Run

Elapsed time (s) 422 Secs.
Rows per sec (avg) 14.300 rows/sec
How to Improve Perform
- Adjust the parameters: I/0, %Memory to Use, Buffer Size

CASE 2: -I/O: 10K, -BufferLn 1GB, Max % Use of Memory: 70%

Objective: To measure elapsed time reading and writing 6 million rows, from Flat file, to .CSV file, working on local disk.
Rows: 6.024.000 M
Columns: 37 Columns
Resources: Virtual machine with: 2 GB RAM, Informatica like main process over the virtual plataform. The resources used are anecdotal, today, Any production environment has enough processing power for current and future requirements. The objective here, is to build, to execute and to measure with the same environment (regardless of the limited resources)
Structure

Design & Run

Elapsed Time 885 Secs.
Rows per sec (avg) 7600 Rows/s.
How to improve Perform - Adjust the parameters: I/0, %Memory to Use, Buffer Size