How to load only the first and last row from a flat file to a target file using Rank transformation?

This is a common interview question which can really test the candidates’ knowledge of informatica transformations and logical thinking.

The caveat here is you have to do this without using any pre-session commands

So here we need to single out the first and last row from the source file. Lets take an example source file with 5 rows delimited by commas.

a,”Go To Market”
c,”Find a soap”
e,”Take it to cashier”
b,”Make Payment”
d,”Come home”

Now our task is to populate a target file with the first and last row, so that it looks like as below

a,”Go To Market”
d,”Come home”

Given this task we can break into three steps
1. Find the first row from the file
2. Find the last row from the file
3. Combine the two and load it to target table

Lets take each of the above individually

STEP1. Find the first row from the file
This can be easily done by assigning a unique number, starting from 1, to each incoming row.
Then filtering out the rest except for the row with number assigned as 1. This will be the first row

HOW TO GENERATE UNIQUE NUMBER FOR EACH ROW?
Two methods – either use sequence generator or use expression variables to generate sequence numbers. I would prefer expression transformation. Here’s how you do it in an expression

1. Use expression transformation and create an integer variable port named v_sequence and put the following expression in the expression editor:

IIF(v_sequence=0,1,v_sequence+1)

This clause takes advantage of the informatica property to assign default values of 0 to integer ports.

2. Create an output port o_sequence and assign v_sequence to it in the expression editor.

3. From this expression feed the filter_first, which will let only the row with o_sequence =1 pass through (i.e. first row) and the Rank transformation which will give us the last row.

In addition to above also create a constant output port o_constant and set its value to 1.

Connect the fields from the file and the o_sequence and o_constant fields to the rank transformation for the last row.

STEP2. Find the last row from the file
This can be easily done by using the Informatica Rank transformation. Use the unique number as assigned to the rows in step above as the port to rank on. The highest number will get the top most rank that is 1. So we can then have a filter which will let pass only the row with RANKINDEX =1.

HOW TO FIND THE LAST ROW USING RANK
1. Use the o_constant field to group by on(check Group by Property) and use the o_sequence field to rank on(check R property). Also in properties select TOP as ranking order.

2. From this rank transformation connect all file fields and RANKINDEX field to a filter_last transformation where put the filter condition as RANKINDEX=1 which will let only the last row pass through.

STEP3. Combine the two rows and load the target file
Use Union operator to combine the two flows for first and last row and load the target.

This is self explanatory. In union transformation create two input groups connect each to first and last flow respectively. Connect the output to the target.

This question tests the knowledge of following transformations
1. Expression
2. Rank
3. Filter
4. Union

This can also be done by using Aggregator transformations but I believe it is not an elegant solution as the one described above.

And that is it! You have the first and last row.

Another way using aggregator

How to load only the first and last row from a flat file to a target file using Rank transformation?

Join the Conversation

Cancel reply

Leave a comment

Share this:

Join the Conversation

Cancel reply

Leave a comment