Recently I had several questions concerning how to process files that carry a header and trailer in them. Typically those files are a product of data extract from non Microsoft products e.g. Oracle database encompassing various tables data where every row starts with an identifier. For example such a file data record could look like:
HDR,INTF_01,OUT,TEST,3/9/2011 11:23
B1,121156789,DATA TEST DATA,2011-03-09 10:00:00,Y,TEST 18 10:00:44,2011-07-18 10:00:44,Y
B2,TEST DATA,2011-03-18 10:00:44,Y
B3,LEG 1 TEST DATA,TRAN TEST,N
B4,LEG 2 TEST DATA,TRAN TEST,Y
FTR,4,TEST END,3/9/2011 11:27
A developer is normally able to break the records using a Conditional Split Transformation component by employing an expression similar to
Output1 -- SUBSTRING(Output1,1,2) == "B1"
and so on, but often a verification is required after this step to check if the number of data records read corresponds to the number specified in the trailer record of the file.
This portion sometimes stumbles some people so I decided to share what I came up with.
As an aside, I want to mention that the approach I use is slightly more portable than some others I saw because I use a separate DFT that can be copied and pasted into a new SSIS package designer surface or re-used within the same package again and it can survive several trailer/footer records (!).
See how a ready DFT can look:
The first step is to create a Flat File Connection Manager and make sure you get the row split into columns like this:
After you are done with the Flat File connection, move onto adding an aggregate which is in use to simply assign a value to a variable (here the aggregate is used to handle the possibility of multiple footers/headers):
The next step is adding a Script Transformation as destination that requires very little coding.
First, some variable setup:
and finally the code:
As you can see it is important to place your code into the appropriate routine in the script, otherwise the end result may not be as expected.
As the last step you would use the regular Script Component to compare the variable value obtained from the DFT above to a package variable value obtained say via a Row Count component to determine if the file being processed has the right number of rows.
No comments:
Post a Comment