Wednesday, March 21, 2012

Help Needed: For Creating Synchronous Transform Component

Hi
I am currently trying to write a custom transform componet in c# that will take a row of data, perform a look-up via an external system,
then if there is a match then send the data from the extranel system down macth ouptut (which will have different columns to the input) and drop the data that
was read, else send the data down the unmacthed output which will be the same as the input.

So I would like to write a synchrons transform becuase I don't need read all the rows from the input buffer before I started processing, also I wish have millions of rows
load in memory.

Can this be done? also does any have explame code of how to do this? becuse I can't see how to send data down the match output buffer,
as this will have the lookup results data which will have diffent columns to the input data and how disgard the input data as well.


Thanks Steve

There are a whole host of samples now out, as well as those that shipped in the box, one of them will help I'm sure-

http://www.microsoft.com/downloads/results.aspx?pocId=&freetext=SQL%20Server%20SSIS%20Sample%20Component&DisplayLang=en

(Or call jamie and he'll tell you, even if he does ask me!)

|||

Just been talking with Steve about this offline.

Turns out that, based on his requirement, he needed an asynch component. However, the same number of rows will be coming in as are going out, its just that the "shape" of the output (i.e. the metadata) needs to change.

Steve doesn't need to cache all the data in memory which is usually what asynch components do. They don't have to though. ProcessInput() can push rows to the output as soon as they are encountered without storing internally - thus giving the "illusion" of it being a synch component.

-Jamie

|||

I would first like to clear up some confusion on asynchronous outputs (I have tried before but it persists :)). Having an async output does not mean that the component waits until all the data has been seen to output any data. All it means is that the data coming out of the component is a copy in a new buffer. The Union All transform has an async output but it clearly does not wait until it receives all the data from all its inputs before outputing any data. In fact, most of the stock components with async outputs start generating output data well before all the input data has been seen.

In specific anwer to your question you can do one of 2 things. The first is to add additional output columns to the match output and keep it in sync with the input. The 2nd is to have your match output be an async output. Normally the 1st is better if you can do it because you don't have a memory copy. However, since you stated you don't want to keep any data from the input data on a match then in your case making the output async is actually the better alternative because you will actually wind up saving memory by doing so (due to the internals of the buffering system that have to do with sync outputs with additional columns causing buffer row widening).

As for sample, as Darren pointed out there are plenty to choose from. Although there probably are none that have a component with one async and one sync output there will be very little difference with just looking at both a sync and async sample and mixing and matching the appropriate code that is needed.

Thanks,

Matt

|||

I had a look and could not see a sample with two synch outputs which is what I thought was required at first. Steve now has a simple example I wrote for him earlier, but for reference the only (public) asynch sample I know of is the RemoveDuplicates component which ships in the box. This does cache data, which is a good example of this style of transform, but since all he work is done in ProcessInput, although it would spane several invocations of this method, it is fairly easy to see how you can change this blocking nature, and start passing rows straight to the output buffer as soon as received from the input.

|||

Interesting thread. I feel a blog post coming on but in this case I'm gonna leave it to Steve as he is keen to share what he's learnt here. Keep a look out!

-Jamie

|||

I was hoping for a little bit more information about Asynchronous output, especially where the input is sent to the output right away instead of using blocking.

Is there any more information on developing Async components besides what has already listed? I've already looked at Remove Duplicates from the samples but find it a litte confusing without a document to go with it explaining the process and why certain techniques were used.

I'm trying to create an Async component with 1 input, 3 outputs. I basically want some processing to be done that will determine which of the three outputs to go to. I'm not concerned about the decision making part (which output to go to) but more so on how to move the data to the outputs right way.

|||

If all the rows that come in, and no more than those that come in, end up in one of the outputs, then this would be best done as a synch component.

You would create your 3 outputs in ProvideComponentProperties, setting the SynchronousInputID property to that of yuor input. Since you have several outputs you also need to set the ExclusionGroup property for each output, each should have a unique value to differentiate them. You then use the DirectRow method of the PipelineBuffer to send each row to the output you decide.

Lookup the ExclusionGroup property help topic for a simple example.

sql

No comments:

Post a Comment