'How can I improve the performance of the Assert activity in Azure Data Factory?

I have a 24 column data stream that I am validating with the assert activity. Things like checking for field length, null, or datatype seem to execute relatively quickly. However, if I add conditionals such as the following, the process slows down remarkably.

expectTrue(contains(['Y','N','R', 'E'],#item == myfield), false, 'myfield', null, 'The myfieldmust be Y, N, R, or E.')

Having 5 or six of these brings it to a halt and the debugger fails to let me perform a data preview.

I've tried varied syntax including:

or(or(myfield == 'Y', myfield == 'N'), myfield == 'R')

myfield = 'Y' || myfield == 'N' || myfield == 'R'

I've thinned my sample file down to three rows, have sampling size set at 10 records. I've set the debug settings to work off a sample file and set a low record threshold there as well.

It doesn't seem like a 24 field text file where each field is validated should be a stretch for this kind of tooling, yet I am stymied.

Of note, we have the validation working in a prior version of the pipeline where all the validation logic was put in a conditional split as you would have before the Assert activity becoming available. It's slow there as well, but not enough to time out as it does here.

I don't want to farm this step out to a python Azure Function or some such silly because the Assert activity is for just this type of thing and is a much neater and integrated solution.

I also moved our IR from 4+4 to 8+8 with no perceivable difference in data flow performance (but significant performance gains in outside activities such as copy ops, etc.)

I'd sure appreciate any guidance if someone has a trick to make this do what I need it to do.

Thanks in advance!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source