'How can I improve the performance of the Assert activity in Azure Data Factory?
I have a 24 column data stream that I am validating with the assert activity. Things like checking for field length, null, or datatype seem to execute relatively quickly. However, if I add conditionals such as the following, the process slows down remarkably.
expectTrue(contains(['Y','N','R', 'E'],#item == myfield), false, 'myfield', null, 'The myfieldmust be Y, N, R, or E.')
Having 5 or six of these brings it to a halt and the debugger fails to let me perform a data preview.
I've tried varied syntax including:
or(or(myfield == 'Y', myfield == 'N'), myfield == 'R')
myfield = 'Y' || myfield == 'N' || myfield == 'R'
I've thinned my sample file down to three rows, have sampling size set at 10 records. I've set the debug settings to work off a sample file and set a low record threshold there as well.
It doesn't seem like a 24 field text file where each field is validated should be a stretch for this kind of tooling, yet I am stymied.
Of note, we have the validation working in a prior version of the pipeline where all the validation logic was put in a conditional split as you would have before the Assert activity becoming available. It's slow there as well, but not enough to time out as it does here.
I don't want to farm this step out to a python Azure Function or some such silly because the Assert activity is for just this type of thing and is a much neater and integrated solution.
I also moved our IR from 4+4 to 8+8 with no perceivable difference in data flow performance (but significant performance gains in outside activities such as copy ops, etc.)
I'd sure appreciate any guidance if someone has a trick to make this do what I need it to do.
Thanks in advance!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
