'Is it possible to combine multiple input files with different schemas using Schema Drift / Dynamic Columns

I have around 20 tab-separated input files. They have in the region of 500 columns, but each will be slightly different.

The sink output schema is known and will contain all the possible input columns.

As a simplified example:

File 1

Name Age DOB Nationality
Bob 21 01/01/1972 British

File2

Name Nationality NINO
Joe British AA995654A

File 3

Name DOB Nationality
Sam 01/01/1990 British

Is it possible to have one DataFlow with multiple inputs, where the schema is not known until runtime, that would cope with changes in the input files and in this case would output:

Name Age DOB NINO Nationality
Bob 21 01/01/1972 NULL British
Joe NULL NULL AA995654A British
Sam NULL 01/01/1990 NULL British

I have looked at column pattern matching and schema drift, but don't see how/if it is possible to achieve this.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source