'pyspark csv format - mergeschema

I have a large dump of data that spans in TB's. The files contain activity data on a daily basis. Day 1 can have 2 columns and Day 2 can have 3 columns. The file dump was in csv format. Now I need to read all these files and load it into a table. Problem is the format is csv and I am not sure how to merge the schema so as to lose not any columns. I know this can be achieved in parquet through mergeschema, but I cant convert these files one by one into parquet as the data is huge. Is there any way to merge schema with format as csv?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source