'How to load my tables into Spark dataframes faster with Scala?
I made a code that is supposed to load many tables (listed through the method LTables) into differents dataframes in Scala using Spark.
Here is my code:
LTables.iterator.foreach{
Table=> TableProcessor.execute(sparkSession,filterTenant,Table)
if (Table.TableDf.count()>0) {
GenerateCsv.execute(sparkSession, Table.TableDf,Table.OutputFilename, filterTenant)
}
}
In my foreach loop, I process TableProcessor.execute that makes an SQL query and put the result into a dataframe and process a filtering, and then GenerateCsv just load filtered data into a csv.
The thing is, I have a lot of tables with large amount of data to process, so the full process is very slow (I tryed with a list of 160 tables) I know Spark is great to process a big dataframe and not that great to deal with a lot of dataframes, but I have to get tables separatly using SQL queries.
If you have solution or advice to help me make this code run faster, it would be great.
Thank's for helping
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
