'How to save the database so that it is readable for the dataframe?
The program takes some .csv database, performs computational manipulations with them, and after that it is necessary to save the resulting database so that it is readable using Dask.Dataframe When reading back the uploaded file in Python, the column types that the dataframe had in the loop should be preserved. I assume that you need to use csv files + a separate configuration file that specifies the types of columns.
Another question, how can I read a large file in one dataframe?
The main function looks like this:
def LoadDataFromDB(con,table): --In this block, you need to record outgoing
date_str = datetime.now().strftime("%d_%b_%Y_%H_%M_%S")
chunkS = 100
filename = "./genfiles/" + date_str + ".gz"
ExRates = ImportCSV("exchrates/Currency rates.csv")
log = open("logs/log_"+ date_str + ".txt","w+")
pbar = tqdm(total=CountTableRows(con)/chunkS)
dfSQL = pds.read_sql_query((SQL_columns + table + SQL_Where),con,chunksize=chunkS)
for i,chunk in enumerate(dfSQL): --In this loop, after the res function, we save the data to a file
print("Reading a Block of Data...")
res = Calculate(chunk,ExRates,log)
df = dd.from_pandas(res, npartitions=3)
print(chunk.dtypes)
pbar.update()
pbar.close()
log.close()
return filename
Solution 1:[1]
Assuming you are using a NoSQL database, I have before saved a dataframe to a database using the following function to convert it into a JSON format:
def dataframe_to_dict(df: pd.DataFrame) -> Dict[Any, Any]:
data_dict = df.to_dict("list")
d = {"data": data_dict, "dtypes": {col: df[col].dtype.name for col in df.columns}, "index": df.index}
return d
and then when reading from the database, I use this function to convert it back to a dataframe
def dict_to_dataframe(d: Dict[Any, Any]):
df = pd.DataFrame.from_dict(d["data"])
df.index = d["index"]
for col, dtype in d["dtypes"].items():
df[col] = df[col].astype(dtype)
return df
You may need to fiddle around with dask if your dataframes are so large that they cant even fit into a singular dataframe. But this should help you get started.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tom McLean |
