'How to save the database so that it is readable for the dataframe?

The program takes some .csv database, performs computational manipulations with them, and after that it is necessary to save the resulting database so that it is readable using Dask.Dataframe When reading back the uploaded file in Python, the column types that the dataframe had in the loop should be preserved. I assume that you need to use csv files + a separate configuration file that specifies the types of columns.

Another question, how can I read a large file in one dataframe?

The main function looks like this:

def LoadDataFromDB(con,table):         --In this block, you need to record outgoing

date_str = datetime.now().strftime("%d_%b_%Y_%H_%M_%S")      
chunkS = 100     
filename = "./genfiles/" + date_str + ".gz"       

ExRates = ImportCSV("exchrates/Currency rates.csv")      
log = open("logs/log_"+ date_str + ".txt","w+")       

pbar = tqdm(total=CountTableRows(con)/chunkS)     
dfSQL = pds.read_sql_query((SQL_columns + table + SQL_Where),con,chunksize=chunkS)            

for i,chunk in enumerate(dfSQL):                --In this loop, after the res function, we save the data to a file         
print("Reading a Block of Data...")         
res = Calculate(chunk,ExRates,log)         
df = dd.from_pandas(res, npartitions=3)         
print(chunk.dtypes)                 
pbar.update()                                                        
pbar.close()     
log.close()       
return filename


Solution 1:[1]

Assuming you are using a NoSQL database, I have before saved a dataframe to a database using the following function to convert it into a JSON format:

def dataframe_to_dict(df: pd.DataFrame) -> Dict[Any, Any]:
    data_dict = df.to_dict("list")
    d = {"data": data_dict, "dtypes": {col: df[col].dtype.name for col in df.columns}, "index": df.index}
    return d

and then when reading from the database, I use this function to convert it back to a dataframe

def dict_to_dataframe(d: Dict[Any, Any]):
    df = pd.DataFrame.from_dict(d["data"])
    df.index = d["index"]

    for col, dtype in d["dtypes"].items():
        df[col] = df[col].astype(dtype)
    return df

You may need to fiddle around with dask if your dataframes are so large that they cant even fit into a singular dataframe. But this should help you get started.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tom McLean