'Why memory usage increases when reopening a Parquet file with pandas?

I generated a Pandas dataframe of 8.481.288 rows and 451 columns, where most of the columns have integer values. When I generate this dataframe, the total memory consumption on my PC is (more o less) 50% of my total memory, but if I save this dataframe to parquet format, restart the kernel and read the file, my memory consumption comes close to 99%, making it almost unfeasible to be used.

More specifically, I am saving with:

df.to_parquet('filepath.parquet')

And then, I restart the kernel and reopen with:

df = pd.read_parquet('filepath.parquet')

Then my memory consumption explodes.

Sorry for the silly question, but I couldn't find the answer in other questions. A similar thing happens if I try to save in feather format.

Thank you

EDIT: two curious things also happen: When I delete the df after reopening it (del df), the memory usage by Python remains in a high level, but half the one before deleting the dataframe. Also, when I reopen the dataframe and merge it with another one, the memory usage level goes back to the normal value (similar to the one before saving and reopening). This corroborates juanpa.arrivillaga's answer.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Why memory usage increases when reopening a Parquet file with pandas?

Sources

Related Questions