'general question about polars memory management
I have some general questions about memory management in Polars. It would be great if you can spend a few sentences on how it works, like when is memory allocated and when it's reclaimed.
In particular I would like to know how to delete some memory from a dataframe. I want to do it in a way that is immediate and doesn't go through the Python garbage collection mechanism if possible. It's not too bad if I have to call gc.collect() immediately after but that's not preferable.
Solution 1:[1]
I don't really understand your question, but I'll have a go at it.
In python-polars, a Series or a DataFrame's deletion is determined by pythons reference counting garbage collection just like any other python object.
Next there is the fact that polars memory is also reference counted. So if we create a new DataFrame that copies data from an already existing DataFrame/Series that data is not copied, but a reference count is incremented.
So for instance in the example below we have 2 DataFrames totalling 4 columns, but we only have 3 columns in memory because the column "a" is shared between both DataFrames. And will only get deleted if the reference count is 0.
The same principle also counts for slicing Series. A slice never copies data, but merely increments a reference count and updates an offset and length field.
df_a = pl.DataFrame({
"a": [1, 2, 3],
"b": ["a", "b", "c"]
})
df_b = df_a.select(["a", pl.col("b") + "py"])
print(df_a)
print(df_b)
shape: (3, 2)
?????????????
? a ? b ?
? --- ? --- ?
? i64 ? str ?
?????????????
? 1 ? a ?
?????????????
? 2 ? b ?
?????????????
? 3 ? c ?
?????????????
shape: (3, 2)
?????????????
? a ? b ?
? --- ? --- ?
? i64 ? str ?
?????????????
? 1 ? apy ?
?????????????
? 2 ? bpy ?
?????????????
? 3 ? cpy ?
?????????????
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ritchie46 |
