'Python: force to release objects from memory in for-loop (or remove caches)

I'm doing a little research about Python object memory sizes. I encounter a problem with memory management (recycling pool).

We declare a class A and a function to create different attribute sizes of objects:

from sys import getsizeof

class A:
    def __init__(self, mydict):
        for k, v in mydict.items():
            # set attr dynamically
            setattr(self, k, v)
        pass

def get_object_dict(count):
    # create an object from a dict (key-value pairs for attributes)
    kv_pairs = dict([(f'k{i}', i) for i in range(count)])
    o = A(kv_pairs)
    # inspect the attribute dict
    d = o.__dict__
    print(
        f'attr: {count:3}  || ',
        f'o id: {id(o)}  || ', 
        f'__dict__ id: {id(d)}, size: {getsizeof(d):4}, len: {len(d):3}', 
    )
    return d

for example

# a object with 3 attributes
obj = get_object_dict(3)
print(obj)
attr:   3  ||  o id: 4362505424  ||  __dict__ id: 4366370560, size:  104, len:   3
{'k0': 0, 'k1': 1, 'k2': 2}

Now I tried to inspect objects with different attribute sizes for a range:

for i in range(20):
    get_object_dict(i)
attr:   0  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  104, len:   0
attr:   1  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  104, len:   1
attr:   2  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  104, len:   2
attr:   3  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  104, len:   3
attr:   4  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  104, len:   4
attr:   5  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  104, len:   5
attr:   6  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  144, len:   6
attr:   7  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  144, len:   7
attr:   8  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  144, len:   8
attr:   9  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  144, len:   9
attr:  10  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  144, len:  10
attr:  11  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  232, len:  11
attr:  12  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  232, len:  12
attr:  13  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  232, len:  13
attr:  14  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  232, len:  14
attr:  15  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  232, len:  15
attr:  16  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  232, len:  16
attr:  17  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  232, len:  17
attr:  18  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  232, len:  18
attr:  19  ||  o id: 4362516656  ||  __dict__ id: 4366171648, size:  232, len:  19

The getsizeof(o.__dict__) are increasing from 104 to 232 for different attribute sizes. How about I reverse the loop range?

for i in reversed(range(20)):
    get_object_dict(i)
attr:  19  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:  19
attr:  18  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:  18
attr:  17  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:  17
attr:  16  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:  16
attr:  15  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:  15
attr:  14  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:  14
attr:  13  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:  13
attr:  12  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:  12
attr:  11  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:  11
attr:  10  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:  10
attr:   9  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:   9
attr:   8  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:   8
attr:   7  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:   7
attr:   6  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:   6
attr:   5  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:   5
attr:   4  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:   4
attr:   3  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:   3
attr:   2  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:   2
attr:   1  ||  o id: 4362505424  ||  __dict__ id: 4365799872, size:  232, len:   1
attr:   0  ||  o id: 4362505424  ||  __dict__ id: 4366123136, size:  232, len:   0

Surprisingly, the getsizeof(o.__dict__) are always 232.

I notice that the object ids and __dict__ ids are the same. I think that was about recycling objects with a pool. So I tried to call gc.collect() to release objects through Python's garbage collection.

import gc
for i in reversed(range(20)):
    get_object_dict(i)
    gc.collect()
attr:  19  ||  o id: 4362544336  ||  __dict__ id: 4366166656, size:  232, len:  19
attr:  18  ||  o id: 4362544384  ||  __dict__ id: 4366423552, size:  232, len:  18
attr:  17  ||  o id: 4364598768  ||  __dict__ id: 4366429440, size:  232, len:  17
attr:  16  ||  o id: 4364598768  ||  __dict__ id: 4366285760, size:  232, len:  16
attr:  15  ||  o id: 4364598768  ||  __dict__ id: 4366423552, size:  232, len:  15
attr:  14  ||  o id: 4364598768  ||  __dict__ id: 4366422016, size:  232, len:  14
attr:  13  ||  o id: 4364598768  ||  __dict__ id: 4366419648, size:  232, len:  13
attr:  12  ||  o id: 4364598768  ||  __dict__ id: 4366431040, size:  232, len:  12
attr:  11  ||  o id: 4364598768  ||  __dict__ id: 4366419840, size:  232, len:  11
attr:  10  ||  o id: 4364598576  ||  __dict__ id: 4366421056, size:  232, len:  10
attr:   9  ||  o id: 4364598576  ||  __dict__ id: 4366419840, size:  232, len:   9
attr:   8  ||  o id: 4364598768  ||  __dict__ id: 4366418368, size:  232, len:   8
attr:   7  ||  o id: 4364598576  ||  __dict__ id: 4366420352, size:  232, len:   7
attr:   6  ||  o id: 4364598768  ||  __dict__ id: 4366422016, size:  232, len:   6
attr:   5  ||  o id: 4364598576  ||  __dict__ id: 4366429568, size:  232, len:   5
attr:   4  ||  o id: 4364598672  ||  __dict__ id: 4366431680, size:  232, len:   4
attr:   3  ||  o id: 4364599200  ||  __dict__ id: 4365801920, size:  232, len:   3
attr:   2  ||  o id: 4307971440  ||  __dict__ id: 4366281472, size:  232, len:   2
attr:   1  ||  o id: 4364598384  ||  __dict__ id: 4365401920, size:  232, len:   1
attr:   0  ||  o id: 4364599200  ||  __dict__ id: 4365013376, size:  232, len:   0

Ok, I gave up. I come to here to ask:

  • Even though I call gc.collect(), the memory sizes of __dict__ through the loops are still 232. Python some how caches the previous allocation size. How do I correct this cache issue? This is buggy for my code, because I create lots of object from the same Class, and I want to inspect the memory sizes. Right now, the results depend on the previously executed code (cache?).

  • After calling gc.collect(), the o id and __dict__ id was ALMOST changed in the for-loop, but some id are the same with next loop. Was that normal? Are those objects released and re-allocated memory for each loop?


PS. I'm using macOS 12.3 (arm64 M1 Max) with Python 3.10.4. I'm not sure whether it's about the execution environment.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source