'Memory usage of a list of millions of strings in Python

As seen in Find the memory size of a set of strings vs. set of bytestrings, it's difficult to precisely measure the memory used by a set or list containing strings. But here is a good estimation/upper bound:

import os, psutil
process = psutil.Process(os.getpid())
a = process.memory_info().rss
L = [b"a%09i" % i for i in range(10_000_000)]
b = process.memory_info().rss
print(L[:10])  # [b'a000000000', b'a000000001', b'a000000002', b'a000000003', b'a000000004', b'a000000005', b'a000000006', b'a000000007', b'a000000008', b'a000000009']
print(b-a)
# 568762368 bytes

i.e. 569 MB for 100 MB of actual data.

Solutions to improve this (for example with other data structures) have been found in Memory-efficient data structure for a set of short bytes-strings and Set of 10-char strings in Python is 10 times bigger in RAM as expected, so here my question is not "how to improve", but:

How can we precisely explain this size in the case of a standard list of byte-string?

How many bytes for each byte-string, for each (linked?) list item to finally obtain 569 MB?

This will help to understand the internals of lists and bytes-strings in CPython (platform: Windows 64 bit).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Memory usage of a list of millions of strings in Python

Sources

Related Questions