'dataclass references the same object rather than initializing a new one

I have a dataclass that looks like this (simplified and renamed):

@dataclass
class SharedReference:
    bytes_io: BytesIO = BytesIO()

And I am running into an issue where I'm accidentally creating a shared reference so closing the IO stream for one instance closes it for all.

Looking at the ids, I see it is referencing the same memory id:

shared = SharedReference()
print(shared.bytes_io)  # 0x7f...5e0
shared.bytes_io.close()

shared_2 = SharedReference() 
print(shared_2.bytes_io)  # 0x7f...5e0 (same id)
print(shared_2.bytes_io.closed) # True (got accidentally closed)

But this dataclass:

@dataclass
class SeparateReference:
    bytes_io: BytesIO = field(init=False)

    def __post_init__(self):
        self.bytes_io = BytesIO()

works properly:

separate = SeparateReference()
print(separate.bytes_io)  # 0x7f...d10
separate.bytes_io.close()

separate = SeparateReference() 
print(separate.bytes_io)  # 0x7f...680 (different id)
print(separate.bytes_io.closed)  # False (didn't get accidentally closed)

Why does the second work but not the first?



Solution 1:[1]

I was thinking that my first example was equivalent to the following code:

class SeparateReference:
    def __init__(self):
        self.bytes_io = BytesIO()

but looking again at the dataclass docs it actually was giving me something like:

class SharedReference:
    def __init__(self, bytes_io = BytesIO()):
        self.bytes_io = bytes_io

where BytesIO() is executed at the time of function definition rather than when an instance is created.

field's default_factory argument also addresses this and would work to keep things separate (someone else mentioned this but then removed their answer):

@dataclass
class SeparateReference:
    bytes_io: BytesIO = field(default_factory=BytesIO)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1