'Data Classes vs typing.NamedTuple primary use cases
Long story short
PEP-557 introduced data classes into Python standard library, that basically can fill the same role as collections.namedtuple and typing.NamedTuple. And now I'm wondering how to separate the use cases in which namedtuple is still a better solution.
Data classes advantages over NamedTuple
Of course, all the credit goes to dataclass if we need:
- mutable objects
- inheritance support
propertydecorators, manageable attributes- generated method definitions out of the box or customizable method definitions
Data classes advantages are briefly explained in the same PEP: Why not just use namedtuple.
Q: In which cases namedtuple is still a better choice?
But how about an opposite question for namedtuples: why not just use dataclass? I guess probably namedtuple is better from the performance standpoint but found no confirmation on that yet.
Example
Let's consider the following situation:
We are going to store pages dimensions in a small container with statically defined fields, type hinting and named access. No further hashing, comparing and so on are needed.
NamedTuple approach:
from typing import NamedTuple
PageDimensions = NamedTuple("PageDimensions", [('width', int), ('height', int)])
DataClass approach:
from dataclasses import dataclass
@dataclass
class PageDimensions:
width: int
height: int
Which solution is preferable and why?
P.S. the question isn't a duplicate of that one in any way, because here I'm asking about the cases in which namedtuple is better, not about the difference (I've checked docs and sources before asking)
Solution 1:[1]
In programming in general, anything that CAN be immutable SHOULD be immutable. We gain two things:
- Easier to read the program- we don't need to worry about values changing, once it's instantiated, it'll never change (namedtuple)
- Less chance for weird bugs
That's why, if the data is immutable, you should use a named tuple instead of a dataclass
I wrote it in the comment, but I'll mention it here:
You're definitely right that there is an overlap, especially with frozen=True in dataclasses- but there are still features such as unpacking belonging to namedtuples, and it always being immutable- I doubt they'll remove namedtuples as such
Solution 2:[2]
I had this same question, so ran a few tests and documented them here: https://shayallenhill.com/python-struct-options/
Summary:
- NamedTuple is better for unpacking, exploding, and size.
- DataClass is faster and more flexible.
- The differences aren't tremendous, and I wouldn't refactor stable code to move from one to another.
- NamedTuple is also great for soft typing when you'd like to be able to pass a tuple instead.
To do this, define a type inheriting from it...
class CircleArg(NamedTuple):
x: float
y: float
radius: float
...then unpack it inside your functions. Don't use the .attributes, and you'll have a nice "type hint" without any PITA for the caller.
*focus, radius = circle_arg_instance # or tuple
Solution 3:[3]
Another important limitation to NamedTuple is that it cannot be generic:
import typing as t
T=t.TypeVar('T')
class C(t.Generic[T], t.NamedTuple): ...
TypeError: Multiple inheritance with NamedTuple is not supported
Solution 4:[4]
One usecase for me is frameworks that do not support dataclasses. In particular, TensorFlow. There, a tf.function can work with a typing.NamedTuple but not with a dataclass.
class MyFancyData(typing.NamedTuple):
some_tensor: tf.Tensor
some_other_stuf: ...
@tf.function
def train_step(self, my_fancy_data: MyFancyData):
...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | smci |
| Solution 3 | KFL |
| Solution 4 | fabian789 |
