Python Data Classes vs Named Tuples: Differences You Should Know
Table of Contents
This article explains Python data structures. Earthly enhances Python builds by combining the simplicity of docker with greater flexibility. Check it out.
Data classes, introduced in Python 3.7, provide a convenient way to define classes that are a collection of fields. But for such use cases, named tuples, built into the collections module in the Python standard library, are good choices too. Named tuples have been around since Python 2.6, and several features have been added in the recent Python 3.x releases.
Given that Python data classes are popular, are named tuples still relevant? What are the key differences between the two? Are there advantages of using one over the other—depending on what we’d like to do?
Let’s take a closer look at both data classes and named tuples, and try to answer these questions.
To follow along, you need to have Python 3.8 or later version. To run the example on slots, you need Python 3.10. You can find the code examples used in this tutorial on GitHub.
Python Data Classes and Named Tuples: An Overview
We’ll start by reviewing the basics of data classes and named tuples.
Python Data Classes
In Python data classes are good choices when you need to create classes that store information and do not have a ton of functionality. Unlike regular Python classes, data classes require less boilerplate code, and come with default implementation of methods for string representation and comparing equality of attributes.
We’ll use the following BookDC
data class that contains fields such as title
, author
, genre
, and more.
from dataclasses import dataclass
@dataclass
class BookDC:
str
title:str
author:str
genre:bool standalone:
After creating the BookDC
data class, we can create instances by passing in the values for the various fields in the constructor:
= BookDC('To the Lighthouse','Virginia Woolf','Modernism',True)
book1 print(book1)
BookDC(
title="To the Lighthouse",
author="Virginia Woolf",
genre="Modernism",
standalone=True,
)
What Are Named Tuples?
When you want to store attributes and efficiently look up and use the values, do we need classes at all? Won’t basic data structures like lists, tuples, and dictionaries suffice?
We would often need such objects to be immutable, perhaps, we can use tuples? However, with tuples, we need to remember what each of the field stands for—and access them using the index.
We can consider switching to a dictionary because the keys will now indicate what the fields are. But we can modify a dictionary in place, so we may accidentally modify fields that you do not intend to. And each created tuple or dictionary object is an independent entity; there is no template that we can use to create objects of similar type.
Here’s where named tuples can help. Named tuples are tuples with named attributes. So they give you the immutability of tuples and readability of dictionaries. In addition, once you define a named tuple of a specific type, you can use that to create many instances of that named tuple type.
To create a named tuple, you can use namedtuple
from the collections
module that is built into the Python standard library. You can pass in the named tuple type (this is analogous to the class name) and the fields as a space-delimited string. You can as well pass in the field names as a list of strings.
BookNT
is the functional named tuple equivalent of the BookDC
data class:
from collections import namedtuple
= namedtuple('BookNT','title author genre standalone') BookNT
You can now create instances of BookNT
:
= BookNT('Deep Work','Cal Newport','Nonfiction', True)
book2 print(book2)
BookNT(title='Deep Work', author='Cal Newport', genre='Nonfiction', standalone=True)
Data Classes vs Named Tuples: A Comprehensive Comparison
🔖 TL; DR: If you want an immutable container data type with a small subset of fields taking default values, consider named tuples. If you want all the features and extensibility of Python classes, use data classes instead.
Factoring in the memory footprint: named tuples are much more memory efficient than data classes, but data classes with slots are more memory efficient.
Immutability
Data class instances are mutable by default. So you can modify the value of one or more fields after the instance has been created.
Consider the following instance of the BookDC
data class:
= BookDC('Elantris','Brandon Sanderson','Epic Fantasy',True) book3
Let’s update the title
and standalone
fields of book3
:
= 'Mistborn'
book3.title = False
book3.standalone
print(book3)
BookDC(
title="Mistborn", author="Brandon Sanderson", genre="Epic Fantasy", standalone=False
)
Named tuples are tuples, too. So they are immutable. Meaning you cannot modify them in place. In this example, you cannot modify named tuple instances after they are created. If you try doing so you will run into errors.
Try updating the title
field of the book2
instance we created:
book2 = BookNT('Deep Work','Cal Newport','Nonfiction', True)
book2.title = 'Digital Minimalism'
You’ll see that it results in an AttributeError
exception:
Traceback (most recent call last):
File "main.py", line 30, in <module>
book2.title = 'Digital Minimalism'
AttributeError: can't set attribute
📑 So far, we know that data class instances are mutable by default, and named tuple instances are immutable. But can we have immutable data class instances and mutable named tuple instances?
- You can make data class instances immutable by setting
frozen
toTrue
in the@dataclass
decorator. - But you cannot have mutable named tuple instances.
📌 A Note on _replace()
Using the _replace()
method, you can get a shallow copy of a named tuple instance where the value of a particular field is replaced with an updated value. As an example, create a shallow copy of the book2
instance with a modified title
field:
= BookNT('Deep Work','Cal Newport','Nonfiction', True)
book2 = book2._replace(title='Digital Minimalism') book2_copy
The title
field of the shallow copy book2_copy
has been updated while the title
of book2
remains unchanged:
print(book2.title)
print(book2_copy.title)
Deep Work Digital Minimalism
You can as well use the _replace()
method to create shallow copies of data class instances.
Setting Default Values
When you create data classes you can specify the default values for one or more fields.
Here we set the standalone
field in the BookDC
data class to take a default value of True
:
from dataclasses import dataclass
@dataclass
class BookDC:
str
title:str
author:str
genre:bool=True standalone:
We instantiate an object for Neil Gaiman’s book Coraline without specifying the value of standalone
in the constructor:
= BookDC('Coraline','Neil Gaiman','Fantasy')
book4 print(book4)
And the standalone
field takes the default value of True
:
BookDC(title='Coraline', author='Neil Gaiman', genre='Fantasy', standalone=True)
But can you set default values in named tuples?
Though this may not be obvious, in Python 3.7+, you can use the defaults
field in the namedtuple()
factory function to set default values. You can set defaults
to a list of k
values to specify the default values for the last k
fields.
Let’s specify the default value for standalone
in the BookNT
named tuple. Here the defaults
list contains only one element, True
, the default value of the last field standlone
:
from collections import namedtuple
= namedtuple('BookNT','title author genre standalone',defaults=[True]) BookNT
The standalone
field of any instance created without the specifying its value in the function call will now be set to True
:
= BookNT('Piranesi','Susanna Clarke','Fantasy')
book5 print(book5)
BookNT(title='Piranesi', author='Susanna Clarke', genre='Fantasy', standalone=True)
When you want to look up all the default values, you can check the _field_defaults
attribute of the named tuple instances:
print(book_5._field_defaults)
The _field_defaults
attribute is a dictionary of containing the fields with default values and the corresponding default values as key-value pairs:
{'standalone': True}
Though we can add literal defaults in named tuples, it can be hard to maintain if there are too many fields.
📑 Initializing Default Values With Default Factory
Both data classes and named tuples support setting literal defaults. With Python data classes, you can also use default_factory
to use any callable to initialize a field with default values.
For the BookDC
class, we can add a rating
field that is initialized with a default value whenever a data class instance is created without specifying the rating
field.
Here get_rating()
is a simple function that returns a number between 1 and 5 (yeah, not the best way to rate a book!). The default_factory
initializes the rating
field with a default value by calling the get_rating()
function.
from dataclasses import dataclass, field
import random
def get_rating():
return random.choice(range(3,6))
@dataclass
class BookDC:
str
title:str
author:str
genre:bool=True
standalone:str=field(default_factory=get_rating) rating:
Now both standalone
and rating
are optional fields in the constructor:
= BookDC('Coraline','Neil Gaiman','Fantasy')
book4 print(book4)
BookDC(title='Coraline', author='Neil Gaiman', genre='Fantasy', standalone=True, rating=5)
Comparing Instances
Unlike a regular Python class that requires you to define dunder methods such as __repr__
and __eq__
, both data classes and named tuples come with some built-in support for representation and object comparison.
Suppose we have AnotherBookDC
, another data class with the same fields as BookDC
.
from dataclasses import dataclass
@dataclass
class AnotherBookDC:
str
title:str
author:str
genre:bool=True standalone:
In this example, book_a
and book_b
are instances of BookDC
and AnotherBookDC
, respectively:
= BookDC('Coraline','Neil Gaiman','Fantasy')
book_a print(book_a)
= AnotherBookDC('Coraline','Neil Gaiman','Fantasy')
book_b print(book_b)
And both the instances take the same values for all the fields:
BookDC(title='Coraline', author='Neil Gaiman', genre='Fantasy', standalone=True)
AnotherBookDC(title='Coraline', author='Neil Gaiman', genre='Fantasy', standalone=True)
But when we check for equality, we get False
:
print(book_a == book_b)
# False
Which is expected because they are instances of two different data classes—though they have identical values.
But what happens when you try to do the same for name tuples? Well, named tuples are just tuples. So comparing two named tuples with identical values — whether they are instances of the same or different named tuple type — returns True
.
from collections import namedtuple
= namedtuple('AnotherBookNT','title author genre standalone',defaults=[True]) AnotherBookNT
Create instances of both BookNT
and AnotherBookNT
. Make sure they have identical values for the fields:
= BookNT('Piranesi','Susanna Clarke','Fantasy')
book_a print(book_a)
= AnotherBookNT('Piranesi','Susanna Clarke','Fantasy')
book_b print(book_b)
BookNT(title='Piranesi', author='Susanna Clarke', genre='Fantasy', standalone=True)
AnotherBookNT(title='Piranesi', author='Susanna Clarke', genre='Fantasy', standalone=True)
Though they are instances of two different named tuple types, element-wise equality between them holds True
and the comparison returns True
.
print(book_a == book_b)
# True
Type Hints
From the way we create data classes and named tuples, it’s easy to see how data classes support type hints out of the box.
Since Python 3.6, you can use NamedTuple
from the typing module to add type hints for fields. You can pass in the field names and their corresponding types as a list of tuples. Here’s how you can add type hints to the BookNT
named tuple:
from typing import NamedTuple
= NamedTuple(
BookNT 'BookNT', [('title', str), ('author', str), ('genre', str), ('standalone', bool)]
)= BookNT('Six of Crows', 'Leigh Bardugo', 'Fantasy', False)
book print(book)
BookNT(title='Six of Crows', author='Leigh Bardugo', genre='Fantasy', standalone=False)
You can also use the familiar class syntax and create named tuple instances with type hints. This is very similar to how you create data classes:
from typing import NamedTuple
class BookNT(NamedTuple):
str
title:str
author:str
genre:bool=True standalone:
= BookNT('Six of Crows','Leigh Bardugo','Fantasy',False)
book print(book)
BookNT(title='Six of Crows', author='Leigh Bardugo', genre='Fantasy', standalone=False)
🏷️ All NamedTuple Types Are Tuple Subclasses
Consider the following code snippet:
class Derived(Base):
pass
We use such a construct when creating subclasses that inherit from a base class; the Derived
class inherits from the Base
class. Notice that we use a similar syntax when creating named tuple types.
class SomeNamedTuple(NamedTuple):
pass
Though this may look like SomeNamedTuple
is a subclass of NamedTuple
, SomeNamedTuple
is a subclass of tuple and not NamedTuple
. You can verify this using the built-in issubclass()
function:
print(issubclass(BookNT,NamedTuple))
# False
print(issubclass(BookNT,tuple))
# True
Memory Footprint and Attribute Access
How do data classes and named tuples compare in terms of memory footprint? Is one more memory efficient than the other? We’ll answer these questions in a bit.
To get the approximate size of the objects in memory, we’ll use Pympler’s asizeof
module. Install pympler
using pip
: pip install pympler
.
In the snippet below, book_dc
and book_nt
are instances of the Book_DC
and Book_NT
data class, respectively.
from pympler.asizeof import asizeof
= BookDC('Hyperfocus','Chris Bailey','Nonfiction',True)
book_dc = BookNT('Hyperfocus','Chris Bailey','Nonfiction',True)
book_nt
= asizeof(book_dc)
s1 = asizeof(book_nt)
s2
print(f"Size of BookDC data class: {s1}")
print(f"Size of BookNT named tuple: {s2}")
We see that the name tuple instance book_nt
takes up much less memory than the data class instance book_dc
:
Size of BookDC data class: 608
Size of BookNT named tuple: 296
🔖 Named Tuples and Tuples Have the Same Memory Footprint
The size of any named tuple instance is the same as that of a simple tuple. Let’s verify this:
= ('Hyperfocus','Chris Bailey','Nonfiction',True)
book_t
from pympler.asizeof import asizeof
= asizeof(book_t)
size_book_t print(size_book_t)
# 296 (equal to the size of `book_nt`)
You can use slots to make data classes more memory efficient. Using slots prevents the creation of the instance variables dictionary resulting in substantial memory savings.
To use slots you can set slots
to True
in the @dataclass
decorator:
from dataclasses import dataclass, field
@dataclass(slots=True)
class BookDC:
str
title:str
author:str
genre:bool=True standalone:
Create an instance of data class that uses slots:
= BookDC('Hyperfocus','Chris Bailey','Nonfiction',True) book_dc_slots
As seen, the size of a data class instance with slots is smaller than that of a named tuple.
Size of BookDC data class with slots: 288
When comparing attribute access speeds, both data classes and named tuples seem to have almost similar performance. In this example, we access the title
field of both the data class and named tuple instance:
from functools import partial
import timeit
def get(book):
book.title
= min(timeit.repeat(partial(get,book_dc)))
t1 = min(timeit.repeat(partial(get,book_nt)))
t2
print(f"Attribute access time for data class instance: {t1:.2f}")
print(f"Attribute access time for named tuple instance: {t2:.2f}")
The following results are for Python 3.10 on Ubuntu 22.04 LTS:
Attribute access time for data class instance: 0.05
Attribute access time for named tuple instance: 0.06
Summing Up the Discussion
Let’s wrap up our discussion by summarizing the key differences between data classes and named tuples.
Features | Data Classes | Named Tuples |
---|---|---|
Immutability of instances | Mutable by default; Set frozen = True in the @dataclass to create immutable instances. |
Instances are immutable by default. |
Default Values | Can set both literal defaults and complex defaults using default_factory . |
Use defaults to specify a list of default values for the last k fields. |
Type Hints | Out-of-the-box support for type hints. | Use typing.NamedTuple to specify type hints for fields. |
Comparison | Comparison works as expected between two instances of the same data class. | Comparison between two instances of any named tuple type returns True so long as the attributes are equal. |
Memory Efficiency | Data classes with slots have lower memory footprint. | More efficient than regular data classes. |
Maintainability | (Almost always) easy to maintain. | Can be hard to maintain, especially when there are many default fields. |
Conclusion
In this post, we dove into data classes and named tuples in Python, comparing their features like immutability and memory usage. These tools are handy for structured data, but don’t forget about third-party packages like Pydantic and attrs too. They can automate some best practices, so feel free to explore and use them in your projects.
Speaking of automation, if you’re looking to streamline your coding process even further, don’t stop at Python. Try out Earthly for build automation. This tool can be a game-changer in managing your builds and ensuring consistency across different environments.