Structural Pattern Matching in Python
Table of Contents
This article explores Python’s structural pattern matching. Earthly streamlines the build process for Python developers. Check it out.
Structural Pattern Matching is a new feature introduced in Python 3.10 in the PEP 634 specification. The feature verifies if the value of an expression, called the subject, matches a given structure called the pattern
.
Structural Pattern Matching provides an elegant method to destructure, extract, and match the values or attributes of an object using an easy-to-read syntax.
In this tutorial, you’ll learn how structural pattern matching works in Python by coding several examples in the context of working with web services. We’ll explore this new feature to match the structure and attributes of the response from the JSONPlaceholder Fake API. JSONPlaceholder is a free online REST API that you can use whenever you need some fake data.
In this tutorial, we’ll use the following endpoints from the API:
GET /posts #returns 100 posts
GET /posts/<id> #returns a single post
By the end of this tutorial, you’ll have a better understanding of structural pattern matching and you’ll be able to use it in your next Python project or a codebase that you’re working on.
The code examples in this tutorial are in the pattern-matching.py
file; you can download it from this GitHub repository and follow along.
Prerequisites
- Proficiency in Python
- Python 3.10 or the most recent stable release of Python 3.11
This tutorial uses the Python Requests library to retrieve fake blog post data from the JSONPlaceholder API. Therefore, a working knowledge of the library will be helpful but not required.
You may install the Requests library using the pip
package manager:
pip install requests
Basics of Pattern Matching
Understanding the match-case
Syntax
<expression>:
match <pattern 1> [<if guard>]:
case <block to execute if pattern 1 matches>
<pattern n> [<if guard>]:
case <code block to execute if pattern n matches>
The match
keyword is a soft keyword whose expression evaluates to produce a value called the subject. The subject is then matched against the pattern of each case
clause. The case
block corresponding to the first match is executed; all subsequent case statements are ignored.
The guard
is an optional if condition
in a case clause
. It’s evaluated after a pattern
matches the subject. The block of code associated with the case clause will execute only if the guard
evaluates to True
. Otherwise, the next pattern will be compared until there is another match with a guard
that evaluates to True (if we specified a guard
).
If you’ve coded in Javascript or C, the match-case
statement might look similar to the switch-case
statement. However, there are certain differences. The match-case
statement differs from the switch-case
statement in that it does not require an explicit break
statement after a pattern has been matched. It also has a lot of powerful features that cannot be found in the switch-case
statement in other languages. We will explore these features later on.
The patterns are shapes or structures that the subjects are compared against. The values of the subject can also be captured and bound
to a variable that we specified in the pattern.
Binding variables to values is a little different than assigning variables to values. The value captured in the pattern can not be set as a value to an attribute of an object in the case clause
using the dot notation:
object.attribute = value
However, variable bindings outlive the scope of the respective case or match statements just like a normal variable.
= ["Structural Pattern Matching", "DrA", 232113]
book_data
match book_data:
case title, author:= None
isbn if type(isbn) == 'int':
case title, author, isbn pass
In the example above:
- The
book_data
variable that precedes thematch
statement is thesubject
- The variables in front of the
case
statements are a form of pattern we will discuss later. - The second
case
clause will be matched and the elements in thebook_data
list will be bound to these variables (title
,author
, andisbn
). - The condition in the
if statement
is theguard
and the Python interpreter only evaluates it if the pattern has matched thesubject
.
There are several classes of structural patterns that can be matched and they include the following:
- Literal Patterns
- As Patterns
- Wildcard Patterns
- OR Patterns
- Value Patterns
- Sequence Patterns
- Mapping Patterns
- Class Patterns
Matching Literal Patterns in Python
Literal patterns are constants (alphabetic, numeric, or boolean) that only match the exact values. They include a subject with one of the basic data types (integer, float, string, and Boolean) matched against a pattern of the same data type.
The behavior of the match-case
statement, in this case, is similar to the switch-case
statement in Javascript.
The match-case
statement compares the value of the subject with the literal values specified as patterns in the case clauses.
This comparison is done using the equality == operator for all the literal patterns except for True
, False
, and None
which are compared using Python’s is
keyword.
All forms of strings (byte, raw, triple quoted string) can be specified as a pattern except the F string which are not really literals.
However, based on the consistent rule of equality ==
in Python, literal patterns like 1 and 1.0 will match (so do all expressions that evaluate to True
) when compared with the equality sign:
1]: 1 == 1.0
In [1]: True
Out[
2]: 0 == False
In [2]: True Out[
Matching Integer Values
Let’s take a look at an example of matching a literal integer value:
import requests
def main(response):
= response.status_code #200
status_code
match status_code:200:
case print("The response is OK")
400:
case print("The response is Bad")
= requests.get("https://jsonplaceholder.typicode.com/posts/1")
response main(response)
If we execute the code above, we will get the following output:
The response is OK
In the case above, Python compares the subject which is the status_code
to the literal integer pattern we specified in the case clauses (200 and 400). After the status code matches the subject, the interpreter executes the code under the case clause.
Matching String Values
We can match a string like the encoding of the response as shown below:
import requests
def main(response):
= response.encoding
encoding
match encoding:"utf-8":
case print("The encoding is utf-8")
"utf-16":
case print("The encoding is utf-16")
= requests.get("https://jsonplaceholder.typicode.com/posts/1")
response main(response)
We have this output:
The encoding is utf-8
Matching Boolean Values
Similarly, we can match a bool
value:
import requests
def main(response):
= response.ok
check
match check:True:
case print("The response is ok")
False:
case print("The response is not ok")
= requests.get("https://jsonplaceholder.typicode.com/posts/1")
response main(response)
We will get the output as shown below:
The response is ok
To match a floating value or a None
keyword, you can specify an expression that evaluates to either of them in the match
statement to compare against them as a literal value in the case clauses.
Matching Capture Patterns
We can capture and bind some or all the values of the subject to a variable.
Capture patterns
are names that capture values that match the structure of the subject in a variable. These variables outlive the scope of their respective case clauses; they can be accessed outside the match-case
block.
The binding of values to variables is similar to how arguments are the values of parameters in a function.
def main(response):
= [response.status_code, response.encoding, response.json()]
values
match values:
case [status_code, encoding]:print("The first pattern matches the subject")
print(status_code, encoding)
if status_code <= 399:
case [status_code, encoding, response_data] print("The second pattern matches the subject")
print(status_code, encoding, response_data)
In the snippet above, the subject is a sequence with three elements, therefore only a pattern with this same structure can match the subject successfully.
The
pattern
in the firstcase
matches only two of the three elements in the subject, as a result, the match will not be successful.The pattern in the second
case
matches the structure of the subject and theguard
evaluates toTrue
. In addition, the interpreter binds the names we specified in the pattern to the values of the elements in the sequence.
When we run the code above, we will get the following output:
The second pattern matches the subject
200 utf-8 {'userId': 1, 'id': 1, 'title': \
'sunt aut facere repellat provident occaecati \
excepturi optio reprehenderit', 'body': \
'quia et suscipit\nsuscipit recusandae consequuntur \
expedita et cum\nreprehenderit molestiae ut ut quas \
totam\nnostrum rerum est autem sunt rem eveniet architecto'}
Only the second pattern matches the subject and the bound values are printed out in the case
block.
The capture pattern matches and binds values to the names we specified in the pattern. However, individual data type matching of the elements in the sequence above is not possible. This can be achieved with the As pattern
, which we will discuss in the next section.
As Pattern
The As pattern allows us to specify a pattern to match the subject or individual elements in a subject against and also a name to bind the value of the subject.
The As pattern
uses the as
keyword to bind a variable to the value after the structure of the subject matches the pattern.
Let us modify the snippet above to match the number of elements in the sequence and the data-types of the individual elements.
def main(response):
= [response.status_code, response.encoding, response.json()]
values
match values:int() as status_code, str() as encoding]:
case [print("The first pattern matches the subject")
int() as status_code, str() as encoding, str() as \
case [
response_data]:print("The second pattern matches the subject")
int() as status_code, str() as encoding, dict() as \
case [
response_data]:print("The Third pattern matches the subject")
print(f"status_code:{status_code}, encoding:{encoding}, \
response_data:{response_data}")
In this example:
- The first pattern fails because it does not match the number of elements in the sequence, but matches the data type of the
status_code
and theencoding
which are integer and string respectively. - The second pattern matches the number of the elements but fails to match the data type of the
response_data
which is adict
data type. - The third pattern matches both the number of elements and the data type of each element.
When we run the code above, we will get the following output:
The Third pattern matches the subject
status_code:200, encoding:utf-8, response_data: \
'userId': 1, 'id': 1, 'title': 'sunt aut facere \
{repellat provident occaecati excepturi optio reprehenderit', \
'body': 'quia et suscipit\nsuscipit recusandae consequuntur \
expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum \
rerum est autem sunt rem eveniet architecto'}
The above pattern can be made more concise by passing in the variable names as arguments into the data type classes, as shown below:
int(status_code), str(encoding)]: case [
Matching Wildcard Patterns
The wildcard pattern, denoted by an underscore ( _
) matches any structure but doesn’t bind the value. It is often used as a fallback pattern if no pattern matches the structure of the subject. Here’s an example:
def main(response):
= response.status_code
status_code
match status_code:300:
case print("The response is 300")
400:
case print("The response is 400")
case _:print("No pattern matches the response status !")
When you run the code above, you will get an output as shown below:
No pattern matches the response status !
The status code of the response is expected to be 200; so none of the literal patterns matched the status code. Since the wildcard pattern ( _
) matches any structure, the interpreter will execute the statement in the last case clause.
We can match the example in the captured pattern above with the wildcard pattern if we only want to bind the value of the encoding
as shown below:
def main(response):
= [response.status_code, response.encoding, response.json()]
values
match values:
case [ _, encoding]:print("The first pattern matches the subject")
print(encoding)
case [ _, encoding, _ ]:print("The second pattern matches the subject")
print(encoding)
The wildcard pattern will match the status code and response data:
The second pattern matches the subject
utf-8
An attempt to access the value of the wildcard pattern ( _
) will give a NameError
as shown below:
NameError: name '_' is not defined
Matching OR Patterns
The OR pattern allows you to combine structurally equivalent alternatives into a new pattern, allowing several patterns to share a common handler.
“If any of an OR pattern’s subpatterns matches the subject, the entire OR pattern succeeds.” - PEP635
The OR pattern is specified with a pipe
( | ) character in between the structurally equivalent alternatives.
Once one of the patterns that are separated by the or character matches the structure of the subject, the pattern matching is successful and the interpreter executes the code under the respective case clause.
If more than one of the alternatives matches, only the first value will be picked and optionally bound to a variable if we specify a variable to bind to. This behavior conforms with Python’s or
operator short-circuit evaluation which stops evaluation as soon as it finds the first condition that evaluates to True
.
The example below checks the encoding scheme of the response by matching the subject against patterns with alternatives.
As with other pattern, we can bind the value of the pattern that matches the structure of the subject to a variable as shown below:
def main(response):
= response.encoding
encoding
match encoding:"utf-8" | "utf-16" as encoding:
case print(f"The response was encoded with {encoding} \
encoding scheme")
"base64" | "ascii" as encoding:
case print(f"The response was encoded with {encoding} \
encoding scheme")
case _:print("No pattern matches the response encoding !")
The first pattern will match if the encoding scheme is either utf-8
or utf-16
.The second pattern will match if the encoding scheme is either base64
or ascii
and the value of the encoding will be bound to the encoding
variable we specified.
If we run the code above, we will get the output below:
The response was encoded with utf-8 encoding scheme
Matching Value Patterns
Value patterns involve accessing the attributes of an object. Python matches the subject against the value of the attribute that we accessed in the pattern.
This is similar in syntax to the capture pattern. However, the structural pattern matching specification adopted a rule that any attribute access is to be interpreted as a value pattern, and the value of the subject is matched against the value of the attribute.
from http import HTTPStatus
import requests
def main(response):
match (response.status_code, response.json()):
case (HTTPStatus.OK.value, body):print(f"The response is OK")
| HTTPStatus.NOT_FOUND.value, _):
case (HTTPStatus.BAD_REQUEST.value print(f"Bad request or Not found")
case _:print("No pattern matches the response status code !")
= requests.get("https://jsonplaceholder.typicode.com/posts/0") #new
response main(response)
In the example above:
- We request an article with an
id
of0
which returns a status code of404 Not Found
. - We specify the subject as a sequence of the response status code and the response json data.
- The pattern in each case clause is a sequence whose first element is the attribute access.
The subject is matched against the value the attribute access evaluates to—rather than binding the subject as a value to the attribute.
The first case pattern evaluates to:
200, body) case (
The second case pattern evaluates to:
400 | 404, _) case (
The evaluation of the second pattern contains a literal pattern, an or pattern, and the wildcard pattern.
The response status code is 404; so the second pattern matches and the interpreter executes the code block associated with it:
Bad request or Not found
Matching Sequence Patterns
Sequence patterns are patterns with comma-separated values enclosed within ( … )
or [ … ]
.
Depending on whether or not the sequence pattern contains a wildcard, it could be a fixed-length or a variable-length sequence pattern.
The fixed-length sequence pattern has to match the subject length-wise and element-wise. The pattern fails if the length of the subject sequence is not equal to the length of the sequence in the pattern.
The variable-length sequence pattern uses the Python iterable packing and unpacking syntax ( the star character *
) to pack a slice of the sequence into a variable. A variable-length sequence can contain at most one starred subpattern.
As in iterable unpacking, the specification does not distinguish between ‘tuple’ and ‘list’ notation. [1, 2, 3]
is equivalent to (1, 2, 3)
as well as 1, 2, 3
. If we need to match the sequence against its type, we need to wrap the sequence with the data type class: list([1,2,3]) or tuple(1,2,3).
In the context of pattern matching, only the following are recognized as sequences:
array.array
collections.deque
list
memoryview
range
tuple
import requests
def main(response):
match response.json():
case [last_post]:print(last_post)
*_, last_post:
case first_post, print("first_post: ", first_post)
print("last_post:", last_post)
case _:print("No pattern matches the response status code !")
= requests.get("https://jsonplaceholder.typicode.com/posts")
response main(response)
The code above gives the output shown below:
first_post: {'userId': 1, 'id': 1, 'title': \
'sunt aut facere repellat provident occaecati excepturi \
optio reprehenderit', 'body': 'quia et suscipit\nsuscipit \
recusandae consequuntur expedita et cum\nreprehenderit molestiae \
ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto'}
last_post: {'userId': 10, 'id': 100, 'title': 'at nam consequatur ea \
labore ea harum', 'body': 'cupiditate quo est a modi nesciunt soluta\nipsa \
voluptas error itaque dicta in\nautem qui minus magnam et distinctio \
eum\naccusamus ratione error aut'}
The code snippet above requests a list of posts from the /posts
endpoint which returns 100 lists of posts.
The first pattern match is unsuccessful because the length of the sequence pattern does not match the length of the sequence in the subject.
The second pattern binds the value of the first element in the sequence of the subject to the first_post
pattern and binds the last element in the last_post
variable. The elements in between are packed and bound into the wildcard pattern. Hence the second pattern matches the structure of the subject.
Matching Mapping Patterns
The mapping pattern allows us to match and extract both values and keys from mapping data structures like Python dict data type. The values and keys are matched against a given subpattern. The keys of the mapping pattern must be literals or value patterns while the value could be any of the patterns we’ve discussed earlier.
As an example, let’s match the data of the json response of the posts/1
endpoint. The response data has the following structure:
{
"userId": 1,
"id": 1,
"title": "sunt aut facere…",
"body": "quia et suscipit\nsuscipit recusandae…"
}
Matching Keys
The patterns in the key could be literal or value patterns.
All or some of the keys in the mapping data structure could be specified. If only some of the keys are specified, other keys are ignored and the pattern will match if a key that matches such a pattern is in the mapping data structure that we specify as the subject.
import requests
def main(response):
= response.json()
post_data
match post_data:"user_id":1}:
case {print("Pattern 1 matched")
"userId":1, "postId":1}:
case {print("pattern 2 matched")
"userId":1, "id":1}:
case {print("pattern 3 matched")
case _:print("No pattern matched")
= requests.get("https://jsonplaceholder.typicode.com/posts/1")
response main(response)
In the example above:
- The first pattern failed because
user_id
is not a key in the response data ( the subject ). - The second pattern failed because the subject does not have a
post_id
key. - The third pattern matches because both of the keys we specified are present in the subject.
Output :
pattern 3 matched
Matching Values
The patterns in the value could be any form of pattern we have discussed so far.
def main(response):
= response.json()
post_data
match post_data:"userId":2}:
case {print("Pattern 1 matched")
"userId":user_id, "id":post_id} if user_id < post_id:
case {print("pattern 2 matched")
"userId":user_id, "id":1|2|3} if user_id >= 1:
case {print("pattern 3 matched")
case _:print("No pattern matched")
In the code above:
- The first pattern will fail because the literal pattern
2
does not match the value of theuserId
key in the subject. - The second pattern will fail because the guard clause will fail since the
user_id
is not less than thepost_id
. - The third pattern will match because all the keys we specified are present, the
guard
will evaluate toTrue
and theor pattern
will pass.
Output :
pattern 3 matched
Key-Value Packing
When we match some part of the key-value pairs in a mapping data structure, as shown above, the interpreter ignores the other key-value pairs. If we need them, we can leverage Python’s sequence packing feature to match and bind several keys and values in the subject to a variable as shown below:
def main(response):
= response.json()
post_data
match post_data:"user_id":1, **others}:
case {print("Pattern 1 matched ", others)
"userId":user_id, "id":post_id, **others} \
case {if user_id < post_id:
print("pattern 2 matched", others)
"userId":user_id, "id":1|2|3, **others} \
case {if "title" in others.keys():
print("pattern 3 matched")
print(others)
case _:print("No pattern matched")
When we execute the code above, we get the following output:
pattern 3 matched
'title': 'sunt aut facere repellat provident occaecati\
{ excepturi optio reprehenderit', 'body': 'quia et \
suscipit\nsuscipit recusandae consequuntur expedita et \
cum\nreprehenderit molestiae ut ut quas totam\nnostrum \
rerum est autem sunt rem eveniet architecto'}
The third pattern matches.
The value that the interpreter binds to the others
variable is a mapping data structure that has all the attributes of a normal mapping data structure. Hence we can construct a guard
checking if a key is present in it.
Matching Class Pattern
Class patterns check whether the subject is an instance of a specific class. If there are no arguments, the pattern matches if the subject is an instance of the class specified in the pattern.
class Post:
def __init__(self, userId, id, title, body):
self.userId = userId
self.title = title
self.body = body
self.post_id = id
class Post2:
pass
We have two classes:
The
Post
class with attributes that match with the keys in the single post json response.The
Post2
class with no attribute.
Matching an Instance of a Class
In the code below, the match-case
statement behaves like the Python isinstance
function.
We create an instance of the Post
class with the json response and match the instance against the two classes we created earlier. If the subject is an instance of the class we specify as the pattern in the case clause, the pattern matching will be successful. In this case, the second pattern matches.
import requests
def main(response):
= Post(**response.json())
post
match post:
case Post2():print("Pattern 1")
case Post():print("Pattern 2")
case _:print("No pattern matches the post class !")
= requests.get("https://jsonplaceholder.typicode.com/posts/1")
response main(response)
Matching Keyword Arguments
The patterns above only match if the subject is an instance of the pattern
.
In addition to matching class instances, we can specify keyword arguments in the pattern to match against the keyword arguments in the subject.
If keywords arguments are present in the class pattern:
- They are looked up as an attribute on the subject.
- If the attribute lookup raises an
AttributeError
, the pattern fails. - If not, the subpattern associated with the keyword pattern is matched against the attribute value of the subject, if it succeeds, the whole pattern matching succeeds, otherwise it fails.
def main(response):
= Post(**response.json())
post
match post:= 4 | 5):
case Post(post_id print("Pattern 1")
= 1 | 2, userId= id) if id >= 2:
case Post(post_id print("Pattern 2")
= 1 | 2, userId= id) if id == 1:
case Post(post_id print("Pattern 3")
case _:print("No pattern matches the post class !")
In the case above:
- The first pattern above matches the class of the subject but the value of the
post_id
keyword argument does not match because thepost_id
is neither 4 nor 5 hence the overall pattern fails. - The second pattern matches the class and the
post_id
but theguard
clause fails because theuserId
is less than 2 - The third pattern matches the class, the
post_id
, and theguard
, hence the whole pattern matching succeeds.
Matching Positional Arguments
Let’s specify the arguments as positional arguments instead:
def main(response):
= Post(**response.json())
post
match post:4 | 5):
case Post(print("Pattern 1")
1 | 2, id) if id >= 2:
case Post(print("Pattern 2")
1 | 2, id) if id == 1:
case Post(print("Pattern 3")
case _:print("No pattern matches the response status code !")
We will get the following error:
Traceback (most recent call last):
File "/home/dracode/Adventure/pattern_matching.py", \
line 302, in <module>
main(response)
File "/home/dracode/Adventure/pattern_matching.py", \
line 292, in main
case Post(4 | 5):
TypeError: Post() accepts 0 positional sub-patterns (1 given)
Python classes do not have a natural ordering of their attributes, we need to specify the order of the attributes using the __match_args__
attribute before we can use the positional arguments in the patterns.
class Post:
= ("post_id", "userId", "title", "body")
__match_args__ def __init__(self, userId, id, title, body):
self.userId = userId
self.title = title
self.body = body
self.post_id = id
The __match_args__
allows us to order the attributes based on our preference.
- In the case above, the first argument in the pattern will match against the equivalent first value in the
__match__args
. - The
post_id
will be the first positional argument, theuserId
will be the second while thetitle
andbody
will be the third and fourth positional arguments respectively.
If positional patterns are present in a class, they are converted to keyword patterns based on the arrangement in the __match_args__
attribute.
Conclusion
In this tutorial, we delved into Python 3.10’s structural pattern matching feature. You learned about various patterns, like literal, capture, wildcard, AS, OR, sequence, mapping, and class. And the practicality of this feature isn’t confined to API response matching in web development, it’s useful in any scenario where value structure matching is required.
As you continue to explore Python’s capabilities, you might also be interested in optimizing your builds. If so, give Earthly a try! Earthly can be a great asset in your Python development toolkit.
Earthly Cloud: Consistent, Fast Builds, Any CI
Consistent, repeatable builds across all environments. Advanced caching for faster builds. Easy integration with any CI. 6,000 build minutes per month included.