Structural pattern matching in Python 3.10
September 2021
Summary: Python 3.10, which is due out in early October 2021, will include a large new language feature called structural pattern matching. This article is a critical but (hopefully) informative presentation of the feature, with examples based on real-world code.
Go to: What it is | Where it shines | My code | Other projects | Problems | Wrapping up
At a recent local Python meetup, a friend was presenting some of the new features in Python 3.8 and 3.9, and afterwards we got to talking about the pattern matching feature coming in Python 3.10. I went on a mild rant about how I thought Python had lost the plot: first assignment expressions using :=
, and now this rather sprawling feature.
My friend interpreted my rant rather generously, and soon said, “it sounds like you want to give a talk about it at our next meetup”. Okay … well, why not!
In the meantime, I thought I’d get to know the feature better by writing up my thoughts and some code examples in article form. As you can gather, I’m rather biased, but I’ll try to present the positives as well as just criticism.
The pattern matching feature has no fewer than three PEPs (Python Enhancement Proposals) to describe it:
The tutorial in particular provides a good overview of the feature, so if you just want to read one of the PEPs, read that one. I’ll also demo the features below.
The cynical side of me noticed that the rationale is by far the longest (clocking in at 8500 words). Methinks thou dost protest too much? To be fair, however, it looks like they just moved the usual “Rejected Ideas” section of the PEP to its own PEP due to the length.
But I think what’s missing in the PEPs is an evaluation of the costs versus the benefits. The costs are significant new language semantics for developers to learn, as well as the cost of implementation (for CPython and other Python implementations). The benefits need to be discussed in light of real-world code: the kind of code people use Python for on a day-to-day basis, not just the fairly contrived examples in the “Motivation” section of the PEP.
Part of what I want to do here is evaluate some real code and see how much (or little) pattern matching improves it. But first, let’s take a brief look at what structural pattern matching in Python looks like.
What it is
It’s tempting to think of pattern matching as a switch
statement on steroids. However, as the rationale PEP points out, it’s better thought of as a “generalized concept of iterable unpacking”. Many people have asked for a switch
in Python over the years, though I can see why that has never been added. It just doesn’t provide enough value over a bunch of if ... elif
statements to pay for itself. The new match ... case
feature provides the basics of switch
, plus the “structural” matching part – and some.
The basic syntax is shown in the following switch-like example (imagine we’re rolling our own Git CLI):
parser = argparse.ArgumentParser()
parser.add_argument('command', choices=['push', 'pull', 'commit'])
args = parser.parse_args()
match args.command:
case 'push':
print('pushing')
case 'pull':
print('pulling')
case _:
parser.error(f'{args.command!r} not yet implemented')
Python evaluates the match
expression, and then tries each case
from the top, executing the first one that matches, or the _
default case if no others match.
But here’s where the structural part comes in: the case
patterns don’t just have to be literals. The patterns can also:
- Use variable names that are set if a
case
matches - Match sequences using list or tuple syntax (like Python’s existing iterable unpacking feature)
- Match mappings using dict syntax
- Use
*
to match the rest of a list - Use
**
to match other keys in a dict - Match objects and their attributes using class syntax
- Include “or” patterns with
|
- Capture sub-patterns with
as
- Include an
if
“guard” clause
Wow! That’s a lot of features. Let’s see if we can use them all in one go, to see what they look like in a very contrived example (for a more gradual introduction, read the tutorial):
class Car:
__match_args__ = ('key', 'name')
def __init__(self, key, name):
self.key = key
self.name = name
expr = eval(input('Expr: '))
match expr:
case (0, x): # seq of 2 elems with first 0
print(f'(0, {x})') # (new variable x set to second elem)
case ['a', x, 'c']: # seq of 3 elems: 'a', anything, 'c'
print(f"'a', {x!r}, 'c'")
case {'foo': bar}: # dict with key 'foo' (may have others)
print(f"{{'foo': {bar}}}")
case [1, 2, *rest]: # seq of: 1, 2, ... other elements
print(f'[1, 2, *{rest}]')
case {'x': x, **kw}: # dict with key 'x' (others go to kw)
print(f"{{'x': {x}, **{kw}}}")
case Car(key=key, name='Tesla'): # Car with name 'Tesla' (any key)
print(f"Car({key!r}, 'TESLA!')")
case Car(key, name): # similar to above, but use __match_args__
print(f"Car({key!r}, {name!r})")
case 1 | 'one' | 'I': # int 1 or str 'one' or 'I'
print('one')
case ['a'|'b' as ab, c]: # seq of 2 elems with first 'a' or 'b'
print(f'{ab!r}, {c!r}')
case (x, y) if x == y: # seq of 2 elems with first equal to second
print(f'({x}, {y}) with x==y')
case _:
print('no match')
As you can see, it’s complex but also powerful. The exact details of how matching is done are provided in the spec. Thankfully, much of the above is fairly self-explanatory, though the __match_args__
attribute requires an explanation: if position arguments are used in a class pattern, the items in the class’s __match_args__
tuple provide the names of the attributes. This is a shorthand to avoid specifying the attribute names in the class pattern.
One thing to note is that match
and case
are not real keywords but “soft keywords”, meaning they only operate as keywords in a match ... case
block. This is by design, because people use match
as a variable name all the time – I almost always use a variable named match
as the result of a regex match.
Where it shines
As I mentioned, I don’t think match
pays off if you’re just using it as a glorified switch
. So where does it pay off?
In the tutorial PEP there are several examples that make it shine: matching commands and their arguments in a simple text-based game. I’ve merged some of those examples together and copied them below:
command = input("What are you doing next? ")
match command.split():
case ["quit"]:
print("Goodbye!")
quit_game()
case ["look"]:
current_room.describe()
case ["get", obj]:
character.get(obj, current_room)
case ["drop", *objects]:
for obj in objects:
character.drop(obj, current_room)
case ["go", direction] if direction in current_room.exits:
current_room = current_room.neighbor(direction)
case ["go", _]:
print("Sorry, you can't go that way")
case _:
print(f"Sorry, I couldn't understand {command!r}")
As a comparison, let’s do a quick rewrite of that snippet without pattern matching, old-school style. You’d almost certainly use a bunch of if ... elif
blocks. I’m going to set a new variable fields
to the pre-split fields, and n
to the number of fields, to make the conditions simpler:
command = input("What are you doing next? ")
fields = text.split()
n = len(fields)
if fields == ["quit"]:
print("Goodbye!")
quit_game()
elif fields == ["look"]:
current_room.describe()
elif n == 2 and fields[0] == "get":
obj = fields[1]
character.get(obj, current_room)
elif n >= 1 and fields[0] == "drop":
objects = fields[1:]
for obj in objects:
character.drop(obj, current_room)
elif n == 2 and fields[0] == "go":
direction = fields[1]
if direction in current_room.exits:
current_room = current_room.neighbor(direction)
else:
print("Sorry, you can't go that way")
else:
print(f"Sorry, I couldn't understand {command!r}")
Apart from being a bit shorter, I think it’s fair to say that the structural matching version is easier to read, and the variable binding avoids the manual indexing like fields[1]
. In pure readability terms, this example seems like a clear win for pattern matching.
The tutorial also provides examples of class-based matching, in what is presumably part of the game’s event loop:
match event.get():
case Click((x, y), button=Button.LEFT): # This is a left click
handle_click_at(x, y)
case Click():
pass # ignore other clicks
case KeyPress(key_name="Q") | Quit():
game.quit()
case KeyPress(key_name="up arrow"):
game.go_north()
...
case KeyPress():
pass # Ignore other keystrokes
case other_event:
raise ValueError(f"Unrecognized event: {other_event}")
Let’s try rewriting this one using plain if ... elif
. Similar to how I handled the “go” command in the rewrite above, I’ll merge cases together where it makes sense (events of the same class):
e = event.get()
if isinstance(e, Click):
x, y = e.position
if e.button == Button.LEFT:
handle_click_at(x, y)
# ignore other clicks
elif isinstance(e, KeyPress):
key = e.key_name
if key == "Q":
game.quit()
elif key == "up arrow":
game.go_north()
# ignore other keystrokes
elif isinstance(e, Quit):
game.quit()
else:
raise ValueError(f"Unrecognized event: {e}")
To me this one seems more borderline. It’s definitely a bit nicer with pattern matching, but not by much. The match
has the advantage of the case
s all lining up; the if ... elif
has the advantage that the event types are grouped more strongly, and we avoid repeating the types.
Despite my skepticism, I’m trying to be fair: these examples do look nice, and even if you haven’t read the pattern matching spec, it’s reasonably clear what they do – with the possible exception of the __match_args__
magic.
There’s also an expression parser and evaluator that Guido van Rossum wrote to show-case the feature. It uses match ... case
heavily (11 times in a fairly small file). Here’s one example:
def eval_expr(expr):
"""Evaluate an expression and return the result."""
match expr:
case BinaryOp('+', left, right):
return eval_expr(left) + eval_expr(right)
case BinaryOp('-', left, right):
return eval_expr(left) - eval_expr(right)
case BinaryOp('*', left, right):
return eval_expr(left) * eval_expr(right)
case BinaryOp('/', left, right):
return eval_expr(left) / eval_expr(right)
case UnaryOp('+', arg):
return eval_expr(arg)
case UnaryOp('-', arg):
return -eval_expr(arg)
case VarExpr(name):
raise ValueError(f"Unknown value of: {name}")
case float() | int():
return expr
case _:
raise ValueError(f"Invalid expression value: {repr(expr)}")
What would that look like with if ... elif
? Once again, you’d probably structure it slightly differently, with all the BinaryOp
cases together. Note that the nested if
blocks don’t actually increase the indentation level, due to the double nesting required for the case
clauses in a match
:
def eval_expr(expr):
"""Evaluate an expression and return the result."""
if isinstance(expr, BinaryOp):
op, left, right = expr.op, expr.left, expr.right
if op == '+':
return eval_expr(left) + eval_expr(right)
elif op == '-':
return eval_expr(left) - eval_expr(right)
elif op == '*':
return eval_expr(left) * eval_expr(right)
elif op == '/':
return eval_expr(left) / eval_expr(right)
elif isinstance(expr, UnaryOp):
op, arg = expr.op, expr.arg
if op == '+':
return eval_expr(arg)
elif op == '-':
return -eval_expr(arg)
elif isinstance(expr, VarExpr):
raise ValueError(f"Unknown value of: {name}")
elif isinstance(expr, (float, int)):
return expr
raise ValueError(f"Invalid expression value: {repr(expr)}")
It’s two more lines due to the manual attribute unpacking for the BinaryOp
and UnaryOp
fields. Maybe it’s just me, but I find this one just as easy to read as the match
version, and more explicit.
Another place match
might be useful is when validating the structure of JSON from an HTTP request (this is my own made-up example):
try:
obj = json.loads(request.body)
except ValueError:
raise HTTPBadRequest(f'invalid JSON: {request.body!r}')
match obj:
case {
'action': 'sign-in',
'username': str(username),
'password': str(password),
'details': {'email': email, **other_details},
} if username and password:
sign_in(username, password, email=email, **other_details)
case {'action': 'sign-out'}:
sign_out()
case _:
raise HTTPBadRequest(f'invalid JSON structure: {obj}')
This is quite nice. However, one drawback is that it doesn’t give you good validation errors: ideally an API would tell the caller what fields were missing, or what types were incorrect.
Using it in my code
Let’s look at converting some existing code to use the new feature. I’m basically scanning for if ... elif
blocks to see if it makes sense to convert them. I’ll start with a few examples of code I’ve written.
The first couple of examples are from pygit, a toy subset of git
that’s just enough of a Git client to create a repo, commit, and push itself to GitHub (full source code).
I’ve collapsed the code blocks below by default. Just click the arrow or summary paragraph to expand.
if answer() == 42:
print('The meaning of life, the universe and everything!')
Example from find_object()
. Some aspects are a bit nicer, but overall I think rewriting it to use match
is over-use of the feature.
def find_object(sha1_prefix):
...
objects = [n for n in os.listdir(obj_dir) if n.startswith(rest)]
if not objects:
raise ValueError('object {!r} not found'.format(sha1_prefix))
if len(objects) >= 2:
raise ValueError('multiple objects ({}) with prefix {!r}'.format(
len(objects), sha1_prefix))
return os.path.join(obj_dir, objects[0])
It’s quite clear as is, but let’s see if it’s simpler using match
:
def find_object(sha1_prefix):
...
objects = [n for n in os.listdir(obj_dir) if n.startswith(rest)]
match objects:
case []:
raise ValueError('object {!r} not found'.format(sha1_prefix))
case [obj]:
return os.path.join(obj_dir, obj)
case _:
raise ValueError('multiple objects ({}) with prefix {!r}'
.format(len(objects), sha1_prefix))
The cases themselves are a bit nicer, especially how obj
is bound automatically instead of using objects[0]
.
However, one thing that’s not great is how the “success case” ends up sandwiched in the middle, so the normal code path gets a bit lost. You could hack it to the end with the following (but then it’s definitely not as clear as the original):
match objects:
case []:
raise ValueError('object {!r} not found'.format(sha1_prefix))
case [_, _, *_]:
raise ValueError('multiple objects ({}) with prefix {!r}'
.format(len(objects), sha1_prefix))
case [obj]:
return os.path.join(obj_dir, obj)
Alternatively, you could put the success (most specific) case first, which is a bit nicer:
match objects:
case [obj]:
return os.path.join(obj_dir, obj)
case []:
raise ValueError('object {!r} not found'.format(sha1_prefix))
case _:
raise ValueError('multiple objects ({}) with prefix {!r}'
.format(len(objects), sha1_prefix))
Example from cat_file()
, shown two different ways. Seems like a slight win.
def cat_file(mode, sha1_prefix):
obj_type, data = read_object(sha1_prefix)
if mode in ['commit', 'tree', 'blob']:
if obj_type != mode:
raise ValueError('expected type {}, got {}'.format(
mode, obj_type))
sys.stdout.buffer.write(data)
elif mode == 'size':
print(len(data))
elif mode == 'type':
print(obj_type)
elif mode == 'pretty':
if obj_type in ['commit', 'blob']:
sys.stdout.buffer.write(data)
elif obj_type == 'tree':
... # pretty print tree
else:
assert False, 'unhandled type {!r}'.format(obj_type)
else:
raise ValueError('unexpected mode {!r}'.format(mode))
A direct translation would be the following (note the nested match
in the “pretty” case):
def cat_file(mode, sha1_prefix):
obj_type, data = read_object(sha1_prefix)
match mode:
case 'commit' | 'tree' | 'blob':
if obj_type != mode:
raise ValueError('expected type {}, got {}'.format(
mode, obj_type))
sys.stdout.buffer.write(data)
case 'size':
print(len(data))
case 'type':
print(obj_type)
case 'pretty':
match obj_type:
case 'commit' | 'blob':
sys.stdout.buffer.write(data)
case 'tree':
... # pretty print tree
case _:
assert False, 'unhandled type {!r}'.format(obj_type)
case _:
raise ValueError('unexpected mode {!r}'.format(mode))
We’re using match
as a simple switch, but it’s a very slight win. What about if we try reworking it to match mode
and obj_type
at the same time as a tuple:
def cat_file(mode, sha1_prefix):
obj_type, data = read_object(sha1_prefix)
match (mode, obj_type):
case ('commit' | 'tree' | 'blob', _) if obj_type == mode:
sys.stdout.buffer.write(data)
case ('size', _):
print(len(data))
case ('type', _):
print(obj_type)
case ('pretty', 'commit' | 'blob'):
sys.stdout.buffer.write(data)
case ('pretty', 'tree'):
... # pretty print tree
case _:
raise ValueError('unexpected mode {!r} or type {!r}'.format(
mode, obj_type))
Now it’s more streamlined than the original, though arguably no more clear!
Example from argument parsing, when switching on the CLI sub-command. Makes sense to use match
, but it’s a simple switch, not using the structural features.
args = parser.parse_args()
if args.command == 'add':
... # do add
elif args.command == 'cat-file':
... # do cat-file
elif args.command == 'commit':
... # do commit
...
Using match
would reduce the visual noise:
args = parser.parse_args()
match args.command:
case 'add':
... # do add
case 'cat-file':
... # do cat-file
case 'commit':
... # do commit
...
But we don’t really need a whole new feature for that. You can reduce most of the visual noise by assigning a short variable name:
args = parser.parse_args()
cmd = args.command
if cmd == 'add':
... # do add
elif cmd == 'cat-file':
... # do cat-file
elif cmd == 'commit':
... # do commit
...
Here are a couple more examples from Canonical’s ops
library, in pebble.py
, which I wrote for work.
Example from add_layer()
. It handles the various types allowed for the layer
parameter. Less visual noise, though also less explicit.
def add_layer(self, label, layer, *, combine=False):
...
if isinstance(layer, str):
layer_yaml = layer
elif isinstance(layer, dict):
layer_yaml = Layer(layer).to_yaml()
elif isinstance(layer, Layer):
layer_yaml = layer.to_yaml()
else:
raise TypeError('layer must be str, dict, or pebble.Layer')
# use layer_yaml
The match
version uses the class matching syntax:
def add_layer(self, label, layer, *, combine=False):
...
match layer:
case str():
layer_yaml = layer
case dict(): # could also be written "case {}:"
layer_yaml = Layer(layer).to_yaml()
case Layer():
layer_yaml = layer.to_yaml()
case _:
raise TypeError('layer must be str, dict, or pebble.Layer')
# use layer_yaml
Is the match
version clearer? It’s less noisy, but I kind of like the explicitness of the isinstance()
calls. In addition, the empty parentheses in the various cases are a bit weird – they look unnecessary with no positional arguments or attributes, but without them match
would bind new variables named str
or dict
.
At first I thought it was weird how the variables bound (and assigned) in a case
block outlive the entire match
block. But as shown in the above, it makes sense – you’ll often want to use the variables in the code below the match
.
Example from exec()
, code I’m working on now. No clearer in this case.
def exec(command, stdin=None, encoding='utf-8', ...):
if isinstance(command, (bytes, str)):
raise TypeError('command must be a list of str, not {}'
.format(type(command).__name__))
if len(command) < 1:
raise ValueError('command must contain at least one item')
if stdin is not None:
if isinstance(stdin, str):
if encoding is None:
raise ValueError('encoding must be set if stdin is str')
stdin = io.BytesIO(stdin.encode(encoding))
elif isinstance(stdin, bytes):
if encoding is not None:
raise ValueError('encoding must be None if stdin is bytes')
stdin = io.BytesIO(stdin)
elif not hasattr(stdin, 'read'):
raise TypeError('stdin must be str, bytes, or a readable file-like object')
...
Does match
help simplify those checks? Let’s see:
def exec(command, stdin=None, encoding='utf-8', ...):
match command:
case bytes() | str():
raise TypeError('command must be a list of str, not {}'
.format(type(command).__name__))
case []:
raise ValueError('command must contain at least one item')
match stdin:
case str():
if encoding is None:
raise ValueError('encoding must be set if stdin is str')
stdin = io.BytesIO(stdin.encode(encoding))
case bytes():
if encoding is not None:
raise ValueError('encoding must be None if stdin is bytes')
stdin = io.BytesIO(stdin)
case None:
pass
case _ if not hasattr(stdin, 'read'):
raise TypeError('stdin must be str, bytes, or a readable file-like object')
...
I’d argue this is no clearer. The case None
is awkward – we could avoid it by wrapping the whole thing in if stdin is not None:
like the original code did, but that adds a third nesting level, which is less than ideal.
The default case with a guard, case _ if not hasattr(stdin, 'read')
, is also a bit more obscure than the original elif
version. You could of course just use case _
and then nest the if not hasattr
.
Maybe I just don’t write much of the kind of code that would benefit from this feature, but my guess is that there are a lot of people that fall into this camp. However, let’s scan code from a few popular Python projects to see what we can find.
Using it in other projects
Go to: Standard library | Django | Warehouse | Mercurial | Ansible
I’m going to pick examples from three different types of code: library code (from the standard library), framework code (from the Django web framework), and application code (from Warehouse, the server that powers the Python Package Index, from Mercurial, and from Ansible).
In an effort to be fair, I tried to find examples that would really benefit from match
and were more than a glorified switch
(there were a lot of those, but they’re not using the structural part of pattern matching, so converting them isn’t a huge win). I went looking for elif
blocks which look like they test the structure of the data. There might be some good uses of match
in code which just uses if
without elif
, but I think that would be rare.
The standard library
Python’s standard library has about 709,000 lines of code, including tests (measured using scc). The ripgrep search tool (rg --type=py 'elif ' | wc
) tells us that 2529 of those lines are elif
statements, or 0.4%. I realize this would find “elif ” in comments, but presumably that’s rare.
Example from ast.literal_eval()
, in the _convert()
helper. Not surprisingly, the first really good use case I found was in AST processing. Definitely a win.
def _convert(node):
if isinstance(node, Constant):
return node.value
elif isinstance(node, Tuple):
return tuple(map(_convert, node.elts))
elif isinstance(node, List):
return list(map(_convert, node.elts))
elif isinstance(node, Set):
return set(map(_convert, node.elts))
elif (isinstance(node, Call) and isinstance(node.func, Name) and
node.func.id == 'set' and node.args == node.keywords == []):
return set()
elif isinstance(node, Dict):
if len(node.keys) != len(node.values):
_raise_malformed_node(node)
return dict(zip(map(_convert, node.keys),
map(_convert, node.values)))
elif isinstance(node, BinOp) and isinstance(node.op, (Add, Sub)):
left = _convert_signed_num(node.left)
right = _convert_num(node.right)
if isinstance(left, (int, float)) and isinstance(right, complex):
if isinstance(node.op, Add):
return left + right
else:
return left - right
return _convert_signed_num(node)
Converting that to use match
:
def _convert(node):
match node:
case Constant(value):
return value
case Tuple(elts):
return tuple(map(_convert, elts))
case List(elts):
return list(map(_convert, elts))
case Set(elts):
return set(map(_convert, elts))
case Call(Name('set'), args=[], keywords=[]):
return set()
case Dict(keys, values):
if len(keys) != len(values):
_raise_malformed_node(node)
return dict(zip(map(_convert, keys),
map(_convert, values)))
case BinOp(left, (Add() | Sub()) as op, right):
left = _convert_signed_num(left)
right = _convert_num(right)
match (op, left, right):
case (Add(), int() | float(), complex()):
return left + right
case (Sub(), int() | float(), complex()):
return left - right
return _convert_signed_num(node)
Definitely a win! Syntax tree processing seems like the ideal use case for match
. In Python 3.10, the ast
module’s node types already have __match_args__
set, so that makes it even cleaner by avoiding Constant(value=value)
type of repetition.
Still, I want to find one outside the ast
module. I won’t include it here, but there’s a nice long if ... elif
chain in curses/textpad.py
in do_command()
: it’s mostly a simple switch, but it would benefit from match ... case
with a few if
guards.
Example from dataclasses
, in _asdict_inner()
. Reduces visual noise for a nice little improvement.
def _asdict_inner(obj, dict_factory):
if _is_dataclass_instance(obj):
result = []
for f in fields(obj):
value = _asdict_inner(getattr(obj, f.name), dict_factory)
result.append((f.name, value))
return dict_factory(result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
elif isinstance(obj, (list, tuple)):
return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
elif isinstance(obj, dict):
return type(obj)((_asdict_inner(k, dict_factory),
_asdict_inner(v, dict_factory))
for k, v in obj.items())
else:
return copy.deepcopy(obj)
Let’s try converting that to match
:
def _asdict_inner(obj, dict_factory):
match obj:
case _ if _is_dataclass_instance(obj):
result = []
for f in fields(obj):
value = _asdict_inner(getattr(obj, f.name), dict_factory)
result.append((f.name, value))
return dict_factory(result)
case tuple(_fields=_):
return type(obj)(*[_asdict_inner(v, dict_factory) for v in obj])
case list() | tuple():
return type(obj)(_asdict_inner(v, dict_factory) for v in obj)
case {}:
return type(obj)((_asdict_inner(k, dict_factory),
_asdict_inner(v, dict_factory))
for k, v in obj.items())
case _:
return copy.deepcopy(obj)
A nice little improvement, though the first case _
with the if
guard is a bit weird. It may be able to be moved to a regular if
statement inside the final case _
, but I don’t know the code well enough to know if that ordering would still do what’s required.
Example from email.utils
, in parsedate_tz()
. Matching with tuple unpacking makes it a good bit cleaner.
def _parsedate_tz(data):
...
tm = tm.split(':')
if len(tm) == 2:
[thh, tmm] = tm
tss = '0'
elif len(tm) == 3:
[thh, tmm, tss] = tm
elif len(tm) == 1 and '.' in tm[0]:
# Some non-compliant MUAs use '.' to separate time elements.
tm = tm[0].split('.')
if len(tm) == 2:
[thh, tmm] = tm
tss = 0
elif len(tm) == 3:
[thh, tmm, tss] = tm
else:
return None
# use thh, tmm, tss
Let’s have a shot at converting it to use match
:
def _parsedate_tz(tm):
...
match tm.split(':'):
case [thh, tmm]:
tss = '0'
case [thh, tmm, tss]:
pass
case [s] if '.' in s:
match s.split('.'):
case [thh, tmm]:
tss = 0
case [thh, tmm, tss]:
pass
case _:
return None
case _:
return None
# use thh, tmm, tss
This is definitely a good bit cleaner. It’s always a bit of a pain when you use str.split()
to have to test the length before unpacking a tuple (you could also catch the ValueError
exception, but it’s not as clear, and the nesting levels get a bit much).
Side note: the str.partition()
method is often useful in cases like this, but only when you have two items with a separator in between.
Interestingly, while testing parsedate_tz()
I found that this code has a bug that raises an UnboundLocalError
on invalid user input: if you pass a time with more than 3 dotted segments like 12.34.56.78
, the thh
/tmm
/tss
variables won’t be defined for the following code. Check this out:
$ python3.10 -c 'import email.utils; \
email.utils.parsedate_tz("Wed, 3 Apr 2002 12.34.56.78+0800")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.10/email/_parseaddr.py", line 50, in parsedate_tz
res = _parsedate_tz(data)
File "/usr/local/lib/python3.10/email/_parseaddr.py", line 134, in _parsedate_tz
thh = int(thh)
UnboundLocalError: local variable 'thh' referenced before assignment
All it needs is another else: return None
in the dot case. I’ve opened an issue and a pull request that adds a test case for this and fixes the bug.
Django
Django has 327,000 lines of code, including tests. Of these, there are 905 uses of elif
, or 0.3%.
Example from Django admin checks, in _check_fieldsets_item()
. The structural matching is great here, but doesn’t help produce good error messages.
def _check_fieldsets_item(self, obj, fieldset, label, seen_fields):
if not isinstance(fieldset, (list, tuple)):
return must_be('a list or tuple', option=label, obj=obj, id='admin.E008')
elif len(fieldset) != 2:
return must_be('of length 2', option=label, obj=obj, id='admin.E009')
elif not isinstance(fieldset[1], dict):
return must_be('a dictionary', option='%s[1]' % label, obj=obj, id='admin.E010')
elif 'fields' not in fieldset[1]:
return [
checks.Error(
"The value of '%s[1]' must contain the key 'fields'." % label,
obj=obj.__class__,
id='admin.E011',
)
]
elif not isinstance(fieldset[1]['fields'], (list, tuple)):
return must_be('a list or tuple', option="%s[1]['fields']" % label, obj=obj, id='admin.E008')
seen_fields.extend(flatten(fieldset[1]['fields']))
...
This is interesting: it’s doing a lot of nested structural matching, which seems like a great fit. Let’s see about converting it. This kind of does the job:
def _check_fieldsets_item(self, obj, fieldset, label, seen_fields):
match fieldset:
case (_, {'fields': [*fields]}):
pass
case _:
return must_be('a list or tuple of length 2 with a fields dict')
seen_fields.extend(flatten(fields))
...
If the specific error messages don’t matter, that’s really nice! However, in this case they probably do, otherwise it wouldn’t be carefully broken out the way it is. To fix that, we need to specify all the cases, but in the opposite order from the original, so that the most specific is matched first:
def _check_fieldsets_item(self, obj, fieldset, label, seen_fields):
match fieldset:
case [_, {'fields': [*fields]}]:
pass # valid, fall through
case [_, {'fields': _}]:
return must_be('a list or tuple', option="%s[1]['fields']" % label, obj=obj, id='admin.E008')
case [_, {}]:
return [
checks.Error(
"The value of '%s[1]' must contain the key 'fields'." % label,
obj=obj.__class__,
id='admin.E011',
)
]
case [_, _]:
return must_be('a dictionary', option='%s[1]' % label, obj=obj, id='admin.E010')
case [*_]:
return must_be('of length 2', option=label, obj=obj, id='admin.E009')
case _:
return must_be('a list or tuple', option=label, obj=obj, id='admin.E008')
seen_fields.extend(flatten(fields))
...
Is it clearer? Not really. It’s a little strange repeating yourself, getting less and less specific. It also seems less obvious to me with the cases “backwards”, falling through to the looser matches. And [_, _]
followed by [*_]
to mean “not of length 2” is not exactly explicit.
Warehouse
Warehouse, PyPI’s server code, has 59,000 lines of Python code, including tests. There are 35 uses of elif
, or 0.06%. Interestingly, that’s an order of magnitude less than either the standard library or Django, which fits with my conjecture that match
won’t pay off as much in “regular” code.
Example from BigQuery syncing, in sync_bigquery_release_files()
. This is the only example I found in Warehouse that (at first!) looked like it would benefit from match
, but turns out it doesn’t.
for sch in table_schema:
if hasattr(file, sch.name):
field_data = getattr(file, sch.name)
elif hasattr(release, sch.name) and sch.name == "description":
field_data = getattr(release, sch.name).raw
elif sch.name == "description_content_type":
field_data = getattr(release, "description").content_type
elif hasattr(release, sch.name):
field_data = getattr(release, sch.name)
elif hasattr(project, sch.name):
field_data = getattr(project, sch.name)
else:
field_data = None
However, on closer inspection, these structural tests are being done on three different values (file
, release
, and project
), and the structure they’re being tested for is dynamic. At first I was thinking object(name=name)
would do what we want, but the code is actually matching on an attribute with a name of whatever sch.name
’s value is. Tricky!
It seems Warehouse wasn’t exactly crying out for match
. I decided to keep it here anyway, as I think it’s a good counterpoint. Let’s find a couple more examples by skimming through two other large applications: Mercurial and Ansible.
Mercurial
Mercurial, the version control system, has 268,000 lines of Python code, including tests. There are 1941 uses of elif
, or 0.7% – the highest ratio yet.
Example from context.py
, in ancestor()
. Small improvement using tuple unpacking.
def ancestor(self, c2, warn=False):
n2 = c2._node
if n2 is None:
n2 = c2._parents[0]._node
cahs = self._repo.changelog.commonancestorsheads(self._node, n2)
if not cahs:
anc = self._repo.nodeconstants.nullid
elif len(cahs) == 1:
anc = cahs[0]
else:
anc = ...
return self._repo[anc]
Changing that to use match
:
def ancestor(self, c2, warn=False):
n2 = c2._node
if n2 is None:
n2 = c2._parents[0]._node
cahs = self._repo.changelog.commonancestorsheads(self._node, n2)
match cahs:
case []:
anc = self._repo.nodeconstants.nullid
case [anc]:
pass
case _:
anc = ...
return self._repo[anc]
There are quite a few cases like this where it might not be a huge win, but it is a small “quality of life” improvement for developers.
Ansible
Ansible is a widely-used configuration management system written in Python. It has 217,000 lines of Python code, including tests. There are 1594 uses of elif
, which again is 0.7%. Below are a couple of cases I saw which might benefit from pattern matching.
Example from module_utils/basic.py
, in _return_formatted()
. Small readability improvement.
def _return_formatted(self, kwargs):
...
for d in kwargs['deprecations']:
if isinstance(d, SEQUENCETYPE) and len(d) == 2:
self.deprecate(d[0], version=d[1])
elif isinstance(d, Mapping):
self.deprecate(d['msg'], version=d.get('version'), date=d.get('date'),
collection_name=d.get('collection_name'))
else:
self.deprecate(d)
...
Using match
with some light structural patterns gives us a small readability improvement – though I’m not sure the best way to handle the other types in SEQUENCETYPE
:
def _return_formatted(self, kwargs):
...
for d in kwargs['deprecations']:
match d:
case (msg, version):
self.deprecate(msg, version=version)
case {'msg': msg}:
self.deprecate(msg, version=d.get('version'), date=d.get('date'),
collection_name=d.get('collection_name'))
case _:
self.deprecate(d)
...
Example from utils/version.py
, in _Alpha.__lt__()
, some version-comparison code. Type checking is a little bit nicer with match
.
class _Alpha:
...
def __lt__(self, other):
if isinstance(other, _Alpha):
return self.specifier < other.specifier
elif isinstance(other, str):
return self.specifier < other
elif isinstance(other, _Numeric):
return False
raise ValueError
Once again, it’s a little bit nicer with match
:
class _Alpha:
__match_args__ = ('specifier',)
...
def __lt__(self, other):
match other:
case _Alpha(specifier):
return self.specifier < specifier
case str():
return self.specifier < other
case _Numeric():
return False
case _:
raise ValueError
In all of these projects, there are many more cases that could be converted to use match
, but I’ve tried to pick out a few different kinds of code where it made sense to at least try.
Some problems with the feature
As I’ve shown, pattern matching does make code clearer in a few cases, but there are a number of concerns I have with this feature. Obviously the ship has already sailed – Python 3.10 is due out in a few days! – but I think it’s valuable to consider the problems for future designs. (Python definitely doesn’t ship every feature people want: the rejected PEPs are interesting to peruse.)
There’s some trivial stuff like how match ... case
requires two indentation levels: the PEP authors considered various alternatives, and I believe they chose the right route – that’s only a minor annoyance. But what about larger problems?
Learning curve and surface area. As you can see from the size of the spec PEP, there is a lot to this feature, with about 10 sub-features packed into one. Python has always been an easy-to-learn language, and this feature, while it can look good on the page, has a lot of complexity in its semantics.
Another way to do things. The Zen of Python says, “There should be one – and preferably only one – obvious way to do it.” In reality, Python has always had many different ways to do things. But now there’s one that adds a fair bit of cognitive load to developers: as shown in many of the examples, developers may often need to try both with and without match
, and still be left debating which is more “obvious”.
Only useful in rarer domains. As shown above, there are cases where match
really shines. But they are few and far between, mostly when handling syntax trees and writing parsers. A lot of code does have if ... elif
chains, but these are often either plain switch-on-value, where elif
works almost as well, or the conditions they’re testing are a more complex combination of tests that don’t fit into case
patterns (unless you use awkward case _ if cond
clauses, but that’s strictly worse than elif
).
My hunch is that the PEP authors (Brandt Bucher and Guido van Rossum, both Python core developers) regularly write the kind of code that does benefit from pattern matching, but most application developers and script writers will need match
far less often. Guido van Rossum in particular has been working on the Mypy type checker for a while, and now he’s working on speeding up CPython – compiler work no doubt involving ASTs.
Syntax works differently. There are at least two parts of this feature where syntax that looks like one thing in “normal Python” acts differently inside a pattern:
- Variable names: a variable in a
case
clause doesn’t return its value like in ordinary code, it binds it as a name. This meanscase RED
doesn’t work as you expect – it will set a new variable namedRED
, not match your colour constant. To match on constants, they have to have a dot in them – socase Colors.RED
works. In writing some of the code above I actually made this mistake: I wrotecase ('commit' | 'tree' | 'blob', mode)
, expecting it to match if the tuple’s second item was equal tomode
, but of course it would have setmode
to the second item. - Class patterns: these look like function calls, but they’re really
isinstance
andhasattr
tests. It looks nice, but it’s sometimes confusing. It also means you can’t match on the result of an actual function call – that would have to be in anif
guard.
The rationale PEP does acknowledge these syntax variations in the “Patterns” section:
Although patterns might superficially look like expressions, it is important to keep in mind that there is a clear distinction. In fact, no pattern is or contains an expression. It is more productive to think of patterns as declarative elements similar to the formal parameters in a function definition.
The __match_args__ magic. In my opinion the __match_args__
feature is too magical, and requires developers to decide which of a class’s attributes should be position-matchable, if any. It’s also strange that the __match_args__
order could be different from the order of the class’s __init__
parameters (though in practice you’d try not to do that). I can see why they’ve included this feature, as it makes the likes of AST node matching really nice, but it’s not very explicit.
Cost for other implementations. CPython is by far the most commonly-used Python interpreter, but there are also others, such as PyPy and MicroPython, that will have to decide whether or not to implement this feature. Other interpreters are always playing catch-up anyway, but a feature of this size at this stage in Python’s history will make it even harder for other implementations to keep up.
Originally I was also concerned that match
’s class patterns don’t play well with Python’s use of duck typing, where you just access attributes and call methods on an object, without checking its type first (for example, when using file-like objects). With class patterns, however, you specify the type, and it performs an isinstance
check. Duck typing is still possible using object()
, but it would be a bit strange.
However, now that I’ve used the feature, I think this is mostly a theoretical concern – the places you’d use class patterns don’t really overlap with the places you’d use duck typing.
This duck typing concern is discussed briefly in the rationale PEP:
Paying tribute to Python’s dynamic nature with ‘duck typing’, however, we also added a more direct way to specify the presence of, or constraints on specific attributes. Instead of
Node(x, y)
you could also writeobject(left=x, right=y)
, effectively eliminating theisinstance()
check and thus supporting any object withleft
andright
attributes.
Wrapping up
I do like some aspects of pattern matching, and certain code is definitely cleaner with match ... case
than with if ... elif
. But does the feature provide enough value to justify the complexity, not to mention the cognitive burden it places on people learning Python or reading Python code?
That said, Python has always been a pragmatic programming language, not a purist’s dream. As Bjarne Stroustrup, the creator of C++, said, “There are only two kinds of languages: the ones people complain about and the ones nobody uses.” I have always liked Python, and I’ve used it successfully for many years. I’ll almost certainly continue to use it for many tasks. It’s not perfect, but if it were, no one would use it.
I’ve also been using Go a lot recently, and there’s definitely something good about how slowly the language changes (by design). Most of the release notes start with “there are no changes to the language” – for example, in Go 1.16 all the changes were in the tooling and standard library. That said, Go will have its own large new feature in a few months with generics coming in Go 1.18.
Overall I’m a bit pessimistic about structural pattern matching in Python. It’s just a big feature to add so late in the game (Python is 30 years old this year). Is the language starting to implode under its own weight?
Or, as my friend predicted, is it one of those features that will be over-used for everything for a couple of years, and then the community will settle down and only use it where it really improves the code? We shall see!
Comment over at Lobsters or Hacker News.
I’d love it if you sponsored me on GitHub – it will motivate me to work on my open source projects and write more good content. Thanks!