Python has sum types

I am absolutely thrilled about this discovery.


Python doesn't have sum types

One of my biggest gripes with Python is that it doesn't have sum types, or a way to emulate them. For example:

>>> from typing import List
>>> x = []
>>> isinstance(x, List[int])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nix/store/nvxp3xmlrxj9sw66dk7l0grz9m4889jn-python3-3.9.12/lib/python3.9/typing.py", line 720, in __instancecheck__
    return self.__subclasscheck__(type(obj))
  File "/nix/store/nvxp3xmlrxj9sw66dk7l0grz9m4889jn-python3-3.9.12/lib/python3.9/typing.py", line 723, in __subclasscheck__
    raise TypeError("Subscripted generics cannot be used with"
TypeError: Subscripted generics cannot be used with class and instance checks

This means that Python has no way to check if a variable is an instance of a specific type1...

Unless it does

I present to you: typing.Literal. "How is that relevant?" you ask. Excellent question. First, let's remember that sum types are also known as "tagged unions". Python has unions in the form of typing.Union (or the | syntax in newer versions). Given this, we can create the union half of a sum type like this:

from typing import Union, List

StrOrIntList = Union[str, List[int]]
#                    ^^^  ^^^^^^^^^ A **variant**
#                    |
#                    Another **variant**

The next problem is figuring out how to detect which variant we have. The obvious strategy is to use isinstance, but as established, isinstance is not flexible enough. I also looked around to see if there's a way to get type annotation information at runtime so one could check against that, but this doesn't seem to be possible. Even if you could, it would also not cover the case where you want multiple variants with the same data type: Union[int, int, str] is the same as Union[int, str] to the typechecker2, so there's no way to tell the two ints apart.

Next, we need some mechanism to associate each variant with a tag that's accessible at both runtime and typecheck-time, so that we can do control flow at runtime and allow the typechecker to assert that we're checking the right things before even running the code. For the association portion, we know that types can be paired together in Python by using typing.Tuple like this:

from typing import Tuple, List

StrAndInt = Tuple[str, List[int]]
#                 ^^^  ^^^^^^^^^ We'd like this to be the **variant's** data
#                 |
#                 We'd like to use this as our **tag**

This has a major problem, which is that the typechecker has no insight into the possible values of our tag, the str. This practically defeats the entire purpose, since it robs us of assertions that we're checking against valid tags, and that we're checking against all valid tags. After all, the goal is to ensure more things can be checked before runtime, and having to run the code to make sure you have no typos in strings where you're checking against a tag is pretty self-defeating. For a week or so after coming up with what I've discussed so far, I had no solution to this problem, and I thought most hope was lost. But then I had a realization, which led directly to this blog post.

For some reason3, Python allows you to use value literals as types. Importantly, value literals can be used not only as a type, but also as a value. Using this slightly odd behavior4, we can create a tag that's accessible at both runtime and at typecheck-time. For example:

from typing import List, Tuple, Literal

NamedInt = Tuple[Literal["a tag"], List[int]]
#                ^^^^^^^^^^^^^^^^ A statically-analyzable
#                                 *and* runtime-accessible **tag**!

On its own, this construct is completely and utterly useless. But, if we combine our tags with a union, we get...

Tagged unions

Which are also known as...

Sum types

In Python, sum types are constructed as a union of tuples of a tag and the variant's data. Here's an example:

from typing import List, Tuple, Literal, Union

MySumType = Union[
#           ^^^^^ The union
    Tuple[Literal["string"], str],
    Tuple[Literal["list of ints"], List[int]],
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A variant
#         ^^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^ The variant's data
#         |
#         The variant's tag
]

Now that we have our sum type, I'll demonstrate how it's used. The type of x[1] where x is of type MySumType can be either str or List[int]. We can determine the type of x[1] we have by looking at the tag, which is x[0]. Then, if we do things right, our typechecker will be able to automatically narrow the type of x[1] to either str or List[int] based on the type and value of x[0] in our control flow. Let's give it a try:

def gives_sum_type() -> MySumType:
    # Just return one of the two "variants"
    return ("list of ints", [1, 2, 2, 3])

    # We could also do this, for example:
    # return ("string", "foo")


def uses_sum_type():
    match gives_sum_type():
        case ("string", x):
            print("the variant contains a string:", x)
        case ("list of ints", x):
            print("the variant contains a list of ints:", x)

The typechecker narrows the type of x in each branch to either str or List[int] based on the tag matched in the case arm. Values in the tag position that do not correspond to any tag in the sum type's definition will cause the typechecker to emit an error; typos and the removal of variants from the sum type's definition are both covered by this case. Exhaustiveness is also checked in match statements, so if you later add variants to MySumType, the typechecker will emit an error since not all variants are covered.

If you're not on Python 3.10 or later (which was when match was introduced), then you can use the following hack to get the same guarantees:

from typing import NoReturn

def unreachable(_: NoReturn) -> NoReturn:
    raise AssertionError("unreachable")


def uses_sum_type():
    x = gives_sum_type(False)

    if x[0] == "string":
        y = x[1]
        print("the variant contains a string:", y)
    elif x[0] == "list of ints":
        y = x[1]
        print("the variant contains an list of ints:", y)
    else:
        unreachable(x)

Avoiding a typesafety footgun

Given the following:

from typing import List, Tuple, Literal, Union

MySumType = Union[
    Tuple[Literal["string"], str],
    Tuple[Literal["list of ints"], List[int]],
]

MyOtherSumType = Union[
    Tuple[Literal["string"], str],
    Tuple[Literal["list of strings"], List[str]],
]

The code ("string", "foo") will pass the typechecker as either MySumType or MyOtherSumType because they both have a variant with the same name, which is not great. One way to get around this would be to create a wrapper class and instantiate your sum type inside its constructor. For example:

from typing import List, Tuple, Literal, Union

class MySumType:
    def __init__(
        self,
        adt: Union[
            Tuple[Literal["string"], str],
            Tuple[Literal["list of ints"], List[int]],
        ],
    ) -> None:
        self.adt = adt

Now you'd write MySumType(("string", "foo")) instead, which will never typecheck as another sum type that happens to have variants with the same name. In order to use match with this type, you'd simply use match.adt to get access to the inner type. This has the added bonus that you can now add class methods to sum types, which is pretty cool.

Adding documentation

You might also want to add documentation to your sum type variants. We can accomplish this by moving away from Literal["..."]s and creating new types:

from typing import List, Tuple, Literal, Union

class String:
    """
    A variant containing a string
    """


class ListOfInts:
    """
    A variant containing a list of ints
    """


class MySumType:
    def __init__(
        self,
        adt: Union[
            Tuple[String, str],
            Tuple[ListOfInts, List[int]],
        ],
    ) -> None:
        self.adt = adt

This is slightly more awkward to write, as MySumType((String(), "foo")), but the benefit of documentation outweighs it in my opinion. Matching still works as expected as well.

Danger zone

One famously successful use-case for sum types is error handling: their exhaustive and explicit properties make it easy to determine which failures are possible, and from there, which failures you can handle, and then to handle those in a type-safe, high-confidence manner. We can now accomplish this in Python, by replacing the use of exceptions with the following foundation5:

from typing import TypeVar, Union, Tuple, Literal, Generic

class Ok:
    """
    A variant containing a success value
    """


class Err:
    """
    A variant containing an error value
    """


T = TypeVar("T")
E = TypeVar("E")


class Result(Generic[T, E]):
    def __init__(
        self,
        adt: Union[
            Tuple[Ok, T],
            Tuple[Err, E],
        ],
    ) -> None:
        self.adt = adt

    # More methods...

This is what scientists call "carcinisation", which is the phenomenon wherein, given a programming language and enough time, it will eventually become Rust.


1

If the types have generic parameters. Obviously, this works for "regular" types, because otherwise isinstance() would be completely useless.

2

Even further, Union[T, T, T] is the same as T; note how the Union is dropped entirely.

3

I am indifferent to the rationale that caused this behavior to exist, but I am very happy that it does because it is incidentally core to making proper sum types in Python.

4

Hmm, maybe it's not that odd, I guess you could use it to do const generics? I bet that's why it exists.

5

I would also like to have an ABC for sum types that provide a single adt() method that performs a conversion like Result -> ResultRaw so that this sort of interface can be identical for all sum types, which would make writing manual match statements more consistent. I can't think of an easy way to statically type such an ABC right now though, and the .adt convention is good enough for my immediate purposes.