Python annoyances

Day (date.today() - date(2021, 7, 14)).days of not understanding why Python is used in production


pathlib

>>> from pathlib import PurePath
>>> PurePath(b"")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nix/store/y3inmdhijqkb4qj36yphj4cbllljhqzz-python3-3.9.6/lib/python3.9/pathlib.py", line 665, in __new__
    return cls._from_parts(args)
  File "/nix/store/y3inmdhijqkb4qj36yphj4cbllljhqzz-python3-3.9.6/lib/python3.9/pathlib.py", line 697, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/nix/store/y3inmdhijqkb4qj36yphj4cbllljhqzz-python3-3.9.6/lib/python3.9/pathlib.py", line 686, in _parse_args
    raise TypeError(
TypeError: argument should be a str object or an os.PathLike object returning str, not <class 'bytes'>

This is fine because every file system on the planet is UTF-8, clearly.

I heard a counterargument that "it says in the docs that for 'low-level path manipulation on strings, you can also use the os.path module.'" I take a few issues with this: nowhere in the docs are there explicit mentions that exceptions will be raised when passing bytes instead of str; nowhere in the docs are there explicit type annotations suggesting that you can only use str; and the phrasing of that little warning uses such passive language that it doesn't seem like there's any real reason to care about this case in the first place.

I would expect a library designed specifically for dealing with paths to be able to deal with paths, so I find this behavior to be... surprising. A counterargument I heard to this is that "pathlib just provides a high level OOP interface to paths" but I don't understand how that's mutually exclusive with handling bytes/non-UTF-8.

datetime

I'd like to convert an ISO 8601 timestamp string to the appropriate Python object. Looks like datetime.fromisoformat is the way to do that. But wait:

Caution: This does not support parsing arbitrary ISO 8601 strings - it is only intended as the inverse operation of datetime.isoformat(). A more full-featured ISO 8601 parser, dateutil.parser.isoparse is available in the third-party package dateutil.

If you're going to add a function for this to the standard library, you'd think you'd want to avoid half-assing it, no? Since it's built-in, it's way more likely to be used than any third party package. Anyway, now that I've got my ISO 8601 string (which includes timezone information) converted into a datetime object, let's compare it against the current time:

(Pdb) p cache_expires_at
datetime.datetime(2022, 4, 20, 21, 41, 52, 955721, tzinfo=datetime.timezone.utc)
(Pdb) p cache_expires_at < datetime.utcnow()
*** TypeError: can't compare offset-naive and offset-aware datetimes
(Pdb) datetime.utcnow()
datetime.datetime(2022, 4, 20, 20, 43, 23, 982491)

... what!? Why does datetime.utcnow() not have timezone information? Shouldn't it know what timezone the datetime it's creating is in since it literally has utc in the name? Okay, it looks like the docs actually address this:

Warning: Because naive datetime objects are treated by many datetime methods as local times, it is preferred to use aware datetimes to represent times in UTC. As such, the recommended way to create an object representing the current time in UTC is by calling datetime.now(timezone.utc).

Well, sort of, anyway. Why even provide this method if it omits timezone information, then? Why are naive datetimes treated as local time? I bet there are some horrifying edge cases there. Another big point of pain is that since naive and aware timestamps are both the same type, tools like pyright can't even warn about this stuff statically. You need good code coverage (hard, rare) or manual testing (ew) to be able to detect this sort of error. Similarly, pydantic can't easily enforce timestamps to be timezone-aware since again, there's a single type for both cases. It's incredibly silly to allow this sort of error to even happen when it could so easily be prevented by having two separate types.

PyPI

For some reason, PyPI allows packages to be uploaded with version requirements that almost definitely will not work. If I make a package that depends on * or >1 or such of some other dependency, PyPI will happily accept my upload. The problem is that, as soon as that dependency releases 2.0, my package is sure to break. For a real world example of this, see here.

PEP 440 defines Python's own special versioning scheme (instead of just using SemVer like everyone else) with liberal usage of the word "MUST" but then official Python tooling (like PyPI) opt not to enforce it at all. What's even the point, then? Also, what even is a "post release"? Asking Google "post release meaning" gives me a bunch of stuff about prisoners, and appending "software" to the query doesn't help either. After eventually finding the explanation in the PEP, the answer is "it's functionally identical to SemVer Patch releases except we decided to make it a separate thing for no reason".

Since Python decided not to use SemVer, it now also needs to invent its own syntax for specifying allowable dependency versions. It's a mess, and quite easy to misuse since nobody knows what ~= means, nor realizes you can use , to add additional constraints. This could all have been neatly avoided by adopting SemVer instead. Speaking of ~=, here's a cheap shot:

The spelling of the compatible release clause (~=) is inspired by the Ruby (~>) and PHP (~) equivalents.

PEP 440

Ah yes, PHP, the paragon of good design. Smartly, Poetry lets you just use the standard SemVer syntax for this (^).

Poetry

poetry remove has no --lock option.

Adding dependencies

poetry add can take forever. Trying to add new dependencies is a nightmare, and that's due to both the aforementioned performance issues and the fact that, due to the way that Python imports work, it is impossible to have multiple versions of a single package installed at a time. As a direct result of these things, I just spent over five minutes trying to install dependencies. Observe:

$ poetry add --lock --source REDACTED [REDACTED_0..REDACTED_6]
...
Updating dependencies
Resolving dependencies... (40.4s)
...
SolverProblemError

Then after some vim pyproject.toml to comment out things that caused the SolverProblemError:

$ poetry add --lock --source REDACTED [REDACTED_0..REDACTED_6]
...
Updating dependencies
Resolving dependencies... (115.3s)
...
Writing lock file

Cool, this time it worked, but I'm still not done getting the dependencies I need. So let's add them back:

$ poetry add --lock attrs marshmallow
...
Updating dependencies
Resolving dependencies... (39.0s)
...
SolverProblemError

Okay fine so I need to manually specify an older version of marshmallow because for some reason poetry just picks the newest one instead of trying to find the newest compatible one. Let's try again with the version it says is causing the conflict:

$ poetry add --lock attrs 'marshmallow^2'
...
Updating dependencies
Resolving dependencies... (35.8s)
...
SolverProblemError

Okay so now attrs is having the same problem. Following the same pattern:

$ poetry add --lock 'attrs^19' 'marshmallow^2'
...
Updating dependencies
Resolving dependencies... (106.8s)
...
Writing lock file

Thank fuck, it's finally over. Well, for this project. We have a lot of projects that need to be converted to poetry. It'll be worth it though because pip/pip-compile is worse, and poetry2nix is nice.

Just for fun, let's try something similar in a different language:

$ time cargo add rand syn rand_core libc cfg-if quote proc-macro2 unicode-xid serde bitflags
...
... 1.968 total
$ time cargo update # to rebuild the lockfile
...
... 0.704 total

Under 2 seconds. No literally unfixable issues with incompatible transitive dependencies. It Just Works™. Incredible.

Black <22.3.0 incompatible with Click >=8.1

[T]he most recent release of Click, 8.1.0, is breaking Black. This is because Black imports an internal module so Python 3.6 users with misconfigured LANG continues to work mostly properly. The code that patches click was supposed to be resilient to the module disappearing but the code was catching the wrong exception.

psf/black#2964

I find the quantity of backlinks to this issue to be greatly amusing. (There's probably way more than shown too due to the existence of private repositories.) This is what happpens when hobbyists and the industry take a language seriously even though it lacks:

  1. A language-enforced concept of item privacy
  2. The ability to have multiple versions of a package in the dependency tree
  3. Statically checkable error types

Combinatorial exhaustiveness

Let's see what various typecheckers think about the following code:

from typing import Literal, Tuple, Union

SumType = Union[
    Tuple[Literal["foo"], str],
    Tuple[Literal["bar"], int],
]


def assert_int(_: int): pass
def assert_str(_: str): pass


def assert_combinatorial_exhaustion(
    first: SumType,
    second: SumType,
):
    match (first, second):
        case (("foo", x), ("foo", y)):
            assert_str(x)
            assert_str(y)
        case (("foo", x), ("bar", y)):
            assert_str(x)
            assert_int(y)
        case (("bar", x), ("foo", y)):
            assert_int(x)
            assert_str(y)
        case (("bar", x), ("bar", y)):
            assert_int(x)
            assert_int(y)

Pytype

I couldn't get this to run on NixOS, so I don't know.

Rating: ?/10

Pyre

I couldn't get this to run on NixOS either, but they do have a web based version for some reason. Here's what it says:

21:23: Incompatible parameter type [6]: In call `assert_str`, for 1st positional only parameter expected `str` but got `Union[int, str]`.
22:23: Incompatible parameter type [6]: In call `assert_str`, for 1st positional only parameter expected `str` but got `Union[int, str]`.
24:23: Incompatible parameter type [6]: In call `assert_str`, for 1st positional only parameter expected `str` but got `Union[int, str]`.
25:23: Incompatible parameter type [6]: In call `assert_int`, for 1st positional only parameter expected `int` but got `Union[int, str]`.
27:23: Incompatible parameter type [6]: In call `assert_int`, for 1st positional only parameter expected `int` but got `Union[int, str]`.
28:23: Incompatible parameter type [6]: In call `assert_str`, for 1st positional only parameter expected `str` but got `Union[int, str]`.
30:23: Incompatible parameter type [6]: In call `assert_int`, for 1st positional only parameter expected `int` but got `Union[int, str]`.
31:23: Incompatible parameter type [6]: In call `assert_int`, for 1st positional only parameter expected `int` but got `Union[int, str]`.

pyre is clearly unable to do type narrowing in the match arms. There are no warnings about exhaustiveness, however; is that working properly? Let's pass it the simplest possible code to test for that:

def assert_exhaustion(
    x: bool,
) -> None:
    match x:
        case True:
            pass

Output:

No Errors!

Rating: 0/10

Mypy

playground/main.py:23: error: INTERNAL ERROR -- Please try using mypy master on Github:
https://mypy.readthedocs.io/en/stable/common_issues.html#using-a-development-mypy-build
If this issue continues with mypy master, please report a bug at https://github.com/python/mypy/issues
version: 0.941
playground/main.py:23: : note: please use --show-traceback to print a traceback when reporting a bug

Yep, you're reading that right; mypy just crashes.

Rating: Comically bad/10

Pyright

error: Cases within match statement do not exhaustively handle all values
    Unhandled type: "tuple[SumType, SumType]"
    If exhaustive handling is not intended, add "case _: pass" (reportMatchNotExhaustive)

pyright's lack of errors about the assert_{str,int} functions indicates that it is correctly doing type narrowing, so that's good. However, it states pretty clearly that it thinks this match is not exhaustive. Tragically, someone reported this already and it got closed as wontfix.

Rating: 5 pity points since it can at least narrow types and not crash/10

rustc

#![allow(unused)]
fn main() {
enum SumType {
    Foo(String),
    Bar(i32),
}

fn assert_int(_: i32) {}
fn assert_str(_: String) {}

fn assert_combinatorial_exhaustiveness(
    first: SumType,
    second: SumType,
) {
    match (first, second) {
        (SumType::Foo(x), SumType::Foo(y)) => {
            assert_str(x);
            assert_str(y);
        },
        (SumType::Foo(x), SumType::Bar(y)) => {
            assert_str(x);
            assert_int(y);
        },
        (SumType::Bar(x), SumType::Foo(y)) => {
            assert_int(x);
            assert_str(y);
        },
        (SumType::Bar(x), SumType::Bar(y)) => {
            assert_int(x);
            assert_int(y);
        },
    }

}
println!("The match is exhaustive and the types check out.");
println!(
   "If this weren't the case, you'd be seeing a compiler error message here."
);
}

(Hit the play button in the top right corner.)

Rating: 10/10