Charles' Blog
- Psyonix' Epic Fail
- How single-user git-over-ssh works
- Python Project Progression
- Python/Pydantic Pitfalls
- Python has sum types
- Error handling in Rust
- Considered harmful
- Short-form annoyances
- 3D Printing
Psyonix' Epic Fail
Psyonix, Epic, anticompete, and you!
So Epic Games is buying Psyonix. Everyone's upset for a multitude of reasons, a lot of which aren't the right ones. A few very special people are even content with this news for the wrong reasons, which is even worse. Now, imagine the next three paragraphs as panels in the expanding brain meme.
-
Lots of people are just upset that Rocket League is going to show up on the Epic Games Store (EGS) and aren't reading any further into the issue. This opinion is not so bad because at least they're still doing the right thing, just for the wrong reasons. Also, it seems nearly universally agreed that the EGS is... not very good.
-
A lot of people are indifferent and just want everyone to shut up and move on. Obviously, this accomplishes nothing other than making the people who know what's going on talk more (see: this blog post).
-
Some people are actually supportive of this, their reasoning being that they "can still play it on Steam since they already own it there, and this will just bring the game to a wider audience". The last part may be true (summary: ~40% of EGS users come from consoles rather than being an existing PC/Steam player), but this still misses the actual issue at hand. However, I say "may" because Rocket League's playerbase is already mostly console players.
Very few people are seeing the actual issue at hand, however, which is that by removing the game from Steam, Epic/Psyonix (I don't know which is actually in charge of this decision, but if I had to guess I would pick Epic) would be engaging in extremely anticonsumer and anticompetitive behavior. It's true that neither party has yet to confirm or deny that Rocket League will be removed from Steam after it appears on the EGS.
However, if Epic/Psyonix wanted everyone to calm down, you'd think they'd just announce that Rocket League will remain purchasable from the Steam store and fully supported as it has been, and just also be available on the EGS. There would be no issue if this were the case, as then users would be able to choose whether to play on Steam or on the EGS, which is the ideal situation. The lack of such an announcement says to me that they've already decided internally to remove it from the Steam store and they just don't want to deal with the backlash from the public yet.
Some other issues that have not been addressed is whether current owners of the game on Steam will be able to buy Keys, Rocket Passes, or DLC cars once the game appears on the EGS. Whether Psyonix will continue to support Rocket League on Linux is also up in the air; since the EGS has no Linux version, it seems reasonable (from a business perspective) to me that Psyonix would stop ensuring that updates worked on Linux if Rocket League were made unpurchasable on Steam. Valve's Proton is very good, so that might be an option, but if it comes to that I'd bet a lot of us Linux-using software-vegans probably won't be playing the game anyway.
I'm not going to re-explain why exclusives are harmful here since this has already been done by several others at much higher quality than I could manage. If you're tired of reading, this video does a really good job of laying out the issue in terms of streaming services. This translates really well to game marketplaces since both are just DRM CDNs at their core.
TL;DR: To oversimplify, if you're annoyed with having to pay for multiple streaming services who are all lacking important features, you should also be annoyed with companies like Epic buying exclusive rights to having games on their launchers.
UPDATE: As expected, in late September of 2020, Rocket League was removed from the Steam store. Linux support was also dropped/disabled.
How single-user Git-over-SSH works
I had to read through so much source code to figure this out
Have you ever wondered how GitHub, GitLab, Gogs, Gitea, and so on allow multiple users to push and pull data from repos with only one unix user? Perhaps you want to know how this is done so you can write your own version of the aforementioned type of software that doesn't suck. That's how I got here. Anyway, this is how it's done:
OpenSSH gives you an option (ForceCommand
in /etc/ssh/sshd_config
) to force
the use of a particular command on connection, overriding the one the client
actually wants to run. When this option is set, the client's "intended command"
gets stored in an environment variable called $SSH_ORIGINAL_COMMAND
. This can
be used to force the execution of your own script (or binary) that allows and
disallows "intended commands" each time anyone tries to do anything with this
account over SSH.
Additionally, the ~/.ssh/authorized_keys
file allows you to specify
a per-key forced command. This means that you can set an option (say,
--with-key
) for the per-key override that isn't present in the global
override in /etc/ssh/sshd_config
. Now, in your forced command, you can
disallow write access to connections that are missing the --with-key
argument,
since it's only present if a user has uploaded an SSH key.
Even further, you can provide the --with-key
argument a key_id
value that
allows your software to know exactly which key was used for this connection.
With that information, you can associate the SSH key with a user account and,
for example, allow them extra read/write access to their private repositories
and extra write access to their public ones. It's up to your software to keep
a database of key_id
<-> SSH key <-> user associations, however.
While you're messing with the sshd_config
, you'll need to add
PermitEmptyPasswords yes
and you'll probably also want to add
DisableForwarding yes
. Thankfully, you can restrict the effects of those two
options to apply only to your software's user and not to everyone on the server
by using a Match
section. All together, your sshd_config
will have a new
section like this:
Match User your_software_user
PermitEmptyPasswords yes
DisableForwarding yes
ForceCommand /path/to/your/software
References
man 5 sshd_config
for/etc/ssh/sshd_config
optionsman 8 sshd
for~/.ssh/authorized_keys
file format and options- Both of the above for the uses of the
$SSH_ORIGINAL_COMMAND
environment variable - The Gogs codebase for reverse-engineering all of the above
Python project progression
Things you may discover as your Python application grows over time
Getting started
So, you have discovered Python. You've singlehandedly written a useful program using only the standard library with relative ease, without the usual fuss associated with learning a new programming language and setting things up. You've found this experience quite enticing.
Managing dependencies
You've decided grow your program, which requires using a few third-party libraries. First, you need to decide how you'd like to manage your dependencies. You'll likely find several ways to do this, each providing some benefit that the previous method lacked, but also accompanied with its own shortcomings, in approximately this order:
- Use your system's package manager
- pro: doesn't require any extra software to manage dependencies
- con: unless you or anyone running your code uses Windows
- con: you're (typically) at the mercy of the singular version of a given library provided by your package manager (i.e. you do not get to choose the version)
- Manually install things with
pip
- pro: straightforward
- pro: you can choose the version of each direct dependency
- con: imperative package management isn't great for repeatability or documentative purposes
- Maintain a
requirements.txt
file containing your direct dependencies and usepip install -r requirements.txt
to install your dependencies- pro: declarative package management
- con: no control over transitive dependency versions between
pip install
s
- Maintain a
requirements.in
file containing your direct dependencies, usepip-compile
frompip-tools
generate a lockedrequirements.txt
file, and then usepip install -r requirements.txt
to install your dependencies- pro: transitive dependency version pinning
- con: no way to separate dependencies only used for tests/examples versus those required for library publication or application deployment
- Do the above twice, once for runtime requirements and a second (with different
filenames, probably like
requirements{,-dev}.{in,txt}
) files- pro: seperation of runtime and test/example dependencies
- con: litters the root of your project with four files instead of just two
- Use
poetry
- pro: you're back to only needing two files
- con: it isn't exactly standard tooling
- con: it tries to do a lot more than just manage dependencies
Managing dependencies part two
There's a big con of all of the above options (except poetry
) that I neglected
to mention: the act of installing packages is a global operation (by default),
and could potentially break other Python software you're using or developing if
the stars don't align (which they won't). Luckily, there are solutions to this
problem; unluckily, there are a lot of them:
pyenv
- manages versions of CPython per project
pyenv-virtualenv
- an extension for the above
- isolate things installed via
pip
per project
pyenv-virtualenvwrapper
- extension for the above
- adds some extra commands
virtualenv
- isolate things installed via
pip
per project
- isolate things installed via
virtualenvwrapper
- extension for the above
- adds some extra commands
pyvenv
- isolate things installed via
pip
per project
- isolate things installed via
pipenv
- isolate things installed via
pip
per project - manages a lockfile
- isolate things installed via
venv
- isolate things installed via
pip
per project - this one was made specifically to be the canonical tool for this purpose
- isolate things installed via
poetry
- isolate dependencies per project
- Nix +
nix-direnv
- completely unrelated to the Python ecosystem but can do what all of these tools do except with a few more very useful features
- Bonus round:
conda
,anaconda
,miniconda
,mamba
,micromamba
Understanding dependencies
Now that you've settled on a combination of tools, it's time to go on the hunt for helpful dependencies for your program. Over time you'll notice that a good fraction of projects present their documentation in different ways. It can be difficult going from one high profile project to another because not only do you need to relearn the API surface, but also how to navigate its documentation in the first place. Speaking of API surface, a lot of documentation is actually written as if it were a guide rather than an API reference, so determining the permissible uses of any given item is much harder and often ambiguous unless you read the source code of the project, or that of projects using the library in question.
Using dependencies
It's now time to actually add the dependencies you've deemed useful and worthy and start writing code using them. Uh oh, version resolution is failing? Looks like two of your direct dependencies require incompatible versions of a transitive dependency. Luckily, this is solved trivially: just kidding, it isn't.
Making your code nice (visually)
After getting out of that mess somehow, your codebase has grown in size significantly and you think it's probably time to start using an auto-formatter and a linter, especially since you'd like to bring in some extra help. For auto-formatters, you have the following choices:
autopep8
black
yapf
- IDE-specific tooling (good luck running this in your CI pipeline)
You notice something peculiar: the tool you've chosen doesn't automatically sort
your import
statements at all. Indeed, you need a separate tool for this. Here
are some options:
- lexicographically sort them like some kind of madman
- Use
isort
like the CIA probably does - IDE-specific tooling (good luck running this in your CI pipeline)
Make your code nice (cognitively)
Even though it looks nice, you still feel like there are some questionable lines of code here and there. A good number of these oddities can be automatically detected and complained about by a linter, which will help you clean things up a bit more. Again, you have a number of options:
autoflake
bandit
flake8
flakehell
prospector
pycodestyle
pydocstyle
- This one actually lints your docstrings rather than your code, which is useful
pyflakes
pylsp
pylint
- I've saved this one for last because it appears to be a superset of some
of the previous linters (at least it is for
flake8
), and it seems to be the most commonly used
- I've saved this one for last because it appears to be a superset of some
of the previous linters (at least it is for
Configuring stuff
Now that you've chosen all of your tooling, you should probably configure them to your liking. Or at the very least, configure them to work with each other, as lot of them disagree by default. The rapidly growing list of configuration files is worrying you, so you decide to see if there's some alternative. To your delight, you discover PEP 518, which exists for this exact purpose. To your dismay, however, you come to learn that half of your chosen tools don't actually support this.
Gaining momentum
There are now multiple people working on your codebase, perhaps because you've started a company around this program of yours. The number of lines of code increased rapidly and you no longer have had eyes on all parts of the project. Perhaps lints stopped failing your CI pipelines in the name of feature velocity. You find yourself wishing more lints were capable of failing builds, that you and others had left more docstrings behind, or at least that you knew anything about the return value of the function you're trying to call. It would be cool if you could generate some sort of easily-navigable API reference to get a summary of how the new code works together, too. You find a few tools, only to discover that some of them are painful to set up on an existing project and that some of them refuse to document items considered "private" entirely. This is unfortunate considering you're writing an application and not a library, meaning basically all of your code is considered private.
Static typing
At the very least, you can start using yet more linters locally. More specifically, static analysis tooling. Again, you have a number of options:
mypy
pytype
pyre
pyright
These tools will help you figure out what's going on in this new code you haven't seen before. It gives you a map to get out of the woods you've been feeling lost in. It also helps to prevent a few kinds of duck-typing related bugs caused by incorrect assumptions about an object's properties. At this point, though, the code is so fargone that allowing one of these tools to fail CI is implausible due to the sheer amount of changes required to get CI to start passing again. And if the tools can't fail CI, you can't guarantee that other people will follow the conventions set by these new tools, which can significantly hinder their utility.
Statically typing exceptions
You've now noticed that PEP 484 provides no method to annotate exception types, nor is any existing static analysis tool capable of determining the list of exception types a given function may raise. You may be told to "just read the docs", to which you point out that there are often not docs, and sometimes they lie. Or maybe they said "just read the source code", which you know is ridiculous because that would mean you'd have to read the entire source tree of every function you'd like to know about. Or maybe "you should only handle exception types you know you can handle", to which you ask "how can I decide whether I can handle an exception if I don't know what the possible exceptions are?" and are left without an answer like the OP of this SO post.
Memory model
Suddenly you realize that you're not sure whether you're supposed to be able to mutate the original memory of the object being passed to the function you're currently bugfixing. You wonder whether it's possible to annotate or verify the intended ownership semantics of the object and are disappointed to learn that you (currently?) cannot. Perhaps you also wonder if a system like this could improve synchronization primitive ergonomics by forcing you to acquire a lock before granting you read or write access to the underlying data. You look for prior art on this topic and discover it has indeed been done before. Neat.
Performance
The performance of your application is showing signs of becoming a problem, so
naturally you seek to introduce some form of parallelism (presumably you've
already been using async
/await
for IO-bound concurrency). The obvious answer
is multithreading, so you dispatch parallelizable work to a threadpool to be
executed, adding locks throughout your code as necessary. Since lock
acquisition/release isn't enforced particularly strongly as you learned earlier,
perhaps you hit some deadlocks or data races or other parallelism-related bugs
along the way.
Looking at your CPU usage, you notice that the program is still only using a single core at a time for some reason. It turns out that this is because threads aren't real (in CPython at least). A popular way to solve this problem is mutliprocessing via some IPC mechanism like UDS or TCP or HTTP or AMQP or something. You pick one of these and successfully further your goal of scaling vertically, but deep down, you're keenly aware of the overhead of IPC versus shared memory and locks and it leaves a bad taste in your mouth.
During your research about parallelism in Python, you discover some discourse on the topic of removing the GIL, but it seems like the consensus is that this is a bad idea because, somewhat ironically, it would actually hinder performance quite significantly. The reason for this is that the GIL acts as a single lock around the CPython interpreter's internals, and a single lock is very fast and has low contention. If you remove this lock (and still desire programs to run correctly), you'll need to add a bunch of smaller locks around all the objects. Even though there tends to be low contention on all these objects, Python's refcounting memory model now requires atomics, which absolutely annihilates CPU cache as new objects are created and destroyed all the time.
If only there were an alternative memory management scheme to alleviate this problem, like a different flavor of garbage collector or perhaps some way to statically track object lifetimes. But at that point, it probably wouldn't be Python anymore.
Further reading
- Drew Devault's Python: Please stop screwing over Linux distros
Python/Pydantic Pitfalls
Someone please throw me a rope
I'm going to be focusing on pydantic
in this post, since that's what I know
best, but reading through this discussion and having
glanced at some of the other serialization frameworks, they seem to have similar
problems or otherwise look awful to use. If you're trying to use FastAPI, you're
locked into pydantic
anyway too. We'll be trying to interact with the
following JSON schema, and let's say we need to support PersonId
(just the
ID), Person
(rest of the data without the ID), and PersonWithId
(data plus
ID, shown here) in the code for dictionary reasons:
{
"title": "PersonWithId",
"type": "object",
"properties": {
"firstName": {
"title": "Firstname",
"type": "string"
},
"lastName": {
"title": "Lastname",
"type": "string"
},
"age": {
"title": "Age",
"type": "integer"
},
"id": {
"title": "Id",
"type": "string",
"format": "uuid"
}
},
"required": [
"firstName",
"age",
"id"
]
}
This seems pretty straightforward, so let's try defining some models:
class PersonId(BaseModel):
# We have to do this because `id` is taken in the global namespace
which: UUID = Field(..., alias="id")
class Person(BaseModel):
first_name: str = Field(..., alias="firstName")
last_name: Optional[str] = Field(..., alias="lastName")
age: int = Field(...)
class PersonWithId(PersonId, Person):
pass
Cool, now some code to construct a value and convert it to JSON:
person_id = PersonId(which=uuid4())
person = Person(first_name="Charles", age=23)
person_with_id = PersonWithId(**person.dict(), **person_id.dict())
print(person_with_id.json())
Time to show the programming world what language is boss:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/charles/science/python/pydantic-sucks/pydantic_sucks/main.py", line 28, in main
person_id = PersonId(which=uuid4())
File "pydantic/main.py", line 331, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for PersonId
id
field required (type=value_error.missing)
Oh, nevermind, I guess. That's weird though, I set which=uuid4()
like right
there, what do you mean it's not present? Apparently,
you must explicitly tell pydantic
that you'd like to be able to populate
fields by their name. What!? Let's spam Config
classes everywhere to fix it:
class PersonId(BaseModel):
which: UUID = Field(..., alias="id")
+ class Config:
+ allow_population_by_field_name = True
+
class Person(BaseModel):
first_name: str = Field(..., alias="firstName")
last_name: Optional[str] = Field(..., alias="lastName")
age: int = Field(...)
+ class Config:
+ allow_population_by_field_name = True
+
class PersonWithId(PersonId, Person):
pass
Let's try running the code again:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/charles/science/python/pydantic-sucks/pydantic_sucks/main.py", line 29, in main
person = Person(first_name="Charles", age=23)
File "pydantic/main.py", line 331, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Person
lastName
field required (type=value_error.missing)
Great. Why is this happening? I specified the type for last_name
to be
Optional[T]
and the docs say it's aware of that type.
Scouring the docs for a while, we learn that
pydantic
actually distinguishes between "missing field" and "field is set to
None
/null
" for some reason. Whatever, let's fix it:
class Person(BaseModel):
first_name: str = Field(..., alias="firstName")
- last_name: Optional[str] = Field(..., alias="lastName")
+ last_name: Optional[str] = Field(None, alias="lastName")
age: int = Field(...)
And run our code again:
{
"first_name": "Charles",
"last_name": null,
"age": 23,
"which": "f49be13a-63f1-4ef6-b8f7-b948a32836ed"
}
Hooray, no more errors! Except wait, this doesn't look like our JSON schema at all! Why are they being serialized by their field names instead of the aliases? Y'know, the aliases I added for the express purpose of appearing in the JSON because the JSON field names are (mostly) (stylistically and sometimes semantically) invalid Python identifiers? The docs say you have to explicitly ask for them to be serialized by their aliases. Why though? If this isn't the default, what are the aliases even for? Whatever, let's try it:
-print(person_with_id.json())
+print(person_with_id.json(by_alias=True))
And the output:
{
"firstName": "Charles",
"lastName": null,
"age": 23,
"id": "41501697-8ed7-4d2c-8bd0-47aea0d5cd92"
}
We've finally done it. I should be excited to be getting this working but at
this point I'm actually just exhausted. Plus I've still got some questions.
There are still two calls to some_model.dict()
that don't take
by_alias=True
, but the code appears to work anyway. How am I supposed to
remember when it's required to add by_alias=True
and when it would break
things if I did add it? I think this issue is on Python itself, for baking the
concept of a constructor into the language and for allowing everything to be
represented as dicts. (More on that later). Also, what does pyright
think
about our code now? Let's find out:
error: No parameter named "which" (reportGeneralTypeIssues)
error: Argument missing for parameter "id" (reportGeneralTypeIssues)
error: No parameter named "first_name" (reportGeneralTypeIssues)
error: Arguments missing for parameters "firstName", "lastName" (reportGeneralTypeIssues)
A little digging suggests that pyright
is now expecting you to initialize the
fields based on their aliases instead of their actual names. Again, this seems
extremely backwards. How can I fix this without using (stylistically or
semantically) illegal identifiers? This discussion and
this comment specifically suggest that pyright
is
thinking too hard about what identifier a field name should be populated by. To
be fair to it, it's very weird for a typechecker to have a constraint that
a field is initialized by exactly one of two names in a generalized fashion. Oh
was it unclear that you can still populate the fields by both names? Because
yeah, you can do that. So anyway how do I appease pyright
? It seems like the
only way to do so is to add # type: ignore
everywhere. Remember, doing # pylint: disable=invalid-name
is not an option because of JSON field names that
are semantically invalid Python names, such as id
or anything using hyphens
for word separators. So, let's clutter the place with sad comments:
-person_id = PersonId(which=uuid4())
-person = Person(first_name="Charles", age=23)
+person_id = PersonId(which=uuid4()) # type: ignore
+person = Person(first_name="Charles", age=23) # type: ignore
Now pyright
, pylint
, and the Python interpreter (in the case of semantically
illegal names) are all satisfied and don't give us any issues. But at what cost?
This code now typechecks perfectly fine but will detonate at runtime:
-person = Person(first_name="Charles", age=23) # type: ignore
+person = Person(first_name="Charles", age="lol") # type: ignore
And this code typechecks fine and, somewhat surprisingly, works at runtime too. This is not very Parse, Don't Validate and thus is almost certainly prone to causing problems in the future:
-person = Person(first_name="Charles", age=23) # type: ignore
+person = Person(first_name="Charles", age="23") # type: ignore
This code typechecks and runs fine, but omits the extraneous value in the output that we potentially wanted to show up there:
-person = Person(first_name="Charles", age=23) # type: ignore
+person = Person(first_name="Charles", age=23, fingers=10) # type: ignore
We can at least configure pydantic
to give us errors about that situation so
that a test suite, assuming there is one and assuming it has good coverage, can
catch this before we deploy anything by adding extra = Extra.forbid
to all of
your Config
interior classes (or whatever they're called), which results in
errors like this:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/charles/science/python/pydantic-sucks/pydantic_sucks/main.py", line 31, in main
person = Person(first_name="Charles", age=23, fingers=10) # type: ignore
File "pydantic/main.py", line 331, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Person
fingers
extra fields not permitted (type=value_error.extra)
Speaking of errors, what happens if our code fails to serialize or deserialize? Maybe our code constructed a value that didn't pass a validator or is the wrong type or the incoming data was malformed or some other such problem. In that case, we have to deal with the pitfalls of Python's error handling facilities, which I've talked more about in a previous post.
Speaking of deserialization, what about doing that with pydantic
? You get at
least all the same problems as above, and at this point I don't really want to
talk about pydantic
anymore. Except to make the point again that way aliases
work in pydantic
(and similar libraries) is extremely backwards and the
default behavior is really bad; I hope at this point I've demonstrated clearly
why this is the case, but it's worth repeating just to be sure.
Let's talk about something else
What if we wanted to do this in Rust? Let's define our models:
#![allow(unused)] fn main() { #[derive(Serialize, Deserialize)] struct PersonId(Uuid); #[derive(Serialize, Deserialize)] struct Person { #[serde(rename = "firstName")] first_name: String, #[serde(rename = "lastName")] last_name: Option<String>, age: u8, } #[derive(Serialize, Deserialize)] struct PersonWithId { id: PersonId, #[serde(flatten)] person: Person, } }
And some code to construct a value and serialize it:
#![allow(unused)] fn main() { let id = PersonId(Uuid::new_v4()); let person = Person { first_name: "Charles".to_owned(), last_name: None, age: 23, }; let person_with_id = PersonWithId { id, person, }; let json = serde_json::to_string(&person_with_id)?; println!("{json}"); }
Now let's see what happens when we run it:
{
"id": "ecacbd24-2c68-45a0-aaf0-ebee4390c16a",
"firstName": "Charles",
"lastName": null,
"age": 23
}
Why did — oh this isn't Python; it actually worked first try. Sorry, force of habit. Anyway, why does this work, and why is it so much safer and more intuitive?
-
#[serde(rename = "whatever")]
does what you think it would, unlikepydantic
'salias
, and no extra configuration is required beyond setting the new name we want. (serde
also allows setting different names for serialization and deserialization, which can be handy when, for example, you want to use the same model to deserialize from your database and serialize for your web API) -
There are no weird typechecking issues because there are no constructors getting modified at runtime to accept multiple names for the same things
-
PersonWithId
doesn't require a roundtrip of its components through aHashMap
(akadict
) and because of that and point 1, there's no fear of whether you should have addedby_alias=True
-
It also means any inability to map fields correctly will cause a compile time error instead of a runtime one
-
Any attempt to add a new field in the
let person = Person { ... };
section will cause a compile error because the struct does not yet define that new field. You can acheive the same for deserialization by annotating the container with#[serde(deny_unknown_fields)]
-
Any places where a runtime error can occur is made clear by the
Result
type, in this case we're simply handling it with the?
operator -
There's no weird coercion happening, trying to set
age: "23"
without explicitly converting it from&str
tou8
viaparse()
or such would fail to compile instead of potentially accepting bad values, which is much more in the Parse, Don't Validate spirit
Conclusion
I don't really know how to end this article without either saying something overly sassy about Python or holding my head in my hands in sadness. I hope Python's situation improves, or maybe there's a better library out there that solves all of these problems somehow (you'd still have to deal with constructors) that I just haven't seen yet. In the meantime, I'm going to continue using Rust when I can and Python only when I have to. I guess I went the sadness route.
Addendum
For FastAPI in particular, it looks like it sets by_alias=True
by default for
returning responses built from pydantic
objects. This is notably not the same
as pydantic
's default, which is yet another violation of the Principle of
Least Astonishment in the Python ecosystem. I also see this
merged PR suggesting that you cannot disable this behavior
without either causing inconsistencies in the schema presented in the generated
OpenAPI spec or requiring you to implement a dummy model for the express purpose
of showing up in the OpenAPI spec with the correct field names, which is another
POLA violation and seems awful to maintain.
A large cause of problems with complex pydantic
code is the following pattern:
# NewThing inherits from BaseModel and OldThing, and adds its own extra fields
new_thing = NewThing(**old_thing.dict(), **other_thing.dict())
# Or
new_thing = NewThing(**old_thing.dict(), **{"new_field": new_value})
# Or
new_thing = NewThing(**old_thing.dict(), new_field=new_value)
Unlike Rust, Python has an explicit concept of constructors and also supports
inheritance. These two features prove themeselves to be problems here, because
their existence tricks people into using them. pydantic
makes this mistake by
requiring object composition to be facilitated through the use of inheritance.
Python provides them no choice but to use constructors to construct these
values. The result is that, in any of the above examples, it is impossible to
statically verify that every field of NewThing
gets set properly by its
constructor's arguments, and are all suspect for causing runtime errors.
This could be fixed by listing out every individual field of NewThing
,
including those inherited from OldThing
, and explicitly matching up every one
of those fields with the new value from the example constructors' arguments. The
massive downside is the maintenance burden: now you have to keep every instance
of this type of conversion in sync, which is fallible since accidentally
omitting Optional[T]
fields will run and typecheck fine even though they
should have been set to some value from other_thing
, for example. Rust and
serde
solve all of these issues by simply storing all of old_thing
inside
NewThing
via #[serde(flatten)]
, alongside other_thing
or any other new
fields NewThing
needs to have, as demonstrated above.
Python and some libraries (like pydantic
) treat names with
leading underscores specially, which may have adverse effects on the
serializability of fields with certain names. I know Go also uses a similar
method (casing) to convey privacy information, and I wonder if Go has similar
issues at the intersection of those and serialization. Again, Rust/serde
does
not exhibit this issue because naming has no real effect (single leading
underscore can silence unused-code warnings but that's it) and #[serde(rename = "whatever")]
is painless and straightforward to use.
FAQ
-
Q: I haven't heard of anyone else having this problem, are you just bad at programming?
A: You know, I haven't either. Maybe I am. *shrug*
-
Q: Why do
serde
-like frameworks combine the model definition and its configuration in the first place?A: Because if they didn't, you'd wind up with a worse version of the problem with
by_alias=True
. It makes it possible to try to serialize something with the wrong configuration, instead of simply baking it in so it happens every time transparently to you. Maybe it wouldn't be so bad if all you were doing was changing field names, but there's a lot of stuff you can do withserde
container and field attributes,pydantic
validators, and manually implementingserde
traits for complicated things. Inserde
's case, you'd also take a performance hit because now it would have to check a separate source for what to do with a model for every single model in the tree, instead of what it currently does, which is generate the correctly-configured code at compile time.
Python has sum types
I am absolutely thrilled about this discovery.
Python doesn't have sum types
One of my biggest gripes with Python is that it doesn't have sum types, or a way to emulate them. For example:
>>> from typing import List
>>> x = []
>>> isinstance(x, List[int])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/nix/store/nvxp3xmlrxj9sw66dk7l0grz9m4889jn-python3-3.9.12/lib/python3.9/typing.py", line 720, in __instancecheck__
return self.__subclasscheck__(type(obj))
File "/nix/store/nvxp3xmlrxj9sw66dk7l0grz9m4889jn-python3-3.9.12/lib/python3.9/typing.py", line 723, in __subclasscheck__
raise TypeError("Subscripted generics cannot be used with"
TypeError: Subscripted generics cannot be used with class and instance checks
This means that Python has no way to check if a variable is an instance of a specific type1...
Unless it does
I present to you: typing.Literal
. "How is that relevant?" you ask. Excellent
question. First, let's remember that sum types are also known as "tagged
unions". Python has unions in the form of typing.Union
(or the |
syntax
in newer versions). Given this, we can create the union half of a sum type
like this:
from typing import Union, List
StrOrIntList = Union[str, List[int]]
# ^^^ ^^^^^^^^^ A **variant**
# |
# Another **variant**
The next problem is figuring out how to detect which variant we have. The
obvious strategy is to use isinstance
, but as established, isinstance
is not
flexible enough. I also looked around to see if there's a way to get type
annotation information at runtime so one could check against that, but this
doesn't seem to be possible. Even if you could, it would also not cover the case
where you want multiple variants with the same data type: Union[int, int, str]
is the same as Union[int, str]
to the typechecker2, so there's no way
to tell the two int
s apart.
Next, we need some mechanism to associate each variant with a tag
that's accessible at both runtime and typecheck-time, so that we can do control
flow at runtime and allow the typechecker to assert that we're checking the
right things before even running the code. For the association portion, we
know that types can be paired together in Python by using typing.Tuple
like
this:
from typing import Tuple, List
StrAndInt = Tuple[str, List[int]]
# ^^^ ^^^^^^^^^ We'd like this to be the **variant's** data
# |
# We'd like to use this as our **tag**
This has a major problem, which is that the typechecker has no insight into the
possible values of our tag, the str
. This practically defeats the entire
purpose, since it robs us of assertions that we're checking against valid
tags, and that we're checking against all valid tags. After all, the
goal is to ensure more things can be checked before runtime, and having to run
the code to make sure you have no typos in strings where you're checking against
a tag is pretty self-defeating. For a week or so after coming up with what
I've discussed so far, I had no solution to this problem, and I thought most
hope was lost. But then I had a realization, which led directly to this blog
post.
For some reason3, Python allows you to use value literals as types. Importantly, value literals can be used not only as a type, but also as a value. Using this slightly odd behavior4, we can create a tag that's accessible at both runtime and at typecheck-time. For example:
from typing import List, Tuple, Literal
NamedInt = Tuple[Literal["a tag"], List[int]]
# ^^^^^^^^^^^^^^^^ A statically-analyzable
# *and* runtime-accessible **tag**!
On its own, this construct is completely and utterly useless. But, if we combine our tags with a union, we get...
Tagged unions
Which are also known as...
Sum types
In Python, sum types are constructed as a union of tuples of a tag and the variant's data. Here's an example:
from typing import List, Tuple, Literal, Union
MySumType = Union[
# ^^^^^ The union
Tuple[Literal["string"], str],
Tuple[Literal["list of ints"], List[int]],
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A variant
# ^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^ The variant's data
# |
# The variant's tag
]
Now that we have our sum type, I'll demonstrate how it's used. The type of
x[1]
where x
is of type MySumType
can be either str
or List[int]
. We
can determine the type of x[1]
we have by looking at the tag, which is x[0]
.
Then, if we do things right, our typechecker will be able to automatically
narrow the type of x[1]
to either str
or List[int]
based on the type and
value of x[0]
in our control flow. Let's give it a try:
def gives_sum_type() -> MySumType:
# Just return one of the two "variants"
return ("list of ints", [1, 2, 2, 3])
# We could also do this, for example:
# return ("string", "foo")
def uses_sum_type():
match gives_sum_type():
case ("string", x):
print("the variant contains a string:", x)
case ("list of ints", x):
print("the variant contains a list of ints:", x)
The typechecker narrows the type of x
in each branch to either str
or
List[int]
based on the tag matched in the case
arm. Values in the tag
position that do not correspond to any tag in the sum type's definition will
cause the typechecker to emit an error; typos and the removal of variants from
the sum type's definition are both covered by this case. Exhaustiveness is also
checked in match
statements, so if you later add variants to MySumType
, the
typechecker will emit an error since not all variants are covered.
If you're not on Python 3.10 or later (which was when match
was introduced),
then you can use the following hack to get the same guarantees:
from typing import NoReturn
def unreachable(_: NoReturn) -> NoReturn:
raise AssertionError("unreachable")
def uses_sum_type():
x = gives_sum_type(False)
if x[0] == "string":
y = x[1]
print("the variant contains a string:", y)
elif x[0] == "list of ints":
y = x[1]
print("the variant contains an list of ints:", y)
else:
unreachable(x)
Avoiding a typesafety footgun
Given the following:
from typing import List, Tuple, Literal, Union
MySumType = Union[
Tuple[Literal["string"], str],
Tuple[Literal["list of ints"], List[int]],
]
MyOtherSumType = Union[
Tuple[Literal["string"], str],
Tuple[Literal["list of strings"], List[str]],
]
The code ("string", "foo")
will pass the typechecker as either MySumType
or MyOtherSumType
because they both have a variant with the same name, which
is not great. One way to get around this would be to create a wrapper class and
instantiate your sum type inside its constructor. For example:
from typing import List, Tuple, Literal, Union
class MySumType:
def __init__(
self,
adt: Union[
Tuple[Literal["string"], str],
Tuple[Literal["list of ints"], List[int]],
],
) -> None:
self.adt = adt
Now you'd write MySumType(("string", "foo"))
instead, which will never
typecheck as another sum type that happens to have variants with the same name.
In order to use match
with this type, you'd simply use match.adt
to get
access to the inner type. This has the added bonus that you can now add class
methods to sum types, which is pretty cool.
Adding documentation
You might also want to add documentation to your sum type variants. We can
accomplish this by moving away from Literal["..."]
s and creating new types:
from typing import List, Tuple, Literal, Union
class String:
"""
A variant containing a string
"""
class ListOfInts:
"""
A variant containing a list of ints
"""
class MySumType:
def __init__(
self,
adt: Union[
Tuple[String, str],
Tuple[ListOfInts, List[int]],
],
) -> None:
self.adt = adt
This is slightly more awkward to write, as MySumType((String(), "foo"))
, but
the benefit of documentation outweighs it in my opinion. Matching still works as
expected as well.
Danger zone
One famously successful use-case for sum types is error handling: their exhaustive and explicit properties make it easy to determine which failures are possible, and from there, which failures you can handle, and then to handle those in a type-safe, high-confidence manner. We can now accomplish this in Python, by replacing the use of exceptions with the following foundation5:
from typing import TypeVar, Union, Tuple, Literal, Generic
class Ok:
"""
A variant containing a success value
"""
class Err:
"""
A variant containing an error value
"""
T = TypeVar("T")
E = TypeVar("E")
class Result(Generic[T, E]):
def __init__(
self,
adt: Union[
Tuple[Ok, T],
Tuple[Err, E],
],
) -> None:
self.adt = adt
# More methods...
This is what scientists call "carcinisation", which is the phenomenon wherein, given a programming language and enough time, it will eventually become Rust.
If the types have generic parameters. Obviously, this works for "regular"
types, because otherwise isinstance()
would be completely useless.
Even further, Union[T, T, T]
is the same as T
; note how the Union
is
dropped entirely.
I am indifferent to the rationale that caused this behavior to exist, but I am very happy that it does because it is incidentally core to making proper sum types in Python.
Hmm, maybe it's not that odd, I guess you could use it to do const generics? I bet that's why it exists.
I would also like to have an ABC for sum types that provide a single
adt()
method that performs a conversion like Result
-> ResultRaw
so that
this sort of interface can be identical for all sum types, which would make
writing manual match
statements more consistent. I can't think of an easy
way to statically type such an ABC right now though, and the .adt
convention
is good enough for my immediate purposes.
Error handling in Rust
Some error handling strategies are more equal than others.
Type-erased errors
Box<dyn std::error::Error>
1 certainly has its uses. It's very convenient if
the API consumer2 genuinely does not care what an error was, only that
there was an error. If the reactive action the API consumer needs to perform
when an error occurs is exactly the same regardless of what the error actually
was, then Box<dyn Error>
is perfectly fine. It's easy to reach for Box<dyn Error>
because getting ?
to work on all error types in your function for free
is very attractive from a convenience standpoint. However, once you need to do
something specific when a specific error occurs, you should no longer be using
Box<dyn Error>
.
In order to handle errors inside Box<dyn Error>
, you must know the exact type
signature of the error you intend to handle. With generic code, this can
sometimes be difficult, especially if you're new to Rust. The compiler can
provide almost no useful diagnostics about whether you're trying to downcast to
the right type. This also means you have to keep track of which concrete errors
are inside the Box<dyn Error>
yourself so that you don't accidentally try to
handle an error that will never occur there. The loss of (potential for)
exhaustive error handling makes it difficult to have confidence in a program's
robustness because Box<dyn Error>
does not encode descriptions of a function's
failure modes in the type system, which makes it extremely easy to overlook
easily-handled errors, instead turning them into fatal problems for program
functionality.
Something else that came to my attention was the question of what to do in the
situation where you're writing a library whose errors are caused by errors
defined in your dependencies/crates you don't control. One school of thought is
to never expose the concrete type of your dependency's error type, instead
preferring to return an enum variant containing simply Box<dyn Error>
. If your
users care about the specifics of that downstream error, they can add your
dependency to their dependencies, then downcast to the concrete type in the code
using your library. This way, if you ever make a semver-incompatible upgrade to
that package, you don't break downstream compilations.
I think this is an incredibly bad idea, because while it doesn't break
compiletime, it does break runtime. The downcast will no longer work, since
types from two versions of the same crate are not the same type. This sort of
behavior seems antithetical to Rust; we have a borrow checker for a reason.
Specifically, if your error handling path was important, and needs to be run
whenever that error occurs, this could be extremely costly in terms of data
corruption or lost capital or just time spent trying to debug why the heck your
code stopped working when you changed nothing (other than running cargo update
, which may even be done automatically by your CI pipeline). (The
alternative I propose is to simply not do this, and instead just expose the
concrete type directly.)
Strongly typed errors
The3 alternative to Box<dyn Error>
is to create a custom enum with
a variant for each error type. For example, you might have variants like
Deserialize(serde_json::Error)
, Http(reqwest::Error)
, and maybe an "unknown"
variant4 if exhaustiveness is infeasible. If you're an API consumer, the bar
for "infeasible" is as low as "I don't need to handle this anywhere so I'm not
going to make a variant for it". But, as soon as you do need to handle it, you
need to make a variant for it. If you're not an API consumer, you should aim
to be exhaustive. There are some cases where this is unreasonable, but those
situations are rare and, as such, this exception likely does not apply to you.
With error enums, knowing which errors are possible is now absolutely trivial, all you need to do is look at the variants of the enum. The compiler can also give vastly more helpful diagnostics this way, since it will be able to follow the type system around to ensure that you're handling all cases, and that you're not inventing cases that will never actually happen. No longer do you need to rely on possibly-stale manual documentation or have to read the entire call tree to determine what the failure modes are5.
Another advantage of using enums is allowing multiple errors of the same type to
have different semantics. Maybe you need to load two files (std::io::Error
),
but you need to do something different based on which file failed to load. With
enums, you can simply create two variants, one for each behavior. With Box<dyn Error>
, this is not possible6.
Now, the downside: you must manually implement std::error::Error
for your new
custom error enum. This means Display
, Debug
, and all the From
impls so
?
is still ergonomic. Luckily, the thiserror
crate provides
a procedural macro that allows you derive all of those traits. When you use the
#[from]
attribute to generate From
impls, it even correctly implements
std::error::Error::source()
for you! This makes acquiring
detailed error messages (e.g. for logging) using nothing but (ostensibly) the
standard library very easy.
Custom error types
There are some rules about creating custom error types that you should follow in order to create the best possible error messages for your users and fellow developers.
-
If your error has an inner error, or your error is caused by another error, you must pick exactly one of the following options for each inner error:
-
Return the inner error when
Error::source()
is called.With
thiserror
, this means using the#[from]
or#[source]
attributes. Generally, reach for#[from]
first unless it fails to compile, and in that situation, switch to#[source]
. Without aFrom
impl, you can useResult<T, E>::map_err()
to convert the inner error into your custom error type. -
Include the inner error's message as part of your own error's.
With
thiserror
, this means using{0}
in your#[error("...")]
, assuming the variant is a tuple variant with the inner error stored in the zeroth tuple item.
This prevents an error message from containing needlessly duplicated information. These options are also listed in the order that you should prefer to do them, the first one being much more common. If you're not sure which to do, just pick number 1.
-
-
The human-readable part of your error message should not include the
:
character. What I mean by "human-readable part" is that if, for example, your error message happens to include JSON, then don't worry about it. Just don't use:
in#[error("...")]
strings, basically.The reason for this is that
:
is commonly used to indicate causality on a single line, for example:failed to create user: failed to execute SQL statement: invalid SQL
All three of these would be a separate concrete error type, each of which being wrapped inside an enum variant of the preceding message.
-
Display
impls forstd::error::Error
implers should not start with a capital letter, except for special cases like if it begins with an initialism. For example, this looks inconsistent and gross:failed to create user: Failed to execute SQL statement: Invalid SQL
-
Don't use sentence-ending punctuation in error messages. Your error may not be the last one in the chain.
Displaying error messages
Stick the following code into src/error.rs
in your project and add thiserror = "1"
to your dev dependencies:
#![allow(unused)] fn main() { use std::{ error::Error, fmt::{self, Display, Formatter}, iter, }; /// Wraps any [`Error`][e] type so that [`Display`][d] includes its sources /// /// # Examples /// /// If `Foo` has a source of `Bar`, and `Bar` has a source of `Baz`, then the /// formatted output of `Chain(&Foo)` will look like this: /// /// ``` /// # use crate::error::Chain; // YOU WILL NEED TO CHANGE THIS /// # use thiserror::Error; /// # #[derive(Debug, Error)] /// # #[error("foo")] /// # struct Foo(#[from] Bar); /// # #[derive(Debug, Error)] /// # #[error("bar")] /// # struct Bar(#[from] Baz); /// # #[derive(Debug, Error)] /// # #[error("baz")] /// # struct Baz; /// # fn try_foo() -> Result<(), Foo> { Err(Foo(Bar(Baz))) } /// match try_foo() { /// Ok(foo) => { /// // Do something with foo /// # drop(foo); /// # unreachable!() /// } /// Err(e) => { /// assert_eq!( /// format!("foo error: {}", Chain(&e)), /// "foo error: foo: bar: baz" /// ); /// } /// } /// ``` /// /// [e]: Error /// [d]: Display #[derive(Debug)] pub(crate) struct Chain<'a>(pub &'a dyn Error); impl<'a> Display for Chain<'a> { fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result { write!(f, "{}", self.0)?; let mut source = self.0.source(); source .into_iter() .chain(iter::from_fn(|| { source = source.and_then(Error::source); source })) .try_for_each(|source| write!(f, ": {}", source)) } } }
This representation is most useful for logging, but could be easily adapted for other uses as well. Once the result of this tracking issue lands in stable, this should be even more ergonomic. Especially if they implement this person's suggestion, which is almost exactly what's written above.
A note about panicking
Panicking is not an error handling strategy, panicking is panicking. You should
only resort to panicking when an illegal state has been reached, or for
convenience when it's not possible to prove that this state will never occur
with the type system. If there is no possible way to recover and further program
execution wouldn't make any sense, is dangerous, or would invoke undefined
behavior, then you can panic. If you have proven out-of-band that this state is
unreachable, then you can panic. This also applies to working with Option
,
Result
, and other "maybe a value" types: you should only unwrap (aka get the
value you want and panic if it's not there) if the alternative is an illegal
state. I could probably fill another blog post about specific examples of when
to or when not to unwrap maybe types, so for now I'm just going to leave it at
that.
Further reading
- Rust API Guidelines' C-GOOD-ERR
- The docs for
std::error::Error
- Guidelines for implementing Display and Error::source for library errors
Footnotes
Henceforth Box<dyn Error>
. I'm eliding extra constraints like + Send + Sync + 'static
since they're not strictly relevant. Also, you can/should
substitute in its other analogs such as the anyhow
crate.
The "API consumer" is the person who will need to handle the error. If you are writing an application, you are an API consumer. If you are writing a library (at least, the public interface), you are not an API consumer. If your codebase is large and you work with other people, you might also want to consider the people you're working with as API consumers when writing new fallible code.
An? I genuinely can't think of a third option.
If you're an API consumer, using an Unknown(Box<dyn Error>)
variant
straight up is fine as long as you've taken the rest of this blog post into
consideration. If you're not an API consumer, this is somewhat more
complicated. If your code is the one creating the error and an API consumer is
expected to handle it, you may need either Unknown
, Unknown(Box<dyn Error>)
, or some other such variant with an opaque inner type. If you're
defining the contract for a function (say, with a trait, or something to do
with closures), you should use either T
directly (no enum) or an Other(T)
variant (where T
is a generic parameter) so that your API consumer can
decide what to do, instead of having their hand be forced.
Like you would with a language with an explicit concept of exceptions that either don't require type annotations and/or cannot be type-annotated (*cough* Python), or poorly written Rust.
Unless you abuse the newtype pattern. But isn't the whole point of using
Box<dyn Error>
to not care about errors? Having to create a new type sounds
like you care about errors. Also, as a new Rust user, good luck figuring out
why you need to downcast to two different types that aren't the type you want
to get the type you want.
Considered harmful
A collection of things that I hold to be silly. In these pages, I'll try to convince you that these things should be avoided. See here for the etymology of the title of this section.
Extreme minimalism considered harmful
ORMs considered harmful
"Just write SQL lol" but unironically
I think people reach for ORMs because they are afraid of making mistakes, they do not want to learn SQL, or both. People assume that their ORM of choice will generate correct (i.e. does what they intended), valid (i.e. running it won't produce an error, such as a syntax error), and performant SQL for the queries they need to make, and that they can do this without learning SQL itself. I think wanting to avoid mistakes is a very noble goal, but I do not think ORMs are the right way to acheive it. I also think learning SQL is unavoidable, and reliance on ORMs are a net hinderance for yourself and other maintainers. Allow me to explain why I think these things, and perhaps convince you as well.
ORMs are a performance black box. The query you want to perform may not be perfectly supported by your ORM without some workarounds, leading to suboptimal SQL being generated. Updating your ORM may cause it generate different SQL for the same ORM code. The ORM may not use the most optimal SQL for a given operation, resulting in worse performance. You can improve performance and the consistency thereof by not using an ORM.
ORMs are an extra moving part, which is an extra point of failure. By using an ORM, you trust its test suite (if any) to ensure that it will generate desirable SQL. You also trust that updating the ORM library won't subtly break your program because it changes the SQL it generates under certain conditions. You can easily make your supply chain more secure and your program more reliable by not using an ORM.
ORMs are a waste of developer bandwidth. Instead of being able to read the documentation for the database you're using to learn how to manipulate and query data, you have to do that and read the documentation for your ORM to translate the SQL you want into ORM code... which just turns it back into the SQL you wanted to begin with. Unless of course, there's a bug, or the ORM can't represent the SQL you want, in which cases you'll need to go back to writing SQL directly anyway. Also, in my experience, ORM documentation quality pales absolutely in comparison to database documentation. You can save yourself and other maintainers a lot of reading and frustration by not using an ORM.
ORMs are a waste of human memory. Learning ORMs is an O(N)
problem, where N
is the amount of ORMs across all programming languages. Each language has
a handful of ORMs, so multiple projects in the same programming language may be
not even use the same ORM. This raises the bar to contributing quite a bit.
Contrast this with writing SQL directly, which is an O(1)
problem across all
programming languages. SQL is the same everywhere, regardless of even
programming language. (And yes, I'm aware there are multiple flavors of SQL, but
that doesn't change this argument at all; ORMs introduce an unnecessary factor
either way.) You can let everyone remember more important things by not using an
ORM.
ORMs are bad at ensuring query validity and correctness. It's entirely possible for an ORM to generate invalid SQL even though the ORM code is correct. ORMs also generally do not validate type information against the database/schema itself, if at all, so typesafety is often lost. ORMs effectively only provide a false sense of security. Statically checked queries (such as those made available by SQLx), solve these problems, and provide actual safety benefits. Essentially, the compiler finds the text of your SQL queries and dry-runs them against a running database (or a description of your schema) to validate their syntactic, semantic, and type-level correctness, both within the query itself and also how the returned data is used by the code. If any of these checks fail, your code fails to compile, and you've successfully avoided making mistakes and avoided the problems posed by ORMs by not using an ORM.
Unfortunately, not every programming language and database library allow for statically checked queries, and that is a major failing on their part. You should either upstream a fix where possible, or jump ship to something that isn't gimped. If you can't do either of those things, I still urge you to not use ORMs. The lives of the maintainers of your software, including yourself, will be improved by not using an ORM.
Using verifiers in tests considered harmful
Don't test X when you care about Y.
A verifier is something that allows the inspection of (parts of) the call stack of a function-under-test, to assert things like "X function was called Y times", "X function was called with Y and Z arguments (and returned W)", and so on. They are generally combined with mocks, but can be used without them.
For example, let's say a test needs to be written for a function that does the
equivalent of mkdir -p foo/bar/baz
. Someone using verifiers would first need
to mock the language's standard library's "create directory" function, to avoid
the side effect of actually creating directories. Then, they might assert that
the "create directory" function was called 3 times, perhaps with the expected
arguments.
Later on, it turns out that creating directories is a major bottleneck in the application for whatever reason, and so this logic needs to be rewritten using io_uring. This probably means pulling in a new dependency that provides its own io_uring-aware "create directory" function. Once the refactoring is completed, the test written above is now failing even though it still has the exact same behavior. Unfortunately, the mock and verifier applied in the original test are not aware of this new way of creating directories, and so the test fails because it was reliant on implementation details of the original code.
This situation is both frustrating and avoidable. I don't have any advice for handling the frustration, but I do have advice for avoiding inflicting it: instead of testing that the language's standard library's "create directory" function was called 3 times (mocked or not), one should instead run the function-under-test without mocking away its side effects and then read back the directories on disk and assert that they were created as desired. This way, the exact methodology for creating these side effects is free to change as needed, and the tests will only and always pass as long as the desired observable behavior is maintained.
The important differences between these two ways of writing this test are that the verifiers method:
- becomes a false-negative and a maintenance burden when the implementation changes
- does not guarantee that the function-under-test causes the desired (side) effect
While the other method:
- is resilient to changes in the implementation details
- does guarantee that the function-under-test causes the desired (side) effect
Short-form annoyances
There are a lot of pain-points in various things I interact with that aren't really worthy of a typical length blog post. This gives me a place to write about these issues because, regardless of article length, I believe they're still important.
Python annoyances
Day (date.today() - date(2021, 7, 14)).days
of not understanding why Python is
used in production
pathlib
>>> from pathlib import PurePath
>>> PurePath(b"")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/nix/store/y3inmdhijqkb4qj36yphj4cbllljhqzz-python3-3.9.6/lib/python3.9/pathlib.py", line 665, in __new__
return cls._from_parts(args)
File "/nix/store/y3inmdhijqkb4qj36yphj4cbllljhqzz-python3-3.9.6/lib/python3.9/pathlib.py", line 697, in _from_parts
drv, root, parts = self._parse_args(args)
File "/nix/store/y3inmdhijqkb4qj36yphj4cbllljhqzz-python3-3.9.6/lib/python3.9/pathlib.py", line 686, in _parse_args
raise TypeError(
TypeError: argument should be a str object or an os.PathLike object returning str, not <class 'bytes'>
This is fine because every file system on the planet is UTF-8, clearly.
I heard a counterargument that "it says in the docs that for
'low-level path manipulation on strings, you can also use the os.path
module.'" I take a few issues with this: nowhere in the docs are there explicit
mentions that exceptions will be raised when passing bytes
instead of str
;
nowhere in the docs are there explicit type annotations suggesting that you can
only use str
; and the phrasing of that little warning uses such passive
language that it doesn't seem like there's any real reason to care about this
case in the first place.
I would expect a library designed specifically for dealing with paths to be
able to deal with paths, so I find this behavior to be... surprising.
A counterargument I heard to this is that "pathlib
just provides a high level
OOP interface to paths" but I don't understand how that's mutually exclusive
with handling bytes/non-UTF-8.
datetime
I'd like to convert an ISO 8601 timestamp string to the appropriate Python
object. Looks like datetime.fromisoformat
is the way to do
that. But wait:
Caution: This does not support parsing arbitrary ISO 8601 strings - it is only intended as the inverse operation of
datetime.isoformat()
. A more full-featured ISO 8601 parser,dateutil.parser.isoparse
is available in the third-party packagedateutil
.
If you're going to add a function for this to the standard library, you'd think
you'd want to avoid half-assing it, no? Since it's built-in, it's way more
likely to be used than any third party package. Anyway, now that I've got my ISO
8601 string (which includes timezone information) converted into a datetime
object, let's compare it against the current time:
(Pdb) p cache_expires_at
datetime.datetime(2022, 4, 20, 21, 41, 52, 955721, tzinfo=datetime.timezone.utc)
(Pdb) p cache_expires_at < datetime.utcnow()
*** TypeError: can't compare offset-naive and offset-aware datetimes
(Pdb) datetime.utcnow()
datetime.datetime(2022, 4, 20, 20, 43, 23, 982491)
... what!? Why does datetime.utcnow()
not have timezone information?
Shouldn't it know what timezone the datetime
it's creating is in since it
literally has utc
in the name? Okay, it looks like the docs actually
address this:
Warning: Because naive
datetime
objects are treated by manydatetime
methods as local times, it is preferred to use aware datetimes to represent times in UTC. As such, the recommended way to create an object representing the current time in UTC is by callingdatetime.now(timezone.utc)
.
Well, sort of, anyway. Why even provide this method if it omits timezone
information, then? Why are naive datetime
s treated as local time? I bet there
are some horrifying edge cases there. Another big point of pain is that since
naive and aware timestamps are both the same type, tools like pyright
can't
even warn about this stuff statically. You need good code coverage (hard, rare)
or manual testing (ew) to be able to detect this sort of error. Similarly,
pydantic
can't easily enforce timestamps to be timezone-aware since again,
there's a single type for both cases. It's incredibly silly to allow this sort
of error to even happen when it could so easily be prevented by having two
separate types.
PyPI
For some reason, PyPI allows packages to be uploaded with version requirements
that almost definitely will not work. If I make a package that depends on *
or
>1
or such of some other dependency, PyPI will happily accept my upload. The
problem is that, as soon as that dependency releases 2.0
, my package is sure
to break. For a real world example of this, see here.
PEP 440 defines Python's own special versioning scheme (instead of just using SemVer like everyone else) with liberal usage of the word "MUST" but then official Python tooling (like PyPI) opt not to enforce it at all. What's even the point, then? Also, what even is a "post release"? Asking Google "post release meaning" gives me a bunch of stuff about prisoners, and appending "software" to the query doesn't help either. After eventually finding the explanation in the PEP, the answer is "it's functionally identical to SemVer Patch releases except we decided to make it a separate thing for no reason".
Since Python decided not to use SemVer, it now also needs to invent its own
syntax for specifying allowable dependency versions. It's a mess, and
quite easy to misuse since nobody knows what ~=
means, nor realizes you can use
,
to add additional constraints. This could all have been neatly avoided by
adopting SemVer instead. Speaking of ~=
, here's a cheap shot:
The spelling of the compatible release clause (
~=
) is inspired by the Ruby (~>
) and PHP (~
) equivalents.
— PEP 440
Ah yes, PHP, the paragon of good design. Smartly,
Poetry lets you just use the standard SemVer syntax for this (^
).
Poetry
poetry remove
has no --lock
option.
Adding dependencies
poetry add
can take forever. Trying to add new dependencies is a nightmare,
and that's due to both the aforementioned performance issues and the fact that,
due to the way that Python imports work, it is impossible to have multiple
versions of a single package installed at a time. As a direct result of these
things, I just spent over five minutes trying to install dependencies.
Observe:
$ poetry add --lock --source REDACTED [REDACTED_0..REDACTED_6]
...
Updating dependencies
Resolving dependencies... (40.4s)
...
SolverProblemError
Then after some vim pyproject.toml
to comment out things that caused the
SolverProblemError
:
$ poetry add --lock --source REDACTED [REDACTED_0..REDACTED_6]
...
Updating dependencies
Resolving dependencies... (115.3s)
...
Writing lock file
Cool, this time it worked, but I'm still not done getting the dependencies I need. So let's add them back:
$ poetry add --lock attrs marshmallow
...
Updating dependencies
Resolving dependencies... (39.0s)
...
SolverProblemError
Okay fine so I need to manually specify an older version of marshmallow
because for some reason poetry
just picks the newest one instead of trying to
find the newest compatible one. Let's try again with the version it says is
causing the conflict:
$ poetry add --lock attrs 'marshmallow^2'
...
Updating dependencies
Resolving dependencies... (35.8s)
...
SolverProblemError
Okay so now attrs
is having the same problem. Following the same pattern:
$ poetry add --lock 'attrs^19' 'marshmallow^2'
...
Updating dependencies
Resolving dependencies... (106.8s)
...
Writing lock file
Thank fuck, it's finally over. Well, for this project. We have a lot of projects
that need to be converted to poetry
. It'll be worth it though because
pip
/pip-compile
is worse, and poetry2nix
is nice.
Just for fun, let's try something similar in a different language:
$ time cargo add rand syn rand_core libc cfg-if quote proc-macro2 unicode-xid serde bitflags
...
... 1.968 total
$ time cargo update # to rebuild the lockfile
...
... 0.704 total
Under 2 seconds. No literally unfixable issues with incompatible transitive dependencies. It Just Works™. Incredible.
Black <22.3.0
incompatible with Click >=8.1
[T]he most recent release of Click, 8.1.0, is breaking Black. This is because Black imports an internal module so Python 3.6 users with misconfigured LANG continues to work mostly properly. The code that patches click was supposed to be resilient to the module disappearing but the code was catching the wrong exception.
I find the quantity of backlinks to this issue to be greatly amusing. (There's probably way more than shown too due to the existence of private repositories.) This is what happpens when hobbyists and the industry take a language seriously even though it lacks:
- A language-enforced concept of item privacy
- The ability to have multiple versions of a package in the dependency tree
- Statically checkable error types
Combinatorial exhaustiveness
Let's see what various typecheckers think about the following code:
from typing import Literal, Tuple, Union
SumType = Union[
Tuple[Literal["foo"], str],
Tuple[Literal["bar"], int],
]
def assert_int(_: int): pass
def assert_str(_: str): pass
def assert_combinatorial_exhaustion(
first: SumType,
second: SumType,
):
match (first, second):
case (("foo", x), ("foo", y)):
assert_str(x)
assert_str(y)
case (("foo", x), ("bar", y)):
assert_str(x)
assert_int(y)
case (("bar", x), ("foo", y)):
assert_int(x)
assert_str(y)
case (("bar", x), ("bar", y)):
assert_int(x)
assert_int(y)
Pytype
I couldn't get this to run on NixOS, so I don't know.
Rating: ?/10
Pyre
I couldn't get this to run on NixOS either, but they do have a web based version for some reason. Here's what it says:
21:23: Incompatible parameter type [6]: In call `assert_str`, for 1st positional only parameter expected `str` but got `Union[int, str]`.
22:23: Incompatible parameter type [6]: In call `assert_str`, for 1st positional only parameter expected `str` but got `Union[int, str]`.
24:23: Incompatible parameter type [6]: In call `assert_str`, for 1st positional only parameter expected `str` but got `Union[int, str]`.
25:23: Incompatible parameter type [6]: In call `assert_int`, for 1st positional only parameter expected `int` but got `Union[int, str]`.
27:23: Incompatible parameter type [6]: In call `assert_int`, for 1st positional only parameter expected `int` but got `Union[int, str]`.
28:23: Incompatible parameter type [6]: In call `assert_str`, for 1st positional only parameter expected `str` but got `Union[int, str]`.
30:23: Incompatible parameter type [6]: In call `assert_int`, for 1st positional only parameter expected `int` but got `Union[int, str]`.
31:23: Incompatible parameter type [6]: In call `assert_int`, for 1st positional only parameter expected `int` but got `Union[int, str]`.
pyre
is clearly unable to do type narrowing in the match arms. There are no
warnings about exhaustiveness, however; is that working properly? Let's pass it
the simplest possible code to test for that:
def assert_exhaustion(
x: bool,
) -> None:
match x:
case True:
pass
Output:
No Errors!
Rating: 0/10
Mypy
playground/main.py:23: error: INTERNAL ERROR -- Please try using mypy master on Github:
https://mypy.readthedocs.io/en/stable/common_issues.html#using-a-development-mypy-build
If this issue continues with mypy master, please report a bug at https://github.com/python/mypy/issues
version: 0.941
playground/main.py:23: : note: please use --show-traceback to print a traceback when reporting a bug
Yep, you're reading that right; mypy
just crashes.
Rating: Comically bad/10
Pyright
error: Cases within match statement do not exhaustively handle all values
Unhandled type: "tuple[SumType, SumType]"
If exhaustive handling is not intended, add "case _: pass" (reportMatchNotExhaustive)
pyright
's lack of errors about the assert_{str,int}
functions indicates that
it is correctly doing type narrowing, so that's good. However, it states pretty
clearly that it thinks this match is not exhaustive. Tragically, someone
reported this already and it got closed as wontfix.
Rating: 5 pity points since it can at least narrow types and not crash/10
rustc
#![allow(unused)] fn main() { enum SumType { Foo(String), Bar(i32), } fn assert_int(_: i32) {} fn assert_str(_: String) {} fn assert_combinatorial_exhaustiveness( first: SumType, second: SumType, ) { match (first, second) { (SumType::Foo(x), SumType::Foo(y)) => { assert_str(x); assert_str(y); }, (SumType::Foo(x), SumType::Bar(y)) => { assert_str(x); assert_int(y); }, (SumType::Bar(x), SumType::Foo(y)) => { assert_int(x); assert_str(y); }, (SumType::Bar(x), SumType::Bar(y)) => { assert_int(x); assert_int(y); }, } } println!("The match is exhaustive and the types check out."); println!( "If this weren't the case, you'd be seeing a compiler error message here." ); }
(Hit the play button in the top right corner.)
Rating: 10/10
Nix annoyances
Random stuff I find annoying about Nix that doesn't take an entire article to summarize
poetry2nix
poetry2nix
appears to not work with dependencies with md5 hashes, only sha256.
Due to Reasons™, a large amount of packages I have to deal with straight
up don't work as a result of this. It makes adding Python dependencies even
worse than it already is.
Rust
Okay, so it's not all sunshine and rainbows...
You can't turbofish Into::into()
I was just writing some code that involves a pretty simple conversion from type A to type C via type B. I initially had some nice-looking code like this:
#![allow(unused)] fn main() { struct A; impl A { fn new() -> Self { Self } } impl A { fn something(self) -> Self { self } } impl A { fn something_else(self) -> Self { self } } struct B; impl From<A> for B { fn from(_other: A) -> Self { B } } struct C; impl From<B> for C { fn from(_other: B) -> Self { C } } let something: A = A::new(); let x = something .something() .something_else() .into::<B>() .into::<C>(); }
However, hitting the play button will reveal an error message. It rightfully
tells us that the into()
function does not take any generic parameters.
Instead, the Into
trait itself is generic. This means, if we want to use
turbofishing, we'd have to write this:
#![allow(unused)] fn main() { struct A; impl A { fn new() -> Self { Self } } impl A { fn something(self) -> Self { self } } impl A { fn something_else(self) -> Self { self } } struct B; impl From<A> for B { fn from(_other: A) -> Self { B } } struct C; impl From<B> for C { fn from(_other: B) -> Self { C } } let something: A = A::new(); let x = Into::<C>::into( Into::<B>::into( something.something().something_else() ) ); }
There are two problems with this: first, it looks awful, and second, this is
what From
is for.
We (and by that, I mean this Reddit comment) can actually do better by
writing our own version of Into
where the method is generic, then adding
a blanket impl over the real Into
so that we get existing conversions for
free. It looks like this:
#![allow(unused)] fn main() { struct A; impl A { fn new() -> Self { Self } } impl A { fn something(self) -> Self { self } } impl A { fn something_else(self) -> Self { self } } struct B; impl From<A> for B { fn from(_other: A) -> Self { B } } struct C; impl From<B> for C { fn from(_other: B) -> Self { C } } /// Turbofishable version of [`Into`](Into) pub(crate) trait IntoT { /// Performs the conversion fn into_t<T>(self) -> T where Self: Into<T>; } // Blanket impl delegating to `Into` impl<U> IntoT for U { #[inline] fn into_t<T>(self) -> T where Self: Into<T>, { self.into() } } let something: A = A::new(); let x = something .something() .something_else() .into_t::<B>() .into_t::<C>(); }
And it works too. The downside is that you now have to have this random extra bit of code somewhere in your project and it won't be instantly recognizable by other Rust programmers.
Doctests don't work on private items
Pretty straightforward. You can write doctests on private items but running
cargo test
won't execute them. "Private items" here meaning anything not
exposed in the public API or anything in a bin crate. There's an open issue
about this here.
False-positive warnings for dead code in integration tests
Not really sure how I didn't hit this till just today, but either way this is pretty unfortunate. My current workaround is to just keep everything in one massive file, which is not great, but I like the other workarounds even less. Here's the GitHub issue.
3D Printing
I have a lot of thoughts about 3D printing and should probably stop spamming Matrix about it. Maybe this is what the fediverse is for. Anyway.
SliceWorx KP3S Pro S1
Living document that serves as notes on and a review of this 3D printer
The SliceWorx KP3S Pro S1 is a version of the Kingroon KP3S Pro, which itself is a version of the Kingroon KP3S. The selling point of the SliceWorx KP3S Pro S1 over the Kingroon KP3S Pro is that it ships with linear rails on all axes. Kingroon does sell an upgrade kit to convert the Y axis to linear rails, but their kit will cost you ~1cm of build height, whereas SliceWorx redesigned the Y axis a bit so you can have your cake and eat it too.
In my opinion, this printer has great value for the 250 USD I paid for it. As far as I can tell, it's only possible to buy this version of the printer through SliceWorx themselves, since they make some modifications before sending it to you. If you're looking for a relatively well-designed first printer, I'd say this fits the bill. Of course, there's always room for tinkering if you're into that, but the out-of-the-box experience is fine for typical use-cases.
Mainboard
As you can see, this uses a GD32F303VET6
, which is unsupported by
Marlin.
Marlin
I'm currently running Marlin bugfix-2.1.x
on it anyway using my configuration
here. I'm going to be doing a lot of rebasing/force-pushing so
be aware of that if you're going to be using or basing off of my configuration.
Don't expect anything to work properly until you cherry-pick
this patch into your tree, reproduced here just in case:
From 1940418bbe89d07863ed05c6cddb1edf285a5a31 Mon Sep 17 00:00:00 2001
From: Alexander Gavrilenko <jmz52@users.noreply.github.com>
Date: Tue, 24 Jan 2023 20:37:03 +0300
Subject: [PATCH] Update tft_fsmc.cpp
---
Marlin/src/HAL/STM32/tft/tft_fsmc.cpp | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/Marlin/src/HAL/STM32/tft/tft_fsmc.cpp b/Marlin/src/HAL/STM32/tft/tft_fsmc.cpp
index cf9e569336bc..aad796231436 100644
--- a/Marlin/src/HAL/STM32/tft/tft_fsmc.cpp
+++ b/Marlin/src/HAL/STM32/tft/tft_fsmc.cpp
@@ -100,11 +100,11 @@ void TFT_FSMC::Init() {
HAL_SRAM_Init(&SRAMx, &Timing, &ExtTiming);
- __HAL_RCC_DMA2_CLK_ENABLE();
-
#ifdef STM32F1xx
- DMAtx.Instance = DMA2_Channel1;
+ __HAL_RCC_DMA1_CLK_ENABLE();
+ DMAtx.Instance = DMA1_Channel1;
#elif defined(STM32F4xx)
+ __HAL_RCC_DMA2_CLK_ENABLE();
DMAtx.Instance = DMA2_Stream0;
DMAtx.Init.Channel = DMA_CHANNEL_0;
DMAtx.Init.FIFOMode = DMA_FIFOMODE_ENABLE;
In order to get a basic config going, you'll want to find and cherry-pick the
configure electronics
and copy mechanical configuration from manufacturer
commits into your tree.
Klipper
I hear it's supposed to be possible to put Klipper on this, but I have yet to try. Currently stuck in decision fatigue for which SBC to get.
Further reading
A collection of links to other things worth checking out
Software
-
Guy Steele: Computer Science Metanotation
A video mostly about the language used to talk about PL theory in whitepapers, but also includes some interesting PL history.
-
Richard Feldman: Why Isn't Functional Programming the Norm?
The title of the video is unintentional misdirection, most of the time is spent talking about how OOP came to be popular. In short, it was by complete accident: designers of the early now-popular OOP languages wanted FP features like modules and privacy but discovered OOP as a way to emulate those behaviors before, decades later, coming to agree that what we actually wanted was modules from the beginning. (See C With Classes, and C++ slated to add modules.) Also there's passing mention of the fact that JavaScript was originally going to be a Scheme, but enterprise hype for Java was overwhelming. Can you imagine if that went differently? Insane.
Reliability and robustness
-
Dan Luu: Given that devs spend the effort they do on testing, what can we do to improve testing?
An interesting article about testing software from the perspective of someone with more experience testing hardware. Most of the article is spent discussing tools that automatically generate tests, that is, automatic automated testing. Inspired by this article, I got
cargo-fuzz
working with myfar
crate and discovered a bug! I like fuzzers now. -
Alexis King: Parse, don't validate
Great post that articulates how to leverage type-driven design to ensure program correctness. I highly, highly, highly recommend reading this.
-
Drew DeVault: Software developers have stopped caring about reliability
This is more of a rant than anything else, but if you excuse that aspect of it, I feel like this article makes some very good points and addresses some very important issues. There is a (probably unintentional) allusion to Parse, Don't Validate in there too so that's cool.
-
Dan Luu: One week of bugs
Dan must've read Drew's post above because this seems like a direct response to Drew's call-to-action at the very end. TL;DR: everything is on fire.
-
Graham: When to "address" "technical debt"?
If you're in a hurry, just read the last 3 paragraphs. If you're in an even bigger hurry, the short version is "don't create technical debt in the first place".
-
Ben Kuhn: Why are software tool vendors so unbelievably horrible?
A very funny yet depressing twitter tirade. The moral of the story is to do your best to not end up on another twitter thread like this one.
Developer responsibility
-
Drew DeVault: We are complicit in our employer's deeds
To keep the world from becoming shittier, avoid implementing shitty things.
Rust
-
johnthagen: min-sized-rust
Useful reference for when you're doing embedded and need to make your binary not massive.
Python
-
Ned Batchelder: Facts and Myths about Python names and values
This is a good video about how Python's memory model works.
-
Larry Hastings: Removing Python's GIL
A video on the efforts for removing Python's GIL. In short, Python's memory model is not amenable to performance once you're allowed to use threads because suddenly everything must be atomically reference counted, so you can say goodbye to your CPU cache. If you add a borrow checker and move semantics you could alleviate this problem, but we already have that and it's called Rust.
Git
-
krick: "conventional commits" are the stupidest thing ever
Someone complaining about conventional commits, and I'm very much in agreement with their analysis
Creative Commons Attribution-ShareAlike 4.0 International
Creative Commons Corporation (“Creative Commons”) is not a law firm and does not provide legal services or legal advice. Distribution of Creative Commons public licenses does not create a lawyer-client or other relationship. Creative Commons makes its licenses and related information available on an “as-is” basis. Creative Commons gives no warranties regarding its licenses, any material licensed under their terms and conditions, or any related information. Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible.
Using Creative Commons Public Licenses
Creative Commons public licenses provide a standard set of terms and conditions that creators and other rights holders may use to share original works of authorship and other material subject to copyright and certain other rights specified in the public license below. The following considerations are for informational purposes only, are not exhaustive, and do not form part of our licenses.
-
Considerations for licensors: Our public licenses are intended for use by those authorized to give the public permission to use material in ways otherwise restricted by copyright and certain other rights. Our licenses are irrevocable. Licensors should read and understand the terms and conditions of the license they choose before applying it. Licensors should also secure all rights necessary before applying our licenses so that the public can reuse the material as expected. Licensors should clearly mark any material not subject to the license. This includes other CC-licensed material, or material used under an exception or limitation to copyright. More considerations for licensors.
-
Considerations for the public: By using one of our public licenses, a licensor grants the public permission to use the licensed material under specified terms and conditions. If the licensor’s permission is not necessary for any reason–for example, because of any applicable exception or limitation to copyright–then that use is not regulated by the license. Our licenses grant only permissions under copyright and certain other rights that a licensor has authority to grant. Use of the licensed material may still be restricted for other reasons, including because others have copyright or other rights in the material. A licensor may make special requests, such as asking that all changes be marked or described. Although not required by our licenses, you are encouraged to respect those requests where reasonable. More considerations for the public
Creative Commons Attribution-ShareAlike 4.0 International Public License
By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-ShareAlike 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions.
Section 1 – Definitions
a. Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image.
b. Adapter's License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License.
c. BY-SA Compatible License means a license listed at https://creativecommons.org/compatiblelicenses, approved by Creative Commons as essentially the equivalent of this Public License.
d. Copyright and Similar Rights means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights.
e. Effective Technological Measures means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements.
f. Exceptions and Limitations means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material.
g. License Elements means the license attributes listed in the name of a Creative Commons Public License. The License Elements of this Public License are Attribution and ShareAlike.
h. Licensed Material means the artistic or literary work, database, or other material to which the Licensor applied this Public License.
i. Licensed Rights means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license.
j. Licensor means the individual(s) or entity(ies) granting rights under this Public License.
k. Share means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them.
l. Sui Generis Database Rights means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world.
m. You means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning.
Section 2 – Scope
a. License grant.
-
Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to:
A. reproduce and Share the Licensed Material, in whole or in part; and
B. produce, reproduce, and Share Adapted Material.
-
Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions.
-
Term. The term of this Public License is specified in Section 6(a).
-
Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material.
-
Downstream recipients.
A. Offer from the Licensor – Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License.
B. Additional offer from the Licensor – Adapted Material. Every recipient of Adapted Material from You automatically receives an offer from the Licensor to exercise the Licensed Rights in the Adapted Material under the conditions of the Adapter’s License You apply.
C. No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material.
-
No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i).
b. Other rights.
-
Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise.
-
Patent and trademark rights are not licensed under this Public License.
-
To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties.
Section 3 – License Conditions
Your exercise of the Licensed Rights is expressly made subject to the following conditions.
a. Attribution.
-
If You Share the Licensed Material (including in modified form), You must:
A. retain the following if it is supplied by the Licensor with the Licensed Material:
i. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);
ii. a copyright notice;
iii. a notice that refers to this Public License;
iv. a notice that refers to the disclaimer of warranties;
v. a URI or hyperlink to the Licensed Material to the extent reasonably practicable;
B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and
C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.
-
You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.
-
If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable.
b. ShareAlike.
In addition to the conditions in Section 3(a), if You Share Adapted Material You produce, the following conditions also apply.
-
The Adapter’s License You apply must be a Creative Commons license with the same License Elements, this version or later, or a BY-SA Compatible License.
-
You must include the text of, or the URI or hyperlink to, the Adapter's License You apply. You may satisfy this condition in any reasonable manner based on the medium, means, and context in which You Share Adapted Material.
-
You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, Adapted Material that restrict exercise of the rights granted under the Adapter's License You apply.
Section 4 – Sui Generis Database Rights
Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material:
a. for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database;
b. if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material, including for purposes of Section 3(b); and
c. You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database.
For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights.
Section 5 – Disclaimer of Warranties and Limitation of Liability
a. Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You.
b. To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You.
c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability.
Section 6 – Term and Termination
a. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically.
b. Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates:
-
automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or
-
upon express reinstatement by the Licensor.
For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License.
c. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License.
d. Sections 1, 5, 6, 7, and 8 survive termination of this Public License.
Section 7 – Other Terms and Conditions
a. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed.
b. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License.
Section 8 – Interpretation
a. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License.
b. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions.
c. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor.
d. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority.
Creative Commons is not a party to its public licenses. Notwithstanding, Creative Commons may elect to apply one of its public licenses to material it publishes and in those instances will be considered the “Licensor.” The text of the Creative Commons public licenses is dedicated to the public domain under the CC0 Public Domain Dedication. Except for the limited purpose of indicating that material is shared under a Creative Commons public license or as otherwise permitted by the Creative Commons policies published at https://creativecommons.org/policies, Creative Commons does not authorize the use of the trademark “Creative Commons” or any other trademark or logo of Creative Commons without its prior written consent including, without limitation, in connection with any unauthorized modifications to any of its public licenses or any other arrangements, understandings, or agreements concerning use of licensed material. For the avoidance of doubt, this paragraph does not form part of the public licenses.
Creative Commons may be contacted at creativecommons.org.