welltypedwitch

The way we're thinking about breaking changes is really silly

A major problem plaguing modern compilers is that they have no concept of time. When run on a codebase, a compiler treats it as if it had always been in the exact state it is in at that moment and updating a dependency literally just updates the code that is downloaded to your hard drive without modifying its call sites in any way. Any errors that occur as a result are treated as fundamental to your program and not as temporary annoyances caused by the update.

In practice, what this means is that we essentially don't directly allow changing a function's type ever. The only reason non-breaking changes exist at all is because sometimes the same syntax that used to work with a previous instance of some function happens to also work with the new one. For example, if a function previously took an Int, modifying it to take an Int | null is not a breaking change because any value of type Int also implicitly has type Int | null and so any code that called it on an Int, like f(5), still happens to work because 5 can also be seen as a value of type Int | null. If we used, say, Maybe Int, we would break call sites because the syntactic expression f(5) we had before now gives a type error.

Unfortunately, union types break parametricity and have pretty poor nesting behavior1, so it's quite likely that you would prefer a Maybe instead. If you wanted to allow changing values of type Int to Maybe Int without breaking callers, you might imagine adding some sort of implicit coercion that automatically wraps arguments of type Int in a Just when a Maybe Int is expected. This will probably cause problems for type inference and might lead to quite a few headaches because now Nothing can technically mean Nothing, Just Nothing, Just (Just Nothing), etc.

But if we take a step back, we'll notice that this entire approach is really silly! We don't care at all about being able to call f on Ints. All we care about is that our existing code keeps its behavior. But because our compilers are so naive, we have to keep supporting the exact syntax we supported before.

Databases solved this ages ago

So how do we prevent old call sites from breaking without having to keep supporting the exact way they were called into all eternity? Migrations!

In the example above, we don't actually want an automatic coercion from Int to Maybe Int that inserts an implicit Some. All we want is for our call sites of the function whose type we changed to apply a migration from Int to Maybe Int that inserts a Some.

There are a few ways one could approach the specifics here.

Option 1: Automatic type migrations

Using a Haskell-y syntax, migrations could be declared directly on a type like this

data Maybe a
   = Nothing
   | Just a

migration (a --> Maybe a) argument = [e| Just $(argument) |]

A migration declaration like this would declare a (typed!) macro that transforms any argument expression of type a into one of type Maybe a. Now, any time a function changed the type of one of its arguments from Int to Maybe Int, the compiler would automatically apply this macro to every call site, which would be guaranteed to make them compile again (although the burden of verifying that the migration is semantically correct is still on the programmer). Importantly, this can also happen once a dependency is updated.

Option 2: Migration files

Not every API change just changes its arguments in a universally applicable, context-free way like this. For example, let's say we once had a function f :: Int -> String, but then we realized that one can never have enough ints, so we generalized it to f :: Map String Int -> String where f (fromList [("default", x)]) now does what f x did previously. One way we could approach this would be to add a migration file to the release that changed fs type.

-- blah.atria
f :: Map String Int -> String
f = ...

-- migration/1.2.3/blah.atria.migration
migration f x = [e| f (fromList [("default", $x)]) ]

This migration is again a (typed) macro that the compiler can automatically apply to any matching call site whenever it upgrades a dependency from a previous version to 1.2.3.

We don't need to stop at function calls

If we actually care about handling and automatically migrating code affected by changes that are supposed to be non-breaking, we can do so much more! Adding a function to a module is a purely additive, backwards compatible change and yet in languages with unqualified imports it can cause ambiguity errors if another function of the same name was already in scope! With migrations, we can have the compiler detect cases like this and hide the new function from the import list of the affected module.

Similarly, Rust has a pretty controversial problem where adding a new instance of a trait can sometimes break existing code because the compiler uses its knowledge that there is only a single instance of some trait in scope. If the Rust compiler were smarter, it could automatically fix issues like this by inserting type annotations in any code that was previously unambiguous!

It's not even like this is that revolutionary of an idea. With the rise of language servers, it's honestly somewhat surprising that current languages don't support automatic code migrations like this yet.

  1. Imagine you have a function lookup :: key -> Map key value -> value | null. This is a very natural type to write except that it totally breaks if value is instantiated to something of the form type _ | null, because you won't be able to distinguish null as the value inside the map and null as the sentinel value for a failed lookup. Without parametricity, you always need to consider all possible instantiations separately and cannot rule out subtle edge cases like this.