welltypedwitch

A (very niche) footgun in GHC.Generics

While working on a project and trying to use GHC.Generics to write a function that generates the set of all data constructors of a function (as one does), I ran into a very interesting issue.

For those unaware, GHC.Generics is a very lightweight meta-programming mechanism in Haskell. The idea is essentially that types can derive Generic to implement a type family Rep that turns a type into a generic representation made up of sums, products and metadata, as well as functions to :: Generic a => Rep a x -> a and from :: Generic a => a -> Rep a x.1.

For example, the representation of data Blah = A | B Int is

ghci> data Blah = A | B Int deriving Generic
ghci> :k! Rep Blah
Rep Blah :: * -> *
= M1
    D
    (MetaData "Blah" "Ghci3" "interactive" False)
    (M1 C (MetaCons "A" PrefixI False) U1
     :+: M1
           C
           (MetaCons "B" PrefixI False)
           (M1
              S
              (MetaSel
                 Nothing NoSourceUnpackedness NoSourceStrictness DecidedLazy)
              (K1 R Int)))

Now, this might look a bit scary but it's actually quite simple. All the M1 constructors are just metadata so if we leave them off, we get

U1 :+: (K1 R Int)

U1 stands for nullary constructors (that look like Unit), i.e. A in this case and K1 stands for "constants", i.e. regular types2. (:+:) means that our type is a sum over those two (i.e. it has two constructors).

So what this is really saying is that our type Blah is isomorphic to Either () Int, which is true!

Now to implement a generic function that works on any data type, all we need to do is implement a type class on those generic constructors and we will get an implementation for every type that implements Generic for free!

In our case, to collect the constructor names we really only need to handle three cases:

Now, my implementation looked something like this

constructorNames :: forall a. (ConstructorNamesG (Rep a)) => Seq Text
constructorNames = constructorNamesG @(Rep a)

class ConstructorNamesG f where
    constructorNamesG :: Seq Text

instance (KnownSymbol name) => ConstructorNamesG (C1 (MetaCons name _fixity _strictness)) where
    constructorNamesG = [fromString (symbolVal (Proxy :: Proxy name))]

instance (ConstructorNamesG f, ConstructorNamesG g) => ConstructorNamesG (f :+: g) where
    constructorNamesG = constructorNamesG @f <> constructorNamesG @g

instance {-# OVERLAPPABLE #-} (ConstructorNamesG f) => ConstructorNamesG (M1 _i _c f) where
    constructorNamesG = constructorNamesG @f

So where is the footgun?

If you try to run this code and call constructorNames @Blah, you will see a strange, somewhat cryptic error message

<interactive>:88:1: error: [GHC-39999]
    • No instance for ‘ConstructorNamesG (K1 R Int)’
        arising from a use of ‘constructorNames’
    • In the expression: constructorNames @Blah
      In an equation for ‘it’: it = constructorNames @Blah

It's true that we haven't provided an instance for K1 but that shouldn't matter since we stop at C1 and don't recurse any further.

So what is really going on? Let's look at that instance again.

instance (KnownSymbol name) => ConstructorNamesG (C1 (MetaCons name _fixity _strictness)) where
    constructorNamesG = [fromString (symbolVal (Proxy :: Proxy name))]

Do you see the issue? No?

Look closely: C1 should really take 2 arguments! It's supposed to add metadata on top of whatever the underlying type representation is, but we... never actually bind that variable? Isn't this supposed to throw a type error?

PolyKinds

Let's look at our class definition again.

class ConstructorNamesG f where
    constructorNamesG :: Seq Text

Because we never actually use f anywhere (we don't actually care about any values only about the representation of the type itself), there is technically nothing saying it needs to be a type! So what happens is that ConstructorNamesG becomes a type class with a polymorphic kind ConstructorNamesG :: forall {k}. k -> Constraint. The instance we want declares an instance with k ~ Type -> Type3 but the one we declared uses k ~ (Type -> Type) -> Type -> Type and therefore doesn't match when we call it on Blah!

Now, is this a serious, dangerous footgun? Probably not. But it's definitely an interesting one!

  1. The additional x parameter is there so that Generic1 can use the same constructors. It usually doesn't matter

  2. I don't actually know what the R is about. According to the documentation it apparently stands for "recursion", but I'm not sure how that applies here.

  3. It's not Type because of that unused x parameter from earlier