A (very niche) footgun in GHC.Generics
While working on a project and trying to use GHC.Generics
to write a function that generates the set of all data constructors of a function (as one does), I ran into a very interesting issue.
For those unaware, GHC.Generics
is a very lightweight meta-programming mechanism in Haskell. The idea is essentially that types can derive Generic
to implement a type family Rep
that turns a type into a generic representation made up of sums, products and metadata, as well as functions to :: Generic a => Rep a x -> a
and from :: Generic a => a -> Rep a x
.1.
For example, the representation of data Blah = A | B Int
is
ghci> data Blah = A | B Int deriving Generic
ghci> :k! Rep Blah
Rep Blah :: * -> *
= M1
D
(MetaData "Blah" "Ghci3" "interactive" False)
(M1 C (MetaCons "A" PrefixI False) U1
:+: M1
C
(MetaCons "B" PrefixI False)
(M1
S
(MetaSel
Nothing NoSourceUnpackedness NoSourceStrictness DecidedLazy)
(K1 R Int)))
Now, this might look a bit scary but it's actually quite simple. All the M1
constructors are just metadata so if we leave them off, we get
U1 :+: (K1 R Int)
U1
stands for nullary constructors (that look like Unit), i.e. A
in this case and K1
stands for "constants", i.e. regular types2. (:+:)
means that our type is a sum over those two (i.e. it has two constructors).
So what this is really saying is that our type Blah
is isomorphic to Either () Int
, which is true!
Now to implement a generic function that works on any data type, all we need to do is implement a type class on those generic constructors and we will get an implementation for every type that implements Generic
for free!
In our case, to collect the constructor names we really only need to handle three cases:
(:+:)
to collect distinct constructorsC1
(a special case ofM1
) to collect the name per constructor- (overlappable)
M1
to ignore all the other metadata
Now, my implementation looked something like this
constructorNames :: forall a. (ConstructorNamesG (Rep a)) => Seq Text
constructorNames = constructorNamesG @(Rep a)
class ConstructorNamesG f where
constructorNamesG :: Seq Text
instance (KnownSymbol name) => ConstructorNamesG (C1 (MetaCons name _fixity _strictness)) where
constructorNamesG = [fromString (symbolVal (Proxy :: Proxy name))]
instance (ConstructorNamesG f, ConstructorNamesG g) => ConstructorNamesG (f :+: g) where
constructorNamesG = constructorNamesG @f <> constructorNamesG @g
instance {-# OVERLAPPABLE #-} (ConstructorNamesG f) => ConstructorNamesG (M1 _i _c f) where
constructorNamesG = constructorNamesG @f
So where is the footgun?
If you try to run this code and call constructorNames @Blah
, you will see a strange, somewhat cryptic error message
<interactive>:88:1: error: [GHC-39999]
• No instance for ‘ConstructorNamesG (K1 R Int)’
arising from a use of ‘constructorNames’
• In the expression: constructorNames @Blah
In an equation for ‘it’: it = constructorNames @Blah
It's true that we haven't provided an instance for K1
but that shouldn't matter since we stop at C1
and don't recurse any further.
So what is really going on? Let's look at that instance again.
instance (KnownSymbol name) => ConstructorNamesG (C1 (MetaCons name _fixity _strictness)) where
constructorNamesG = [fromString (symbolVal (Proxy :: Proxy name))]
Do you see the issue? No?
Look closely: C1 should really take 2 arguments! It's supposed to add metadata on top of whatever the underlying type representation is, but we... never actually bind that variable? Isn't this supposed to throw a type error?
PolyKinds
Let's look at our class definition again.
class ConstructorNamesG f where
constructorNamesG :: Seq Text
Because we never actually use f
anywhere (we don't actually care about any values only about the representation of the type itself), there is technically nothing saying it needs to be a type! So what happens is that ConstructorNamesG
becomes a type class with a polymorphic kind ConstructorNamesG :: forall {k}. k -> Constraint
. The instance we want declares an instance with k ~ Type -> Type
3 but the one we declared uses k ~ (Type -> Type) -> Type -> Type
and therefore doesn't match when we call it on Blah
!
Now, is this a serious, dangerous footgun? Probably not. But it's definitely an interesting one!
The additional
x
parameter is there so thatGeneric1
can use the same constructors. It usually doesn't matter↩I don't actually know what the
R
is about. According to the documentation it apparently stands for "recursion", but I'm not sure how that applies here.↩It's not
Type
because of that unusedx
parameter from earlier↩