Posted by Jesper on July 30, 2024
As I wrote in a previous blog post,
agda2hs
is a tool for producing verified and readable Haskell code by
extracting it from a (lightly annotated) Agda program. In this blog post, I will
dig a bit deeper into how agda2hs
decides which parts of an Agda program to
compile to Haskell, and which parts to drop. In the second part, I also want to
dream a bit about how ornaments (or something like it) could help to reason in a
principled way about some of the features of agda2hs
that currently feel
rather ad-hoc.
Erasure annotations in agda2hs
agda2hs
relies heavily on erasure annotations (a.k.a. the 0
quantity from
Quantitative Type Theory) to determine which parts of an Agda program should be
preserved during compilation, and which parts are just “for show”, i.e.
enforcing some kind of invariant that is not enforced statically on the Haskell
side.
As an example, we can define a type of well-scoped syntax as follows:
data Term (@0 xs : List Name) : Set where TVar : (@0 x : Name) → Var x xs → Term xs TApp : Term xs → Term xs → Term xs TLam : (@0 x : Name) → Term (x ∷ xs) → Term xs
This translates to the following Haskell code:
data Term = TVar In | TApp Term Term | TLam Term
The big advantage we get out of these (erased) indices is that we can define functions that are guaranteed to respect the scope of terms by Agda’s type system, which are then translated to Haskell functions operating on this bare type of terms (which would be tricky to get right by hand). For example, a substitution function might have type
substTop : ∀ {@0 x xs} → Term xs → Term (x ∷ xs) → Term xs substTop = ⋯
making it immediately obvious which term is substituted for which variable.
Using Agda’s erasure annotations for indicating which parts of the Agda program
agda2hs
should remove during its translation is nice for several reasons:
Agda ensures that we never accidentally use or pattern match on an argument that doesn’t appear in the generated Haskell code.
We can be very precise in which parts of the Agda code should (not) appear in the generated Haskell code, including for higher-order functions. For example, in the definition of
if_then_else_
ofagda2hs
, the branches are provided erased proofs that the boolean condition is respectivelyTrue
orFalse
.Since erasure annotations are part of the Agda type system, the information about which arguments of a type or function are erased is propagated automatically throughout the program.
Still, the fit is not perfect: agda2hs
implements additional checks to ensure
that not too many arguments are marked as erased. For example, the following
datatype is perfectly valid Agda code
data Box (@0 a : Set) : Set where MkBox : a → Box a
but would translate to the following Haskell code with an out-of-scope type variable:
data Box = MkBox a
To avoid this problem, agda2hs
requires that every type variable that appears
in the generated Haskell code is not marked as erased.
Another subtler problem pops up if we allow erased type arguments to functions:
cast : {a b : Set} → @0 a ≡ b → a → b cast refl x = x
which translates to the ill-typed Haskell
: a → b
cast = x cast x
The way agda2hs
prevents this second example is by requiring additionally that
no non-erased pattern variables are forced by pattern matching. In the
example, the pattern match on refl
forces the (implicit) pattern variable b
to be equal to a
, so the two rules together rule out this definition of
cast
.
These restrictions seem to do their job, but are hardly satisfactory. The root
of the problem seems to be a mismatch between the intented use of erasure
annotations and how agda2hs
actually uses them: erasure annotations assume
that the code is compiled to an untyped language and hence that all types can
be safely erased, but agda2hs
is compiling Agda code to a typed language.
In other words, agda2hs
uses erasure not to erase types, but merely to
erase dependencies.
An interesting question would thus be whether we can have an alternative version
of erasure annotations that do not assume by default that all types can be
erased. A first step would be to change the typing rule for top-level type
signatures so they cannot refer to erased things by default. But perhaps further
changes are needed too. Alternatively, it would be interesting to investigate
whether a different modality such as
shape irrelevance would be a
better fit for the purposes of agda2hs
.
Erasing more than just arguments
Erasure annotations in Agda allow us to annotate arguments to functions,
top-level definitions, parameters and indices to data types, arguments to
constructors, and even specific constructors themselves. However, there are some
things that we would like to annotate in agda2hs
that go beyond the current
capabilities of Agda. Let’s take a look at two examples.
Sometimes when working in a dependently typed language, we like to pair a value
of some type a
together with a proof of a property p
of this value. Since we
don’t want proofs to show up in our Haskell code, we mark it as erased:
record ∃ (a : Set) (@0 p : a → Set) : Set where constructor _⟨_⟩ field value : a @0 proof : p value open ∃ public
For example, we could define the Var
type from before as a natural number
together with a proof that looking up the element of that position results in
the given name:
--Var : Name → List Name → Set Var x xs = ∃ Int λ n → xs !? n ≡ Just x
This is also known as a refinement type from systems such as Liquid Haskell, though Agda does not provide the same level of automation.
In the generated Haskell code, we would like values of this type to show up not
as having type ∃ a
, but simply as having type a
. To make this translation
type-preserving, constructor applications u ⟨ p ⟩
should be translated to u
,
and value x
should be translated to x
. We can accomplish this with a pragma
marking ∃
as an unboxed record type (which is probably a bad name,
suggestions for better names are welcome):
{-# COMPILE AGDA2HS ∃ unboxed #-}
The presence of unboxed types introduces an interesting prospect: we can have (potentially many) different Agda types that are all translated to the same Haskell type. But this also means that we can have functions that only manipulate the “proof part” of their input but leave the value intact. For example:
mapProof : ∀ {@0 p q : a → Set} → @0 (∀ {x} → p x → q x) → ∃ a p → ∃ a q mapProof f (x ⟨ p ⟩) = x ⟨ f p ⟩
When compiled normally, this produces the rather boring definition
mapProof x = x
. What’s worse, sometimes functions like this need to do some
pattern matching purely to put proofs in the right places. For example,
we can attach a proof to a value nested in a Maybe
as follows:
refineMaybe : ∀ {@0 p : a → Set} → (mx : Maybe a) → @0 (∀ {x} → mx ≡ Just x → p x) → Maybe (∃ a p) refineMaybe Nothing f = Nothing refineMaybe (Just x) f = Just (x ⟨ f refl ⟩)
This compiles to the rather unfortunate Haskell
refineMaybe :: Maybe a → Maybe a
Nothing = Nothing
refineMaybe Just x) = Just x refineMaybe (
To avoid having these identity functions showing up in the Haskell code,
agda2hs
allows you to mark a function as being “transparent”:
{-# COMPILE AGDA2HS mapProof transparent #-} {-# COMPILE AGDA2HS refineMaybe transparent #-}
When checking a transparent function, agda2hs
checks that it takes exactly one
(non-erased) argument and that each clause returns this argument unchanged
(after erasure). It then does not generate any Haskell code for this function,
but instead replaces each application of this transparent function by its
argument (and each lone appearance by id
).
In my biased experience, unboxed record types and transparent functions are both
very useful features of agda2hs
, and we use them extensively in the
implementation of Agda Core.
However, they also feel like ad-hoc fixes to a deeper problem. A more
fundamental solution would be to introduce a notion of
ornaments to the language, and
provide a mechanism to identify functions that only modify the “data-logic” of
an argument but preserve the “data-structure”.
I am obviously speculating here, but I believe - at least for the purposes of
agda2hs
- ornaments would be at their most useful if they could also be
integrated at the type level. Concretely, the type system should be aware of the
structure of each type, and for each function whether it modifies the
structure or just the logical parts (i.e. whether or not it is the identity
function after erasing all proofs). How such a type system would work
concretely, I leave as an exercise to the reader (or perhaps as a topic for a
future blog post).
As always, feel free to send comments on the Agda Zulip, to @jesper@agda.club via any Fediverse portal, or email me at jesper@sikanda.be.