Posted by Jesper on May 11, 2024
One important purpose of a type system (and especially a dependent type system) is to increase the trustworthiness of the objects the types are describing, whether those objects are programs or proofs. They do so by making the specification (or part of it) explicit and statically checking that the specification is implemented according to the rules. But then the question arises: how far can we trust the correctness of the type system itself? And what does it even mean for the type system to be correct, i.e. what is its specification?
With each line added to the implementation of Agda and each new bug discovered, this question becomes more and more relevant. The traditional answer to this question for most proof assistants as well as some programming languages (such as Haskell) is to define a core language that is much smaller and more trustworthy than the surface language implementation, and to translate the surface language into this core through a process called elaboration. However, this has not been the case for Agda, which instead has been defined only by its implementation. This has caused people to rightfully ask questions about what is precisely the type theory that Agda implements and how all of its features fit in to it.
This blog post is about Agda Core, a project to build a a formally specified core language for (a reasonable subset of) Agda together with a reference implementation of a type checker for it. If you have been following my work, you might know that I have been working on this project on and off for the past few years, yet the current version is still very much a work in progress. Hence, in the second part of this post, I will dive into a few examples of some of the more unexpected and interesting obstacles we encountered while trying to turn Agda Core from a dream into reality.
Agda Core: The Dream
Since Agda does not already have a core language, we have the opportunity to design one from scratch! In my previous post on the topic of core languages, I went deeper into the reasons for wanting a core language. Summarizing briefly, the main reasons are:
- To reduce the trusted code base
- To implement independent checkers
- To enable meta-theoretic study
- To decouple surface syntax from the internal language
- To use as a basis for meta-programming
- To use as a basis for exporting programs and proofs
From these six reasons, we can derive the main criteria our core language should satisfy:
- It should have a formal specification of its static and dynamic semantics, so we can study it and implement independent checkers.
- It should have a small type checker that can easily be verified to follow the specification, so we can use it as the main trusted component of a larger system.
- It should support all features of the surface language, either directly or through elaboration, so that we can use it effectively for meta-programming and exporting proofs.
Apart from these three, one final goal I have is to implement everything in Agda itself. Agda is well suited for this kind of project that mixes specification and implementation. Doing everything in Agda also is an excellent stress test for Agda itself, and makes the end result more accessible to Agda users to read. Also, Agda is just a nice language.
Having set all our goals and having picked an implementation language, we can start implementing a very basic version of Agda Core! For now, we will leave out most of Agda’s interesting/crazy features and focus on the very basics: a dependently typed lambda calculus with a universe hierarchy, data types, and pattern matching. The main components we need are the following:
- A definition of the syntax for expressions and declarations of data types and functions.
- An executable specification of reduction.
- A specification of the rules for typing and conversion.
- An implementation of type checking and conversion checking that produces evidence that the output is well-typed according to the rules as specified.
The final bit here is really crucial: because we have a formal specification of the typing rules and the type checker outputs evidence of well-typedness, we do not need to worry about bugs in the type checker but only in the typing rules themselves. As the authors of the latest paper on MetaCoq say: we move from a Trusted Code Base (TCB) to a Trusted Theory Base (TTB) paradigm. This is really cool and more people should be following this paradigm. Apart from MetaCoq, the only other place where I’ve seen it is in the CakeML project.
This is a good point to talk about MetaCoq, since the comparison will inevitably come up. While some of the goals of MetaCoq and Agda Core are the same, they are really quite different in scope. Unlike MetaCoq, Agda Core is (currently) not meant to do formal proofs of metatheoretic properties such as confluence, subject reduction, strong normalization, canonicity, or consistency. Rather, the goal is to be ‘merely’ a formal specification of a possible core language together with a correct-by-construction type checker for it. The reason for this is simple: I was the only person working on Agda Core (recently I started getting help from my two PhD students) and I still want to work on other Agda-related projects, too. Meanwhile, MetaCoq is a long-term project with quite a few very smart people working on it. As you’ll see in the second half of this post, even with this somewhat smaller scope there are still plenty of interesting challenges to figure out!
The final thing we need is a way to take our type checker
and actually compile and run it
so we can use it for example to double-check the output of the main Agda type checker.
Luckily (and in contrast to popular belief) it is possible to compile and run Agda programs.
The standard way to do this would be to use Agda’s built-in GHC backend.
Another option - which is the one I decided on - is to use agda2hs
,
an alternative backend that produces much more readable Haskell code that is guaranteed to contain no unsafeCoerce
s.
In return, it requires you to write your Agda code in a Haskell-like style,
and in particular use erasure annotations
to make sure no dependent types end up in the compiled code.
Erasure annotations play a similar role to Prop
in Rocq:
they allow you to be explicit about which parts of your Agda program
are needed at runtime, and which parts are ‘just’ for ensuring correctness.
In Agda Core, erasure annotations are applied liberally to scoping and typing information.
Agda Core: The Reality
When I started working on Agda Core,
I had a plan very similar to the one outlined above
(minus the agda2hs
part, which did not yet exist).
In my folly, I thought I would be able to finish a minimal bare-bones version of the whole pipeline relatively quickly.
However, actually implementing Agda Core turned out to be far more challenging than I expected.
One big reason for this was that I had just started in Delft,
and saw my research time roughly cut in half compared to my PhD and postdoc years.
While I might write more about that in another blog post,
here I would like to focus on the more technical challenges:
those related to the design of agda2hs
as well as those caused by the infrastructure we rely on.
Let us dive into some details.
Design challenge: Scoping of variables and defined symbols
In Agda’s surface syntax, there are various kinds of symbols that use the same namespace: variables, functions, datatypes, constructors, record fields, and postulates. Internally, variables are represented as de Bruijn indices, while top-level symbols are represented as globally unique names. We could certainly use the same choice for Agda Core, but as we are doing this in Agda there is also the option of defining a well-scoped syntax. In fact there are so many options that a while back I wrote a blog post to get a better understanding for myself.
After writing this blog post, I ended up writing my own library for representing scopes and related concepts based on the primitive notion of proof-relevant separation - see my TYPES ’23 abstract for some details or the source code for many more details. While I am reasonably happy with this library, this is definitely one part that took a lot of time and in hindsight a much simpler solution (e.g. simply representing scopes as lists of names) would have been good enough for a first prototype.
Design challenge: representing case trees
One of Agda’s most important and characteristic features is its support for dependent pattern matching. To add support for it to Agda Core, we need to choose which stage of the elaboration process to adopt:
The direct representation of definitions as a list of clauses is the most direct, but has very complicated typing rules and does not naturally allow for enforcing completeness. So it is not well suited for a core language.
On the other end of the spectrum are the eliminators associated to each datatype. While theoretically elegant and minimal, representing full dependent pattern matching in terms of them takes a lot of work - especially to encode the unification process - which leads to a complex elaboration and large proof terms (see my PhD thesis).
In between these two extremes, case trees are the representation used by Agda’s internal syntax. They strike a nice balance between theoretical elegance and efficiency, and are thus well suited for a core language for Agda.
Rather than creating a separate syntactic class for case trees like Agda does,
Agda Core simply embeds each kind of node as a term constructor.
In particular, there is a
case
construct representing a single pattern match.
In contrast to case
expressions in Rocq or Lean,
these case
expressions can do unification
when the indices of the scrutinee are not fully general.
For example, a case
can pattern match on a value of type Vec A (suc n)
with a single branch for the cons constructor _∷_
.
Embedding case
expressions into the syntax poses one problem,
which is that they do not fit neatly into a bidirectional typing discipline.
In particular, we can omit the type annotation of a case match
(a.k.a. the “motive”) only if the scrutinee is a variable.
However, “being a variable” is a property that is very much not stable under substitution,
which would be rather unfortunate.
At the moment, this is solved by always requiring a type annotation for each case expression.
However, this is a decision I would like to come back to later
to investigate further what other options there are.
Design challenge: Preserving sharing during reduction
One of the major flaws in the current implementation of Agda
(which I also wrote about in a
recent blog post)
is that there is no good way to represent explicit sharing
in the internal syntax of Agda, through let
-bindings or otherwise.
I decided to try to avoid this mistake in Agda Core
and so added a
let
constructor
to the syntax,
even though it is not strictly necessary to represent the current implementation of Agda.
Little did I know that it is not enough to merely add sharing to the syntax, but that it should also be preserved during reduction (okay, maybe I should have predicted this one). Luckily there are good techniques for doing sharing-preserving reduction. In particular, I am fond of Peter Sestoft’s lazy abstract machine for striking a nice balance between simplicity and efficiency. It defines reduction on an abstract machine consisting of an environment of shared terms, a focus term currently being evaluated, and a stack of frames to continue reduction later. This is what I used as the basis for defining reduction for Agda Core.
This allows us to preserve sharing during reduction,
but at some point we have to convert the abstract machine back into a regular term
(unless you are okay with having these abstract machines in your typing judgement).
At first I thought we could simply turn each shared term in the environment into a let
-binding.
However, often these shared terms are only there as intermediate results of computation
and are not used in the final term.
Thus my strategy resulted in lots and lots of unused let
-bindings.
To make this approach practical we would need
a procedure for garbage collection of unused let
-bindings
(and perhaps inlining of linear let
-bindings).
Implementing such a procedure in a well-scoped way
seems like a rather challenging and time-consuming task
which I have not yet attempted (though I’d like to try).
So for now, converting from an abstract machine back to an expression
instead relies on substitution rather than let
-binding,
destroying any sharing in the process.
Design challenge: dealing with non-termination
Another challenge that comes up when defining reduction and conversion is how to deal with the possibility of non-terminating terms. For reasons of modularity, we would like to define conversion separately from typing, so we don’t yet know that a term is well-typed when reducing it. Even if we did know it was well-typed, Agda Core currently does not include a termination checker (yet).
So either way, we need some way to deal with the possibility of non-termination. For declaring the conversion rules, this is straightforward: we simply ask for a proof that the reduction to weak-head normal form is terminating for a particular term.
In the implementation of the executable conversion checker we need to actually construct this evidence.
The simplest way to do this is to rely on a “fuel” argument that gives an upper bound on the number of reduction steps that will be tried. This is not very satisfactory because it means conversion checking (and hence type checking) can now fail on well-typed terms if they run out of fuel.
Another possibility would be to use a
Delay
monad to model potential non-termination, but Agda’s guardedness checker proved to be too restrictive (and its sized types too buggy) for this to work out.For a while, I tried another solution that tried to compute in advance the set of terms that needed to be normalized for a given typing/conversion problem. However, this resulted essentially in running every typing and conversion check twice so I abandoned it.
In MetaCoq, they solve this instead by only calling reduction on well-typed terms and postulating strong normalization, however Agda Core currently lacks a termination check so this is not an option either.
So for a lack of better options, the type checking monad of Agda Core currently carries a global fuel value that is used for every conversion check.
Design challenge: Typed vs untyped conversion
One major difference between conversion checking in Agda and most other dependently typed languages is that it uses a typed notion of conversion. This means that Agda always keeps track of the type of two terms when comparing them (with the exception of comparing two types, for which Agda does not care about their type). As far as I know, this type information is actually used in three places:
When checking equality at a function type (and neither side of the equality is a neutral term), Agda will extend the context with a fresh variable and apply both sides of the equation to it.
When checking equality at a record type with eta-equality enabled (and neither side of the equality is a neutral term) Agda will compare the terms field-wise. For example, for a pair type it will compare the first components and the second components.
When checking equality at an definitionally irrelevant type (either the domain of an irrelevant function type, or a type in the
Prop
universe) Agda will regard any terms as equal.
Most of these uses of typed conversion are not really essential
and could be solved instead by more syntax-directed rules
(as demonstrated by Rocq and Lean).
However, the one big exception is the eta unit type ⊤
,
which is defined as a record type with no fields.
Since there are no fields,
the conversion rule says that any two terms at this type are equal.
This is the reason why this type is not supported in Rocq (I’m not sure about Lean),
and it is the cause of quite a bit of complexity in the implementation of Agda.
Since the eta unit type is an important and unique feature of Agda, I would very much like to support it in Agda Core, too. Unfortunately, trying to making the conversion rules type-directed made them significantly more unweildy to write down as each conversion rule essentially becomes a two-sided version of the corresponding typing rule. This became too annoying so we reverted back to an untyped conversion judgement for the moment.
An alternative approach I would like to explore is to locally require a typing derivation whenever conversion uses eta-equality, which would make eta-equality much less invasive to the basic typing and conversion infrastructure. Actually it would be sufficient to require a re-typing derivation (i.e. a derivation that a term has a specific type, assuming that it is already well-typed) which could easiby be computed in the conversion checker. The main snag is that this would require the conversion judgement to maintain a local context for running the re-typing, which might not be possible without adding type annotations to lambda expressions which I was trying to avoid.
Infrastructure challenge: working with erasure annotations
Apart from the more conceptual design challenges I mentioned so far,
we also encountered plenty of more technical challenges.
Quite a few of these have to do with the @0
erasure annotations,
which (if you remember) use to draw the line between
the erasable specification of the Agda Core typing rules
and the executable implementation of its type checker.
Because of Agda issue #5703
the error messages when you forget an @0
are often horrendous:
rather than a proper error message about a failed instantiation,
you get a long list of mostly irrelevant unsolved constraints.
(As a side note, I tried to solve this issue for a while but got stuck because of an interaction between erasure and eta unit types via the occurs checker. A proper fix would require a type-directed reimplementation of the occurs checking and pruning algorithms used for solving higher-order unification problems. Alas, apart from being really tricky to do, my attempt also causes a performance regression significant enough for me to give up.)
The solution for working with erasure that I found to work
is to not use @0
directly but instead define a
record type Erased A
with a single erased field of type A
.
We also found other types that are useful when working with erasure
such as the type Rezz x
that carries a runtime representation of the erased value x
and three variants of the Σ
type
with different erasure annotations.
Infrastructure challenge: agda2hs woes
While I believe the choice to use agda2hs
to compile Agda Core
rather than Agda’s GHC backend was a sensible one,
it is the first project of this size that uses agda2hs
and it shows.
In the process of making Agda Core compatible with agda2hs
,
we encountered numerous compilation bugs, missing features,
and missing library functionality.
The biggest limitation was perhaps the lack of support for module parameters,
which we relied on heavily in Agda Core
to represent the global scope of definitions and datatypes.
A proper fix for all issues eventually required us to switch
to a type-directed compilation
(see #270 and #304).
Do you start to notice a pattern?
So the bad news is that making Agda Core work with agda2hs
took a lot of work to improving agda2hs
.
The good news is that agda2hs
is now a much better and more robust tool than it was,
so if you are curious I would definitely recommend trying it out!
Infrastructure challenge: awkward representations in Agda’s internal syntax
The final technical challenge I want to mention has to do with Agda’s implementation itself.
As I mentioned in the first part of this post,
our ultimate goal is to link up Agda Core as a backend to the implementation of Agda
so it can be used to double-check the well-formedness of its output.
Since Agda is implemented in Haskell
and Agda Core can be translated to Haskell with agda2hs
,
this should be easy!
Or well, possible at least.
Since this is currently the place where the most work still needs to be done,
I am unsure which further technical challenges will pop up.
But one challenge will definitely be to translate Agda’s internal case trees
to iterated case
statements in Agda Core.
Agda’s case trees are optimized for efficient reduction
and thus omit all typing information.
For some reason they also use de Bruijn levels instead of de Bruijn indices
and without even knowing the context size translating between the two is not straightforward.
The main cause of the problem here is that
Agda builds a lot of typing information
while type checking definitions by pattern matching
but this information is not stored anywhere in the case tree.
So perhaps the right thing to do is
to add an intermediate typed representation of case trees to Agda’s pipeline.
Such a representation could even be useful for other tools besides Agda Core,
such as Agda2Dk
and Agda’s own double-checker CheckInternal
(which currently only handles terms but not case trees).
Conclusions
So, whew, now you have an idea of the overall goals of Agda Core as well as some of the challenges that we ran into while working on it. I am personally quite disappointed with how long it took to get to this point and how much work is still left to be done. Still, it has been a fascinating project and despite all of its shortcomings, I find Agda truly a very satisfying language to work with.
So rather than seeing all these difficulties as something negative, I would rather view them as pointing out places where there is still possibility for growth, both in the implementation of Agda and its ecosystem as well as our own knowledge and best practices of how to implement correct-by-construction type checkers. So what are we still waiting for? Let’s go and find out!
As always, comments on this blog post are welcome via Mastodon/Fediverse/ActivityPub at @jesper@agda.club, on the Agda Zulip, or via email to jesper@sikanda.be.
Oh, and the next blog post will be something short and sweet, I promise!