Posted by Jesper on May 11, 2024

One important purpose of a type system (and especially a dependent type system) is to increase the trustworthiness of the objects the types are describing, whether those objects are programs or proofs. They do so by making the specification (or part of it) explicit and statically checking that the specification is implemented according to the rules. But then the question arises: how far can we trust the correctness of the type system itself? And what does it even mean for the type system to be correct, i.e. what is its specification?

With each line added to the implementation of Agda and each new bug discovered, this question becomes more and more relevant. The traditional answer to this question for most proof assistants as well as some programming languages (such as Haskell) is to define a core language that is much smaller and more trustworthy than the surface language implementation, and to translate the surface language into this core through a process called elaboration. However, this has not been the case for Agda, which instead has been defined only by its implementation. This has caused people to rightfully ask questions about what is precisely the type theory that Agda implements and how all of its features fit in to it.

This blog post is about Agda Core, a project to build a a formally specified core language for (a reasonable subset of) Agda together with a reference implementation of a type checker for it. If you have been following my work, you might know that I have been working on this project on and off for the past few years, yet the current version is still very much a work in progress. Hence, in the second part of this post, I will dive into a few examples of some of the more unexpected and interesting obstacles we encountered while trying to turn Agda Core from a dream into reality.

Agda Core: The Dream

Since Agda does not already have a core language, we have the opportunity to design one from scratch! In my previous post on the topic of core languages, I went deeper into the reasons for wanting a core language. Summarizing briefly, the main reasons are:

To reduce the trusted code base
To implement independent checkers
To enable meta-theoretic study
To decouple surface syntax from the internal language
To use as a basis for meta-programming
To use as a basis for exporting programs and proofs

From these six reasons, we can derive the main criteria our core language should satisfy:

It should have a formal specification of its static and dynamic semantics, so we can study it and implement independent checkers.
It should have a small type checker that can easily be verified to follow the specification, so we can use it as the main trusted component of a larger system.
It should support all features of the surface language, either directly or through elaboration, so that we can use it effectively for meta-programming and exporting proofs.

Apart from these three, one final goal I have is to implement everything in Agda itself. Agda is well suited for this kind of project that mixes specification and implementation. Doing everything in Agda also is an excellent stress test for Agda itself, and makes the end result more accessible to Agda users to read. Also, Agda is just a nice language.

Having set all our goals and having picked an implementation language, we can start implementing a very basic version of Agda Core! For now, we will leave out most of Agda’s interesting/crazy features and focus on the very basics: a dependently typed lambda calculus with a universe hierarchy, data types, and pattern matching. The main components we need are the following:

A definition of the syntax for expressions and declarations of data types and functions.
An executable specification of reduction.
A specification of the rules for typing and conversion.
An implementation of type checking and conversion checking that produces evidence that the output is well-typed according to the rules as specified.

The final bit here is really crucial: because we have a formal specification of the typing rules and the type checker outputs evidence of well-typedness, we do not need to worry about bugs in the type checker but only in the typing rules themselves. As the authors of the latest paper on MetaCoq say: we move from a Trusted Code Base (TCB) to a Trusted Theory Base (TTB) paradigm. This is really cool and more people should be following this paradigm. Apart from MetaCoq, the only other place where I’ve seen it is in the CakeML project.

This is a good point to talk about MetaCoq, since the comparison will inevitably come up. While some of the goals of MetaCoq and Agda Core are the same, they are really quite different in scope. Unlike MetaCoq, Agda Core is (currently) not meant to do formal proofs of metatheoretic properties such as confluence, subject reduction, strong normalization, canonicity, or consistency. Rather, the goal is to be ‘merely’ a formal specification of a possible core language together with a correct-by-construction type checker for it. The reason for this is simple: I was the only person working on Agda Core (recently I started getting help from my two PhD students) and I still want to work on other Agda-related projects, too. Meanwhile, MetaCoq is a long-term project with quite a few very smart people working on it. As you’ll see in the second half of this post, even with this somewhat smaller scope there are still plenty of interesting challenges to figure out!

The final thing we need is a way to take our type checker and actually compile and run it so we can use it for example to double-check the output of the main Agda type checker. Luckily (and in contrast to popular belief) it is possible to compile and run Agda programs. The standard way to do this would be to use Agda’s built-in GHC backend. Another option - which is the one I decided on - is to use agda2hs, an alternative backend that produces much more readable Haskell code that is guaranteed to contain no unsafeCoerces. In return, it requires you to write your Agda code in a Haskell-like style, and in particular use erasure annotations to make sure no dependent types end up in the compiled code. Erasure annotations play a similar role to Prop in Rocq: they allow you to be explicit about which parts of your Agda program are needed at runtime, and which parts are ‘just’ for ensuring correctness. In Agda Core, erasure annotations are applied liberally to scoping and typing information.

Agda Core: The Reality

When I started working on Agda Core, I had a plan very similar to the one outlined above (minus the agda2hs part, which did not yet exist). In my folly, I thought I would be able to finish a minimal bare-bones version of the whole pipeline relatively quickly. However, actually implementing Agda Core turned out to be far more challenging than I expected. One big reason for this was that I had just started in Delft, and saw my research time roughly cut in half compared to my PhD and postdoc years. While I might write more about that in another blog post, here I would like to focus on the more technical challenges: those related to the design of agda2hs as well as those caused by the infrastructure we rely on. Let us dive into some details.

Design challenge: Scoping of variables and defined symbols

In Agda’s surface syntax, there are various kinds of symbols that use the same namespace: variables, functions, datatypes, constructors, record fields, and postulates. Internally, variables are represented as de Bruijn indices, while top-level symbols are represented as globally unique names. We could certainly use the same choice for Agda Core, but as we are doing this in Agda there is also the option of defining a well-scoped syntax. In fact there are so many options that a while back I wrote a blog post to get a better understanding for myself.

After writing this blog post, I ended up writing my own library for representing scopes and related concepts based on the primitive notion of proof-relevant separation - see my TYPES ’23 abstract for some details or the source code for many more details. While I am reasonably happy with this library, this is definitely one part that took a lot of time and in hindsight a much simpler solution (e.g. simply representing scopes as lists of names) would have been good enough for a first prototype.

Design challenge: representing case trees

One of Agda’s most important and characteristic features is its support for dependent pattern matching. To add support for it to Agda Core, we need to choose which stage of the elaboration process to adopt:

The direct representation of definitions as a list of clauses is the most direct, but has very complicated typing rules and does not naturally allow for enforcing completeness. So it is not well suited for a core language.
On the other end of the spectrum are the eliminators associated to each datatype. While theoretically elegant and minimal, representing full dependent pattern matching in terms of them takes a lot of work - especially to encode the unification process - which leads to a complex elaboration and large proof terms (see my PhD thesis).
In between these two extremes, case trees are the representation used by Agda’s internal syntax. They strike a nice balance between theoretical elegance and efficiency, and are thus well suited for a core language for Agda.

Rather than creating a separate syntactic class for case trees like Agda does, Agda Core simply embeds each kind of node as a term constructor. In particular, there is a case construct representing a single pattern match. In contrast to case expressions in Rocq or Lean, these case expressions can do unification when the indices of the scrutinee are not fully general. For example, a case can pattern match on a value of type Vec A (suc n) with a single branch for the cons constructor _∷_.

Embedding case expressions into the syntax poses one problem, which is that they do not fit neatly into a bidirectional typing discipline. In particular, we can omit the type annotation of a case match (a.k.a. the “motive”) only if the scrutinee is a variable. However, “being a variable” is a property that is very much not stable under substitution, which would be rather unfortunate. At the moment, this is solved by always requiring a type annotation for each case expression. However, this is a decision I would like to come back to later to investigate further what other options there are.

One of the major flaws in the current implementation of Agda (which I also wrote about in a recent blog post) is that there is no good way to represent explicit sharing in the internal syntax of Agda, through let-bindings or otherwise. I decided to try to avoid this mistake in Agda Core and so added a let constructor to the syntax, even though it is not strictly necessary to represent the current implementation of Agda.

Little did I know that it is not enough to merely add sharing to the syntax, but that it should also be preserved during reduction (okay, maybe I should have predicted this one). Luckily there are good techniques for doing sharing-preserving reduction. In particular, I am fond of Peter Sestoft’s lazy abstract machine for striking a nice balance between simplicity and efficiency. It defines reduction on an abstract machine consisting of an environment of shared terms, a focus term currently being evaluated, and a stack of frames to continue reduction later. This is what I used as the basis for defining reduction for Agda Core.

This allows us to preserve sharing during reduction, but at some point we have to convert the abstract machine back into a regular term (unless you are okay with having these abstract machines in your typing judgement). At first I thought we could simply turn each shared term in the environment into a let-binding. However, often these shared terms are only there as intermediate results of computation and are not used in the final term. Thus my strategy resulted in lots and lots of unused let-bindings.

To make this approach practical we would need a procedure for garbage collection of unused let-bindings (and perhaps inlining of linear let-bindings). Implementing such a procedure in a well-scoped way seems like a rather challenging and time-consuming task which I have not yet attempted (though I’d like to try). So for now, converting from an abstract machine back to an expression instead relies on substitution rather than let-binding, destroying any sharing in the process.

Design challenge: dealing with non-termination

Another challenge that comes up when defining reduction and conversion is how to deal with the possibility of non-terminating terms. For reasons of modularity, we would like to define conversion separately from typing, so we don’t yet know that a term is well-typed when reducing it. Even if we did know it was well-typed, Agda Core currently does not include a termination checker (yet).

So either way, we need some way to deal with the possibility of non-termination. For declaring the conversion rules, this is straightforward: we simply ask for a proof that the reduction to weak-head normal form is terminating for a particular term.

In the implementation of the executable conversion checker we need to actually construct this evidence.

The simplest way to do this is to rely on a “fuel” argument that gives an upper bound on the number of reduction steps that will be tried. This is not very satisfactory because it means conversion checking (and hence type checking) can now fail on well-typed terms if they run out of fuel.
Another possibility would be to use a Delay monad to model potential non-termination, but Agda’s guardedness checker proved to be too restrictive (and its sized types too buggy) for this to work out.
For a while, I tried another solution that tried to compute in advance the set of terms that needed to be normalized for a given typing/conversion problem. However, this resulted essentially in running every typing and conversion check twice so I abandoned it.
In MetaCoq, they solve this instead by only calling reduction on well-typed terms and postulating strong normalization, however Agda Core currently lacks a termination check so this is not an option either.

So for a lack of better options, the type checking monad of Agda Core currently carries a global fuel value that is used for every conversion check.

Design challenge: Typed vs untyped conversion

One major difference between conversion checking in Agda and most other dependently typed languages is that it uses a typed notion of conversion. This means that Agda always keeps track of the type of two terms when comparing them (with the exception of comparing two types, for which Agda does not care about their type). As far as I know, this type information is actually used in three places:

When checking equality at a function type (and neither side of the equality is a neutral term), Agda will extend the context with a fresh variable and apply both sides of the equation to it.
When checking equality at a record type with eta-equality enabled (and neither side of the equality is a neutral term) Agda will compare the terms field-wise. For example, for a pair type it will compare the first components and the second components.
When checking equality at an definitionally irrelevant type (either the domain of an irrelevant function type, or a type in the Prop universe) Agda will regard any terms as equal.

Most of these uses of typed conversion are not really essential and could be solved instead by more syntax-directed rules (as demonstrated by Rocq and Lean). However, the one big exception is the eta unit type ⊤, which is defined as a record type with no fields. Since there are no fields, the conversion rule says that any two terms at this type are equal. This is the reason why this type is not supported in Rocq (I’m not sure about Lean), and it is the cause of quite a bit of complexity in the implementation of Agda.

Since the eta unit type is an important and unique feature of Agda, I would very much like to support it in Agda Core, too. Unfortunately, trying to making the conversion rules type-directed made them significantly more unweildy to write down as each conversion rule essentially becomes a two-sided version of the corresponding typing rule. This became too annoying so we reverted back to an untyped conversion judgement for the moment.

An alternative approach I would like to explore is to locally require a typing derivation whenever conversion uses eta-equality, which would make eta-equality much less invasive to the basic typing and conversion infrastructure. Actually it would be sufficient to require a re-typing derivation (i.e. a derivation that a term has a specific type, assuming that it is already well-typed) which could easiby be computed in the conversion checker. The main snag is that this would require the conversion judgement to maintain a local context for running the re-typing, which might not be possible without adding type annotations to lambda expressions which I was trying to avoid.

Infrastructure challenge: working with erasure annotations

Apart from the more conceptual design challenges I mentioned so far, we also encountered plenty of more technical challenges. Quite a few of these have to do with the @0 erasure annotations, which (if you remember) use to draw the line between the erasable specification of the Agda Core typing rules and the executable implementation of its type checker. Because of Agda issue #5703 the error messages when you forget an @0 are often horrendous: rather than a proper error message about a failed instantiation, you get a long list of mostly irrelevant unsolved constraints.

(As a side note, I tried to solve this issue for a while but got stuck because of an interaction between erasure and eta unit types via the occurs checker. A proper fix would require a type-directed reimplementation of the occurs checking and pruning algorithms used for solving higher-order unification problems. Alas, apart from being really tricky to do, my attempt also causes a performance regression significant enough for me to give up.)

The solution for working with erasure that I found to work is to not use @0 directly but instead define a record type Erased A with a single erased field of type A. We also found other types that are useful when working with erasure such as the type Rezz x that carries a runtime representation of the erased value x and three variants of the Σ type with different erasure annotations.

Infrastructure challenge: agda2hs woes

While I believe the choice to use agda2hs to compile Agda Core rather than Agda’s GHC backend was a sensible one, it is the first project of this size that uses agda2hs and it shows. In the process of making Agda Core compatible with agda2hs, we encountered numerous compilation bugs, missing features, and missing library functionality. The biggest limitation was perhaps the lack of support for module parameters, which we relied on heavily in Agda Core to represent the global scope of definitions and datatypes. A proper fix for all issues eventually required us to switch to a type-directed compilation (see #270 and #304). Do you start to notice a pattern?

So the bad news is that making Agda Core work with agda2hs took a lot of work to improving agda2hs. The good news is that agda2hs is now a much better and more robust tool than it was, so if you are curious I would definitely recommend trying it out!

Infrastructure challenge: awkward representations in Agda’s internal syntax

The final technical challenge I want to mention has to do with Agda’s implementation itself. As I mentioned in the first part of this post, our ultimate goal is to link up Agda Core as a backend to the implementation of Agda so it can be used to double-check the well-formedness of its output. Since Agda is implemented in Haskell and Agda Core can be translated to Haskell with agda2hs, this should be easy! Or well, possible at least.

Since this is currently the place where the most work still needs to be done, I am unsure which further technical challenges will pop up. But one challenge will definitely be to translate Agda’s internal case trees to iterated case statements in Agda Core. Agda’s case trees are optimized for efficient reduction and thus omit all typing information. For some reason they also use de Bruijn levels instead of de Bruijn indices and without even knowing the context size translating between the two is not straightforward.

The main cause of the problem here is that Agda builds a lot of typing information while type checking definitions by pattern matching but this information is not stored anywhere in the case tree. So perhaps the right thing to do is to add an intermediate typed representation of case trees to Agda’s pipeline. Such a representation could even be useful for other tools besides Agda Core, such as Agda2Dk and Agda’s own double-checker CheckInternal (which currently only handles terms but not case trees).

Conclusions

So, whew, now you have an idea of the overall goals of Agda Core as well as some of the challenges that we ran into while working on it. I am personally quite disappointed with how long it took to get to this point and how much work is still left to be done. Still, it has been a fascinating project and despite all of its shortcomings, I find Agda truly a very satisfying language to work with.

So rather than seeing all these difficulties as something negative, I would rather view them as pointing out places where there is still possibility for growth, both in the implementation of Agda and its ecosystem as well as our own knowledge and best practices of how to implement correct-by-construction type checkers. So what are we still waiting for? Let’s go and find out!

As always, comments on this blog post are welcome via Mastodon/Fediverse/ActivityPub at @jesper@agda.club, on the Agda Zulip, or via email to jesper@sikanda.be.

Oh, and the next blog post will be something short and sweet, I promise!

Agda Core: The Dream and the Reality