Formalize all the things (in Agda)

Posted by Jesper on October 4, 2019

I quite often hear from people that they are interested in learning Agda, and that’s great! However, I often feel there are not enough examples out there of how to use the different features of Agda to implement programs, enforce invariants in their types, and prove their properties. So in this blog post I hope to address exactly that problem.

The main goal of this example is to show off the dual purpose of Agda as a strongly typed programming language and as a proof assistant for formalizing mathematics. Since both purposes are part of the same language, each of them reinforces the other: we can prove properties of our programs, and we can write programs that produce proofs. In this example, we will see this power in action by (1) defining the mathematical structure of partial orders, (2) implementing a generic binary search trees and insertion of new elements into them, and (3) proving that this function is implemented correctly.

This blog post was based on a talk I gave at the Initial Types Club at Chalmers.

Preliminaries

For this post we keep the dependencies to a minimum so we don’t rely on the standard library. Instead, we import some of the built-in modules of Agda directly.

open import Agda.Primitive
open import Agda.Builtin.Bool
open import Agda.Builtin.Nat
open import Agda.Builtin.Equality

A variable declaration (since Agda 2.6.0) allows us to use variables without binding them explicitly. This means they are implicitly universally quantified in the types where they occur.

variable
  A B C : Set
  x y z : A
  k l m n : Nat

In the code that follows, we will use instance arguments to automatically construct some proofs. When working with instance arguments, the it function below is often very useful. All it does is ask Agda to please fill in the current argument by using a definition that is marked as an instance. (More about instance arguments later).

it : {{x : A}}  A
it {{x}} = x

(Unary) natural numbers are defined as the datatype Nat with two constructors zero : Nat and suc : Nat → Nat. We use the ones imported from Agda.Builtin.Nat because they allow us to write literal numerals as well as constructor forms.

_ : Nat
_ = zero + 7 * (suc 3 - 1)

(Definitions that are named _ are typechecked by Agda but cannot be used later on. This is often used to define examples or test cases).

We can define parametrized datatypes and functions by pattern matching on them. For example, here is the equivalent of Haskell’s Maybe type.

data Maybe (A : Set) : Set where
  just    : A  Maybe A
  nothing :     Maybe A

mapMaybe : (A  B)  (Maybe A  Maybe B)
mapMaybe f (just x) = just (f x)
mapMaybe f nothing = nothing

Note how A and B are implicitly quantified in the type of mapMaybe!

Quick recap on the Curry-Howard correspondence

The Curry-Howard correspondence is the core idea that allows us to use Agda as both a programming language and a proof assistant. Under the Curry-Howard correspondence, we can interpret logical propositions (A ∧ B, ¬A, A ⇒ B, …) as the types of all their possible proofs.

A proof of ‘A and B’ is a pair (x , y) of a proof x : A and an proof y : B.

record _×_ (A B : Set) : Set where
  constructor _,_
  field
    fst : A
    snd : B
open _×_

A proof of ‘A or B’ is either inl x for a proof x : A or inr y for a proof y : B.

data _⊎_ (A B : Set) : Set where
  inl : A  A  B
  inr : B  A  B

mapInl : (A  B)  A  C  B  C
mapInl f (inl x) = inl (f x)
mapInl f (inr y) = inr y

mapInr : (B  C)  A  B  A  C
mapInr f (inl x) = inl x
mapInr f (inr y) = inr (f y)

A proof of ‘A implies B’ is a transformation from proofs x : A to proofs of B, i.e. a function of type A → B.

‘true’ has exactly one proof tt : ⊤. We could define this as a datatype with a single constructor tt, but here we define it as a record type instead. This has the advantage that Agda will use eta-equality for elements of , i.e. x = y for any two variables x and y of type .

record  : Set where
  constructor tt     -- no fields

‘false’ has no proofs.

data  : Set where   -- no constructor

‘not A’ can be defined as ‘A implies false’.

¬_ : Set  Set
¬ A = A  

Examples

-- “If A then B implies A”
ex₁ : A  (B  A)
ex₁ = λ z _  z

-- “If A and true then A or false”
ex₂ : (A × )  (A  )
ex₂ = λ z  inl (fst z)

-- “If A implies B and B implies C then A implies C”
ex₃ : (A  B)  (B  C)  (A  C)
ex₃ = λ f g z  g (f z)

-- “It is not the case that not (either A or not A)”
ex₄ : ¬ (¬ (A  (¬ A)))
ex₄ = λ f  f (inr  x  f (inl x)))

Since Agda’s logic is constructive, it is not possible to prove the direct version of ex₄ (A ⊎ (¬ A)).

Equality

To state many properties of our programs, we need the notion of equality. In Agda, equality is defined as the datatype _≡_ with one constructor refl : x ≡ x (imported from Agda.Builtin.Equality).

_ : x  x
_ = refl

sym : x  y  y  x
sym refl = refl

trans : x  y  y  z  x  z
trans refl refl = refl

cong : (f : A  B)  x  y  f x  f y
cong f refl = refl

subst : (P : A  Set)  x  y  P x  P y
subst P refl p = p

Ordering natural numbers

The standard ordering on natural numbers can be defined as an indexed datatype with two indices of type Nat:

module Nat-≤ where

  data _≤_ : Nat  Nat  Set where
    ≤-zero :         zero   n
    ≤-suc  : m  n  suc m  suc n

  ≤-refl : n  n
  ≤-refl {n = zero}  = ≤-zero
  ≤-refl {n = suc k} = ≤-suc ≤-refl

  ≤-trans : k  l  l  m  k  m
  ≤-trans ≤-zero      l≤m         = ≤-zero
  ≤-trans (≤-suc k≤l) (≤-suc l≤m) =
    ≤-suc (≤-trans k≤l l≤m)

  ≤-antisym : m  n  n  m  m  n
  ≤-antisym ≤-zero      ≤-zero      = refl
  ≤-antisym (≤-suc m≤n) (≤-suc n≤m) =
    cong suc (≤-antisym m≤n n≤m)

Now we can prove statements like 3 ≤ 5 as follows:

  _ : 3  5
  _ = ≤-suc (≤-suc (≤-suc ≤-zero))

However, to prove an inequality like 9000 ≤ 9001 we would have to write 9000 ≤-suc constructors, which would get very tedious. Instead, we can use Agda’s instance arguments to automatically construct these kind of proofs.

To do this, we define an ‘instance’ that automatically constructs a proof of m ≤ n when m and n are natural number literals. A definition inst : A that is marked as an ‘instance’ will be used to automatically construct the implicit argument to functions with a type of the form {{x : A}} → B.

For efficiency reasons, we don’t mark the constructors ≤-zero and ≤-suc as instances directly. Instead, we make use of the efficient boolean comparison _<_ (imported from Agda.Builtin.Nat) to construct the instance when the precondition So (m < suc n) is satisfied.

  So : Bool  Set
  So false = 
  So true  = 

  instance
    ≤-dec : {p : So (m < suc n)}  m  n
    ≤-dec {m = zero} {n = n} = ≤-zero
    ≤-dec {m = suc m} {n = suc n} {p = p} =
      ≤-suc (≤-dec {p = p})

  _ : 9000  9001
  _ = it

Partial orders

We’d like to talk not just about orderings on concrete types like Nat, but also about the general concept of a ‘partial order’. For this purpose, we define a typeclass Ord that contains the type _≤_ and proofs of its properties.

record Ord (A : Set) : Set₁ where
  field
    _≤_       : A  A  Set
    ≤-refl    : x  x
    ≤-trans   : x  y  y  z  x  z
    ≤-antisym : x  y  y  x  x  y

  _≥_ : A  A  Set
  x  y = y  x

Unlike in Haskell, typeclasses are not a primitive concept in Agda. Instead, we use the special syntax open Ord {{...}} to bring the fields of the record in scope as instance functions with a type of the form {A : Set}{{r : Ord A}} → .... Instance search will then kick in to find the right implementation of the typeclass automatically.

open Ord {{...}}

We now define some concrete instances of the typeclass using copattern matching

instance
  Ord-Nat : Ord Nat
  _≤_       {{Ord-Nat}} = Nat-≤._≤_
  ≤-refl    {{Ord-Nat}} = Nat-≤.≤-refl
  ≤-trans   {{Ord-Nat}} = Nat-≤.≤-trans
  ≤-antisym {{Ord-Nat}} = Nat-≤.≤-antisym

instance
  Ord-⊤ : Ord 
  _≤_       {{Ord-⊤}} = λ _ _  
  ≤-refl    {{Ord-⊤}} = tt
  ≤-trans   {{Ord-⊤}} = λ _ _  tt
  ≤-antisym {{Ord-⊤}} = λ _ _  refl

module Example (A : Set) {{A-≤ : Ord A}} where

  example : {x y z : A}  x  y  y  z  z  x  x  y
  example x≤y y≤z z≤x = ≤-antisym {A = A} x≤y (≤-trans {A = A} y≤z z≤x)

For working with binary search trees, we need to be able to decide for any two elements which is the bigger one, i.e. we need a total, decidable order.

data Tri {{_ : Ord A}} : A  A  Set where
  less    : {{x≤y : x  y}}  Tri x y
  equal   : {{x≡y : x  y}}  Tri x y
  greater : {{x≥y : x  y}}  Tri x y

record TDO (A : Set) : Set₁ where
  field
    {{Ord-A}} : Ord A               -- superclass Ord
    tri       : (x y : A)  Tri x y

open TDO {{...}} public

triNat : (x y : Nat)  Tri x y
triNat zero zero = equal
triNat zero (suc y) = less
triNat (suc x) zero = greater
triNat (suc x) (suc y) with triNat x y
... | less    {{x≤y}} = less    {{x≤y = Nat-≤.≤-suc x≤y}}
... | equal   {{x≡y}} = equal   {{x≡y = cong suc x≡y}}
... | greater {{x≥y}} = greater {{x≥y = Nat-≤.≤-suc x≥y}}

instance
  TDO-Nat : TDO Nat
  Ord-A {{TDO-Nat}} = Ord-Nat
  tri   {{TDO-Nat}} = triNat

Binary search trees

In a dependently typed language, we can encode invariants of our data structures by using indexed datatypes. In this example, we will implement binary search trees by a lower and upper bound to the elements they contain (see How to Keep Your Neighbours in Order by Conor McBride).

Since the lower bound may be -∞ and the upper bound may be +∞, we start by providing a generic way to extend a partially ordered set with those two elements.

data [_]∞ (A : Set) : Set where
  -∞  :     [ A ]∞
  [_] : A  [ A ]∞
  +∞  :     [ A ]∞

variable
  lower upper : [ A ]∞

module Ord-[]∞ {A : Set} {{ A-≤ : Ord A}} where

  data _≤∞_ : [ A ]∞  [ A ]∞  Set where
    -∞-≤ :          -∞   ≤∞   y
    []-≤ : x  y  [ x ] ≤∞ [ y ]
    +∞-≤ :           x   ≤∞  +∞

  []∞-refl : x ≤∞ x
  []∞-refl { -∞}   = -∞-≤
  []∞-refl {[ x ]} = []-≤ (≤-refl {A = A})
  []∞-refl { +∞}   = +∞-≤

  []∞-trans : x ≤∞ y  y ≤∞ z  x ≤∞ z
  []∞-trans -∞-≤       _          = -∞-≤
  []∞-trans ([]-≤ x≤y) ([]-≤ y≤z) = []-≤ (≤-trans {A = A} x≤y y≤z)
  []∞-trans _          +∞-≤       = +∞-≤

  []∞-antisym : x ≤∞ y  y ≤∞ x  x  y
  []∞-antisym -∞-≤       -∞-≤       = refl
  []∞-antisym ([]-≤ x≤y) ([]-≤ y≤x) = cong [_] (≤-antisym x≤y y≤x)
  []∞-antisym +∞-≤       +∞-≤       = refl

  instance
    Ord-[]∞ : {{_ : Ord A}}  Ord [ A ]∞
    _≤_       {{Ord-[]∞}} = _≤∞_
    ≤-refl    {{Ord-[]∞}} = []∞-refl
    ≤-trans   {{Ord-[]∞}} = []∞-trans
    ≤-antisym {{Ord-[]∞}} = []∞-antisym

open Ord-[]∞ public

We define some instances to automatically construct inequality proofs.

module _ {{_ : Ord A}} where

  instance
    -∞-≤-I : {y : [ A ]∞}  -∞  y
    -∞-≤-I = -∞-≤

    +∞-≤-I : {x : [ A ]∞}  x  +∞
    +∞-≤-I = +∞-≤

    []-≤-I : {x y : A} {{x≤y : x  y}}  [ x ]  [ y ]
    []-≤-I {{x≤y = x≤y}} = []-≤ x≤y

Now we are (finally) ready to define binary search trees.

data BST (A : Set) {{_ : Ord A}}
         (lower upper : [ A ]∞)  : Set where

  leaf : {{l≤u : lower  upper}}
        BST A lower upper

  node : (x : A)
        BST A lower [ x ]
        BST A [ x ] upper
        BST A lower upper

_ : BST Nat -∞ +∞
_ = node 42
      (node 6    leaf leaf)
      (node 9000 leaf leaf)

Note how instances help by automatically filling in the proofs that the bounds are satisfied! Somewhat more explicitly, the tree looks as follows:

_ : BST Nat -∞ +∞
_ = node 42
      (node 6    (leaf {{l≤u = -∞≤6}})    (leaf {{l≤u = 6≤42}}))
      (node 9000 (leaf {{l≤u = 42≤9000}}) (leaf {{l≤u = 9000≤+∞}}))

  where
    -∞≤6 : -∞  [ 6 ]
    -∞≤6 = it

    6≤42 : [ 6 ]  [ 42 ]
    6≤42 = it

    42≤9000 : [ 42 ]  [ 9000 ]
    42≤9000 = it

    9000≤+∞ : [ 9000 ]  +∞
    9000≤+∞ = it

Next up: defining a lookup function. The result of this function is not just a boolean true/false, but a proof that the element is indeed in the tree. A proof that x is in the tree t is either a proof that it is here, a proof that it is in the left subtree, or a proof that it is in the right subtree.

module Lookup {{_ : TDO A}} where

  data _∈_ {lower} {upper} (x : A) :
           (t : BST A lower upper)  Set where
    here  :  {t₁ t₂}  x  y   x  node y t₁ t₂
    left  :  {t₁ t₂}  x  t₁  x  node y t₁ t₂
    right :  {t₁ t₂}  x  t₂  x  node y t₁ t₂

The definition of lookup makes use of with-abstraction to inspect the result of the tri function on x and y.

  lookup :  {lower} {upper}
          (x : A) (t : BST A lower upper)  Maybe (x  t)
  lookup x leaf = nothing
  lookup x (node y t₁ t₂) with tri x y
  ... | less    = mapMaybe left (lookup x t₁)
  ... | equal   = just (here it)
  ... | greater = mapMaybe right (lookup x t₂)

Similarly, we can define an insertion function. Here, we need to enforce the precondition that the element we want to insert is between the bounds (alternatively, we could have updated the bounds in the return type to ensure they include the inserted element).

module Insert {{_ : TDO A}} where

  insert : (x : A) (t : BST A lower upper)
          {{l≤x : lower  [ x ]}} {{x≤u : [ x ]  upper}}
          BST A lower upper
  insert x leaf = node x leaf leaf
  insert x (node y t₁ t₂) with tri x y
  ... | less    = node y (insert x t₁) t₂
  ... | equal   = node y t₁ t₂
  ... | greater = node y t₁ (insert x t₂)

To prove correctness of insertion, we have to show that y ∈ insert x t is equivalent to x ≡ y ⊎ y ∈ t. The proofs insert-sound₂ and insert-complete are a bit long because there are two elements x and y that can both independently be here, in the left subtree, or in the right subtree, so we have to distinguish 9 distinct cases. Let me know if you manage to find a shorter proof!

  open Lookup

  insert-sound :
    (x : A) (t : BST A lower upper)
     {{_ : lower  [ x ]}} {{_ : [ x ]  upper}}
     (x  y)  (y  t)  y  insert x t
  insert-sound x t (inl refl) = insert-sound₁ x t

    where

      insert-sound₁ :
        (x : A) (t : BST A lower upper)
         {{_ : lower  [ x ]}} {{_ : [ x ]  upper}}
         x  insert x t
      insert-sound₁ x leaf = here refl
      insert-sound₁ x (node y t₁ t₂) with tri x y
      insert-sound₁ x (node y t₁ t₂) | less    = left (insert-sound₁ x t₁)
      insert-sound₁ x (node y t₁ t₂) | equal   = here it
      insert-sound₁ x (node y t₁ t₂) | greater = right (insert-sound₁ x t₂)

  insert-sound x t (inr y∈t) = insert-sound₂ x t y∈t

    where

      insert-sound₂ :
        (x : A) (t : BST A lower upper)
         {{_ : lower  [ x ]}} {{_ : [ x ]  upper}}
         y  t  y  insert x t
      insert-sound₂ x (node y t₁ t₂) (here  refl) with tri x y
      ... | less    = here refl
      ... | equal   = here refl
      ... | greater = here refl
      insert-sound₂ x (node y t₁ t₂) (left  y∈t₁) with tri x y
      ... | less    = left (insert-sound₂ x t₁ y∈t₁)
      ... | equal   = left y∈t₁
      ... | greater = left y∈t₁
      insert-sound₂ x (node y t₁ t₂) (right y∈t₂) with tri x y
      ... | less    = right y∈t₂
      ... | equal   = right y∈t₂
      ... | greater = right (insert-sound₂ x t₂ y∈t₂)

  insert-complete :
    (x : A) (t : BST A lower upper)
     {{_ : lower  [ x ]}} {{_ : [ x ]  upper}}
     y  insert x t  (x  y)  (y  t)
  insert-complete x leaf           (here refl) = inl refl
  insert-complete x (node y t₁ t₂) y∈t'       with tri x y
  insert-complete x (node y t₁ t₂) (here refl)   | less    = inr (here refl)
  insert-complete x (node y t₁ t₂) (here refl)   | equal   = inl it
  insert-complete x (node y t₁ t₂) (here refl)   | greater = inr (here refl)
  insert-complete x (node y t₁ t₂) (left y∈t₁')  | less    = mapInr left (insert-complete x t₁ y∈t₁')
  insert-complete x (node y t₁ t₂) (left  y∈t₁)  | equal   = inr (left y∈t₁)
  insert-complete x (node y t₁ t₂) (left  y∈t₁)  | greater = inr (left y∈t₁)
  insert-complete x (node y t₁ t₂) (right y∈t₂)  | less    = inr (right y∈t₂)
  insert-complete x (node y t₁ t₂) (right y∈t₂)  | equal   = inr (right y∈t₂)
  insert-complete x (node y t₁ t₂) (right y∈t₂') | greater = mapInr right (insert-complete x t₂ y∈t₂')

Of course, there are many more functions on search trees we could want to implement and prove correct: deletion, merging, flattening … Likewise, there are other invariants we might want to enforce in the type, such as being well-balanced. I strongly recommend to read Conor McBride’s paper on the topic, or try it yourself!