How Do Formal Specification Languages Like Idris, Coq, and Agda Work?

Mar 19, 2026

The Problem with Traditional Specifications

I wrote a specification document. Then I wrote code. The two drifted apart. Tests caught some bugs, but production still broke at 2 AM.

This is the fundamental problem: specifications and implementations live in separate worlds. A spec says “the function should return a non-empty list.” The code returns a list. Nobody checks if it’s non-empty until runtime crashes.

Spec Document          Implementation
     |                      |
     v                      v
"Return non-empty" ---> return items;  // items could be []!
     |                      |
     +---- NO CONNECTION ---+

I asked myself: what if the compiler could check that my code matches my spec? Not through tests, but through type checking?

The Core Insight: A Sufficiently Detailed Spec Is Code

The Reddit discussion on r/programming crystallized something I’d been groping toward:

“A sufficiently detailed spec is code.”

And more specifically:

“Idris2, coq, and agda have entered the chat. A sufficiently detailed spec in these languages is literally an implementation too. They don’t let you write code that does not conform to your spec by design.”

This is the paradigm shift. Instead of hoping code matches specs, you write code that IS the spec. The compiler becomes a proof checker.

What Makes These Languages Different?

Traditional languages have types like List<Int> or String. Formal specification languages have types that depend on values.

Let me show you what this means.

Traditional Approach: Runtime Checks

In TypeScript, I’d write:

function head<T>(list: T[]): T {
  if (list.length === 0) {
    throw new Error("Cannot get head of empty list");
  }
  return list[0];
}

// This compiles but fails at runtime:
const crash = head([]);  // Runtime error!

The type system doesn’t know about list length. I have to add runtime checks.

Dependent Type Approach: Compile-Time Safety

In Agda, the length is part of the type:

data Vec (A : Set) : Nat -> Set where
  []  : Vec A zero
  _::_ : {n : Nat} -> A -> Vec A n -> Vec A (suc n)

-- Type-safe head: impossible to call on empty list
head : {A : Set} {n : Nat} -> Vec A (suc n) -> A
head (x :: xs) = x
-- No case for [] - the type checker guarantees it won't happen!

Look at the signature: Vec A (suc n) means a vector with at least one element. The suc (successor) ensures non-zero length. I cannot call head on an empty vector because Vec A zero doesn’t match Vec A (suc n).

The compiler rejects the empty list case before I can run the code.

How Dependent Types Work

The key innovation is that types can depend on values. Here’s a mental model:

Traditional Types:
  List Int        -- List of integers (length unknown)
  String -> Int   -- Function from string to integer

Dependent Types:
  Vec Int 5       -- Vector of exactly 5 integers
  (n : Nat) -> Vec Int n  -- Function that returns vector of length n

This lets me encode specifications as types:

-- safeLookup: given a vector and a valid index, return the element
-- The index must be less than the vector length
safeLookup : {A : Set} {n : Nat}
           -> (vec : Vec A n)
           -> (idx : Nat)
           -> IsTrue (idx < n)  -- Proof that idx is in bounds
           -> A
safeLookup (x :: xs) zero p = x
safeLookup (x :: xs) (suc idx) p = safeLookup xs idx (drop-proof p)

The type IsTrue (idx < n) is a proof requirement. I must provide evidence that my index is in bounds. No runtime bounds check needed.

Curry-Howard Correspondence: Proofs Are Programs

This is where it gets mind-bending. There’s a mathematical connection between logic and programming:

Logic                    Programming
-----------              -----------
Proposition              Type
Proof                    Program
Proof verification       Type checking
A and B                  A x B (product type)
A or B                   A + B (sum type)
A implies B              A -> B (function type)

When I write a function that type-checks, I’m writing a proof. When I call safeLookup, I’m proving the index is valid.

Let me show you a concrete proof in Coq:

(* Specification: reversing a list twice gives back the original *)
Theorem reverse_involutive : forall (A : Type) (l : list A),
  rev (rev l) = l.
Proof.
  intros A l.
  induction l as [| x xs IH].
  - (* l = [] *)
    simpl. reflexivity.
  - (* l = x :: xs *)
    simpl. rewrite rev_app_distr.
    rewrite IH. reflexivity.
Qed.

This isn’t a test. It’s a mathematical proof that reverse(reverse(l)) = l for all lists. The compiler verifies the proof. If I change my reverse implementation incorrectly, the proof fails to compile.

Practical Workflow: Specification to Implementation

The Reddit comment described the practical approach:

“Oftentimes the runtime code is automatically generated because it is obvious from the spec and that’s good enough. But if your runtime requirements don’t mesh with your proof requirements due to performance concerns, that’s fine - you can hand write parts for efficiency, it just won’t compile unless your efficient version is provably the same as your specified version.”

This means:

Write a high-level specification
Generate or extract the implementation automatically
Optimize manually if needed, but prove the optimized version equivalent

Here’s how this looks in Idris:

-- Type-safe printf: return type depends on format string!
-- printf "Hello %s" has type String -> String
-- printf "Age: %d" has type Int -> String

data PrintfType : String -> Type where
  PrintfStr : (fmt : String) -> PrintfType fmt

-- The format string determines the function signature
printf : (fmt : String) -> PrintfType fmt

The format string "%s %d" creates a function expecting a String and an Int. The type is computed from the value.

Real-World Success Stories

This isn’t just academic. These tools have verified real systems:

CompCert: A verified C compiler. The compiled code is mathematically proven to match the source. No compiler bugs introducing subtle changes.

seL4: A verified microkernel. Memory safety, absence of data leaks, and functional correctness are all proved.

AWS: Uses formal methods for encryption and access control code. AWS’s internal tooling proves properties about IAM policies.

Meta: Uses Coq for verifying distributed systems protocols.

Why Isn’t Everyone Using These?

The Reddit discussion noted:

“We also have (to a significant degree) the tools to spec things out in code, but people aren’t using them. How many are using and investing into advanced type systems?”

I see three barriers:

1. Learning Curve

Dependent types require new mental models. I had to unlearn habits from decades of mainstream programming. The type errors are cryptic at first:

.n != suc m
when checking that the expression xs has type Vec A (suc n)

This error means “the vector length doesn’t match what I expected.” Learning to read these takes practice.

2. Tooling Maturity

Idris is general-purpose but young. Coq and Agda excel at proofs but aren’t designed for building web apps. The ecosystem isn’t there yet.

3. Cost-Benefit Trade-off

For most software, bugs are acceptable. The cost of formal verification exceeds the cost of fixing bugs in production. But for crypto, compilers, and security-critical code, the math flips.

When Should You Consider These Tools?

Traditional Testing          Formal Verification
        |                            |
        v                            v
  +------------------+          +------------------+
  | Most software    |          | Security-critical|
  | Business apps    |          | Compilers        |
  | Web services     |          | Crypto           |
  | Mobile apps      |          | Kernel/OS        |
  +------------------+          | Financial systems|
                                +------------------+

Use formal specification when:

A bug could cost millions (financial systems)
A bug could cause physical harm (medical devices, aviation)
A bug could compromise security (crypto, access control)
You need mathematical certainty (compilers, protocols)

Stick with testing when:

Bugs are cheap to fix
Time to market matters more than certainty
Requirements change frequently
Team lacks expertise

Getting Started

If you want to explore these tools:

For Idris: Start with “Type-Driven Development with Idris” by Edwin Brady. Idris is the most approachable for programmers coming from mainstream languages.

For Coq: Work through the Software Foundations series. It’s free online and teaches both the language and the theory.

For Agda: Start with the official tutorial. Agda’s syntax feels like Haskell with dependent types added.

For a gentler start: Try Liquid Haskell. It adds refinement types to Haskell without full dependent types. It’s a stepping stone to the more powerful systems.

Summary

In this post, I explored how formal specification languages like Idris, Coq, and Agda work. The core insight is that a sufficiently detailed specification becomes executable code. Dependent types let you encode properties in types, and the Curry-Howard correspondence means type checking is proof verification.

The compiler rejects code that violates specifications. No runtime checks needed. But the learning curve is steep, tooling is immature, and the cost-benefit trade-off only makes sense for certain domains.

These tools aren’t for everyone. But if you work on systems where bugs are expensive or dangerous, they offer something traditional methods cannot: mathematical certainty that your code does what you specified.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Reddit discussion on specification vs code
👨‍💻 Type-Driven Development with Idris
👨‍💻 Software Foundations (Coq)
👨‍💻 Agda Documentation
👨‍💻 CompCert Verified C Compiler

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!