Mugo, a toy compiler for a subset of Go that can compile itself

April 2021

Summary: This article presents Mugo, a single-pass compiler for a tiny subset of the Go programming language. It outputs (very naive) x86-64 assembly, and supports just enough of the language to implement a Mugo compiler: int and string types, slices, functions, locals, globals, and basic expressions and statements.

Go to: Which subset? | Codegen | Lexer and Parser | Performance | Related projects

I’ve been fascinated with compilers since I started coding. One of my first programming projects was “Third”, a self-hosted Forth compiler for 8086 DOS. Forth is incredibly easy to compile: there are no expressions or statements, and each space-separated token gets compiled directly to a call instruction – often via techniques like direct threading.

Typical languages like C and Go have more complex syntax with expressions and statements, so they require real parsers and code generators. Compilers for these languages are usually complex and powerful, but as we’ll see, if you stick to basic types and non-optimized output, you can still write a simple one.

Mugo is kind of in the spirit of the Obfuscated Tiny C Compiler by Fabrice Bellard, though of course mine’s far more pedestrian, and won’t be winning the IOCCC anytime soon. Bellard’s compiler implements just enough C to compile itself to a native i386 Linux executable.

I wanted to do something a bit like that with Go, minus the obfuscation. The idea started as a shower thought: “I wonder what’s the smallest subset of Go that could compile itself?” Fabrice’s C compiler is implemented in 2048 bytes of obfuscated C, whereas mine is 1600 lines of formatted Go.

While this was a fun exercise to do over a long weekend, it’s very much a toy – it leaves out all the great features of Go: user-defined types, interfaces, goroutines, channels, maps, garbage collection, even bounds checking! My goal with Mugo was educational: for me, and hopefully also for you. Doing exercises like this helps demystify how our tools work.

Which subset of Go?

Mugo is a subset of Go, so the source code can be compiled with Go as well as with Mugo. In my opinion, this makes it much more interesting. It also made it easier to test: when the assembly output of the Go-compiled version was identical to the output of the Mugo-compiled version, I knew it was working – it was a beautiful thing when diff mugo2.asm mugo3.asm showed no output!

Before I started, I mulled over which subset of features I should include. I knew I’d need some kind of container type to store compiler state: for example, the names and types of variables, and function signatures and return types. But which container?

Go has pointers, and they’re safer but not nearly as powerful as C’s, because you can’t do pointer arithmetic. Bellard’s compiler makes heavy use of C pointers, but those weren’t going to work in Go.

What about structs or maps? Well, those were going to be more complex to implement, and don’t really solve the most common problem of storing a list of things. So I decided I could do without all those, and just needed slices.

Here’s what Mugo supports:

And that’s about it! If I’ve left it out of the above list, it’s probably not included. Like I said, a small subset.

I consulted the nice and concise Go language spec a bunch while building this, though I’ve almost certainly got some things wrong. However, what is implemented does seem to work like Go, as shown by my “diff test”.

Code generation

Mugo is a single-pass compiler that outputs x86-64 assembly from the parser as it goes. (It’s written for Linux, but it wouldn’t be hard to get it working on macOS or Windows.) There’s no in-memory abstract syntax tree – it would be tricky to build that with only slices in any case.

It’s also very naive. There’s no optimization – I basically turn my powerful register-based CPU into a dumb stack machine, and push and pop intermediate values to and from the stack. Probably half the complexity of a real compiler is in the code generation – the other half is in the type checking – and both of those things are incredibly simplified in Mugo.

One of the tricks I had to play was for local variable declarations (with Go’s := syntax). Because there’s only one pass, you don’t know how many locals you’ll have or what their types are until you’ve finished parsing the function. So my function prologue, apart from the usual rbp frame pointer dance, subtracts 64 bytes from the stack pointer to allow space for up to 8 cells of local variables (the most used in Mugo is 7 cells).

Update: “a1369209993” on Hacker News pointed out that I could have referenced an assembler constant that was defined at the end of the function, once we know the size. I’m letting the assembler handle that for forward jumps in if and else already. Thanks!

Here’s the full output for an integer add function:

; func add(x int, y int) int {
;     return x + y
; }

; function prologue
push rbp             ; rbp is the frame pointer
mov rbp, rsp
sub rsp, 64          ; make space for any more locals
                     ; (not used by this function)

; fetch and push local variable x, then y
push qword [rbp+24]
push qword [rbp+16]

; the + operation
pop rbx
pop rax
add rax, rbx
push rax

; pop result back into rax for "return"
pop rax

; function epilogue (restore stack and frame pointer)
mov rsp, rbp
pop rbp
ret 16              ; return, and free space due to
                    ; caller pushing x and y

Compared to gcc’s non-optimized output, we’re not doing too badly:

push    rbp
mov     rbp, rsp
mov     qword [rbp-8], rdi
mov     qword [rbp-16], rsi
mov     rdx, qword [rbp-8]
mov     rax, qword [rbp-16]
add     rax, rdx
pop     rbp

However, gcc’s optimized output produces a single, register-based instruction:

lea     rax, [rdi+rsi]

On the caller side, to generate a call to add, Mugo produces the following code:

; add(1, 2)

push qword 1  ; push first arg
push qword 2  ; push second arg
call add      ; call the function
push rax      ; push return value back to stack

As you can see, Mugo uses its own very inefficient ABI that’s nothing like the x86-64 ABI – the standard ABI puts the first six “cells” (64-bit values) in registers.

For simplicity, Mugo’s ABI pushes arguments onto the stack. They’re pushed in order, so they end up on the stack in reverse order in memory. However, Mugo does use registers for return values: rax, and then rbx and rcx if there are more cells. Just like in Go, an int is one cell, a string is two (address and length), and a slice is three (address, length, and capacity).

For allocating memory for string concatenation and slice appending, Mugo uses a trivial “bump allocator”. In other words, it bumps a pointer along in a fixed 1MB chunk of memory, and exits with an out-of-memory message if that gets used up. It never frees memory, and there’s no garbage collector. Ideal for short-running programs!

To generate the assembly, Mugo simply calls print to write to standard output:

func genFuncStart(name string) {
    print(name + ":\n")
    print("push rbp\n")
    print("mov rbp, rsp\n")
    print("sub rsp, " + itoa(localSpace) + "\n") // space for locals

Once Mugo has run, I use NASM to assemble the output, and the ld linker to build an executable. Here’s an example from the Makefile showing how I build three versions of the compiler:

# Build the compiler with Go
    go build -o build/mugo

# Build the compiler with the Go-built Mugo
    build/mugo <mugo.go >build/mugo2.asm
    nasm -felf64 -o build/mugo2.o build/mugo2.asm
    ld -o build/mugo2 build/mugo2.o

# Build the compiler with the Mugo-built Mugo
    build/mugo2 <mugo.go >build/mugo3.asm
    nasm -felf64 -o build/mugo3.o build/mugo3.asm
    ld -o build/mugo3 build/mugo3.o
    diff build/mugo2.asm build/mugo3.asm  # ensure output matches!

There’s also a make target to produce a coverage report from a simple test that runs the compiler on itself. The test simply calls Mugo’s main(), so we run the test binary with a coverage profile turned on, and send mugo.go to the process’s standard input. The “test” is to compile the full source code of the compiler and record what coverage we get:

    go test -c -o build/mugo_test -cover
    build/mugo_test -test.coverprofile build/coverage.out \
        <mugo.go >/dev/null
    go tool cover -html build/coverage.out -o build/coverage.html

I initially included a couple of features that the compiler didn’t use, so they weren’t tested, and showed up as red in the coverage report. Apart from a couple of things that it just seemed consistent to have, like the ! not operator and string slice assignment, I removed unused features. Now I have full coverage of the features, excluding error handling.

Once or twice I had to break out gdb for debugging. My x86 assembly skills are definitely rusty, and I’ve never written serious 64-bit assembly. I’m sure there are many ways to improve the output, even with the one-pass constraint – but I’ve left those improvements as an exercise for the reader. :-)

Lexer and parser

Go has nice, simple syntax that’s easy to tokenize and parse. Mugo uses a single character of lookahead in its lexer, and a typical recursive-descent parser.

The lexer is basically a big set of if statements on the next character, which is stored in a global integer c. Here’s a snippet of what it looks like:

func next() {
    // Skip whitespace and comments, and look for / operator
    for c == '/' || c == ' ' || c == '\t' || c == '\r' || c == '\n' {
        if c == '/' {
            if c != '/' {
                token = tDivide
            // Comment, skip till end of line
            for c >= 0 && c != '\n' {
        } else if c == '\n' {
            // Semicolon insertion:
            if token == tIdent || token == tIntLit || token == tStrLit ||
                token == tReturn || token == tRParen ||
                token == tRBracket || token == tRBrace {
                token = tSemicolon
        } else {
    if c < 0 {
        // End of file
        token = tEOF

    // Integer literal
    if isDigit(c) {
        tokenInt = c - '0'
        for isDigit(c) {
            tokenInt = tokenInt*10 + c - '0'
        token = tIntLit

    // ... handle other tokens (snipped) ...

In the parser, I’ve tried to use the names of the productions from the grammar in the Go spec, for example Expression, VarSpec, and Operand. Many of them are slimmed-down from what’s in the Go spec, of course, because we’re dealing with a subset of the language. Here are a couple of examples of parsing functions – note how the calls to the the code generation functions are intermingled:

func Literal() int {
    if token == tIntLit {
        return typeInt
    } else if token == tStrLit {
        return typeString
    } else {
        error("expected integer or string literal")
        return 0

func SimpleStmt() {
    // Funky parsing here to handle assignments
    identName := tokenStr
    expect(tIdent, "assignment or call statement")
    if token == tAssign {
        lhsType := varType(identName)
        rhsType := Expression()
        if lhsType != rhsType {
            error("can't assign " + typeName(rhsType) + " to " +
    } else if token == tDeclAssign {
        typ := Expression()
        defineLocal(typ, identName)
    } else if token == tLParen {
        typ := Arguments()
        genDiscard(typ) // discard return value
    } else if token == tLBracket {
        expect(tRBracket, "]")
        expect(tAssign, "=")
    } else {
        error("expected assignment or call not " + tokenName(token))

func Statement() {
    if token == tIf {
    } else if token == tFor {
    } else if token == tReturn {
    } else {

One of the slightly messy things is the “simple statement” above – with a parser that generated a syntax tree, you’d probably call Expression to parse the left-hand side, and then see if there’s an = or := for assignment, and parse the right-hand side. But we can’t call Expression, or it will compile code to fetch that expression instead of assign to it. So we have to parse an identifier and then detect whether what’s coming next is an assignment, function call, or slice expression. I’m sure the above code doesn’t handle all the edge cases correctly, but it’s good enough.

Operator precedence is handled using recursive-descent, like && and || operators in the following (the names orExpr and andExpr don’t appear in the Go spec):

func andExpr() int {
    typ := comparisonExpr()
    for token == tAnd {
        op := token
        typRight := comparisonExpr()
        typ = genBinary(op, typ, typRight)
    return typ

func orExpr() int {
    typ := andExpr()
    for token == tOr {
        op := token
        typRight := andExpr()
        typ = genBinary(op, typ, typRight)
    return typ

func Expression() int {
    return orExpr()

There are two recursive forward references in the recursive-descent parser: Expression (various functions inside the expression-parsing have to call Expression) and Block (block elements end up nesting into Block). Go doesn’t need or allow forward references, so Mugo pre-defines those two functions with the correct signature when it starts up.

The compiler keeps track of variable names and type information in a set of global slices:

var (
    globals        []string // global names and types
    globalTypes    []int
    locals         []string // local names and types
    localTypes     []int
    funcs          []string // function names
    funcSigIndexes []int    // indexes into funcSigs
    funcSigs       []int    // each func: retType N arg1Type ... argNType

The first four are pretty self-explanatory, but the funcSigs slice is a bit, well, funky. It’s really a slice of structs. In real Go code you’d probably define a funcSig struct and roll all three of those “func” slices into a map of function name to struct:

var funcSigs map[string]funcSig

type funcSig struct {
    retType  int
    argTypes []int

But Mugo doesn’t support structs or maps, so I had to stuff these fields into a flat slice of int, with funcSigIndexes[i] pointing to the start of the pseudo-struct in the funcSigs slice (for the function at index i).


Because it does no optimization, Mugo is obviously going to be a lot slower than Go, so I’m not going to do extensive performance testing. But just for fun, I wrote a little program to test the performance of a basic loop with some integer arithmetic – it sums the numbers from one to a billion:

var (
    result int

func main() {
    sum := 0
    i := 1
    for i <= 1000000000 {
        sum = sum + i
        i = i + 1
    result = sum // so Go doesn't optimize it out

On my machine, the Go version of this runs in 0.34 seconds. The Mugo version runs in 5.7 seconds – about 17 times as long. I guess it’s not bad for some of the worst assembly code ever produced. For reference, a Python version of this same loop runs in 1 minute and 38 seconds … a dynamically-typed bytecode interpreter is not a good choice for heavy integer arithmetic.

Interestingly, if I make sum itself a global variable instead of a local, the Mugo version is unchanged, but the Go version runs in 1.7 seconds instead of 0.34. I suspect a big part of the reason Mugo is so much slower is it’s doing everything in memory on the stack – even if the stack is in the CPU cache, registers are always going to be faster.

Another aspect of performance is code size: the executables Mugo builds are much smaller than those built with Go. The Mugo binary built with Go is 1.6MB. Mugo built with Mugo is only 56KB – about 1/29th the size! It’s not exactly a fair fight – Mugo doesn’t build in a goroutine scheduler, garbage collector, or runtime type information (see this Go FAQ question). However, this does raise an interesting question: for simple CLI tools written in Go, I wonder if we could reduce binary size with a dead-simple scheduler and GC?

As I mentioned, I’ve been interested in interpreters and compilers for a long time. If you enjoyed this article, here are some of my related projects:

I hope you enjoyed this article – I definitely had fun creating Mugo. Feel free to send me feedback! You can also read the discussion on Hacker News.