nctref/DOCUMENTATION.md

# Nectar Reference Compiler Source Documentation

If you know this compiler took since 2019 to get to its current state, you will correctly guess that I don't really know what I am doing. Compiler literature, and online discussion, is abstract to the point where it is not useful for real-world processors. As a result, much of what you see in the source is the result of a lot of experimentation. I'm sure better methods are available to do the things within.

Basically, the compiler works by progressively iterating through the AST, turning it into a more primitive form step by step. This is necessary because machine code itself is primitive, and instructions typically have only 1-3 operands. Thanks to both this, and Nectar itself being a highly low-level language, the need for any IRs disappear. On the other hand, making sure the AST is in a correct state between steps isn't easy, and is the prime source of bugs.

Currently the compiler is designed with only i386+ processors in mind. I intend to add support for i286- and other exotic processors, but I honestly don't see it happening ever, especially if this remains a solo project. More RISC architectures with regular register files will be easier to add support for.

## AST structure

Starting with a Nectar source file, the compiler begins with the two common passes: lexing and parsing. Parsing exploits Nectar's syntax quirks, and may jump back and forth multiple times to fully parse a source file. This is necessary to avoid having to forward declare items. At the end, parsing returns what is called an AST in the source, although formally speaking the term is incorrectly used.

An AST node *may not be shared* by multiple other nodes.

Each basic block of code is called a "chunk", likely a term I took from Lua. Basic blocks may contain one another; the least deep one within a function is called the top-level chunk (very important). Top-level chunks may contain other top-level chunks, because user-defined functions are within the "global scope", which is considered a function in itself. After a top-level chunk is finished parsing, all local variables in the current scope are in the flattened variables list of the top-level chunk `ASTChunk` structure. Names may conflict, but at this point they're no longer important.

There's enough types of passes to push us to have a generic way to invoke the visitor pattern on the AST. Because passes may do many different things to the AST, including modify it, the definition of a generic visitor is very broad. Most functionality is unused by each pass.

    void generic_visitor(AST **nptr, AST *stmt, AST *stmtPrev, AST *chu, AST *tlc, void *ud, void(*handler)(AST**, AST*, AST*, AST*, AST*, void*));

`*nptr` is the actual node that is currently being visited. It is behind an additional indirection, because the node may be replaced by another.

If the current node is within a statement (most are), `stmt` is equal to that statement. `stmtPrev` is the previous statement. This is necessary for patching in the linked list of statements within a chunk during modification passes. If there is no previous statement, then the head pointer of the singly-linked list must be patched through the `chu` pointer. The `tlc` is the top-level chunk, which may be equal to `chu`.

## Dumbification

Once the AST is parsed, we move to machine-specific passes (in this case, i386). The idea of turning the AST progressively primitive is called "dumbification" in the source. The most simple example would be the following:

    a = b + c

which should become

    a = b
    a = a + c

Because the `add` instruction on x86 is two-operand. If targeting a 3-op arch like MIPS, this specific dumbification would not be used.

Another rule is to extract function arguments and place them into local variables, but *only* if they do not form an x86 operand (for example, `5` can be pushed immediately).

Dumbification must be repeated until no more changes happen. The dumbification part of the source is responsible for making sure the resulting AST is "trivially compilable" to the machine code. This is actually non-trivial, because what is trivially compilable depends on which registers are used in the end result (a variable colored as `edi`, `esi` or `ebp` cannot be used for 8-bit stores/loads). These details are not taken into account by dumbification.

A common bug when writing a dumbification rule is ending up with one that is always successful. If this happens, the compiler could become stuck endlessly dumbifying, which is nonsense.

Pre-dumbification is a single-time pass that takes a top-level chunk, and inserts loads and stores for the function arguments. Such unconditional instructions are not efficient, but they work.

Putting all of this together, here is an example of nctref's dumbification of the following Fibonacci implementation, as of writing. Here is the main source, and the compiler's debug output:

    fibonacci: u32(u32 n) -> {
        if(n <= 1) {
            return n;
        }
        return fibonacci(n - 1) + fibonacci(n - 2);
    };

    @unimp fibonacci: u32(u32) {
    n = *(@stack + 4);
    if((n <= 1)) {
    return n;
    }
    $dumb2 = n;
    $dumb2 = ($dumb2 - 1);
    $dumb0 = fibonacci($dumb2);
    $dumb3 = n;
    $dumb3 = ($dumb3 - 2);
    $dumb1 = fibonacci($dumb3);
    $dumb0 = ($dumb0 + $dumb1);
    return $dumb0;
    };

`@unimp` is anything unimplemented in the AST debug printer, but it should say `u32(u32)`. `@stack` is an internal variable that points to the beginning of the current stack frame.

## Use-def chain

I hate these things. If you don't want to use static single assignment form, this is one alternative. Another is def-use chains, but both are horribly underdocumented.

For each variable, it's UD chain is a list of each use in the AST, with the corresponding potential definition of the variable at that use. For each potential definition that exists at that point, there is one UD element in the chain. Users include optimizers and codegen. The UD-chains are continually regenerated when needed by using the UD visitor on the top-level chunk.

As always, it's not that fuckin simple. Imagine the following pseudocode:

    x = 0
    loop {
        do something with x
        x = x + 1
    }

`x = x + 1` is a potential definition for everything in `do something with x`! For loops, special logic exists to make sure the UD-chain isn't stupid.

## Coloring

TBA.