diff --git a/DOCUMENTATION.md b/DOCUMENTATION.md index 26598c0..e831588 100644 --- a/DOCUMENTATION.md +++ b/DOCUMENTATION.md @@ -10,9 +10,11 @@ Currently the compiler is designed with only i386+ processors in mind. I intend Starting with a Nectar source file, the compiler begins with the two common passes: lexing and parsing. Parsing exploits Nectar's syntax quirks, and may jump back and forth multiple times to fully parse a source file. This is necessary to avoid having to forward declare items. At the end, parsing returns what is called an AST in the source, although formally speaking the term is incorrectly used. -An AST node *may not be shared* by multiple other nodes. +An AST node *may not be shared* by multiple other nodes. Also, the internal Nectar AST does not have scaling for pointer arithmetic; all pointers behave as `u8*`. -Each basic block of code is called a "chunk", likely a term I took from Lua. Basic blocks may contain one another; the least deep one within a function is called the top-level chunk (very important). Top-level chunks may contain other top-level chunks, because user-defined functions are within the "global scope", which is considered a function in itself. After a top-level chunk is finished parsing, all local variables in the current scope are in the flattened variables list of the top-level chunk `ASTChunk` structure. Names may conflict, but at this point they're no longer important. +Each basic block is called a "chunk", likely a term I took from Lua. Basic blocks may contain one another; the least deep one within a function is called the top-level chunk (very important). Top-level chunks may contain other top-level chunks, because user-defined functions are within the "global scope", which is considered a function in itself. + +After a chunk is finished parsing, all local variables in its scope are added to the flattened variables list of the top-level chunk's `ASTChunk` structure. Names may conflict, but at this point they're no longer important. Also worth mentioning is that this flat list contains `VarTableEntry` structs, even though VarTables are now irrelevant. There's enough types of passes to push us to have a generic way to invoke the visitor pattern on the AST. Because passes may do many different things to the AST, including modify it, the definition of a generic visitor is very broad. Most functionality is unused by each pass. @@ -20,28 +22,28 @@ There's enough types of passes to push us to have a generic way to invoke the vi `*nptr` is the actual node that is currently being visited. It is behind an additional indirection, because the node may be replaced by another. -If the current node is within a statement (most are), `stmt` is equal to that statement. `stmtPrev` is the previous statement. This is necessary for patching in the linked list of statements within a chunk during modification passes. If there is no previous statement, then the head pointer of the singly-linked list must be patched through the `chu` pointer. The `tlc` is the top-level chunk, which may be equal to `chu`. +If the current node is within a statement (most are), `stmt` is equal to that statement. `stmtPrev` is the previous statement. This is necessary for patching in the linked list of statements within a chunk during modification passes. If there is no previous statement, then the head pointer of the singly-linked list must be patched through the `chu` node. The `tlc` is the top-level chunk, which may be equal to `chu`. ## Dumbification Once the AST is parsed, we move to machine-specific passes (in this case, i386). The idea of turning the AST progressively primitive is called "dumbification" in the source. The most simple example would be the following: - a = b + c + a = -b which should become a = b - a = a + c + a = -a -Because the `add` instruction on x86 is two-operand. If targeting a 3-op arch like MIPS, this specific dumbification would not be used. +Because the `neg` instruction on x86 is single-operand. If targeting an arch like MIPS, this specific dumbification would not be used, because one can use the 3-operand `subu` with the zero register. -Another rule is to extract function arguments and place them into local variables, but *only* if they do not form an x86 operand (for example, `5` can be pushed immediately). +Another rule is to extract function arguments and place them into local variables, but *only* if they do not form an x86 operand (for example `5` is ok because `push 5` exists). -Dumbification must be repeated until no more changes happen. The dumbification part of the source is responsible for making sure the resulting AST is "trivially compilable" to the machine code. This is actually non-trivial, because what is trivially compilable depends on which registers are used in the end result (a variable colored as `edi`, `esi` or `ebp` cannot be used for 8-bit stores/loads). These details are not taken into account by dumbification. +Dumbification must be repeated until there are no more changes. The dumbification part of the source is responsible for making sure the resulting AST is "trivially compilable" to the machine code. This is actually non-trivial, because what is trivially compilable depends on which registers are used in the end (a variable colored as `edi`, `esi` or `ebp` cannot be used for 8-bit stores/loads). These details are not taken into account by dumbification. A common bug when writing a dumbification rule is ending up with one that is always successful. If this happens, the compiler could become stuck endlessly dumbifying, which is nonsense. -Pre-dumbification is a single-time pass that takes a top-level chunk, and inserts loads and stores for the function arguments. Such unconditional instructions are not efficient, but they work. +Pre-dumbification is a single-time pass that takes a top-level chunk, and inserts loads for the function arguments. Such unconditional instructions are not efficient, but they work. Putting all of this together, here is an example of nctref's dumbification of the following Fibonacci implementation, as of writing. Here is the main source, and the compiler's debug output: @@ -73,7 +75,7 @@ Putting all of this together, here is an example of nctref's dumbification of th I hate these things. If you don't want to use static single assignment form, this is one alternative. Another is def-use chains, but both are horribly underdocumented. -For each variable, it's UD chain is a list of each use in the AST, with the corresponding potential definition of the variable at that use. For each potential definition that exists at that point, there is one UD element in the chain. Users include optimizers and codegen. The UD-chains are continually regenerated when needed by using the UD visitor on the top-level chunk. +For each variable, its UD chain is a list of each usage in the AST, with the corresponding potential definition of the variable at that use. For each potential definition that exists at that point, there is one UD element in the chain. Users include optimizers and codegen. The UD-chains are continually regenerated when needed by using the UD visitor on the top-level chunk. As always, it's not that fuckin simple. Imagine the following pseudocode: @@ -83,8 +85,66 @@ As always, it's not that fuckin simple. Imagine the following pseudocode: x = x + 1 } -`x = x + 1` is a potential definition for everything in `do something with x`! For loops, special logic exists to make sure the UD-chain isn't stupid. +Despite appearing after in the source, `x = x + 1` is a potential definition for everything in `do something with x`! This means loops require special logic, so that the UD-chains aren't stupid. ## Coloring -TBA. +At this point we have a very distorted kind of Nectar AST in our function. We've got basic blocks and other familiar things, but all variables are in a flat list. These variables are essentially the "virtual registers" you hear a lot about. Because x86 only has six general-purpose registers, we must assign each of these variables (VarTableEntry structures, abbr. VTE) to a physical machine register. + +This problem is a large area of study in itself, but a common approach is to imagine it as a graph coloring problem, where vertices are VTEs, and edges connect conflicting VTEs that cannot have the same color. Said edges are determined using the UD-chains of both VTEs. + +The actual coloring algorithm used is Welsh-Powell, which sorts the VTEs/vertices by degree before attempting greedy coloring. + +If there's more colors than there are physical registers, then we have a conflict, and must spill. There are two ways to do so: spill2var and spill2stack. The former is necessary on boundaries where suddenly a specific register/color must be used (e.g. returning in `eax`). The latter transforms every use of a local variable (`ASTExprVar` where its VTE is of type `VARTABLEENTRY_VAR`) into the form `@stack + n`. + +If spill2stack is used, then CG must fail once so that dumbification can be applied again. + +## Pre-coloring + +I skipped forward a bit. In reality, coloring assumes that all registers have equal importance, which is never true. A return value must be in `eax`, the remainder of division must be in `edx`, etc. In 64-bit, the index of an argument determines in which register it may end up. + +The pre-coloring visitor applies said rules to the AST, setting the colors in the VTE. It is completely plausible that a conflict can occur here, too, from two variables having overlapping live ranges and the same color, but it can also be from demanding more than one color from the same variable. In the latter case, the precoloring visitor gives up as soon as its detected. In both cases we do spill2var, not spill2stack, because spilling to the stack doesn't solve the pre-coloring problem. + +## Callee-saved pass + +If a function uses a callee-saved register, these must be stored and loaded at the correct times. This is done by modifying the AST in a special pass. + +Of the four currently used registers, only `ebx` is callee-saved. A random variable assigned to `ebx` is chosen, and is saved to/loaded from the stack. The rule is written such that dumbification isn't necessary, unlike spill2stack. + +## Code generation + +FINALLY. This pass doesn't use `generic_visitor`, because it may consume multiple sibling AST nodes for emitting code. At this point there's nothing arcane or obscure; the code is pretty self-explanatory. + +Using the same Fibonacci example as above, this is the result. + + global fibonacci + fibonacci: + mov edx, [esp + 4] + cmp edx, 1 + ja .L0 + mov eax, edx + ret + .L0: + mov eax, edx + dec eax + push ecx + push edx + push eax + call fibonacci + add esp, 4 + pop edx + pop ecx + mov ecx, eax + mov eax, edx + sub eax, 2 + push ecx + push edx + push eax + call fibonacci + add esp, 4 + pop edx + pop ecx + add ecx, eax + mov eax, ecx + ret +