diff --git a/DOCUMENTATION.md b/DOCUMENTATION.md index 6815b8e..eea92c1 100644 --- a/DOCUMENTATION.md +++ b/DOCUMENTATION.md @@ -12,7 +12,7 @@ Currently the compiler is designed with only i386+ processors in mind. I intend ## AST structure -Starting with a Nectar source file, the compiler begins with the two common passes: lexing and parsing. Parsing exploits Nectar's syntax quirks, and may jump back and forth multiple times to fully parse a source file. This is necessary to avoid having to forward declare items. At the end, parsing returns what is called an AST in the source, although formally speaking the term is incorrectly used. +Starting with a Nectar source file, the compiler begins with the two common passes: lexing and parsing. The skimming in-between exploits Nectar's syntax quirks, and may jump back and forth multiple times to find all declared symbols and types in advance. This is necessary to avoid having to forward declare items. At the end, parsing returns what is called an AST in the source. An AST node may not be shared by multiple parent nodes. Also, the internal Nectar AST does not have scaling for pointer arithmetic; all pointers behave as `u8*`. This is the first of many simplifications. @@ -32,6 +32,12 @@ If the current node is within a statement (most are), `stmt` is equal to that st A handler may be called before or after delving deeper into the tree (hence the pre and post handlers). Most passes use the prehandler, but type checking will be better with a posthandler, since we want type checks to happen bottom to top. +## Desegmentation + +Pointers in Nectar might not be regular integers. The basic ones (called near) are, but we also have far pointers to support x86 segmentation. Internally these are record types with two fields: a `u16` segment and a `T*` near pointer. + +## SRoA + ## Pre-dumbification Before dumbification we need to make sure the code at least matches the semantics of the x86 architecture. @@ -173,7 +179,7 @@ It is insufficient for a compiler to assume these are in any way comparable. The From these we form sets of registers called "register classes", which can be thought of as "ways in which a register can be used". The resource mask of a register class is the union (bitwise OR) of all bits used by all of its registers. -This compiler currently considers 3 register classes: `REG_CLASS_8`, for `al`, `ah`, `bl`, `bh`, `cl`, `ch`, `dl`, `dh`; `REG_CLASS_NOT_8` for `ax`, `eax`, `bx`, `ebx`, `cx`, `ecx`, `dx`, `edx`, `di`, `edi`, `si`, `esi`; `REG_CLASS_IA16_PTRS` for `di`, `si`, `bx`. It can be seen registers are not unique under this abstraction, but this is necessary as this abstraction assumes the CPU to be a soup. +This compiler currently considers 3 register classes: `REG_CLASS_8`, for `al`, `ah`, `bl`, `bh`, `cl`, `ch`, `dl`, `dh`; `REG_CLASS_NOT_8` for `ax`, `eax`, `bx`, `ebx`, `cx`, `ecx`, `dx`, `edx`, `di`, `edi`, `si`, `esi`; `REG_CLASS_IA16_PTRS` for `di`, `si`, `bx`. It can be seen registers are not unique under this abstraction, but this is necessary as we gave up finding any patterns and assume the CPU to be a soup. (64-bit is not considered)