Compare commits

...

3 Commits

Author SHA1 Message Date
Mid
13f43cadeb More tests 2025-05-03 10:03:51 +03:00
Mid
0d808de34c Updated DOCUMENTATION.md 2025-05-03 10:03:26 +03:00
Mid
56c10daaa7 parametrize your mom? 2025-05-03 10:00:20 +03:00
25 changed files with 1400 additions and 234 deletions

View File

@ -2,7 +2,7 @@
When writing a program, I usually make the most primitive and smallest code I can that does the job. If it turns out I miscalculated the complexity, or I must add some feature that isn't compatible with the codebase, I'll obviously have to refactor it. Still, I've been programming this way for probably my entire life.
That being said, if you know this compiler took since 2019 to get to its current state, you will correctly guess that I DO NOT KNOW WHAT I AM DOING. Compiler literature, and online discussion, is abstract to the point where it is not useful for real-world processors. Imagine how long it took me to realize real-world IRs are actually not at all generic, and are actually quite close to their target architectures. As a result, much of what you see in the source is the result of a lot of experimentation. There's definitely better ways to do the things I show here, but I figured it's better to have at least some resource on how a "real" compiler works.
That being said, if you know this compiler took since 2019 to get to its current state, you will correctly guess that I DO NOT KNOW WHAT I AM DOING. Compiler literature and online discussion is abstract to the point where it is not useful for real-world archs. When it gets specific, it's often too simplistic. It's common to say instruction selection should happen before register allocation, but how can you know which instructions to emit when some of them only work with specific registers? Imagine how long it took me to realize real-world IRs are not at all generic, and are actually quite close to their target architectures. As a result, much of what you see in the source is the result of a lot of experimentation. There's definitely better ways to do the things I show here, but I figured it's better to have at least some resource on how a "real" compiler works.
The core idea behind the compiler is to progressively iterate through the AST, turning it into a more primitive form step by step. Once this primitivization ends, the code generator is given the code in a form it will understand. Doing it this way is necessary because machine code itself is primitive, and instructions typically have 0-3 operands. Thanks to both this, and Nectar itself being highly low-level, the need for an IR disappears. On the other hand, making sure the AST is in a correct state between steps is the prime source of bugs.
@ -22,12 +22,14 @@ The top-level chunk keeps a list of variables within its `ASTChunk` structure. A
There's enough types of passes to push us to have a generic way to invoke the visitor pattern on the AST. Because passes may do many different things to the AST, including modify it, the definition of a generic visitor is very broad. Most functionality is unused by each pass, but all of it is needed.
void generic_visitor(AST **nptr, AST *stmt, AST *stmtPrev, AST *chu, AST *tlc, void *ud, void(*handler)(AST**, AST*, AST*, AST*, AST*, void*));
void generic_visitor(AST **nptr, AST *stmt, AST *stmtPrev, AST *chu, AST *tlc, void *ud, GenericVisitorHandler preHandler, GenericVisitorHandler postHandler);
`*nptr` is the actual node that is currently being visited. It is behind an additional indirection, because the node may be replaced by another.
If the current node is within a statement (most are), `stmt` is equal to that statement. `stmtPrev` is the previous statement. This is necessary for patching in the linked list of statements within a chunk during modification passes. If there is no previous statement, then the head pointer of the singly-linked list must be patched through the `chu` node. The `tlc` is the top-level chunk, which may be equal to `chu`.
A handler may be called before or after delving deeper into the tree (hence the pre and post handlers). Most passes use the prehandler, but type checking will be better with a posthandler, since we want type checks to happen bottom to top.
## Pre-dumbification
Before dumbification we need to make sure the code at least matches the semantics of the x86 architecture.
@ -91,11 +93,11 @@ NOTE: Later someone called this normalization, which is a much less stupid word
I hate these things. Another is def-use chains, but both are horribly underdocumented. Their only use in most literature is so the author can immediately move to SSA form.
For each variable, its UD chain is a list of each usage in the AST, with the corresponding potential definition of the variable at that use. For each potential definition that exists at that point, there is one UD element in the chain. If there's only one potential definition at a point, then it's definitely the true one. Users of UD chains include optimizers and codegen. The UD chains are continually regenerated when needed by using the UD visitor on the top-level chunk.
For each variable, its UD chain is a list of each usage in the AST, with the corresponding potential definition of the variable at that use. For each potential definition that exists at that point, there is one UD element in the chain. If there's only one potential definition at a point, then it's definitely the true one. Users of UD chains include optimizers and codegen. UD chains are always regenerated for use between passes by using the UD visitor on the top-level chunk.
As simplest, the code `u8 x = 0;` has an empty UD-chain, because there are no uses. It's definition could even be classified as dead code.
Clearly, a definition of a variable overrides every definition before it, but that is only within the same basic block. In the following code, a variable has a single potential definition in each branch of the if statement, but afterward it will have two:
Clearly, a definition of a variable overrides every definition before it, but that is only within a basic block. In the following code, a variable has a single potential definition in each branch of the if statement, but afterward it will have two:
u8 x = 0; /* Potential definitions: [x = 0]
* UD-chain of x:
@ -156,15 +158,17 @@ This problem is a large area of study in itself, but a common approach is to ima
The actual coloring algorithm used is Welsh-Powell, which sorts the VTEs/vertices by degree before attempting greedy coloring.
If there's more colors than there are physical registers, then we have a conflict, and must spill. There are two ways to do so: spill2var and spill2stack. The former is necessary on boundaries where suddenly a specific register/color must be used (e.g. returning in `eax`). The latter transforms every use of a local variable (`ASTExprVar` where its VTE is of type `VARTABLEENTRY_VAR`) into the form `@stack + n`.
If there's more colors than there are physical registers, then we have a conflict, and must spill. There are two ways to do so: ~~spill2var~~ and spill2stack. ~~The former is necessary on boundaries where suddenly a specific register/color must be used (e.g. returning in `eax`).~~ The latter transforms every use of a local variable (`ASTExprVar` where its VTE is of type `VARTABLEENTRY_VAR`) into the form `@stack + n`.
If spill2stack is used, then CG must fail once so that dumbification can be applied again.
## Pre-coloring
I skipped forward a bit. Coloring assumes that all registers have equal importance, which is never true. A return value must be in `eax`, the remainder of division must be in `edx`, etc. In 64-bit, the index of an argument determines in which register it may end up.
NOTE: spill2var turned out to be pushing the problem a step back rather than solving it. Because it is known in advance what must be pre-colored, any such expressions are immediately placed in their own variable by another pass (dumbification?). If the assignment turns out to have been redundant, the register allocator should coaslesce the moves.
The pre-coloring visitor applies said rules to the AST, setting the colors in the VTE. It is completely plausible that a conflict can occur here, too, from two variables having overlapping live ranges and the same color, but it can also be from demanding more than one color from the same variable. In the latter case, the pre-coloring visitor gives up as soon as its detected. In both cases we do spill2var, not spill2stack, because spilling to the stack doesn't solve the pre-coloring problem.
~~I skipped forward a bit. Coloring assumes that all registers have equal importance, which is never true. A return value must be in `eax`, the remainder of division must be in `edx`, etc. In 64-bit, the index of an argument determines in which register it may end up.~~
~~The pre-coloring visitor applies said rules to the AST, setting the colors in the VTE. It is completely plausible that a conflict can occur here, too, from two variables having overlapping live ranges and the same color, but it can also be from demanding more than one color from the same variable. In the latter case, the pre-coloring visitor gives up as soon as its detected. In both cases we do spill2var, not spill2stack, because spilling to the stack doesn't solve the pre-coloring problem.~~
## Callee-saved pass
@ -209,19 +213,52 @@ Using the same Fibonacci example as above, this is the result.
mov eax, ecx
ret
## Other problems with this approach
## Generics
Short-circuit evaluation is when the evaluation of an expression is guaranteed to stop once the output is already known. For example, if in `A || B` `A` is already truthy, then `B` is not evaluated. This is not just an optimization, but an important semantical feature, as evaluation of the operands may have side-effects.
**NOTE: I intend to place this section in a different Markdown file entirely. It will be simply too big.**
record Foo[T, U, V] {
T t;
U u;
V v;
}
Nectar does generics similarly to C++. Structures are simple to make generic. When parsing a generic structure definition we must introduce a new scope, so we can introduce the generic types as instances of `TypeGeneric`. If we encounter a parametrization like `Foo[u8, u16, u32]`, we walk up the tree formed by the type of `Foo`, and replace all `TypeGeneric` instances with the concrete types. This is done by `type_parametrize` which takes a `Parametrization` structure. Note that generic type names are not used, but the indices at which they appear.
bar: [T]T(T a, T b) -> {
return a + b;
};
If a function is defined with a generic type, parsing it is skipped until an explicit instantiation. This is because type checking is coupled with parsing. It needn't be this way, but it's a refactoring I'm not interested in doing at the moment. This ended up bringing other complexities. Because of the parser-type checker coupling, we must know what a generic type's name originally was, so `TypeGeneric`s must store this in addition to the index.
@instantiate bar[u32];
Upon parsing the above statement, the parser:
1. Creates a new scope
2. Finds the generic type names (using an output value of `type_parametrize` not mentioned until now)
3. Inserts the concrete types into the scope under the generic type names
4. Jumps to the generic function definition (in fact, to *right after the `[...]` block* to ignore the genericness)
5. Begins parsing the function's code block
6. Pops the scope
7. Jumps back to the end of the `@instantiate` statement
8. Insert the function code block into a new symbol, appending the concrete type names to the original function name separated by underscores (`bar_u32`)
How's that for a hack?
## Other problems with this approach (1)
Short-circuit evaluation is when the evaluation of an expression is guaranteed to stop once the output is already known. For example, if in `A || B` `A` is already truthy, then `B` is not evaluated. This is not just an optimization, but an important semantical detail, as evaluation of the operands may have side-effects.
Let us write `if(x == 1 || y == 1) { do stuff; }` in x86:
cmp eax, 1
je .L1
cmp ebx, 1
jne .L2
.L1:
; do stuff
.L2:
cmp eax, 1
je .L1
cmp ebx, 1
jne .L2
.L1:
; do stuff
.L2:
Note that the two jump instructions are basically goto statements. As the Nectar IR is defined without gotos, it is practically impossible for the compiler to output the neat code shown above. You could insert special logic for this case, but in general it'll fail.
@ -234,15 +271,39 @@ Even worse, the dumbification pass will try to move the condition into a variabl
do stuff;
}
And now suddenly we need 2 new registers for no reason..
And now we need 2 new registers for no reason..
In conclusion, an optimized IR should not use self-contained blocks, but should actually be flat, like the real thing, and have goto statements. Fixing this in nctref will require extreme refactoring, as the notion of blocks forming a tree is ingrained. Also, statements within a block form a singly-linked list. Even if we there were a simple `ASTStmtGoto` node, it cannot be a simple pointer to a statement, because passes need to modify the AST. For the time being, I have given up on short-circuit evaluation, and I do not actually support neither `||` nor `&&`.
Lack of gotos also makes function inlining impossible (!!).
In conclusion, what? Should a good IR actually be 100% flat and have nothing but jumps? Can this be solved by modelling the code as a graph of basic blocks? I don't know, but for now I have given up on short-circuit evaluation, and I do not actually support neither `||` nor `&&`.
## Other problems with this approach (2)
The `denoop_visitor` pass is incredibly important in normalizing the AST to something other passes will accept. Here's one case I found when trying to implement a statically allocated list class:
T* data = &((*this).data[0]);
It seems innocent enough, but it actually becomes:
T* data = &*(&*((&*this + 4) as T[4]*) + 0);
As of writing, `denoop_visitor` had produced this:
T* data = (this + 4) as T*;
The code generator failed to accept this, because the `as T*` cast meant that it could not match any pattern. The dumbifier also failed to decompose this to `data = this; data = data + 4;` for the same reason.
What was my solution? IGNORE ALL POINTER CASTS! As I wrote above, the Nectar AST does not support pointer arithmetic like that of C. By this point, all complex types should have already been converted into integers. Therefore, it does not even matter.
By adding the rule (`x as A*` -> `x` *only* if x's type is a pointer), we obtain the following after denooping:
T* data = this + 4;
## Adding a Feature
When adding a feature, first write it out in Nectar in the ideal dumbified form. Make sure this compiles correctly. Afterward, implement dumbification rules so that code can be written in any fashion. If specific colorings are required, then the pre-coloring and spill2var passes must be updated. The following is an example with multiplication, as this is what I'm adding as of writing.
Note the way `mul` works on x86. Firstly, one of the operands is the destination, because `mul` is a 2-op instruction. Secondly, the other operand cannot be an immediate, because it is defined as r/m (register or memory), so if the second operand is a constant, it must be split into a variable (`varify` in `dumberdowner.c`). Thirdly, the destination must be the A register, so one of the operands must be pre-colored to A. Fourthly, `mul` clobbers the D register with the high half of the product. In other words, we have an instruction with *two* output registers, which the Nectar AST does not support. But we can't have the register allocator assign anything to D here.
Note the way `mul` works on x86 (yes, I'm aware `imul` exists). Firstly, one of the operands is the destination, because `mul` is a 2-op instruction. Secondly, the other operand cannot be an immediate, because it is defined as r/m (register or memory), so if the second operand is a constant, it must be split into a variable (`varify` in `dumberdowner.c`). Thirdly, the destination must be the A register, so one of the operands must be pre-colored to A. Fourthly, `mul` clobbers the D register with the high half of the product. In other words, we have an instruction with *two* output registers, which the Nectar AST does not support. But we can't have the register allocator assign anything to D here.
To account for this, we can have a second assignment statement right next to the multiplication. Because the main multiplication clobbers the source operand, the mulhi assignment must come before the mul. Putting all this together, this is the canonical way to do `z = x * y` with an x86 target:
@ -250,7 +311,7 @@ To account for this, we can have a second assignment statement right next to the
w = z *^ y;
z = z * y;
But this is without pre-coloring. We want precolored nodes to live as little as possible, because separately solving pre-coloring collisions whilst also keeping the code dumbified *and* not horrible is pretty much impossible. I've tried.
But this is without pre-coloring. We want precolored nodes to live as little as possible, because separately solving pre-coloring collisions whilst also keeping the code dumbified *and* not horrible turned out to be practically impossible (spill2var).
k = x;
w = k *^ y;

145
src/ast.c
View File

@ -10,8 +10,8 @@ const char *AST_KIND_STR[] = {
AST_KINDS(GEN_STRI)
};
void generic_visitor(AST **nptr, AST *stmt, AST *stmtPrev, AST *chu, AST *tlc, void *ud, void(*handler)(AST**, AST*, AST*, AST*, AST*, void*)) {
handler(nptr, stmt, stmtPrev, chu, tlc, ud);
void generic_visitor(AST **nptr, AST *stmt, AST *stmtPrev, AST *chu, AST *tlc, void *ud, GenericVisitorHandler preHandler, GenericVisitorHandler postHandler) {
if(preHandler) preHandler(nptr, stmt, stmtPrev, chu, tlc, ud);
AST *n = *nptr;
@ -19,51 +19,51 @@ void generic_visitor(AST **nptr, AST *stmt, AST *stmtPrev, AST *chu, AST *tlc, v
AST *sPrev = NULL;
AST **s = &n->chunk.statementFirst;
while(*s) {
generic_visitor(s, *s, sPrev, n, tlc, ud, handler);
generic_visitor(s, *s, sPrev, n, tlc, ud, preHandler, postHandler);
sPrev = *s;
s = &sPrev->statement.next;
}
} else if(n->nodeKind == AST_STMT_ASSIGN) {
generic_visitor(&n->stmtAssign.what, stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->stmtAssign.what, stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
if(n->stmtAssign.to) {
generic_visitor(&n->stmtAssign.to, stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->stmtAssign.to, stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
}
} else if(n->nodeKind == AST_STMT_IF) {
generic_visitor(&n->stmtIf.expression, stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->stmtIf.then, stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->stmtIf.expression, stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
generic_visitor(&n->stmtIf.then, stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
} else if(n->nodeKind == AST_STMT_LOOP) {
generic_visitor(&n->stmtLoop.body, stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->stmtLoop.body, stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
} else if(n->nodeKind == AST_STMT_BREAK) {
} else if(n->nodeKind == AST_STMT_CONTINUE) {
} else if(n->nodeKind == AST_STMT_EXT_ALIGN) {
} else if(n->nodeKind == AST_STMT_DECL) {
if(n->stmtDecl.expression) {
generic_visitor(&n->stmtDecl.expression, stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->stmtDecl.expression, stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
}
} else if(n->nodeKind == AST_STMT_EXPR) {
generic_visitor(&n->stmtExpr.expr, stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->stmtExpr.expr, stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
} else if(n->nodeKind == AST_STMT_EXT_ORG) {
} else if(n->nodeKind == AST_STMT_EXT_SECTION) {
} else if(n->nodeKind == AST_STMT_RETURN) {
if(n->stmtReturn.val) {
generic_visitor(&n->stmtReturn.val, stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->stmtReturn.val, stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
}
} else if(n->nodeKind == AST_EXPR_BINARY_OP) {
generic_visitor(&n->exprBinOp.operands[0], stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->exprBinOp.operands[1], stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->exprBinOp.operands[0], stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
generic_visitor(&n->exprBinOp.operands[1], stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
} else if(n->nodeKind == AST_EXPR_CALL) {
generic_visitor(&n->exprCall.what, stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->exprCall.what, stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
for(size_t i = 0; i < n->exprCall.what->expression.type->function.argCount; i++) {
generic_visitor(&n->exprCall.args[i], stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->exprCall.args[i], stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
}
} else if(n->nodeKind == AST_EXPR_CAST) {
generic_visitor(&n->exprCast.what, stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->exprCast.what, stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
} else if(n->nodeKind == AST_EXPR_FUNC) {
generic_visitor(&n->exprFunc.chunk, NULL, NULL, n->exprFunc.chunk, n->exprFunc.chunk, ud, handler);
generic_visitor(&n->exprFunc.chunk, NULL, NULL, n->exprFunc.chunk, n->exprFunc.chunk, ud, preHandler, postHandler);
} else if(n->nodeKind == AST_EXPR_UNARY_OP) {
generic_visitor(&n->exprUnOp.operand, stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->exprUnOp.operand, stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
} else if(n->nodeKind == AST_EXPR_VAR) {
} else if(n->nodeKind == AST_EXPR_STACK_POINTER) {
} else if(n->nodeKind == AST_EXPR_PRIMITIVE) {
@ -73,18 +73,16 @@ void generic_visitor(AST **nptr, AST *stmt, AST *stmtPrev, AST *chu, AST *tlc, v
assert(n->expression.type->array.length != 0);
for(size_t i = 0; i < n->expression.type->array.length; i++) {
generic_visitor(&n->exprArray.items[i], stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->exprArray.items[i], stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
}
} else if(n->nodeKind == AST_EXPR_EXT_SALLOC) {
} else if(n->nodeKind == AST_EXPR_DOT) {
generic_visitor(&n->exprDot.a, stmt, stmtPrev, chu, tlc, ud, handler);
generic_visitor(&n->exprDot.a, stmt, stmtPrev, chu, tlc, ud, preHandler, postHandler);
} else {
abort();
}
}
AST *ast_expression_optimize(AST *ast) {
return ast;
if(postHandler) postHandler(nptr, stmt, stmtPrev, chu, tlc, ud);
}
int ast_expression_equal(AST *a, AST *b) {
@ -100,9 +98,11 @@ int ast_expression_equal(AST *a, AST *b) {
return a->exprBinOp.operator == b->exprBinOp.operator && ast_expression_equal(a->exprBinOp.operands[0], b->exprBinOp.operands[0]) && ast_expression_equal(a->exprBinOp.operands[1], b->exprBinOp.operands[1]);
} else if(a->nodeKind == AST_EXPR_STACK_POINTER) {
return 1;
} else if(a->nodeKind == AST_EXPR_CAST) {
return ast_expression_equal(a->exprCast.what, b->exprCast.what) && type_equal(a->exprCast.to, b->exprCast.to) && a->exprCast.reinterpretation == b->exprCast.reinterpretation;
} else {
stahp_node(a, "ast_expression_equal: unhandled %s", AST_KIND_STR[a->nodeKind]);
}
return 0;
}
// This function may return three values: YES (1), NO (0) or UNKNOWN (-1).
@ -231,7 +231,7 @@ static void adduse(VarTableEntry *vte, AST *use, AST *whole) {
if(!rd) return;
for(size_t d = 0; d < rd->defCount; d++) {
UseDef *ud = malloc(sizeof(*ud));
UseDef *ud = calloc(1, sizeof(*ud));
ud->def = rd->defs[d];
ud->use = use;
ud->stmt = whole;
@ -277,7 +277,7 @@ static void mergedefs(VarTableEntry *vte) {
static void pushdefs(VarTableEntry *vte) {
assert(vte->kind == VARTABLEENTRY_VAR);
ReachingDefs *rdefs = malloc(sizeof(*rdefs));
ReachingDefs *rdefs = calloc(1, sizeof(*rdefs));
rdefs->defCount = 0;
rdefs->defs = NULL;
rdefs->excludeParent = 0;
@ -455,6 +455,20 @@ char *type_to_string(Type *t) {
return r;
} else if(t->type == TYPE_TYPE_RECORD) {
return malp("%s", t->record.name);
} else if(t->type == TYPE_TYPE_GENERIC) {
return malp("%s", t->generic.paramName);
} else if(t->type == TYPE_TYPE_ARRAY) {
char *of = type_to_string(t->array.of);
char *len = NULL;
if(t->array.lengthIsGeneric) {
len = malp("");
} else {
len = malp("%i", t->array.length);
}
char *r = malp("%s[%s]", of, len);
free(of);
free(len);
return r;
}
return strdup("@unimp");
@ -552,6 +566,11 @@ static char *ast_dumpe(AST *e) {
} else if(e->nodeKind == AST_EXPR_FUNC) {
char *out = NULL;
if(type_is_generic(e->expression.type)) {
out = malp("(generic)");
return out;
}
{
char *rettype = type_to_string(e->expression.type->function.ret);
out = malp("%s(", rettype);
@ -660,6 +679,8 @@ static char *ast_dumps(AST *s) {
}
return malp("%s; /* loop guard */\n", name);
} else if(s->nodeKind == AST_STMT_EXPR) {
return ast_dumpe(s->stmtExpr.expr);
} else if(s->nodeKind == AST_STMT_RETURN) {
if(s->stmtReturn.val) {
char *e = ast_dumpe(s->stmtReturn.val);
@ -677,7 +698,7 @@ static char *ast_dumps(AST *s) {
char *ast_dump(AST *tlc) {
AST *stmt = tlc->chunk.statementFirst;
char *ret = NULL;
char *ret = strdup("");
#define CAT(s) do { char *b = s; ret = cat(ret, (b)); free(b); } while(0)
@ -712,15 +733,15 @@ AST *ast_cast_expr(AST *what, Type *to) {
/* Only exists at parse-time, hence not part of type system and is handled separately */
if(what->nodeKind == AST_EXPR_STRING_LITERAL) {
if(to->type == TYPE_TYPE_ARRAY && type_equal(primitive_parse("u8"), to->array.of) && to->array.length == what->exprStrLit.length) {
ASTExprArray *ret = malloc(sizeof(*ret));
ASTExprArray *ret = calloc(1, sizeof(*ret));
ret->nodeKind = AST_EXPR_ARRAY;
ret->items = malloc(sizeof(*ret->items) * to->array.length);
ret->items = calloc(to->array.length, sizeof(*ret->items));
ret->type = to;
for(int i = 0; i < to->array.length; i++) {
uint8_t bajt = what->exprStrLit.data[i];
ASTExprPrimitive *item = malloc(sizeof(*item));
ASTExprPrimitive *item = calloc(1, sizeof(*item));
item->nodeKind = AST_EXPR_PRIMITIVE;
item->type = to->array.of;
item->val = bajt;
@ -731,10 +752,10 @@ AST *ast_cast_expr(AST *what, Type *to) {
return (AST*) ret;
} else if(to->type == TYPE_TYPE_PRIMITIVE) {
if(to->primitive.width != what->exprStrLit.length * 8) {
stahp(0, 0, "Size mismatch between string literal and target type");
stahp_node(what, "Size mismatch between string literal and target type");
}
ASTExprPrimitive *ret = malloc(sizeof(*ret));
ASTExprPrimitive *ret = calloc(1, sizeof(*ret));
ret->nodeKind = AST_EXPR_PRIMITIVE;
ret->type = to;
memcpy(&ret->val, what->exprStrLit.data, sizeof(ret->val));
@ -742,6 +763,12 @@ AST *ast_cast_expr(AST *what, Type *to) {
} else abort();
}
// Make sure an unparametrized generic int parameter hasn't sneaked its way in
while(what->nodeKind == AST_EXPR_VAR && what->exprVar.thing->kind == VARTABLEENTRY_CEXPR && what->exprVar.thing->data.cexpr.concrete) {
what = what->exprVar.thing->data.cexpr.concrete;
}
assert(!(what->nodeKind == AST_EXPR_VAR && what->exprVar.thing->kind == VARTABLEENTRY_CEXPR));
if(type_equal(what->expression.type, to)) return what;
if(!type_is_castable(what->expression.type, to)) {
@ -749,7 +776,7 @@ AST *ast_cast_expr(AST *what, Type *to) {
}
if(what->nodeKind == AST_EXPR_PRIMITIVE && (to->type == TYPE_TYPE_PRIMITIVE || to->type == TYPE_TYPE_POINTER)) {
ASTExprPrimitive *ret = malloc(sizeof(*ret));
ASTExprPrimitive *ret = calloc(1, sizeof(*ret));
ret->nodeKind = AST_EXPR_PRIMITIVE;
ret->type = to;
@ -761,7 +788,7 @@ AST *ast_cast_expr(AST *what, Type *to) {
return (AST*) ret;
} else {
ASTExprCast *ret = malloc(sizeof(*ret));
ASTExprCast *ret = calloc(1, sizeof(*ret));
ret->nodeKind = AST_EXPR_CAST;
ret->type = to;
ret->what = what;
@ -770,7 +797,7 @@ AST *ast_cast_expr(AST *what, Type *to) {
}
fail:
stahp(0, 0, "Cannot cast type %s into %s", type_to_string(what->expression.type), type_to_string(to));
stahp_node(what, "Cannot cast type %s into %s", type_to_string(what->expression.type), type_to_string(to));
}
struct Spill2StackState {
@ -795,23 +822,23 @@ static void spill2stack_visitor(AST **aptr, AST *stmt, AST *stmtPrev, AST *chunk
if(a->exprVar.thing == this->target) {
// DO THE SPILL
ASTExprStackPointer *rsp = malloc(sizeof(*rsp));
ASTExprStackPointer *rsp = calloc(1, sizeof(*rsp));
rsp->nodeKind = AST_EXPR_STACK_POINTER;
rsp->type = primitive_parse("u32");
ASTExprPrimitive *offset = malloc(sizeof(*offset));
ASTExprPrimitive *offset = calloc(1, sizeof(*offset));
offset->nodeKind = AST_EXPR_PRIMITIVE;
offset->type = rsp->type;
offset->val = -this->stackGrowth; // This will be affected by the other part of this pass, so we must reverse
ASTExprBinaryOp *bop = malloc(sizeof(*bop));
ASTExprBinaryOp *bop = calloc(1, sizeof(*bop));
bop->nodeKind = AST_EXPR_BINARY_OP;
bop->type = rsp->type;
bop->operator = BINOP_ADD;
bop->operands[0] = (AST*) rsp;
bop->operands[1] = (AST*) offset;
ASTExprUnaryOp *deref = malloc(sizeof(*deref));
ASTExprUnaryOp *deref = calloc(1, sizeof(*deref));
deref->nodeKind = AST_EXPR_UNARY_OP;
deref->type = a->expression.type;
deref->operator = UNOP_DEREF;
@ -829,8 +856,11 @@ static void spill2stack_visitor(AST **aptr, AST *stmt, AST *stmtPrev, AST *chunk
}
void ast_spill_to_stack(AST *tlc, VarTableEntry *vte) {
assert(vte != NULL);
assert(tlc->nodeKind == AST_CHUNK);
assert(vte != NULL);
assert(vte->kind == VARTABLEENTRY_VAR);
fprintf(stderr, "Spilling %s to stack...\n", vte->data.var.name);
struct Spill2StackState state;
memset(&state, 0, sizeof(state));
@ -838,5 +868,36 @@ void ast_spill_to_stack(AST *tlc, VarTableEntry *vte) {
state.targetTLC = tlc;
state.stackGrowth = (type_size(vte->type) + 7) & ~7;
generic_visitor(&tlc, NULL, NULL, tlc, tlc, &state, spill2stack_visitor);
generic_visitor(&tlc, NULL, NULL, tlc, tlc, &state, spill2stack_visitor, NULL);
}
static void typecheck_visitor(AST **aptr, AST *stmt, AST *stmtPrev, AST *chunk, AST *tlc, void *ud) {
AST *a = *aptr;
if(a->nodeKind == AST_EXPR_CALL) {
if(a->exprCall.what->expression.type != TYPE_TYPE_FUNCTION) {
stahp_node(a, "Only function types may be called.");
}
} else if(a->nodeKind == AST_EXPR_BINARY_OP) {
if(!type_is_number(a->exprBinOp.operands[0]) || !type_is_number(a->exprBinOp.operands[1])) {
stahp_node(a, "Operands must be numbers.");
}
if(type_size(a->exprBinOp.operands[0]->expression.type) < type_size(a->exprBinOp.operands[1]->expression.type)) {
a->exprBinOp.operands[0] = ast_cast_expr(a->exprBinOp.operands[0], a->exprBinOp.operands[1]->expression.type);
}
if(type_size(a->exprBinOp.operands[1]->expression.type) < type_size(a->exprBinOp.operands[0]->expression.type)) {
a->exprBinOp.operands[1] = ast_cast_expr(a->exprBinOp.operands[1], a->exprBinOp.operands[0]->expression.type);
}
if(!a->exprBinOp.type) {
a->exprBinOp.type = a->exprBinOp.operands[0]->expression.type;
}
} else if(a->nodeKind == AST_EXPR_UNARY_OP) {
}
}
void ast_type_check(AST *tlc, VarTableEntry *vte) {
generic_visitor(&tlc, NULL, NULL, tlc, tlc, NULL, NULL, typecheck_visitor);
}

View File

@ -111,6 +111,12 @@ union AST;
typedef struct {
ASTKind nodeKind;
uint16_t row;
uint16_t col;
} ASTBase;
typedef struct {
ASTBase;
Type *type;
} ASTExpr;
@ -163,10 +169,15 @@ typedef struct {
ASTExpr;
union AST *chunk;
// Necessary for when the parser jumps to a generic function
Token *rangeTokens;
size_t startTokI;
size_t endTokI;
} ASTExprFunc;
typedef struct {
ASTKind nodeKind;
ASTBase;
size_t size;
} ASTType;
@ -185,7 +196,7 @@ typedef struct {
} ASTTypePointer;
typedef struct {
ASTKind nodeKind;
ASTBase;
union AST *next;
} ASTStmt;
@ -198,12 +209,16 @@ typedef struct {
} ASTStmtDecl;
typedef struct {
ASTKind nodeKind;
ASTBase;
/* Flattened variable array for global register allocation */
size_t varCount;
VarTableEntry **vars;
/* extern symbol array */
size_t externCount;
VarTableEntry **externs;
union AST *statementFirst;
union AST *statementLast;
@ -301,7 +316,7 @@ typedef struct {
} ASTStmtReturn;
typedef union AST {
ASTKind nodeKind;
ASTBase;
ASTChunk chunk;
ASTStmt statement;
@ -332,9 +347,9 @@ typedef union AST {
#pragma pack(pop)
void generic_visitor(AST **nptr, AST *stmt, AST *stmtPrev, AST *chu, AST *tlc, void *ud, void(*handler)(AST**, AST*, AST*, AST*, AST*, void*));
typedef void(*GenericVisitorHandler)(AST**, AST*, AST*, AST*, AST*, void*);
void generic_visitor(AST **nptr, AST *stmt, AST *stmtPrev, AST *chu, AST *tlc, void *ud, GenericVisitorHandler preHandler, GenericVisitorHandler postHandler);
AST *ast_expression_optimize(AST*);
int ast_expression_equal(AST*, AST*);
int ast_stmt_is_after(const AST *chunk, const AST *s1, const AST *s2);
@ -349,6 +364,8 @@ AST *ast_cast_expr(AST *what, Type *to);
void ast_spill_to_stack(AST *tlc, VarTableEntry *vte);
void ast_typecheck(AST *tlc);
__attribute__((format(printf, 1, 2))) char *malp(const char *fmt, ...);
#endif

View File

@ -120,6 +120,10 @@ static AST *is_field_access(AST *e) {
return e;
}
if(e->nodeKind == AST_EXPR_BINARY_OP && e->exprBinOp.operator == BINOP_ADD && e->exprBinOp.operands[1]->nodeKind == AST_EXPR_PRIMITIVE && e->exprBinOp.operands[0]->nodeKind == AST_EXPR_VAR && e->exprBinOp.operands[0]->exprVar.thing->kind == VARTABLEENTRY_VAR) {
return e;
}
return NULL;
}
@ -131,7 +135,8 @@ static const char *xop_sz(AST *tlc, AST *e, int sz) {
char *ret = bufs[bufidx];
if(e->nodeKind == AST_EXPR_CAST && e->exprCast.what->expression.type->type == TYPE_TYPE_POINTER && e->exprCast.to->type == TYPE_TYPE_POINTER) {
//if(e->nodeKind == AST_EXPR_CAST && e->exprCast.what->expression.type->type == TYPE_TYPE_POINTER && e->exprCast.to->type == TYPE_TYPE_POINTER) {
if(e->nodeKind == AST_EXPR_CAST) {
e = e->exprCast.what;
}
@ -154,10 +159,22 @@ static const char *xop_sz(AST *tlc, AST *e, int sz) {
xv_sz(p->exprBinOp.operands[1]->exprVar.thing, 4));
} else if(is_field_access(e)) {
e = is_field_access(e);
snprintf(ret, XOPBUFSZ, "%s [%s + %i]",
spec(sz),
e->exprBinOp.operands[0]->exprUnOp.operand->exprVar.thing->data.symbol.name,
e->exprBinOp.operands[1]->exprPrim.val);
if(e->exprBinOp.operands[0]->nodeKind == AST_EXPR_UNARY_OP) {
assert(e->exprBinOp.operands[0]->exprUnOp.operator == UNOP_REF);
snprintf(ret, XOPBUFSZ, "%s [%s + %i]",
spec(sz),
e->exprBinOp.operands[0]->exprUnOp.operand->exprVar.thing->data.symbol.name,
e->exprBinOp.operands[1]->exprPrim.val);
} else {
assert(e->exprBinOp.operands[0]->nodeKind == AST_EXPR_VAR);
snprintf(ret, XOPBUFSZ, "%s [%s + %i]",
spec(sz),
regs[e->exprBinOp.operands[0]->exprVar.thing->data.var.color][2],
e->exprBinOp.operands[1]->exprPrim.val);
}
} else if(p->nodeKind == AST_EXPR_BINARY_OP && p->exprBinOp.operator == BINOP_ADD && p->exprBinOp.operands[0]->nodeKind == AST_EXPR_UNARY_OP && p->exprBinOp.operands[1]->nodeKind == AST_EXPR_BINARY_OP && p->exprBinOp.operands[0]->exprUnOp.operator == UNOP_REF && p->exprBinOp.operands[0]->exprUnOp.operand->nodeKind == AST_EXPR_VAR && p->exprBinOp.operands[0]->exprUnOp.operand->exprVar.thing->kind == VARTABLEENTRY_SYMBOL && p->exprBinOp.operands[1]->exprBinOp.operator == BINOP_MUL && p->exprBinOp.operands[1]->exprBinOp.operands[1]->nodeKind == AST_EXPR_VAR && p->exprBinOp.operands[1]->exprBinOp.operands[0]->nodeKind == AST_EXPR_PRIMITIVE && p->exprBinOp.operands[1]->exprBinOp.operands[1]->exprVar.thing->kind == VARTABLEENTRY_VAR) {
snprintf(ret, XOPBUFSZ, "%s [%s + %i * %s]",
spec(sz),
@ -238,7 +255,9 @@ void cg_chunk(CGState *cg, AST *a) {
VarTableEntry *v = s->stmtDecl.thing;
if(v->data.symbol.isExternal) {
printf("extern %s\n", v->data.symbol.name);
// Do nothing.
// All external symbols are handled at once in the top-level chunk.
//printf("extern %s\n", v->data.symbol.name);
} else {
if(!v->data.symbol.isLocal) {
printf("global %s\n", v->data.symbol.name);
@ -265,9 +284,14 @@ void cg_chunk(CGState *cg, AST *a) {
assert(s->stmtDecl.expression->nodeKind == AST_EXPR_FUNC);
dumben_go(s->stmtDecl.expression->exprFunc.chunk);
while(!cg_go(s->stmtDecl.expression->exprFunc.chunk)) {
// Generic functions have non-NULL code blocks
if(!type_is_generic(s->stmtDecl.expression->expression.type)) {
dumben_go(s->stmtDecl.expression->exprFunc.chunk);
while(!cg_go(s->stmtDecl.expression->exprFunc.chunk)) {
dumben_go(s->stmtDecl.expression->exprFunc.chunk);
}
}
} else abort();
@ -375,7 +399,7 @@ void cg_chunk(CGState *cg, AST *a) {
printf("neg %s\n", xop(cg->tlc, s->stmtAssign.what));
} else if(is_xop(s->stmtAssign.what) && s->stmtAssign.to->nodeKind == AST_EXPR_CAST) {
} else if(is_xop(s->stmtAssign.what) && s->stmtAssign.to->nodeKind == AST_EXPR_CAST && type_size(s->stmtAssign.what->expression.type) > type_size(s->stmtAssign.to->expression.type)) {
printf("movzx %s, %s\n", xop(cg->tlc, s->stmtAssign.what), xop(cg->tlc, s->stmtAssign.to->exprCast.what));
@ -448,7 +472,15 @@ void cg_chunk(CGState *cg, AST *a) {
/* Loop guard, probably. */
} else abort();
} else if(s->nodeKind == AST_STMT_EXPR && s->stmtExpr.expr->nodeKind == AST_EXPR_UNARY_OP && s->stmtExpr.expr->exprUnOp.operator == UNOP_DEREF && s->stmtExpr.expr->exprUnOp.operand->nodeKind == AST_EXPR_BINARY_OP && s->stmtExpr.expr->exprUnOp.operand->exprBinOp.operator == BINOP_ADD && s->stmtExpr.expr->exprUnOp.operand->exprBinOp.operands[0]->nodeKind == AST_EXPR_STACK_POINTER && s->stmtExpr.expr->exprUnOp.operand->exprBinOp.operands[1]->nodeKind == AST_EXPR_PRIMITIVE) {
/* Loop guard for a spilled variable, probably. */
} else {
stahp_node(s, "Unknown statement caught by code generator.");
}
s = s->statement.next;
}
@ -801,7 +833,7 @@ static void callee_saved(AST *tlc) {
state.targetTLC = tlc;
state.ebxuser = ebxuser;
generic_visitor(&tlc, NULL, NULL, tlc, tlc, &state, callee_saved_visitor);
generic_visitor(&tlc, NULL, NULL, tlc, tlc, &state, callee_saved_visitor, NULL);
}
}
@ -812,6 +844,15 @@ static int comparator(const void *A, const void *B) {
return ((*a)->data.var.degree * (*a)->data.var.priority) - ((*b)->data.var.degree * (*b)->data.var.priority);
}
int cg_go(AST *a) {
assert(a->nodeKind == AST_CHUNK);
for(size_t e = 0; e < a->chunk.externCount; e++) {
assert(a->chunk.externs[e]->kind == VARTABLEENTRY_SYMBOL);
assert(a->chunk.externs[e]->data.symbol.isExternal);
printf("extern %s\n", a->chunk.externs[e]->data.symbol.name);
}
ast_usedef_reset(a);
size_t adjCount = 0;

View File

@ -42,13 +42,13 @@ static AST *varify(AST *tlc, AST *chunk, AST *stmtPrev, AST *stmt, AST *e) {
ASTExprVar *ev[2];
for(int i = 0; i < 2; i++) {
ev[i] = malloc(sizeof(ASTExprVar));
ev[i] = calloc(1, sizeof(ASTExprVar));
ev[i]->nodeKind = AST_EXPR_VAR;
ev[i]->type = e->expression.type;
ev[i]->thing = vte;
}
ASTStmtAssign *assign = malloc(sizeof(*assign));
ASTStmtAssign *assign = calloc(1, sizeof(*assign));
assign->nodeKind = AST_STMT_ASSIGN;
assign->what = (AST*) ev[0];
assign->to = e;
@ -187,6 +187,12 @@ static void dumben_visitor(AST **nptr, AST *stmt, AST *stmtPrev, AST *chu, AST *
s->stmtAssign.to = varify(tlc, chu, stmtPrev, s, s->stmtAssign.to);
this->effective = 1;
} else if(s->stmtAssign.what->nodeKind == AST_EXPR_UNARY_OP && s->stmtAssign.what->exprUnOp.operator == UNOP_DEREF && !is_xop(s->stmtAssign.what)) {
s->stmtAssign.what->exprUnOp.operand = varify(tlc, chu, stmtPrev, s, s->stmtAssign.what->exprUnOp.operand);
this->effective = 1;
} else if(s->stmtAssign.what && s->stmtAssign.what->nodeKind == AST_EXPR_VAR && s->stmtAssign.what->exprVar.thing->kind == VARTABLEENTRY_VAR && s->stmtAssign.to->nodeKind == AST_EXPR_CALL) {
ASTExprCall *call = &s->stmtAssign.to->exprCall;
@ -215,7 +221,7 @@ static void dumben_visitor(AST **nptr, AST *stmt, AST *stmtPrev, AST *chu, AST *
s->stmtAssign.to = negation->exprUnOp.operand;
negation->exprUnOp.operand = ev[0];
AST *assign2 = malloc(sizeof(ASTStmtAssign));
AST *assign2 = calloc(1, sizeof(ASTStmtAssign));
assign2->nodeKind = AST_STMT_ASSIGN;
assign2->stmtAssign.what = ev[1];
assign2->stmtAssign.to = negation;
@ -291,7 +297,7 @@ static void dumben_visitor(AST **nptr, AST *stmt, AST *stmtPrev, AST *chu, AST *
// a = b
// a = a op c
AST *assign2 = malloc(sizeof(ASTStmtAssign));
AST *assign2 = calloc(1, sizeof(ASTStmtAssign));
assign2->nodeKind = AST_STMT_ASSIGN;
assign2->stmtAssign.what = ast_deep_copy(s->stmtAssign.what);
assign2->stmtAssign.to = s->stmtAssign.to->exprBinOp.operands[0];
@ -458,13 +464,19 @@ static void denoop_visitor(AST **nptr, AST *stmt, AST *stmtPrev, AST *chunk, AST
*nptr = n->exprBinOp.operands[0];
*success = true;
} else if(n->nodeKind == AST_EXPR_CAST && n->exprCast.what->expression.type->type == TYPE_TYPE_POINTER && n->exprCast.to->type == TYPE_TYPE_POINTER) {
// Turn (x as A*) into x, since all pointer types are identical in Nectar's AST
*nptr = n->exprCast.what;
*success = true;
}
}
void dumben_pre(AST *tlc) {
generic_visitor(&tlc, NULL, NULL, tlc, tlc, NULL, pre_dumb_visitor);
generic_visitor(&tlc, NULL, NULL, tlc, tlc, NULL, decompose_symbol_record_field_access);
generic_visitor(&tlc, NULL, NULL, tlc, tlc, NULL, pre_dumb_visitor, NULL);
generic_visitor(&tlc, NULL, NULL, tlc, tlc, NULL, decompose_symbol_record_field_access, NULL);
for(size_t t = 0; t < tlc->chunk.varCount; t++) {
if(tlc->chunk.vars[t]->type->type == TYPE_TYPE_RECORD) {
@ -475,7 +487,7 @@ void dumben_pre(AST *tlc) {
bool success;
do {
success = false;
generic_visitor(&tlc, NULL, NULL, tlc, tlc, &success, denoop_visitor);
generic_visitor(&tlc, NULL, NULL, tlc, tlc, &success, denoop_visitor, NULL);
} while(success);
}
@ -492,7 +504,7 @@ void dumben_go(AST* tlc) {
struct DumbenState state;
memset(&state, 0, sizeof(state));
generic_visitor(&tlc, NULL, NULL, tlc, tlc, &state, dumben_visitor);
generic_visitor(&tlc, NULL, NULL, tlc, tlc, &state, dumben_visitor, NULL);
int successful = state.effective;

View File

@ -48,6 +48,7 @@ char *TOKEN_NAMES[] = {
"'record'",
"'.'",
"'as'",
"'use'",
};
static int isAlpha(int c) {
@ -290,6 +291,10 @@ Token nct_tokenize(FILE *f) {
free(content);
tok.type = TOKEN_AS;
return tok;
} else if(!strcmp(content, "use")) {
free(content);
tok.type = TOKEN_USE;
return tok;
}
tok.type = TOKEN_IDENTIFIER;
@ -319,7 +324,7 @@ Token nct_tokenize(FILE *f) {
while(c = nextc(f), (isNum(c) || (base > 10 && c >= 'A' && c < ('A' + base - 10)))) {
if(i == 31) {
stahp(1, 6180, "Numbers have a maximum size of 31.");
stahp_token(&tok, "Numbers have a maximum size of 31.");
}
content[i++] = c;

View File

@ -48,9 +48,10 @@ typedef enum {
TOKEN_RECORD,
TOKEN_DOT,
TOKEN_AS,
TOKEN_USE,
} TokenKind;
typedef struct {
typedef struct Token {
TokenKind type;
int row, column;

File diff suppressed because it is too large Load Diff

View File

@ -3,15 +3,43 @@
#include<stdarg.h>
#include<stdio.h>
#include<stdlib.h>
#include"ast.h"
#include"lexer.h"
static void stahp_va(int row, int column, const char *error, va_list l) {
fprintf(stderr, "error %i:%i: ", row, column);
vfprintf(stderr, error, l);
fputc('\n', stderr);
}
/* Abort immediately on first error (for now) */
void stahp(int row, int column, const char *error, ...) {
va_list l;
va_start(l, error);
fprintf(stderr, "error %i:%i: ", row, column);
vfprintf(stderr, error, l);
fputc('\n', stderr);
stahp_va(row, column, error, l);
va_end(l);
exit(1);
}
void stahp_node(union AST *node, const char *error, ...) {
va_list l;
va_start(l, error);
stahp_va(node->row, node->col, error, l);
va_end(l);
exit(1);
}
void stahp_token(struct Token *tok, const char *error, ...) {
va_list l;
va_start(l, error);
stahp_va(tok->row, tok->column, error, l);
va_end(l);

View File

@ -5,6 +5,11 @@
#define __attribute__(x)
#endif
union AST;
struct Token;
void __attribute__((noreturn)) stahp(int, int, const char*, ...);
void __attribute__((noreturn)) stahp_node(union AST*, const char*, ...);
void __attribute__((noreturn)) stahp_token(struct Token*, const char*, ...);
#endif

View File

@ -4,6 +4,8 @@
#include<stdlib.h>
#include<string.h>
#include<stdint.h>
#include"ast.h"
#include"reporting.h"
#include"ntc.h"
@ -20,7 +22,7 @@ Type *primitive_parse(const char *src) {
}
}
TypePrimitive *ret = malloc(sizeof(*ret));
TypePrimitive *ret = calloc(1, sizeof(*ret));
ret->type = TYPE_TYPE_PRIMITIVE;
ret->src = strdup(src);
@ -100,7 +102,16 @@ size_t type_size(Type *t) {
} else if(t->type == TYPE_TYPE_ARRAY) {
return type_size(t->array.of) * t->array.length;
} else if(t->type == TYPE_TYPE_RECORD) {
return t->record.size;
size_t max = 0;
for(size_t f = 0; f < t->record.fieldCount; f++) {
size_t end = t->record.fieldOffsets[f] + type_size(t->record.fieldTypes[f]);
if(max < end) {
max = end;
}
}
return max;
}
abort();
@ -123,7 +134,19 @@ int type_equal(Type *O, Type *T) {
} else if(O->type == TYPE_TYPE_POINTER) {
return type_equal(O->pointer.of, T->pointer.of);
} else if(O->type == TYPE_TYPE_ARRAY) {
return type_equal(O->array.of, T->array.of) && O->array.length == T->array.length;
if(!type_equal(O->array.of, T->array.of)) {
return 0;
}
if(O->array.lengthIsGeneric != T->array.lengthIsGeneric) {
return 0;
}
if(O->array.lengthIsGeneric) {
return O->array.lengthGenericParamIdx == T->array.lengthGenericParamIdx && !strcmp(O->array.lengthGenericParamName, T->array.lengthGenericParamName);
} else {
return O->array.length == T->array.length;
}
} else if(O->type == TYPE_TYPE_FUNCTION) {
if(!type_equal(O->function.ret, T->function.ret)) {
return 0;
@ -151,7 +174,7 @@ int type_equal(Type *O, Type *T) {
/* TODO: cache */
Type *type_pointer_wrap(Type *t) {
TypePointer *ret = malloc(sizeof(*ret));
TypePointer *ret = calloc(1, sizeof(*ret));
ret->type = TYPE_TYPE_POINTER;
ret->of = t;
return (Type*) ret;
@ -182,3 +205,180 @@ int type_is_castable(Type *from, Type *to) {
return 0;
}
bool type_is_generic(Type *t) {
if(t->type == TYPE_TYPE_GENERIC) {
return true;
} else if(t->type == TYPE_TYPE_FUNCTION) {
if(type_is_generic(t->function.ret)) {
return true;
}
for(int i = 0; i < t->function.argCount; i++) {
if(type_is_generic(t->function.args[i])) {
return true;
}
}
} else if(t->type == TYPE_TYPE_RECORD) {
for(int i = 0; i < t->record.fieldCount; i++) {
if(type_is_generic(t->record.fieldTypes[i])) {
return true;
}
}
} else if(t->type == TYPE_TYPE_POINTER) {
return type_is_generic(t->pointer.of);
} else if(t->type == TYPE_TYPE_ARRAY) {
return type_is_generic(t->array.of) || t->array.lengthIsGeneric;
}
return false;
}
static void *parametrization_get_by_index(Parametrization *list, size_t idx) {
for(size_t i = 0; list && i < idx; i++) {
list = list->next;
}
return list ? list->param : NULL;
}
static void parametrization_set_by_index(Parametrization **list, size_t idx, void *param) {
if(*list == NULL) {
*list = calloc(1, sizeof(Parametrization));
}
for(size_t i = 1; i <= idx; i++) {
if((*list)->next == NULL) {
(*list)->next = calloc(1, sizeof(Parametrization));
}
list = &(*list)->next;
}
(*list)->param = param;
}
Type *type_parametrize(Type *t, Parametrizations *parametrizations, Parametrizations *renames) {
if(t->type == TYPE_TYPE_RECORD) {
t = type_shallow_copy(t);
for(size_t f = 0; f < t->record.fieldCount; f++) {
t->record.fieldTypes[f] = type_parametrize(t->record.fieldTypes[f], parametrizations, renames);
}
if(!type_is_generic(t)) {
// Now that everything is concrete we may set the field offsets
size_t nextOffset = 0;
for(size_t f = 0; f < t->record.fieldCount; f++) {
t->record.fieldOffsets[f] = nextOffset;
nextOffset += type_size(t->record.fieldTypes[f]);
}
}
} else if(t->type == TYPE_TYPE_FUNCTION) {
t = type_shallow_copy(t);
t->function.ret = type_parametrize(t->function.ret, parametrizations, renames);
for(size_t i = 0; i < t->function.argCount; i++) {
t->function.args[i] = type_parametrize(t->function.args[i], parametrizations, renames);
}
} else if(t->type == TYPE_TYPE_GENERIC) {
Type *newt = parametrization_get_by_index(parametrizations->typeParams, t->generic.paramIdx);
if(renames) {
parametrization_set_by_index(&renames->typeParams, t->generic.paramIdx, t);
}
if(newt) {
return newt;
}
} else if(t->type == TYPE_TYPE_POINTER) {
t = type_shallow_copy(t);
t->pointer.of = type_parametrize(t->pointer.of, parametrizations, renames);
} else if(t->type == TYPE_TYPE_ARRAY) {
t = type_shallow_copy(t);
t->array.of = type_parametrize(t->array.of, parametrizations, renames);
if(t->array.lengthIsGeneric) {
AST *n = parametrization_get_by_index(parametrizations->intParams, t->array.lengthGenericParamIdx);
if(n) {
while(n->nodeKind == AST_EXPR_VAR && n->exprVar.thing->kind == VARTABLEENTRY_CEXPR && n->exprVar.thing->data.cexpr.concrete) {
n = n->exprVar.thing->data.cexpr.concrete;
}
if(n->nodeKind == AST_EXPR_PRIMITIVE) {
t->array.length = n->exprPrim.val;
t->array.lengthIsGeneric = false;
} else if(n->nodeKind == AST_EXPR_VAR && n->exprVar.thing->kind == VARTABLEENTRY_CEXPR) {
t->array.lengthGenericParamIdx = n->exprVar.thing->data.cexpr.paramIdx;
t->array.lengthGenericParamName = n->exprVar.thing->data.cexpr.paramName;
t->array.lengthIsGeneric = true;
} else {
stahp_node(n, "Invalid parametrization expression.");
}
}
parametrization_set_by_index(&renames->intParams, t->array.lengthGenericParamIdx, t->array.lengthGenericParamName);
}
}
return t;
}
Type *type_shallow_copy(Type *t) {
if(t->type == TYPE_TYPE_PRIMITIVE) {
Type *n = calloc(1, sizeof(TypePrimitive));
memcpy(n, t, sizeof(TypePrimitive));
return n;
} else if(t->type == TYPE_TYPE_POINTER) {
Type *n = calloc(1, sizeof(TypePointer));
memcpy(n, t, sizeof(TypePointer));
return n;
} else if(t->type == TYPE_TYPE_ARRAY) {
Type *n = calloc(1, sizeof(TypeArray));
memcpy(n, t, sizeof(TypeArray));
return n;
} else if(t->type == TYPE_TYPE_GENERIC) {
Type *n = calloc(1, sizeof(TypeGeneric));
memcpy(n, t, sizeof(TypeGeneric));
return n;
} else if(t->type == TYPE_TYPE_FUNCTION) {
Type *n = calloc(1, sizeof(TypeFunction));
n->type = TYPE_TYPE_FUNCTION;
n->function.ret = t->function.ret;
n->function.argCount = t->function.argCount;
n->function.argNames = calloc(n->function.argCount, sizeof(*n->function.argNames));
memcpy(n->function.argNames, t->function.argNames, n->function.argCount * sizeof(*n->function.argNames));
n->function.args = calloc(n->function.argCount, sizeof(*n->function.args));
memcpy(n->function.args, t->function.args, n->function.argCount * sizeof(*n->function.args));
return n;
} else if(t->type == TYPE_TYPE_RECORD) {
Type *n = calloc(1, sizeof(TypeRecord));
n->type = TYPE_TYPE_RECORD;
n->record.name = strdup(t->record.name);
n->record.fieldCount = t->record.fieldCount;
n->record.fieldNames = calloc(n->record.fieldCount, sizeof(*n->record.fieldNames));
memcpy(n->record.fieldNames, t->record.fieldNames, n->record.fieldCount * sizeof(*n->record.fieldNames));
n->record.fieldTypes = calloc(n->record.fieldCount, sizeof(*n->record.fieldTypes));
memcpy(n->record.fieldTypes, t->record.fieldTypes, n->record.fieldCount * sizeof(*n->record.fieldTypes));
n->record.fieldOffsets = calloc(n->record.fieldCount, sizeof(*n->record.fieldOffsets));
memcpy(n->record.fieldOffsets, t->record.fieldOffsets, n->record.fieldCount * sizeof(*n->record.fieldOffsets));
return n;
}
abort();
}

View File

@ -3,9 +3,10 @@
#include<stddef.h>
#include<stdint.h>
#include<stdbool.h>
typedef enum {
TYPE_TYPE_PRIMITIVE, TYPE_TYPE_RECORD, TYPE_TYPE_POINTER, TYPE_TYPE_FUNCTION, TYPE_TYPE_ARRAY, TYPE_TYPE_ERROR
TYPE_TYPE_PRIMITIVE, TYPE_TYPE_RECORD, TYPE_TYPE_POINTER, TYPE_TYPE_FUNCTION, TYPE_TYPE_ARRAY, TYPE_TYPE_GENERIC, TYPE_TYPE_ERROR
} TypeType;
union Type;
@ -48,7 +49,12 @@ typedef struct TypeArray {
TypeType type;
union Type *of;
size_t length; /* 0 means unknown */
intmax_t length;
bool lengthIsGeneric;
char *lengthGenericParamName;
size_t lengthGenericParamIdx;
} TypeArray;
typedef struct TypeRecord {
@ -56,14 +62,19 @@ typedef struct TypeRecord {
char *name;
size_t size;
union Type **fieldTypes;
size_t *fieldOffsets;
char **fieldNames;
size_t fieldCount;
} TypeRecord;
typedef struct TypeGeneric {
TypeType type;
char *paramName;
size_t paramIdx;
} TypeGeneric;
typedef union Type {
TypeType type;
@ -72,6 +83,7 @@ typedef union Type {
TypeFunction function;
TypeArray array;
TypeRecord record;
TypeGeneric generic;
} Type;
extern Type TYPE_ERROR;
@ -90,4 +102,21 @@ int type_is_number(Type *t);
char *type_to_string(Type*);
typedef struct Parametrization {
void *param;
struct Parametrization *next;
} Parametrization;
typedef struct {
Parametrization *typeParams;
Parametrization *intParams;
} Parametrizations;
Type *type_parametrize(Type *target, Parametrizations *parametrizations, Parametrizations *renames);
Type *type_shallow_copy(Type *t);
bool type_is_generic(Type *t);
#endif

View File

@ -2,6 +2,9 @@
#define NCTREF_UTILS_H
#include<stddef.h>
#include<stdbool.h>
#include<errno.h>
#include<stdlib.h>
inline static size_t djb2(const char *str) {
size_t hash = 5381;
@ -14,4 +17,25 @@ inline static size_t djb2(const char *str) {
return hash;
}
inline static bool unstupid_strtol(const char *str, char **endptr, int base, long *result) {
errno = 0;
char *endptr2 = NULL;
*result = strtol(str, &endptr2, base);
if(endptr2 == str) {
return false;
}
if(errno == ERANGE) {
return false;
}
if(endptr) {
*endptr = endptr2;
}
return true;
}
#endif

View File

@ -31,7 +31,7 @@ void reachingdefs_set(struct ReachingDefs *this, union AST *def) {
}
VarTable *vartable_new(VarTable *parent) {
VarTable *ret = malloc(sizeof(*ret));
VarTable *ret = calloc(1, sizeof(*ret));
ret->parent = parent;
ret->count = 0;
ret->names = NULL;
@ -70,7 +70,7 @@ VarTableEntry *vartable_set(VarTable *this, const char *name, VarTableEntry *e)
return e;
}
void vartable_new_reachingdefs_for_all_vars(VarTable *this) {
/*void vartable_new_reachingdefs_for_all_vars(VarTable *this) {
for(size_t i = 0; i < this->count; i++) {
if(this->data[i]->kind == VARTABLEENTRY_VAR) {
this->data[i]->data.var.reachingDefs = reachingdefs_push(this->data[i]->data.var.reachingDefs);
@ -92,11 +92,33 @@ void vartable_coalesce_reachingdefs_for_all_vars(VarTable *this) {
if(this->parent) {
vartable_coalesce_reachingdefs_for_all_vars(this->parent);
}
}*/
VarTable *vartable_merge(VarTable *child) {
VarTable *parent = child->parent;
parent->names = realloc(parent->names, sizeof(*parent->names) * (parent->count + child->count));
parent->data = realloc(parent->data, sizeof(*parent->data) * (parent->count + child->count));
for(size_t i = 0; i < child->count; i++) {
child->data[i]->owner = parent;
parent->names[parent->count] = child->names[i];
parent->data[parent->count] = child->data[i];
parent->count++;
}
free(child->names);
free(child->data);
free(child);
return parent;
}
void vte_precolor(VarTableEntry *vte, int color) {
assert(vte->kind == VARTABLEENTRY_VAR);
assert(!vte->data.var.precolored);
assert(vte->kind == VARTABLEENTRY_VAR && "vte must be var");
assert(!vte->data.var.precolored && "already precolored");
vte->data.var.precolored = true;
vte->data.var.color = color;

View File

@ -4,8 +4,11 @@
#include"types.h"
#include<stdbool.h>
struct Token;
union AST;
typedef enum {
VARTABLEENTRY_SYMBOL, VARTABLEENTRY_VAR, VARTABLEENTRY_TYPE
VARTABLEENTRY_SYMBOL, VARTABLEENTRY_VAR, VARTABLEENTRY_TYPE, VARTABLEENTRY_CEXPR
} VarTableEntryKind;
union AST;
@ -41,6 +44,12 @@ typedef struct VarTableEntry {
char isLocal;
char isExternal;
const char *name;
struct {
struct Token *rangeTokens;
size_t startTokI;
size_t endTokI;
} genfunc;
} symbol;
struct {
// For debugging
@ -67,6 +76,15 @@ typedef struct VarTableEntry {
struct {
Type *ptr;
} type;
struct {
// cexpr is used for expression parametization as opposed to type parametrization
// I don't like the idea of having a special VarTableEntry kind for these, but all other places were worse
const char *paramName;
size_t paramIdx;
// If the cexpr has been parametrized (as opposed to just being a symbol), this field will be non-NULL
union AST *concrete;
} cexpr;
};
} data;
} VarTableEntry;
@ -84,8 +102,10 @@ VarTableEntry *vartable_get(VarTable*, const char*);
VarTableEntry *vartable_find(VarTable*, const char*);
VarTableEntry *vartable_set(VarTable*, const char*, VarTableEntry*);
void vartable_new_reachingdefs_for_all_vars(VarTable*);
void vartable_coalesce_reachingdefs_for_all_vars(VarTable*);
VarTable *vartable_merge(VarTable *child);
//void vartable_new_reachingdefs_for_all_vars(VarTable*);
//void vartable_coalesce_reachingdefs_for_all_vars(VarTable*);
void vte_precolor(VarTableEntry *vte, int color);

View File

@ -35,8 +35,13 @@ static inline int is_xop(AST *e) {
if(c->nodeKind == AST_EXPR_VAR && c->exprVar.thing->kind == VARTABLEENTRY_VAR) {
return XOP_MEM;
} else if(c->nodeKind == AST_EXPR_BINARY_OP && c->exprBinOp.operator == BINOP_ADD && c->exprBinOp.operands[0]->nodeKind == AST_EXPR_UNARY_OP && c->exprBinOp.operands[0]->exprUnOp.operator == UNOP_REF && c->exprBinOp.operands[0]->exprUnOp.operand->nodeKind == AST_EXPR_VAR) {
if(c->exprBinOp.operands[1]->nodeKind == AST_EXPR_VAR) {
} else if(
(c->nodeKind == AST_EXPR_BINARY_OP && c->exprBinOp.operator == BINOP_ADD && c->exprBinOp.operands[0]->nodeKind == AST_EXPR_UNARY_OP && c->exprBinOp.operands[0]->exprUnOp.operator == UNOP_REF && c->exprBinOp.operands[0]->exprUnOp.operand->nodeKind == AST_EXPR_VAR) ||
(c->nodeKind == AST_EXPR_BINARY_OP && c->exprBinOp.operator == BINOP_ADD && c->exprBinOp.operands[0]->nodeKind == AST_EXPR_VAR)) {
if(c->exprBinOp.operands[1]->nodeKind == AST_EXPR_PRIMITIVE) {
return XOP_MEM;
} else if(c->exprBinOp.operands[1]->nodeKind == AST_EXPR_VAR) {
return XOP_MEM;
} else if(c->exprBinOp.operands[1]->nodeKind == AST_EXPR_BINARY_OP && c->exprBinOp.operands[1]->exprBinOp.operator == BINOP_MUL && c->exprBinOp.operands[1]->exprBinOp.operands[0]->nodeKind == AST_EXPR_PRIMITIVE && c->exprBinOp.operands[1]->exprBinOp.operands[1]->nodeKind == AST_EXPR_VAR) {
int scale = c->exprBinOp.operands[1]->exprBinOp.operands[0]->exprPrim.val;

5
tests/Exporter.nct Normal file
View File

@ -0,0 +1,5 @@
u32 SYMBOL: 1234;
record Foo {
u32 gaga;
}

44
tests/GenericStruct.nct Normal file
View File

@ -0,0 +1,44 @@
record StaticList[T, S; capacity] {
S size;
T[capacity] data;
}
StaticList_remove: [T, S; capacity]u0(StaticList[T, S; capacity]* this, S index) -> {
T* data = &((*this).data[index]);
(*this).size = (*this).size - 1;
S sz = (*this).size;
loop {
if(index == sz) {
break;
}
*data = *(data + 1);
data = data + 1;
index = index + 1;
}
return;
};
StaticList_add: [T, S; capacity]u0(StaticList[T, S; capacity]* this, T value) -> {
(*this).data[(*this).size] = value;
(*this).size = (*this).size + 1;
return;
};
@instantiate StaticList_remove[u8, u32; 4];
@instantiate StaticList_add[u8, u32; 4];
/*StaticList[u8, u32; 4] kbbuf;
kbbuf.size = 4;
kbbuf.data[0] = 0;
kbbuf.data[5] = 2;
foo: [T]T(T a, T b) -> {
return a + b;
};
@instantiate foo[u32];
u32 c = foo[u32](5, 3);*/

3
tests/Importer.nct Normal file
View File

@ -0,0 +1,3 @@
use Exporter;
u32 a = Exporter.SYMBOL;

43
tests/UDPEcho.nct Normal file
View File

@ -0,0 +1,43 @@
@section(".text");
record sockaddr {
0: u16 family;
2: u16 port;
4: u32 addr;
8: u32 zero0;
12: u32 zero1;
}
extern u32(u32, u32, u32) socket;
extern u32(u32, sockaddr*, u32) bind;
extern u32(u32, u8*, u32, u32, u8*, u32*) recvfrom;
extern u32(u32, u8*, u32, u32, u8*, u32) sendto;
MY_SOCKET = socket(2, 2, 0);
sockaddr sa;
sa.family = 2; /* AF_INET */
sa.port = 43105; /* 25000 in net-endian */
sa.addr = 0; /* Bind to all */
sa.zero0 = 0;
sa.zero1 = 0;
bind(MY_SOCKET, &sa, 16);
loop {
OTHER_SIDE_LEN = 64;
u32 msglength = recvfrom(MY_SOCKET, &BUFFER, 512, 0, &OTHER_SIDE, &OTHER_SIDE_LEN);
sendto(MY_SOCKET, &BUFFER, msglength, 0, &OTHER_SIDE, OTHER_SIDE_LEN);
}
@section(".data");
u32 MY_SOCKET:;
u32 OTHER_SIDE_LEN:;
u8[64] OTHER_SIDE:;
u8[512] BUFFER:;

16
tests/lvl1localrecord.nct Normal file
View File

@ -0,0 +1,16 @@
record sockaddr {
u32 family;
u16 port;
u32 addr;
u32 zero;
}
@section(".data");
@section(".text");
sockaddr sa;
sa.family = 5;
sa.port = 43105;
sa.addr = 16rC0A80001;
sa.zero = 0;

View File

@ -0,0 +1,17 @@
record sockaddr {
u32 family;
u16 port;
u32 addr;
u32 zero;
}
@section(".data");
sockaddr sa:;
@section(".text");
sa.family = 5;
sa.port = 43105;
sa.addr = 16rC0A80001;
sa.zero = 0;

16
tests/lvl2localrecord.nct Normal file
View File

@ -0,0 +1,16 @@
record A {
u32 x;
u8 z;
}
record B {
0: u32 x;
7: A a;
}
@section(".text");
B rec;
rec.x = 5;
rec.a.x = 10;
rec.a.z = 9;

View File

@ -0,0 +1,19 @@
record A {
u32 x;
u8 z;
}
record B {
u32 x;
A a;
}
@section(".data");
B rec:;
@section(".text");
rec.x = 5;
rec.a.x = 10;
rec.a.z = 9;

View File

@ -1,8 +1,9 @@
u16 a = 12;
u16 b = a & 6;
u16 c = b ^ a | 3;
u16 d = 11 * c;
u16 o = 5;
u16 d = o * 2;
if(a) {
if(a == 0) {
u16 e = b + c + d;
}