As part of a significant course project, I worked in a group of 4 to implement an optimizing compiler for a C-like language Eta which would eventually get extended to become Rho. It compiles into x86_64 and we only got around to supporting the System V calling convention.
We implemented our compiler in Java (although one of my teammates and I were tempted to use Rust) using the CUP parser generator and JFlex lexer generator. The result ended up being over 20,000 lines of code (excluding tests in eta/rho itself).
The project itself was tons of fun and I meant to expand upon it over the following Summer but got a bit caught up with internships and such. We ended up doing fairly well, and although we didn’t podium in the final competition, we did get a high score based on correctness, completion, and style with the original test suite the course staff used for grading.
One thing I think we did well was our tiling implementation. We used
dynamic programming for optimal tiling and a double visitor hierarchy which
essentially boiled down to a form of triple method dispatch so that we can
easily create tile patterns to match subtrees in the IR AST. This is
because we had one set of visitors visiting the pattern tree, another set
of visitors visiting the AST, and a final aggregate visitor visiting both trees
simultaneously. I suppose the challenge really came from striving for utmost
reusability and cleanliness in a language without pattern matching.
We composed patterns
and developed template patterns to easily add new instructions with minimal effort and
without code duplication. One such template pattern matches any
memory expression and converts it to the most optimal x86 addressing mode it can.
So MEM(X + Y * 4)
becomes [X + Y * 4]
while MEM(X + Y * Z)
would have to
become something like
mov _t0, Y
imul _t0, Z
mov ..., [X + _t0]
Then, when building tiles for something like an add, we could construct a pattern tree like:
ADD(
ARG_PATTERN1,
ARG_PATTERN2,
CommutativePattern,
MaxOneMemoryAccess
)
Another part of the compiler I was proud of was our implementation of Chaitin-Briggs register allocation. We could have implemented Linear Scan register allocation, but I thought it would be tons of fun to implement Chaitin-Briggs. So that’s what I did. I very loosely followed the discussion in Professor Appel’s Modern Compiler Implementation in Java. It took a few attempts but after getting it working without Register Spilling, I was quite surprised to discover I managed to add that final part of the algorithm on the first try.
Unfortunately, with 5 hours until the deadline, while running a provided Mandelbrot fractal test program, I discovered that there was some error that we didn’t catch in all our unit tests. We tried going through the 5,000+ lines of assembly our compiler generated and isolating suspicious code into unit tests, but we were unable to find the bug in time. :(
Some things I worked on: