Hi, I’m Stephen. I’m mainly interested in programming languages (PL) and compilers, but I’m also fond of computer architecture and graphics. I recently starting working as a compiler engineer on Metal, Apple’s graphics and GPU programming language. I like a broad range of PL-related things with some special interest in compiler technologies and languages for heterogeneous or parallel computing
In school, I was a part of the Capra research group where I created a frontend for Caiman, a research IR and compiler for heterogeneous programming that allows independent specification of device communication, allocations, and values computed to ease design space exploration and optimization.
I also compete in XC mountain bike races (with the occasional road race) and enjoy cooking. I like to think of myself as a mildly competitive amateur XC racer.
Having somewhat recently finished my undergrad and MEng, I wanted to take some time to reflect on my educational experience so far. I’ve only experienced a small slice of the potential pathways in the US education system. I will try to generalize, but who knows how that will go. This is a collection of things I would say to my younger self, advice I have given to people before, and things I just feel like should be said. Read more
I recently added holes in the caiman frontend. These enable a user to only write part of a schedule, and have the compiler determine the details, using user provided specs. A hole is essentially an expression or statement that can stand in for anything the compiler might want to put there. The frontend is responsible for performing type deduction, and part of a type in caiman is a linear representation of whether the type is “usable” (initialized and not used) or “dead” (uninitialized or initialized but consumed). Read more
Register allocation is commonly framed as a vertex coloring problem on an interference graph, where vertices represent virtual registers and edges connect two virtual registers that interfere (belong to the same live-out set of a program point). A classic method, due to Chaitin1, is to do register allocation via graph coloring with a set of heuristics to make the problem more easily computable, as vertex coloring is an NP-Hard problem. Read more
Caiman is a language that simplifies design space exploration and optimization of heterogeneous programs. This is a research project I worked on with Dietrich Geisler and Oliver Daids at Cornell in the CAPRA group led by Adrian Sampson. Image credits go to Dietrich. Read more about it here and here.
As part of a significant course project, I worked in a group of 4 to implement an optimizing compiler for a C-like language Eta which would eventually get extended to become Rho. It compiles into x86_64 and we only got around to supporting the System V calling convention.
We implemented our compiler in Java (although one of my teammates and I were tempted to use Rust) using the CUP parser generator and JFlex lexer generator. The result ended up being over 20,000 lines of code (excluding tests in eta/rho itself).
The project itself was tons of fun and I meant to expand upon it over the following Summer but got a bit caught up with internships and such. We ended up doing fairly well, and although we didn’t podium in the final competition, we did get a high score based on correctness, completion, and style with the original test suite the course staff used for grading.
One thing I think we did well was our tiling implementation. We used
dynamic programming for optimal tiling and a double visitor hierarchy which
essentially boiled down to a form of triple method dispatch so that we can
easily create tile patterns to match subtrees in the IR AST. This is
because we had one set of visitors visiting the pattern tree, another set
of visitors visiting the AST, and a final aggregate visitor visiting both trees
simultaneously. I suppose the challenge really came from striving for utmost
reusability and cleanliness in a language without pattern matching.
We composed patterns
and developed template patterns to easily add new instructions with minimal effort and
without code duplication. One such template pattern matches any
memory expression and converts it to the most optimal x86 addressing mode it can.
So MEM(X + Y * 4)
becomes [X + Y * 4]
while MEM(X + Y * Z)
would have to
become something like
mov _t0, Y
imul _t0, Z
mov ..., [X + _t0]
Then, when building tiles for something like an add, we could construct a pattern tree like:
ADD(
ARG_PATTERN1,
ARG_PATTERN2,
CommutativePattern,
MaxOneMemoryAccess
)
Another part of the compiler I was proud of was our implementation of Chaitin-Briggs register allocation. We could have implemented Linear Scan register allocation, but I thought it would be tons of fun to implement Chaitin-Briggs. So that’s what I did. I very loosely followed the discussion in Professor Appel’s Modern Compiler Implementation in Java. It took a few attempts but after getting it working without Register Spilling, I was quite surprised to discover I managed to add that final part of the algorithm on the first try.
Unfortunately, with 5 hours until the deadline, while running a provided Mandelbrot fractal test program, I discovered that there was some error that we didn’t catch in all our unit tests. We tried going through the 5,000+ lines of assembly our compiler generated and isolating suspicious code into unit tests, but we were unable to find the bug in time. :(
Some things I worked on:
Notable college era projects also include my BRIL compiler passes and GC, Bear Compiler Fuzzer, and Parallel Register Allocator.
Project Oort is a third-person space shooter game implemented in over 20k+ lines of Rust with OpenGL. You control a starfighter that must battle AI enemies in a zero-gravity asteroid field while managing the ship’s energy and shield.
Your ship is armed with a photon cannon, cloaking, and a gravity tether that allows you to swing from or pull asteroids or other ships.
There’s no deep aim of the game, the goal is to shoot down your enemies without getting shot down yourself. Note that, there is no atmosphere and therefore there is no drag that will slow down your ship. The ship moves at constant velocity unless accelerated or decelerated.
Controls: Left click to fire. Right-click to fire a grappling shot.
T
to turn invisible. W
, S
to accelerate forwards or backward,
and mouse movement to control the rotation of the ship.
Shield: You have a shield, denoted by the blue number at the top left. If this goes to 0, your ship is destroyed and you will respawn somewhere randomly on the map. Your shield will regenerate slowly over time.
Energy: Denoted by the yellow number in the top left, your energy is necessary to fire lasers and accelerate your ship. Your energy will regenerate slowly over time.
Minimap: In the bottom right, you have a minimap to see where you are relative to lasers and asteroids. The minimap is a 2D projection of 3D space which is “top-down” respective to your camera angle.
Grappling Hook: You can fire a “hook shot” by pressing and holding right-click. Once landed on an object, a tether will be formed between the target object and the shooter. The distance between the shooter and the object will not exceed the distance between them when the tether was first formed. The amount a tethered object moves to ensure this depends on the momentum of each tethered object. To release the tether, release right-click.
Technical Implementations:
The graphics engine was designed, from the ground up using OpenGL.
The general structure is built on a pipeline system of RenderPass
es,
RenderTarget
s and TextureProcessor
s. A RenderTarget
is, well,
a render target where objects are drawn to and (typically) produces a texture.
A TextureProcessor
is essentially a function on textures that takes input textures
and may produce an output texture. To pass extra state between stages, each stage has
read/write access to a pipeline cache.
A RenderPass
, strings together multiple RenderTarget
s and TextureProcessor
s,
ordering them at the discretion of the Pipeline
, which holds
a dependency graph of the stages in the RenderPass
. The Pipeline
is defined
by the user via a custom Rust DSL.
A Scene
has a list of Entity
s and RenderPass
es. Each Entity
can define
general properties that a RenderPass
can access to render the Entity
properly,
such as its required render order. A final Compositor
can compose the results
of multiple RenderPass
es together.
I have a blog post about how my graphics engine handles transparent objects here. Below is a diagram of the relevant architecture:
I have a very detailed blog post about this here.
A rigid body simulation is used to handle the physics and collision resolution. Each rigid body is given a manually assigned mass or a manually assigned density. In the latter case, we can compute an estimated mass using the volume of the body’s bounding box. For each rigid body, we estimate an inertial tensor based on the vertices of the Rigid Body’s collision mesh.
Then at each step of the simulation, we determine the magnitude, direction, and point of application of the forces that are applied to each object. We then use the impulse-momentum theorem to compute changes in velocity. We subtract the point of application from the object’s center of mass to estimate a lever arm and compute an applied torque for the object. Using this and the inertial tensor, we compute a rotational angular velocity. I chose not to handle rotational velocity updates quite the same for the user-controlled ships for now, because it made the controls upon colliding with something feel unintuitive (ie. you lost control as the collision would impart a torque, rotating your ship).
For collision resolution, we compute a point of contact by averaging the centroids of colliding triangles and an impact force based on the momentum of colliding bodies.
The basic premise of the grappling hook is that if the two objects connected by the “cable” are too far apart, we essentially update the velocities of both objects by treating the cable being pulled as an elastic collision.
For the AI, a behavior tree is used to control the non-player enemy in the game. The behavior tree is built up of Sequence, Fallback, and ParallelSequence control nodes and custom action nodes for moving and firing. Pathfinding is done using a 3D implementation of A*, by tiling the 3D space into little cubes. Once a path has been computed, a simple local navigator just follows the path in segments of straight lines.