About

Hi, I’m Stephen. I’m mainly interested in programming languages (PL) and compilers, but I’m also fond of computer architecture and graphics. I recently starting working as a compiler engineer on Metal, Apple’s graphics and GPU programming language. I like a broad range of PL-related things with some special interest in compiler technologies and languages for heterogeneous or parallel computing

In school, I was a part of the Capra research group where I created a frontend for Caiman, a research IR and compiler for heterogeneous programming that allows independent specification of device communication, allocations, and values computed to ease design space exploration and optimization.

I also compete in XC mountain bike races (with the occasional road race) and enjoy cooking. I like to think of myself as a mildly competitive amateur XC racer.

Latest Posts

Jan 2, 2025

Unsolicited Advice for (soon-to-be) Undergrads

Having somewhat recently finished my undergrad and MEng, I wanted to take some time to reflect on my educational experience so far. I’ve only experienced a small slice of the potential pathways in the US education system. I will try to generalize, but who knows how that will go. This is a collection of things I would say to my younger self, advice I have given to people before, and things I just feel like should be said. Read more

Nov 30, 2024

Synthesizing Variable Initializations in CPS Programs

I recently added holes in the caiman frontend. These enable a user to only write part of a schedule, and have the compiler determine the details, using user provided specs. A hole is essentially an expression or statement that can stand in for anything the compiler might want to put there. The frontend is responsible for performing type deduction, and part of a type in caiman is a linear representation of whether the type is “usable” (initialized and not used) or “dead” (uninitialized or initialized but consumed). Read more

Jun 3, 2024

Chordal Graph Register Allocation

Register allocation is commonly framed as a vertex coloring problem on an interference graph, where vertices represent virtual registers and edges connect two virtual registers that interfere (belong to the same live-out set of a program point). A classic method, due to Chaitin1, is to do register allocation via graph coloring with a set of heuristics to make the problem more easily computable, as vertex coloring is an NP-Hard problem. Read more

Projects

Caiman

Eta Compiler

Project Oort

All Projects

Spring 2023 View Full Page

As part of a significant course project, I worked in a group of 4 to implement an optimizing compiler for a C-like language Eta which would eventually get extended to become Rho. It compiles into x86_64 and we only got around to supporting the System V calling convention.

We implemented our compiler in Java (although one of my teammates and I were tempted to use Rust) using the CUP parser generator and JFlex lexer generator. The result ended up being over 20,000 lines of code (excluding tests in eta/rho itself).

The project itself was tons of fun and I meant to expand upon it over the following Summer but got a bit caught up with internships and such. We ended up doing fairly well, and although we didn’t podium in the final competition, we did get a high score based on correctness, completion, and style with the original test suite the course staff used for grading.

One thing I think we did well was our tiling implementation. We used dynamic programming for optimal tiling and a double visitor hierarchy which essentially boiled down to a form of triple method dispatch so that we can easily create tile patterns to match subtrees in the IR AST. This is because we had one set of visitors visiting the pattern tree, another set of visitors visiting the AST, and a final aggregate visitor visiting both trees simultaneously. I suppose the challenge really came from striving for utmost reusability and cleanliness in a language without pattern matching. We composed patterns and developed template patterns to easily add new instructions with minimal effort and without code duplication. One such template pattern matches any memory expression and converts it to the most optimal x86 addressing mode it can. So MEM(X + Y * 4) becomes [X + Y * 4] while MEM(X + Y * Z) would have to become something like

mov _t0, Y
imul _t0, Z
mov ..., [X + _t0]

Then, when building tiles for something like an add, we could construct a pattern tree like:

ADD(
    ARG_PATTERN1,
    ARG_PATTERN2,
    CommutativePattern,
    MaxOneMemoryAccess
)

Another part of the compiler I was proud of was our implementation of Chaitin-Briggs register allocation. We could have implemented Linear Scan register allocation, but I thought it would be tons of fun to implement Chaitin-Briggs. So that’s what I did. I very loosely followed the discussion in Professor Appel’s Modern Compiler Implementation in Java. It took a few attempts but after getting it working without Register Spilling, I was quite surprised to discover I managed to add that final part of the algorithm on the first try.

Unfortunately, with 5 hours until the deadline, while running a provided Mandelbrot fractal test program, I discovered that there was some error that we didn’t catch in all our unit tests. We tried going through the 5,000+ lines of assembly our compiler generated and isolating suspicious code into unit tests, but we were unable to find the bug in time. :(

Some things I worked on:

x86 Instruction selection, and optimal tiling
Chaitin-Briggs Register Allocation with move coalescing
conditional copy and constant propagation, dead code elimination, live variables analysis, local value numbering
Syntax-directed IR translation and type-checking
Parser specification

Notable college era projects also include my BRIL compiler passes and GC, Bear Compiler Fuzzer, and Parallel Register Allocator.

Jan 2022 - June 2023 View Full Page

Project Oort is a third-person space shooter game implemented in over 20k+ lines of Rust with OpenGL. You control a starfighter that must battle AI enemies in a zero-gravity asteroid field while managing the ship’s energy and shield.

Your ship is armed with a photon cannon, cloaking, and a gravity tether that allows you to swing from or pull asteroids or other ships.

There’s no deep aim of the game, the goal is to shoot down your enemies without getting shot down yourself. Note that, there is no atmosphere and therefore there is no drag that will slow down your ship. The ship moves at constant velocity unless accelerated or decelerated.

Controls: Left click to fire. Right-click to fire a grappling shot. T to turn invisible. W, S to accelerate forwards or backward, and mouse movement to control the rotation of the ship.

Shield: You have a shield, denoted by the blue number at the top left. If this goes to 0, your ship is destroyed and you will respawn somewhere randomly on the map. Your shield will regenerate slowly over time.

Energy: Denoted by the yellow number in the top left, your energy is necessary to fire lasers and accelerate your ship. Your energy will regenerate slowly over time.

Minimap: In the bottom right, you have a minimap to see where you are relative to lasers and asteroids. The minimap is a 2D projection of 3D space which is “top-down” respective to your camera angle.

Grappling Hook: You can fire a “hook shot” by pressing and holding right-click. Once landed on an object, a tether will be formed between the target object and the shooter. The distance between the shooter and the object will not exceed the distance between them when the tether was first formed. The amount a tethered object moves to ensure this depends on the momentum of each tethered object. To release the tether, release right-click.

Summary

Technical Implementations:

A Forward+ ¹ physically based rendering engine that supports area lights, cascading shadow maps, soft shadows, animated models, and ray-marched volumetrics
A 3-phase collision detection system utilizing an Octree, Bounding Volume Hierarchy of Oriented Bounded Boxes, and triangle intersections parallelized with compute shaders. You can read more about this here.
A behavior tree AI and a modification of A* for pathfinding
Rigid Body Simulation with rotational motion

Graphics Engine

The graphics engine was designed, from the ground up using OpenGL. The general structure is built on a pipeline system of RenderPasses, RenderTargets and TextureProcessors. A RenderTarget is, well, a render target where objects are drawn to and (typically) produces a texture. A TextureProcessor is essentially a function on textures that takes input textures and may produce an output texture. To pass extra state between stages, each stage has read/write access to a pipeline cache. A RenderPass, strings together multiple RenderTargets and TextureProcessors, ordering them at the discretion of the Pipeline, which holds a dependency graph of the stages in the RenderPass. The Pipeline is defined by the user via a custom Rust DSL.

A Scene has a list of Entitys and RenderPasses. Each Entity can define general properties that a RenderPass can access to render the Entity properly, such as its required render order. A final Compositor can compose the results of multiple RenderPasses together.

I have a blog post about how my graphics engine handles transparent objects here. Below is a diagram of the relevant architecture:

DesignDiagram

Collision Detection

I have a very detailed blog post about this here.

Physics Engine and AI

A rigid body simulation is used to handle the physics and collision resolution. Each rigid body is given a manually assigned mass or a manually assigned density. In the latter case, we can compute an estimated mass using the volume of the body’s bounding box. For each rigid body, we estimate an inertial tensor based on the vertices of the Rigid Body’s collision mesh.

Then at each step of the simulation, we determine the magnitude, direction, and point of application of the forces that are applied to each object. We then use the impulse-momentum theorem to compute changes in velocity. We subtract the point of application from the object’s center of mass to estimate a lever arm and compute an applied torque for the object. Using this and the inertial tensor, we compute a rotational angular velocity. I chose not to handle rotational velocity updates quite the same for the user-controlled ships for now, because it made the controls upon colliding with something feel unintuitive (ie. you lost control as the collision would impart a torque, rotating your ship).

For collision resolution, we compute a point of contact by averaging the centroids of colliding triangles and an impact force based on the momentum of colliding bodies.

The basic premise of the grappling hook is that if the two objects connected by the “cable” are too far apart, we essentially update the velocities of both objects by treating the cable being pulled as an elastic collision.

For the AI, a behavior tree is used to control the non-player enemy in the game. The behavior tree is built up of Sequence, Fallback, and ParallelSequence control nodes and custom action nodes for moving and firing. Pathfinding is done using a 3D implementation of A*, by tiling the 3D space into little cubes. Once a path has been computed, a simple local navigator just follows the path in segments of straight lines.

Harada, T., McKee, J., & Yang, J. (2012). Forward+: Bringing Deferred Lighting to the Next Level. In Eurographics 2012 - Short Papers. The Eurographics Association. ↩︎

Source

Stephen Verderame

About

Latest Posts

Projects