Tags: projects, compilers, graph-theory
Register allocation is commonly framed as a vertex coloring problem on an interference graph, where vertices represent virtual registers and edges connect two virtual registers that interfere (belong to the same live-out set of a program point).
A classic method, due to Chaitin1, is to do register allocation via graph coloring with a set of heuristics to make the problem more easily computable, as vertex coloring is an NP-Hard problem.
This does give good results, but it can be too slow for JIT compilers. Furthermore, although I enjoyed implementing Chaitin’s algorithm in my compilers class, some compilers, like LLVM don’t use it. It was also interesting to see that Carnegie Mellon (at least in one iteration of their compiler course) doesn’t teach it, even throwing in this slightly passive-aggressive comment in their lecture notes:2
“Finally, the algorithms needed for coloring chordal graphs are quite easy to implement compared, for example, to the complex algorithm in Appel’s textbook”
I’ll quickly review some notation and concepts from graph theory. I’ll assume a basic knowledge of graph theory, like knowing the terms graphs, subgraphs, vertices, walks, paths, etc.
Definition: A graph is chordal if every cycle of 4 or more vertices contains a chord, an edge connecting two non-adjacent vertices in the cycle. In other words, a chordal graph does not have any subset of vertices whose induced subgraph is isomorphic to $C_n$ for $n \ge 4$. Here’s an example from the CMU lecture notes:
In the first and third cases, the cycle $abcd$ does not contain a chord. So, the vertex set $V = \{a, b, c, d\}$ induces a subgraph that is isomorphic to $C_4$.
A chordless cycle is then a cycle of length 4 or more that doesn’t contain a chord.
Definition: In a graph $G$, a vertex $v \in V(G)$ is simplicial if the subgraph induced by $v \cup N(v)$ is a clique.
For example, in the second graph shown above, $a$ is simplicial, while $d$ is not since its neighbors $a$ and $c$ don’t have an edge between them.
Definition: A perfect elimination order (PEO), $\alpha$, of a graph $G$, is a permutation of the vertex set $G(V) = \{v_1, …, v_n\}$ such that $v_{\alpha(i)}$ is simplicial in the subgraph induced by $\{v_{\alpha(1)}, …, v_{\alpha(i)}\}$. In other words, a vertex is simplicial in the subgraph induced by itself and all vertices that come before it in the ordering.
We will prove that chordal graphs have a perfect elimination order, but first, a helper lemma:
Lemma: Let $G$ be a connected graph. If $G$ is chordal, a minimal $x - y$ separator, $S$, will induce a clique.
Proof.
Let $A_x$ and $A_y$ be the connected components of $x$ and $y$ in $G - S$, respectively. We will assume that $S$ contains at least two vertices, for if it only contained a single vertex it would already be a clique. Take an arbitrary and distinct $u, v$ in $S$. Since $S$ is minimal, it contains no proper subset that is also an $x - y$ separator. So $u$ and $v$ must be adjacent to vertices in both $A_x$ and $A_y$. Let $x_u$ and $x_v$ be two nodes in $A_x$ that are adjacent to $u$ and $v$, respectively. Let $y_u$ and $y_v$ be two vertices of $A_y$ adjacent to $u$ and $v$, respectively.
Now consider the path $P_x = (u, x_u, …, x_v, v)$ and $P_y = (u, y_u, …, y_v, v)$. They must exist since $A_x$ and $A_y$ are connected, so there is a path between $x_u$ and $x_v$, and $y_u$ and $y_v$. Each of these paths has length $\ge 2$ (since $x_u$ and $x_v$ need not be distinct), so the cycle formed by $P_x$ followed by $P_y$ has length $\ge 4$.
Since $G$ is chordal, this cycle must contain a chord. But we know that a vertex in $A_x$ cannot be adjacent to a vertex in $A_y$. This implies that $\{u, v\}$ must be an edge.
Since we took an arbitrary $u$ and $v$, this logic applies for all vertices of $S$, hence $S$ is a clique.
$\square$
Theorem: A connected graph $G$ is chordal if and only if it has a perfect elimination order.
Proof.
We start with the forward direction. We will proceed by induction on the vertices of $G$, denoted $n$.
For our base case, $n = 1$ is trivial.
For the inductive case, let $G$ be a chordal graph of order $n + 1$. If $G$ is a clique, any ordering of the vertices will be a PEO, so assume that $G$ is not a clique. Then there exist non-adjacent vertices $x$ and $y$. Let $S$ be a minimal $x, y$ separator, and let $A_x$ and $A_y$ be connected components of $x$ and $y$ in $G - S$. By the earlier lemma, $S$ is a clique. By the inductive hypothesis, $A_x$ has a perfect elimination order, so there exists a vertex $u \in A_x$ such that $u \cup N(u)$ form a clique in $G[A_x]$. The only vertices in $G$ that can be adjacent $G[A_x]$ are in $S$, and since $S$ forms a clique, that must mean that $u \cup N(u)$ forms a clique in $G$.
By the inductive hypothesis, $G - u$ has a perfect elimination ordering, $\alpha$. Adding the mapping $n + 1 \mapsto u$ to $\alpha$ gives us a perfect elimination order of $G$.
Hint: By minimum counterexample. Take the smallest chordless cycle and let $u$ be the vertex in the cycle that appears last in the elimination order. What can be said about the neighbors of $u \cup N(u)$ in the cycle?
$\square$
So we have that chordal graphs have a perfect elimination order. Let’s review the algorithm for greedy coloring, a simple method for vertex coloring:
Let V be the vertices of the interfence graph
C = {}
for v in ordering(V):
c := the "best" color not used by N(v)
C[v] = c
It turns out that a perfect elimination order is the optimal ordering for which to assign colors, which we’ll show.
Proof.
Let $\alpha = (v_1, …, v_n)$ be a perfect elimination order, and let $\gamma$ be some other ordering. Let $P(n)$ be the proposition that at iteration $n$, $\{v_{\alpha(1)}, .. ,v_{\alpha(n)}\}$ uses at most as many colors as $\{v_{\gamma(1)}, .., v_{\gamma(n)}\}$.
We will prove this by induction. For the base case $n = 1$, both orderings can do no better or worse than using one color.
Now assume that we are on iteration $n + 1$. Consider vertex $v^* = v_{\alpha(n + 1)}$. $v^* $ cannot share any colors with its neighbors that have already been colored. Let $c$ be the best available color as selected by the greedy algorithm. By the definition of a perfect elimination order, the neighbors of $v^* $ that have already been colored (occur earlier in the order), form a clique with $v^*$. Therefore, no valid coloring can select $c$ to be a color that is unavailable when using the perfect elimination order.
By the induction hypothesis, $\{v_{\alpha(1)}, .., v_{\alpha(n)}\}$ is minimally colored. Since the available colors for $c$ is a superset of the available colors for $c$ when using any other ordering, we can conclude that $\{v_{\alpha(1)}, .., v_{\alpha(n + 1)}\}$ is minimally colored.
$\square$
Definition: A perfect graph, $G$ is a graph such that the minimum number of colors needed to color $G$ (denoted $\chi(G)$) is equal to the order of the largest clique in $G$ (denoted $\omega(G)$).
Theorem: If a graph $G$ has a perfect elimination order, then $G$ is perfect.
Proof. By induction on the number of vertices of $G$.
For the base case, $n = 1$, $G$ has a maximum clique size of 1 and chromatic number, or minimum coloring, of 1.
Now let $G$ be a graph with $n + 1$ vertices. Let $\{v_1, …, v_{n + 1}\}$ be a perfect elimination order of $G$. Observe that $G - v_{n + 1}$ still has a perfect elimination order (namely $\{v_1, …, v_n\}$). So by the induction hypothesis, $G - v_{n + 1}$ is perfect. Let $k = \chi(G - v_{n + 1}) = \omega(G - v_{n + 1})$.
Since $N(v_{n + 1})$ induces a clique in $G - v_{n + 1}$, then it must be that $\text{deg}(v_{n + 1}) \le k$. We continue by cases:
In both cases, we can conclude that $\omega(G) = \chi(G)$, and so $G$ is perfect.
$\square$
Corollary: Chordal graphs are perfect.
With these results, we can see that not only is the perfect elimination order the optimal ordering for the greedy algorithm, but with this ordering, the greedy algorithm produces an optimal coloring! This gives us a linear time solution for optimal coloring! It should be noted that we can compute a perfect ordering in polynomial time with the maximum cardinality search algorithm.3 But the results we’ve shown only apply if our graph is chordal…
Interference graphs that are generated from SSA programs are chordal4. But, even without a program in SSA form, 95% of program interference graphs are chordal5!
An informal argument for how all SSA programs have chordal interference graphs can be made as follows:
Let’s suppose that a
interferes with b
, and b
interferes with a
and c
,
and c
interferes with b
and d
. This might get us something that looks like this:
a := foo()
b := bar()
// a and b live out here
print a, b
c := baz()
// b and c live out here
print b, c
d := foo()
// c and d live out here
print c, d
To make a chordless cycle, we’d need d
to also interfere with a
. So we’d need to
do something like this:
d := bar()
loop {
a := foo()
// a and d live out here
print a, d
b := bar()
// a and b live out here
print a, b
c := baz()
// b and c live out here
print b, c
d := foo()
// c and d live out here
print c, d
// d live out here
}
But such a program can’t exist in SSA form, the two definitions of d
would
be split up and a $\phi$ node would be introduced!
We can also use a perfect elimination ordering to decompose an interference graph into indivisible atoms that can be colored independently. Actually, one doesn’t need a PEO to do this. Given a graph $G$, the fill-in is the smallest set of edges that when added to $G$ produce a chordal graph, $G’$. $G’$ is also known as the triangulation of $G$. Thus, a chordal graph is its triangulation.
Given a triangulation $H$, of a graph $G$, we can compute a clique decomposition of the original graph. The idea behind a clique decomposition is that we identify a clique separator, $S$, which is a vertex cut that is also a clique. Thus, $G - S$ will contain at least two components, $A$ and $B$. We then duplicate $S$ into each component and recurse on $G[A \cup S]$ and $G[B \cup S]$. When we get to a component that no longer contains any clique separators, we add that component to the set of atoms.
Observe that $A$ and $B$ have no edges between them since $S$ is a vertex cut. So $A \cup S$ and $B \cup S$ can be colored independently with some special handling for how to color the vertices in $S$.
For example in the following interference graph:
we can identify the clique separator %8
and %19
which breaks up the big component
into two smaller ones:
The triangle %19
, %20
, %8b
cannot be divided further so we can color it.
We can then continue to decompose the bigger component, following the same procedure,
and color it. Once we’re done coloring, we’ll need to merge the results in such a
way that %8a
and %8b
are colored the same.
The decomposition algorithm, due to Tarjan6, is as follows:
This algorithm is not specific to perfect elimination orders, but if our graph is chordal and the ordering we use is a PEO, then we can see that the algorithm will decompose the graph if it contains a clique separator (as $B$ would be non-empty). We further know that in a chordal graph, all minimal separators are clique separators.
To get better parallelization performance, we can follow the work of Gupta et al7 to perform the decomposition from the source code directly without constructing the interference graph first, and Makowski and Pollock8 to use tree parallelism and a recoloring approach to perform coloring independently and in parallel.
To perform recoloring, let’s suppose that we have a clique separator $S$, dividing the graph into two components, $A$ and $B$. We first color $A \cup S$ and $B \cup S$ independently. Then for a register, $r$ in $S$, we have three cases:
After doing this for all registers in the clique, we then need to recolor the rest of the virtual registers in $B \cup S$. This is because recoloring vertices to match the colors of $A \cup S$ may have created interferences. We define a color class as a set of virtual registers that map to the same physical register. Then to perform efficient recoloring, we simply remap the color classes in $B \cup S$ to an available physical register, preferring to keep the old mapping if possible. We implement a color class as an equivalence class in a union-find data structure with path compression. Thus, a color class can be recolored in asymptotically constant time.
When merging the coloring results for $A \cup S$ and $B \cup S$, we won’t need more colors than the maximum number of colors used in coloring $A \cup S$ and $B \cup S$. This is because there is no inference between vertices of $A$ and $B$ except via vertices in $S$. For a perfect graph in particular, one can also prove this fact by observing that $A \cup B \cup S$ cannot contain a larger clique than the largest clique in $A \cup S$ or $B \cup S$.
I and two others attempted to use these facts to write a parallel register allocator in LLVM for a course project. At least in the time we had, we couldn’t get it to perform well or scale. So I can’t say I strongly recommend it.
So does this mean that register allocation is a solved problem, and with SSA IR’s like LLVM, we can just get perfect results with greedy coloring?
Unfortunately no, the more complicated thing is determining when to split, coalesce or recompute values to avoid spilling. Trying all combinations of these techniques to avoid spilling would lead to a very exponential backtracking algorithm.
But, in LLVM, the standard register allocators for debug and release builds
are using the greedy approach. Named GreedyAllocator
, the release
build register allocator is the basic greedy algorithm presented here but with
complicated heuristics to try really hard and avoid spills.
G. J. Chaitin. “Register allocation & spilling via graph coloring”. In: SIGPLAN Not. 17.6 (June 1982), pp. 98–101. issn: 0362-1340. doi: 10.1145/872726.806984. url: https://doi.org/10.1145/872726. 806984 ↩︎
Pfenning, F., Simmons, R., Platzer, A., & Hoffman, J. Lecture Notes on Register Allocation. 15-411: Compiler Design. Retrieved 2024, from https://www.cs.cmu.edu/~janh/courses/411/24/lectures/03-regalloc.pdf. ↩︎
Anne Berry, Romain Pogorelcnik, and Genevi`eve Simonet. “An introduction to clique minimal separator decomposition”. In: Algorithms 3.2 (May 2010), pp. 197–215. doi: 10.3390/a3020197 ↩︎
Hack, S., & Goos, G. (2006). Optimal Register allocation for SSA-form programs in polynomial time. Information Processing Letters, 98(4), 150–155. https://doi.org/10.1016/j.ipl.2006.01.008 ↩︎
Pereira, F. M., & Palsberg, J. (2005). Register allocation via coloring of chordal graphs. Programming Languages and Systems, 315–329. https://doi.org/10.1007/11575467_21 ↩︎
Robert E. Tarjan. “Decomposition by clique separators”. In: Discrete Mathematics 55.2 (1985), pp. 221–232. issn: 0012-365X. doi: https://doi.org/10.1016/0012-365X(85)90051-2. url: https://www.sciencedirect.com/science/article/pii/0012365X85900512 ↩︎
R. Gupta, M. L. Soffa, and T. Steele. “Register allocation via clique separators”. In: ACM SIGPLAN Notices 24.7 (June 1989), pp. 264–274. doi: 10.1145/74818.74842 ↩︎
Christine Makowski and Lori L. Pollock. “Achieving efficient register allocation via parallelism”. In: Proceedings of the 1995 ACM Symposium on Applied Computing. SAC ’95. Nashville, Tennessee, USA: Association for Computing Machinery, 1995, pp. 123–129. isbn: 0897916581. doi: 10.1145/315891.315935. url: https://doi.org/10.1145/315891.315935 ↩︎