Blog

I recently added holes in the caiman frontend. These enable a user to only write part of a schedule, and have the compiler determine the details, using user provided specs. A hole is essentially an expression or statement that can stand in for anything the compiler might want to put there. The frontend is responsible for performing type deduction, and part of a type in caiman is a linear representation of whether the type is “usable” (initialized and not used) or “dead” (uninitialized or initialized but consumed). So the problem I faced when working on this is to determine which hole(s) should initialize a mutable variable if the user doesn’t do so.

I’ll explain this further in an example. Consider the following excerpt of a schedule in Caiman:

var x;
???; // hole1
let z = if a < 0 {
    let t = foo();
    ??? // hole2
} else {
    let y = bar();
    ??? // hole3
};
???; // hole4
let k = z + x;
???; // hole5

Based on what we know so far, it would seem that x has to be usable by the addition of x and z. This would rule out initialization at hole5. Also, we can immediately know that x must be initialized at some point that is dominated by its definition. But which set of holes would be best?

Suppose we choose hole1. Well, if the value of x depends on the value of z (for example if x = z + 1) that means initializing x at hole1 requires recomputing z since it is not yet available. We could exclude all holes that are not dominated by definitions of values that x depends on, but we could easily write a schedule that forces something to be recalculated. Continuing with the example of x = z + 1, we could put a use of x before z is defined, which would mean that we have no choice but to recalculate z.

So perhaps we should choose hole4, which is “as late as possible”. But what if x depends on t or y, then it would be better for x to be initialized at {hole2, hole3} to avoid recalculation.

But as an added complication, we aren’t just dealing with one variable, we are dealing with as many as the user puts in their schedule. These values potentially have dependencies between them. For example if some variable y must be initialized at hole1, and the value of y depends on the value of x, then the value of x is already being initialized at hole1, and we might as well assign this value to the variable x.

Note that any value that is recalculated at a hole (due to initialization of a variable that depends on it) should be thought of as existing only for that hole.

For example in the following:

var x;
???; // hole1
var z;
???; // hole2

If the values of x and z both depend on some value y, and we decide to initialize z at hole2 and x at hole1, then y will be recalculated twice. By recalculation, I mean calculation at a hole where it’s needed to initialize a variable.

In the rest of this post, I’ll use the term “[mutable] variable” to refer to a location a value is held. This corresponds to the source-level concept of a variable. I’ll use the term “value” to refer to the actual value stored in the variable. The value is what is described in the value specifications in caiman. For simplicity I’ll kind of use these two a bit interchangeably when a distinction isn’t really necessary. But it may be useful to keep in mind that definitions, initializations, and locations are properties of a variable while dependencies, recalculations and true/usable uses are properties of a value.

Let’s first define some terms we’ll use throughout this post. All of the graphs we will consider are instruction-level CFGs, in particular this means they are directed graphs with a single start node. All of our CFGs are also normalized to have a single exit node as well. In our discussion, let $r$ be the unique starting root node of the CFG and $f$ the unique final node.

  • $a \text{ dom } b$ if $a$ dominates $b$ which holds when all paths from $r$ to $b$ must contain $a$. The $\text{dom}_s$ relation is used to mean strict domination, which would also require $a \not = b$. The notation $\text{dom}(v)$ denotes set of nodes that $v$ dominates, with the analogous notation $\text{dom}_s(v)$ denoting the set of nodes that $v$ strictly dominates.
  • $a \text{ pdom } b$ if $a$ post-dominates $b$ which holds when all paths from $a$ to $f$ must contain $b$. Likewise for strict post-domination.
  • Set $A$ collectively dominates set $B$ if $\forall b \in B$ every $r$-$b$ path contains a member of $A$. This is often denoted as $A \text{ dom } B$. An analogous definition is used for collective post-domination.
  • A path is a sequence of nodes in the graph denoted $P = (r, …, e)$ for a path starting at $r$ and ending at $e$.
  • $a$ precedes $b$ (denoted $a \rightarrow b$) if there exists an $a$-$b$ path.
  • $a$ strictly precedes $b$ (denoted $a \rightarrow_s b$) if $a \rightarrow b$ and $a \not = b$.
  • $a$ precedes set $B$ (denotes $a \rightarrow B$) if $\forall b \in B. a \rightarrow b$.
  • A set $C$ is minimal with respect to some predicate, if every strict subset of $C$ does not satisfy the predicate, but $C$ itself does.

The control flow graphs in caiman are graphs of programs in continuation-passing style. Graph theoretically, they are what I’m defining as dom-postdom graphs.

Def: A connected, directed acyclic graph $G = (V, E)$ is a dom-postdom (DP) graph if

  • $G$ has 1 node with in-degree 0
  • $G$ has 1 node with out-degree 0
  • $\forall v \in V.\ \text{dom}_s(v) = \emptyset \lor (\exists u.\ v \text{ dom}_s\ u \land u \text { pdom } v)$.

Def: A CFG in the Caiman frontend is called a caiman graph (CG) and it is a DP graph with the added condition that if a node has more than one immediate successor, it dominates those immediate successors.

Here’s two useful facts about DP graphs:

Lemma (DP 2): $\forall v, u, u_2 \in V(G).\ v \text{ dom } u_1 \land v \text{ dom } u_2 \Rightarrow \exists w.\ v \text{ dom } w \land w \text { pdom } u_1 \land w \text{ pdom } u_2$.

Proof.

By the definition of DP, either $\text{dom}_s(v) = \emptyset$ in which case $ v = u_1 = u_2 = w$ and we’re done, or $\exists w’.\ w’ \text{ pdom } v$ and $v \text { dom}_s w’$. WLOG and FSOC, suppose $w’ \not \text{pdom } u_1$. Then $\exists P = (u_1, …, f)$ such that $w’ \not \in P$. Suppose $w’ \not \text{dom } u_1$. Then, since $v \text { dom } u_1$ there exists a $P’ = (v, …, u_1)$ not containing $w’$ so concatenating $P$ and $P’$ yields a $v-f$ path not containing $w’$. This contradicts the definition of post-domination. Otherwise, if $w’ \text{ dom } u_1$, then repeat this process with a post dominator of $w’$ chosen as the new $w’$.

$\square$

Lemma (Scoping): $\forall v, w, u$ st $v$ dom $w \land w$ postdom $v \land v \rightarrow u \land u \rightarrow w \Rightarrow v$ dom $u \land w$ postdom $u$.

Basically, this says the nodes “in-between” $v$ and $w$ are dominated by $v$ and post dominated by $u$.

Scoping

The way I think of this is that these graphs have “scopes” in which every node within the scope (depicted as the gray box with “…” here) has a shared dominator and shared post-dominator.

Proof.

Take an arbitrary $v, w, u$ such that the hypothesis holds. Since $v$ dom $w$, then every $r$-$w$ path must pass through $v$. As $v \rightarrow u$ and the graph is acyclic, then $u \not \rightarrow v$. But since $u \rightarrow w$, then it must be that every $r$-$u$ path must contain $v$, otherwise there would be a $r$-$w$ path not containing $v$. So $v$ dom $u$. Now, since $w$ postdom $v$, every $v$-$f$ path must contain $f$. We again know that $w \not \rightarrow u$, so in order for $w$ to be a post dominator, every $u$-$f$ path must contain $w$ since $v \rightarrow w$. Thus $w$ postdom $u$.

$\square$

For the graphs in Caiman, we also can say a little more.

Observation (CG1): $\forall v,w.\ vw \in E(G) \Rightarrow v$ dom $w$ or $w$ postdom $v$.

Proof. If the only successor of $v$ is $w$, then $w$ postdom $v$. If $v$ has more successors besides $w$, then by definition $v$ dom $w$. $\square$

Lemma (CG2): $\forall u, v.\ u \rightarrow v \Rightarrow \exists w. \ w$ pdom $u$ and $w$ dom $v$.

Proof.

By structural induction on $u$:

  • $u$ has no children. We will denote this as u = None(). Then here $u = v = w$.
  • $u$ has one child, $s$. We will denote this as u = Next(s). If $s = v$ or $v = u$ then let $w = u$. Otherwise, by the IH, there exists a $w’$ such that $w’$ postdom $s$ and $w’$ dom $v$. Since $s$ postdom $u$ then let $w = w’$.
  • $u$ has more than one child. We will denote this as u = Select(a, b), focusing on the two-child case which is all we need. However, this generalizes to an arbitrary number of children. If $v = u$, $v = a$, or $v = b$, then let $w = u$. Otherwise let $z$ be a node such that $u$ dom $z$ and $z$ postdom $u$ which exists by definition of DP graphs. Suppose $v \rightarrow z$. Then by scoping, $u$ dom $v$. Thus, let $w = u$. Otherwise, $z \rightarrow v$. By the IH, $\exists w’$ such that $w’$ postdom $z$ and $w’$ dom $v$. As $z$ pdom $u$, then $w’$ pdom $u$ so let $w = w’$.

$\square$

Lets’ also define some other terms and notation we’ll use throughout this post:

  • Let $G$ be the schedule CFG and $G_h$ be the CFG of holes st $V(G_h) \subseteq V(G)$.
  • Let $L$ be the set of hole locations in the CFG such that $L \subseteq V(G)$. We will assume that the set of holes are distinct from the set of uses. The idea here is that the uses are caused by “regular instructions”.
  • Define an initializing set, $S$ of variable $v$ to be such that ${def(v)}$ dom $S$ and $S$ dom $use[v]$.
  • Define $use[v]$ as the set of true uses of the variable $v$. Note that once a variable is initialized, it will not be reinitialized. This is because we consider an SSA form of the program where “re-initialization” would result in a new variable being used. We will consider that a node which post-dominates a use is also a use.

In the next sections, we’re going to do something weird. I’m going to style most of the rest of the post as a dialogue. It seems to work well for philosophy, although admittedly I haven’t read enough philosophical dialogues to claim I know what I’m doing here. The dialogue will be between Livia Drusilla (whom I will abbreviate as L) and Marcus Aggripa (whom I will abbreviate as A), since I’m sure this is what dead Romans would be doing if there is some kind of afterlife. The goal here is to try and provide a more natural development of ideas, rather than just presenting them to you. Of course, even this dialogue is cleaned up and reordered quite a bit compared to how the ideas actually evolved.

The algorithm will proceed in two phases, one phase which places initializations of variables “as late as possible”, and another hoist phase which tries to find earlier, more optimal initialization points. So let’s see how this goes…

Livia excitedly walks into the Roman forum, where she spots Agrippa, drinking espresso and admiring the ruins

L: Agrippa! I think I have an optimal, polynomial time solution to the initialization problem.

A: Optimal and polynomial time you say? Well I’d sure like to hear it.

Livia writes down the following on a nearby wall in chalk

init[v] = {}
use[v] = { locations of uses of variable v }
I = { mutables } - { mutables with a value returned dead }
for l in topo_order(L).rev():
    for v in I:
        if use[v].all(lambda u: u not dom l and 
                exists_path_not_containing(init[v], path_src: l, path_dest: u)) 
            and def(v) dom l: #cond1
            init[v] = init[v] + {l}
        if init[v] dom use[v]: #cond2
            I.remove(v)

A: So your algorithm looks at all holes going up from the final CFG node, “marking” holes which are not dominated by a use and stopping once you reach an initialization set?

L: Yes! It basically initializes the variables “as late as possible” since the only thing initializing them earlier can do is make it so we need to recalculate something that’s available later.

A: I see. Then I think your optimality claim assumes that once a variable is initialized or defined, it stays live until the end of the program. But this isn’t true, implicitly initialized values are live only for the hole they are recalculated at, and variables can be consumed or go out of scope. For example, suppose we have a CFG like so, where $u$ is a use and $h_n$ are holes:

A: Then we could have a situation where the best initializing set is $\{h_1\}$, but your algorithm will choose $\{h_2, h_3\}$. I can construct a program matching this situation like so:

val spec() -> i64 {
    y :- foo() + foo()
    v :- y * y
    z :- y + y
    r :- 0 if z < 0 else 1
    returns r * v
}

fn sched() -> i64 impls spec {
    var v;
    let r = if ??? /*hole1*/ < 0 {
        ??? // hole2
    } else {
        ??? // hole3
    };
    r * v
}

A: Here, it would be best to initialize v at hole1 since hole1 must initialize z, and z and v both depend on y. So hole1 must initialize y and since the user didn’t create a variable for it, the value will not persist after the hole. The algorithm will initialize v at hole2 and hole3, which will cause the recalculation of y twice along any given path instead of once.

A: We could avoid this problem by allowing ourselves to change the program and create variables to store these intermediate results to reuse them in later holes. But I don’t know if this will work in general, and regardless, part of Caiman is allowing the user to discover the cost of tradeoffs such as recalculation vs caching, so always caching results might be a problem if the user really wanted to recalculate them.

L: Minerva’s owl! You’re right. And considering this, I think an optimal solution must be exponential if it’s based only on the structure of the graph of holes. Take an arbitrary $S \subseteq L$ such that $S$ is a valid initializing set of $v$. Then take the spec, and arbitrarily add $m$, for some very large $m$, dependencies to the value of $v$ which are never used anywhere else in the program. Then for each $s \in S$, we create variables right before $s$ initializing each of the $m$ values and consume them right after $s$. We therefore create conditions such that the optimal initialization set is $S$ exactly, and so we’d miss it if we didn’t check the subset $S$. Thus, an optimal solution must examine all subsets, which is exponential in the number of holes in the graph.

L: In any case, I think I still have a proof of correctness…

Thm: If there exists an initializing set of $v$, $init[v]$ will be an initializing set.

Proof.

By contrapositive, assume $init[v]$ is not an initializing set. Then let $P = (r = p_0, …, p_k = u)$ be a path from $r$ to a $u \in use[v]$ such that $P \cap init[v] = \emptyset$ and $P \cap use[v] = \{u\}$. Then $init[v] \not \text{dom } use[v]$; so all locations are considered by the algorithm as cond2 is always false. Since $(p_0, …, p_{k - 1}) \cap use[v] = \emptyset$ then no use dominates any $p_0, …, p_{k - 1}$. Note that for each $p_i$ for $i \le k - 1$, there is a path to a use, namely $(p_i, …, p_k = u)$. Thus it must be that if any $p_0, …, p_{k - 1}$ were in $L$, the algorithm would select them in $init[v]$ as cond1 would hold. But by assumption, $init[v]$ is not an initializing set even though it contains every hole on any path from $r$ to a “first use”. So we must conclude that there is no valid initializing set.

$\square$

A: I think I can buy that. I wonder, if this algorithm really does initialize the variables as late as possible, then perhaps we could follow it up with something like partial redundancy elimination to hoist the initializations to locations that reduce the dependencies that need to be recalculated.

L: I used the “as late as possible” fact in my bogus proof of optimality, so I’ll give you the proof. Starting with a helper lemma.

Lemma (O1): $\forall v. \ell \in init[v] \Rightarrow \exists u \in use[v]$ st $u$ postdom $\ell$.

Proof.

Since $\ell \in init[v]$, $use[v] \not = \emptyset$ and $\forall u \in use[v]$ $u \not \text{dom } \ell$ by the algorithm. Since $v$ dom $\ell$ and $v$ dominates all uses, then if we take an arbitrary $u \in use[v]$ we’d have that $v$ dom $u$, and $v$ dom $\ell$. So by DP2, $\exists w$ st $w$ postdom $\ell$ and $w$ postdom $u$. Thus $w \in use[v]$ by construction of the use set and thus we reached out goal.

$\square$.

L: And this is used in the proof of the next lemma…

Lemma (O2): $\forall v, S \subseteq L. \exists u \in use[v]. \ell \in init[v] \land (init[v] \setminus \{\ell\}) \not \text{dom } \{\ell\} \land \ell \rightarrow_s S \Rightarrow \exists P = (\ell, …, u)$ st $P \cap S = \emptyset$.

L: This lemma basically states that for a location $\ell$ in an initializing set produced by the algorithm, any initializing set consisting only of locations that strongly succeed $\ell$ will not suffice; there will be a path from $\ell$ to some use not containing any element of $S$.

Proof.

Suppose the contrapositive and that there exists a minimal initializing set $S \subseteq L$ st every path from $\ell$ to a use contains an element of $S$. By the contrapositive of O1, there must be a use $u$ such that $u$ postdom $\ell$, as otherwise, $\ell \not \in init[v]$. So every $\ell$-$f$ path must contain $u$. By assumption that $S$ is an initializing set, every $\ell$-$u$ path contains a member of $S$. By the algorithm, since $\ell \rightarrow_s S$, every member of $S$ will be considered before $\ell$. So consider an arbitrary $s \in S$. Then we have the following cases based on the conditions of how an element is added to $init[v]$ in the algorithm:

  1. $s \not \in init[v]$. Then based on cond1, there are three ways this could occur:
    1. Every $s$-$f$ path contains an element of $init[v]$. Let $S’$ be the minimal set of elements such that every $s$-$f$ path contains an element of $S’$. It follows then that $s \rightarrow S’$, which implies that $\ell \rightarrow_s S’$. So we can repeat the lemma with $S = S \cup S’ \setminus \{s\}$.
    2. There is a use, $u$ that dominates $s$. Since there is a $\ell$-$s$ path not containing a use (by assumption of minimality of $S$), then we can conclude that $u$ dom $\ell$. But, by definition of an initializing set $init[v]$ dom $\{u\}$ which implies that $init[v]$ dom $\{\ell\}$ reaching our goal by showing $\ell \not \in init[v]$.
    3. At the time $s$ is considered by the algorithm, $init[v]$ dom $use[v]$. Since $\ell \rightarrow_s s$, then $s$ follows $\ell$ in topological order. Thus the algorithm would never consider $\ell$ for $init[v]$ since it would have stopped at $s$. This again reaches our goal by showing $\ell \not \in init[v]$.
  2. $s \in init[v]$. Note that if $s = f$ and case 1b isn’t satisfied this necessarily implies $s \in init[v]$.

So the only case we have left to consider is case 2. So assume $S \subseteq init[v]$. So every $\ell$-$f$ path contains a member of $init[v]$ at the time $\ell$ is considered. Thus $\ell \not \in init[v]$ reaching our goal.I claim, by a lemma I haven’t defined yet but will call O3, that this means that $\ell \not \in init[v]$, reaching our goal once again.

$\square$

A: So you aim to prove an equivalence between $\forall u$ exists_path_not_containing(init[v], path_src: l, path_dest: u) in the algorithm and some condition like exists_path_not_containing(init[v], path_src: l, path_dest: f)? I suppose this effectively removes the quantification over $use[v]$.

L: Yes. The need to do this came about as I found two different conditions for the algorithm, one that was well suited to proving the algorithm correct, and this one which was helpful in proving the “late as possible” claim about the algorithm. The intuition for their equivalence is that we defined a post-dominator of a use to also be a use. So based on this and the structure of a graph, if there is an $\ell$-$f$ path not containing $init[v]$, and $init[v]$ is an initializing set, then it must be that $f$ is not a use. But $f$ post-dominates every node, therefore there cannot be any uses.

L: Formally, we have …

Lemma (O3): Let $S$ be a minimal initialization set of a variable $v$ (so no use dominates a member of $S$). Then

$$\forall \ell \in S, \exists P = (\ell, …, f). P \cap (S \setminus \{\ell\}) = \emptyset \Leftrightarrow \exists u \in use[v], P’ = (\ell, …, u). P’ \cap (S \setminus \{\ell\}) = \emptyset$$

Proof.

$\Rightarrow$: Suppose $\exists$ a $\ell$-$f$ path $P$ st $P \cap S \setminus \{\ell\} = \emptyset$. By O1, $\exists u \in use[v]$ st $u$ postdom $\ell$. So $P = (\ell, …, u, …, f)$. Then we take the prefix path of $P$, $P’ = (\ell, …, u)$ which is a path to a use such that $P’ \cap S \setminus \{\ell\} = \emptyset$.

$\Leftarrow$: For the other direction. Suppose $\exists P = (\ell, …, u)$ such that $P \cap S \setminus \{\ell\} = \emptyset$. Then $\exists P’ = (\ell, …, u, …, f)$ for which $P$ is a prefix since all nodes reach $f$. Consider the suffix path of $P’$ $P’’ = (u = p_0, …, p_k = f)$. We will induct on $P’’$ to show every $n \in P’’$ is a use, dominated by a use, or not dominated by $v$.

Base Case: $p_0 = u$ which is a use.

IC: Assume the inductive hypothesis so $p_i$ is a use, dominated by a use, or not dominated by $v$. Then we have the following cases:

  • $p_i$ is a use. Then since $p_ip_{i + 1} \in E(G)$, $p_{i + 1}$ postdom $p_i$ or $p_i$ dom $p_{i + 1}$ by CG1.
  • $p_i \not \in dom_s(v)$. Then $p_{i + 1} \not \in dom_s(v)$.
  • $p_i$ is dominated by a use and $p_i$ dom $p_{i + 1}$. Then $p_{i + 1}$ is dominated by a use.
  • $p_i$ is dominated by a use and $p_i \not \text{dom } p_{i + 1}$. Then it must be that $p_{i + 1}$ postdom $p_i$. Let $u$ be the last use in topological order that dominates $p_i$. Consider a $u$-$f$ path containing $p_i$. Note that since $u$ dom strict $p_i$, then $\exists x \in dom_s(u)$ such that $x$ postdom $u$. Observe $x \not \rightarrow p_i$. If this were the case, then $x$ would be a use later in topological order than $u$, and since $u$ dom $p_i$ and $x$ postdom $u$, then $x$ dom $p_i$ violating our choice of $u$. We further know that $x \not = p_i$, otherwise $p_i$ would be a use, which is covered in another case. Thus we have 2 options. $x = p_{i + 1}$, in which case $p_{i + 1}$ is a use, as desired. Or $p_{i + 1} \rightarrow x$, in which case $u$ dom $p_{i + 1}$ by Scoping.

Finally we conclude by noting that we constructed a $u$-$f$ path such that every node in it is a use, dominated by a use, or not dominated by $v$. The latter two cases show the node cannot be in $init[v]$ by violation of cond1. The first condition shows the node cannot be in $init[v]$ by the assumption that our use set and hole set are distinct. So this path contains no elements of $init[v]$, reaching our goal.

$\square$

Livia sits down at the table with Agrippa, visibly distressed

L: This is all well and good, but I can’t believe I spent all this time thinking I was proving something useful when I didn’t realize I was making a terrible assumption about value persistence.

A: Well, I may have an idea. We could start by using this first algorithm to create a “baseline” initialization set. Then, for each variable, we could try and hoist the initializations earlier so long as we put them at a strictly better location. This won’t be perfect; as a counterexample we can construct the following:

Cex

A: Suppose that x and z depend on t, h, k. The algorithm will initialize x and z at the holes right before x + z and x * k. But the globally optimal position would be at the hole right before bar(t). The hoist idea I have won’t be able to hoist x because doing so would temporarily make things not better1. We’d have to recalculate h and k now even though we save t. Likewise for z. Now if we hoisted z and x “at the same time”, this would avoid this issue but require exponential time as a general solution that does this would need to look at every subset.

A: I further don’t think it’s a good idea to arbitrarily decide that certain values are more or less worth it to recalculate. For example, I don’t think we should make choices that would save 100 recalculations but cost an extra recalculation of some value. It could be that one extra recalculation has a bigger impact on runtime than the saving of the 100 other recalculations.

Livia, regaining some of her optimism, stands up and paces around for a bit. Eventually, she gets an idea and writes the following on a wall

mark = {} # The set of "last best" locations
init[v] = old_init[v]
for v in mutables:
    for l in topo_order(L).rev():
        if l in old_init[v]:
            mark[l] = {l}
        if old_init[v] pdom {l} and def(v) dom l:
            match l:
                case Next(lp): 
                    mark[l] = mark[lp],
                case None:
                    mark[l] = {},
                case Select(a, b):
                    if mark[a].empty() or mark[b].empty():
                        mark[l] = {}
                    elif mark[a] pdom mark[b]:
                        mark[l] = mark[a]
                    elif mark[b] pdom mark[a]:
                        mark[l] = mark[b]
                    else
                        mark[l] = mark[a] + mark[b]
            if calc(v, l) < intersection([calc(v, lp) for lp in mark[l]]):
                mark[l] = {l}
            if init[v].any(lambda x: mark[l] dom {x}):
                init[v] = mark[l] + init[v] 
                    - [x for x in init[v].filter(lambda y: mark[l] dom y)]

L: Here, calc(v, l) is a multiset of nodes that would need to be recalculated given that v is initialized at l. old_init[v] pdom {l} is true if every path from l to the final node passes through an element of old_init[v]. This is somewhat analogous to saying an initialization of v is anticipated at l (in reference to anticipated expressions of regular partial redundancy elimination). So basically, the algorithm starts at the previous initialization set, and then walks up the CFG towards the root. If it encounters any locations such that initializing v at that location results in less recalculations than the intersection of the previous “best”, we mark it. If at any point the mark set dominates some subset of the current initialization set, we add the mark set to the initialization set and remove the dominated subset.

A: Isn’t it possible that hoisting a variable opens up opportunities for other variables to be hoisted? For example, if we hoist a value, it might become attractive to hoist the values it depends on and values that depend on it. So perhaps we could iterate this procedure until convergence? If we do that, and there are $m$ variables and $n$ nodes in the CFG, we get a rough complexity of $O(m^2n^3 \log n)$, because iterating until convergence can be at most $O(mn \log n)$ since we can only hoist a variable $n \log n$ times. This also assumes we can check calc(v, l) in linear time wrt $n$. This isn’t super stellar, but it is still pseudo-polynomial.

L: You’re right, then we can think of old_init[v] as init[i-1][v] and the current init as init[i][v]. I think the statement of “optimality” we can try and prove is $\forall S, v, \exists P = (r, …, f).\ {calc}_v^S(P) \not \subseteq {calc}_v^{{init}_f}$. Where ${init}_f$ is the last initializing set we compute. This would say that for a given variable. We’re going to assume for the rest of this that the initializing set $S$ is not equivalent to $init_f$. By that I mean that there exists a path $P’$ where $calc_v^S(P’) \not = calc_v^{init_f}(P’)$. So what this statement of “optimality” is saying is that when examining the values we must recalculate for a given variable, any other initializing set that is not equivalent to $init_f$ will contain a path where it is potentially worse. This statement gives us “true”/“global” “optimality” when a schedule contains only one mutable variable.

L: Since the algorithm looks at one variable at a time, another way of thinking about this is our algorithm starts with a “baseline” and monotonically improves it until it reaches a point where it can no longer get better.

A: Maybe we can try and prove this by induction. If we say at each step, there isn’t a better choice of initialization points from subsets of $L$ that succeed the current initialization set, then at the final iteration all we’d need to show is that the algorithm considered initializations that precede the current init set.

Thm (Hoist Induct): $\forall v, S.\ \ell \in init_i[v] \land \ell \rightarrow_s S \Rightarrow \exists P = (\ell, …, f). \ calc_v^S(P) \not \subset calc_v^{init_f}(P)$

Proof.

By induction on $i$, the number of iterations of computing $init_i$.

Base Case: The initial initializing set is the result of the previous algorithm. By the “push-down” lemma (O2) this is true since any $S$ such that $\ell \rightarrow_s S$ must not be an initializing set.

IC: Consider $\ell_i \in init_i[v]$ and $\ell_{i + 1} \in init_{i + 1}[v]$ such that there is a $\ell_{i + 1}$ dom $\ell_i$. Such an $\ell_{i + 1}$ and $\ell_i$ exist as its a condition for adding any new elements to the initializing set. Assume $\ell_i \not = \ell_{i + 1}$ as if it did, we’d be done by our inductive hypothesis. Let $P = (\ell_{i + 1} = p_0, …, p_n = \ell_i)$ be a $\ell_{i + 1}$-$\ell_i$ path. Observe that $P \cap S \not = \emptyset$, otherwise we know by the hypothesis there exists a $P’ = (\ell_i, …, f)$ such that $calc_v^S(P’) \not \subset calc_v^{init_f}(P’)$. Joining $P$ and $P’$ yields us an $\ell_{i + 1}$-$f$ path such that our goal holds since by the algorithm, and that $\ell_{i + 1}$ dom $\ell_i$, $calc_v(\ell_{i + 1}) < calc_v(\ell_i)$.

Observe $mark^{i + 1}(\ell_{i + 1})$ does not contain $s$, since $\ell_{i + 1}$ was added to the initializing set, this means that $mark^{i + 1}(\ell_{i + 1}) = \{\ell_{i + 1}\}$. Since $init_i[v]$ postdom $\{\ell_{i + 1}\}$ and S, I claim (and will refer to this as H3) that there exists a $z \in init_i[v]$ so that for $P’ = (\ell_{i + 1}, …, z), calc_v^S(P’) \not \subset calc_v^{mark_{\ell_{i + 1}}}(P’)$. I claim, (and will refer to this a H2) that $P’ \cap init_{i + 1}[v] = \{\ell_{i + 1}\}$. This claim is basically asserting that there is no path between any two locations in an initializing set. These two facts together give us $calc_v^S(P’) \not \subset calc_v^{init_{i + 1}}(P’)$.

  • First suppose that $z \not \in S$. Since we established, $z \not \in init_{i + 1}[v]$, then we apply the IH to obtain a path $P’’ = (z, …, f)$ such that $calc_v^{S \cap P’’}(P’’) \not \subset calc_v^{init_i}(P’’) \not \subset calc_v^{init_{i + 1}}(P’’)$. Joining $P’$ and $P’’$ yields a path $P = (\ell_{i + 1}, …, f) = (P’, P’’)$ where $calc_v^S(P) \not \subset calc_v^{init_{i + 1}}(P)$ since $calc_v^S(P’) \not \subset calc_v^{init_{i + 1}}(P’)$ and $calc_v^{S \cap P’’}(P’’) \not \subset calc_v^{init_{i + 1}}(P’’)$. So $calc_v^S(P’ \sqcup P’’ = P) \not \subset calc_v^{init_{i + 1}}(P’ \sqcup P’’ = P)$.
  • In the case that $z \in S$. Then there is a $P’’ = (z, …, f)$ such that $calc_v^{S \cap P’’}(P’’) = calc_v^{init_i}(P’’)$, because recall that $z \in init_i[v]$. But then joining $P’$ and $P’’$ yields a path $P = (\ell_{i + 1}, …, f) = (P’, P’’)$ where $calc_v^S(P) \not \subset calc_v^{init_{i + 1}}(P)$ since $calc_v^S(P’) \not \subset calc_v^{init_{i + 1}}(P’)$ and $calc_v^{S \cap P’’}(P’’) = calc_v^{init_{i + 1}}(P’’)$. So $calc_v^S(P) \not \subset calc_v^{init_{i + 1}}(P)$.

$\square$

A: Wait, your H2 claim doesn’t work for the first inductive case ($i = 0$). That is, our phase 1 algorithm doesn’t guarantee that there isn’t a path from a location in $init_0[v]$ to another location in $init_0[v]$. For example, suppose that we have a graph like this:

A: Here, holes $h_1$ and $h_2$ would get selected in the initializing set. First $h_1$ would get chosen since no uses dominate it, and there exists a path from it to a use with no selected holes. But since $h_1$ alone does not dominate all uses, the algorithm will continue, eventually selecting $h_2$ since no uses dominate it and there exists a path from it to a use that doesn’t contain any selected holes.

L: Hmm. You’re right. It could be that we add a location to the initialization set, but then need to add a location that dominates it in order to cover a different use. I think we can fix this by removing all elements of the initialization set which are collectively dominated by other elements of the initialization set. In other words, before moving onto phase 2, we can just remove elements from the initialization set until it becomes minimal.

L: So now, we have two holes we have left to fill. Before doing that, let me give you one more helper lemma.

Lemma (H1): $\forall n \in V(G).\ mark_n = mark_k \land k$ postdom $n \Rightarrow \exists P = (n, …, k), \forall s \in P. \ calc_v(s) \not \subset mark_k$.

Proof.

By structural induction on a Caiman CFG.

  • n = None(): Here $mark_n = \emptyset$ and $calc_v(n) \not \subset \emptyset$
  • n = Next(m): Suppose $mark_n = mark_m$. Then $calc_v(n) \not \subset mark_m$. By IH if $mark_m = mark_k$ where $k$ postdom $m$, then $\exists P = (m, …, k)$ such that $\forall s \in P.\ calc_v(s) \not \subset mark_k$. By appending $n$ to the front of the path, we get a desired $n$-$k$ path since $m$ postdom $n$. Otherwise, $m = k$.
  • n = Select(a, b): Suppose $mark_n = mark_a \cup mark_b$. Then we reached our goal vacuously by assumption that $mark_b \not \text{pdom } mark_a$ and vice versa. Now suppose. WLOG that $mark_n = mark_a$. Then $mark_a = mark_k$. Here we finish the case by following the same logic as in the Next(m) case.

$\square$

Claim (H3): $\forall init, x, S, y \in S, x \rightarrow y \land y \not \in mark_x \Rightarrow \exists z \in init, P = (x, …, z).\ calc^{mark_x}(P) \not \supset calc^{S \cap P}(P)$

L: The idea here is basically if there is some parent node that doesn’t have a child node in its mark set, then there is some path where we can potentially do better.

Proof.

By structural induction.

  • x = None(). Then $x = y = z = f$. The only place to initialize is $x$ itself, so this is impossible as $y = x$ and $y \not \in mark_x$.
  • x = Next(m) and $mark_x = \{x\}$. Then by the algorithm $calc_x \subset calc_\ell$ for all $\ell \in mark_m$. Since $x \rightarrow y$ (by this case that $y \in mark_m$ and $y \not \in mark_x$) and some $z \in init$ postdom $y$. Then the path $P = (x, …, y, …, z)$ would be a desired path where $calc^{mark_x}(P) \not \supset calc^S(P)$ if $y \in mark_m$. If $y \not \in mark_m$, by IH $\exists P’ = (m, …, z’)$ such that $calc^{mark_m}(P’) \not \supset calc^{S \cap P’}$. Consing $x$ onto $P$ yields a path where $calc^{mark_x}(P) \not \supset calc^{S \cap P}(P)$.
  • x = Next(m) and $mark_x = mark_m$. By assumption $y \not \in mark_m$. So apply IH and cons $x$ as before.
  • x = Select(a, b) and $mark_x = mark_a \cup mark_b$. WLOG, $y \not \in mark_a$. Then by IH $\exists P’ = (a, …, z)$ st $calc^{mark_a}(P’) \not \supset calc^{S \cap P’}(P’)$. Consing $x$ onto $P’$ yields a path $P = (x, …, z)$ st $calc^{mark_x}(P) \not \supset calc^{S \cap P}(P)$ since $calc_v(x) \not \subset calc^{mark_x}(P)$.
  • x = Select(a, b) and $mark_x = mark_k$ for $k$ postdom $x, a, b$. By lemma H1, $\exists P’ = (x, …, k)$ st $\forall p \in P’$ $calc_v(p) \not \subset mark_k$. By assumption $calc_v(x) \not \subset \bigcap mark_k$. By IH, $\exists P’’ = (k, …, z)$ st $calc^{mark_k} \not \supset calc^{S \cap P’’}(P’’)$. Thus, the path $P = (x, P’, P’’)$ is a $x$-$z$ path where $calc^{mark_x}(P) \not \supset calc^{S \cap P}(P)$ since any initialization at $x$ or on $P’$ is no better than the initialization by $mark_k$. Observe that if $\not \exists y’ \in S$ st $k \rightarrow y’$, then $P$ is still a desired path as $S$ must then initialize at $x$ or on $P’$.
  • x = Select(a, b) and $mark_x = \{x\}$. By assumption $y \not = x$. We can reduce this case to an argument of x = Next(Select(a, b)), reusing the other cases.

$\square$

A: I foresee that we may have a slight problem. So far, these facts required using a strict subset. That is, we seem to be working towards a statement along the lines of $\exists P$ such that $calc_v^S(P) \not \subset calc_v^{init_f}(P)$, which doesn’t seem all that useful. It would say there is some path where our solution is not worse. So it could be that on one path, we match some other solution, but on all other paths, we are worse than other solutions.

L: Oh dear, not another oversight! I’ll have to think about how best to solve this. Intuitively, it feels like if we can narrow down every path to being $\supseteq$ to the calculations of our solution, then we can just assume that the initialization set we’re examining is not equal to the $init_f$. Again, this is with respect to a single variable, so we’re still dealing with a “local optima” instead of a global one.

Claim (H2): $\forall \ell_{i + 1} \in init_{i + 1}, \ell_i \in init_i, P = (\ell_{i + 1}, …, \ell_i).\ P \cap init_{i + 1} = \{\ell_{i + 1}\}$

L: The point of this claim is to argue why an initialization set produced by the algorithm cannot have two members that can reach each other.

Proof.

By induction on $i$.

Base Case: Here $init_0[v]$ is the initialization set produced by the phase 1 algorithm. Let’s assume we process this set such that $\forall \ell_1, \ell_2 \in init_0[v].\ \ell_1 \not$ dom $\ell_2$ and $\ell_2 \not$ dom $\ell_1$. We WTS that $\ell_1 \not \rightarrow \ell_2 \land \ell_2 \not \rightarrow \ell_1$. Suppose FSOC, and WLOG there exists path $P_1 = (\ell_1, …, \ell_2)$ (so $\ell_1 \rightarrow \ell_2$). Since $\ell_1 \in init_0[v]$, then there exists $P_2 = (\ell_1, …, u)$ for $u \in use[v]$ such that $P_2 \cap init_0[v] = \{\ell_1\}$. This is true by definition of the algorithm (cond1). Consider the last shared node between $P_1$ and $P_2$ (which could be $\ell_1$ itself). Let’s call this node $w$. If $w$ is a select, then it must have a distinct post-dominator, and by assumption that $w$ is the last shared node in the paths, this post-dominator must succeed $\ell_2$ and $u$. So by scoping, $w$ dom $\ell_2$. This means that $\ell_2 \not \in init_0[v]$, reaching our goal.

Otherwise, if $w$ is not a select, then it must be $u$ itself. So $u \rightarrow \ell_2$. By Lemma CG2, $\exists w’$ such that $w’$ postdom $u$ and $w’$ dom $\ell_2$. Therefore, $w’$ is a use and $\ell_2$ is dominated by a use. So by cond1 of the phase 1 algorithm, $\ell_2 \not \in init_0[v]$, reaching our goal again.

IC:

  • First, I claim that $\forall \ell \in init_{i + 1}.\ mark_\ell \not = \emptyset \Rightarrow \exists \ell_i \in init_i.\ \ell \rightarrow \ell_i$. We will prove this by structural induction.
    • Suppose $\ell$ is None().This is vacuously true.
    • Suppose $\ell$ is Next(a) and $mark_\ell = mark_a$. If $mark_a = \emptyset$, we have vacuous truth. So assume this is not the case. Then Applying the IH gives us that $a \rightarrow \ell_i$ for some $\ell_i \in init_i$ thus $\ell \rightarrow init_i$.
    • Suppose $\ell$ is Next(a) and $mark_\ell = \{\ell\}$. If $\ell \in init_i$, we’re done. So suppose this isn’t the case. Then $calc_v(\ell) \subset \bigcap_{\ell’ \in mark_a} calc_v(\ell’)$. Thus $mark_a \not = \emptyset$ and following the logic of the previous case we get $\ell \rightarrow a \rightarrow init_i$.
    • Suppose $\ell$ is Select(a, b). If $mark_\ell = \{\ell\}$, then $\ell \in init_i$ and we’ve reached our goal. So suppose WLOG that $mark_\ell = mark_a$. Following the logic of case 2, we get that $\ell \rightarrow a \rightarrow \ell_i$. Finally, it could be that $mark_\ell = mark_a \cup mark_b$. If $mark_\ell = \emptyset$, we’d have vacuous truth, so assume this isn’t the case. WLOG assume $mark_b \not = \emptyset$. Then following the logic of case 2. We have $\ell \rightarrow b \rightarrow \ell_i$.

Now suppose there exists some $\ell_{i + 1}$ such that $\ell_{i + 1} \rightarrow \ell’$ for $\ell’ \in init_{i + 1}$. By the claim, it must be the case that $\ell’ \rightarrow \ell_i$ for some $\ell_i \in init_i$. Let $A$ be the set of nodes added to $init_{i + 1}$ along with $\ell_{i + 1}$. Observe that $A$ dominates some $\ell^* \in init_i$. Thus $\ell_i \not = \ell^*$, since $A$ does not dominate $\ell’$.

  • Suppose that $\ell_{i + 1} \rightarrow \ell^*$. Consider paths $P_1 = (\ell_{i + 1}, …, \ell^*)$ and $P_2 = (\ell_{i + 1}, …, \ell_i)$. Let $w$ be the last node shared between the two paths. By the IH, $w \not = \ell_i$ and $w \not = \ell^*$ (otherwise two elements of $init_i$ would reach each other). So $w$ must be a select. Therefore, $\exists v$ st $w$ dom $v$ and $v$ pdom $w$. Since $v$ is the last shared node in the paths, it must be that $\ell^* \rightarrow v$ and $\ell_i \rightarrow v$. But then, by scoping, $w$ dom $\ell_i$. This implies that $A$ dominates $\ell’$ and thus $\ell’ \not \in init_{i + 1}$.
  • Now suppose that $\ell_{i + 1} \not \rightarrow \ell^*$. Assume, $\ell^* \not \rightarrow \ell_{i + 1}$, as if it did, $\ell_{i + 1} \not \in init_{i + 1}$. Then there must be some select node, $w$, such that $mark_w = A$. Therefore $A$ pdom $w$. Let $P_1 = (w, …, \ell^*)$ and $P_2 = (w, …, \ell’)$ be two paths such that $P_2$ contains no member of $A$ (as $A$ does not dominate $\ell’$). Let $x$ be the last shared node of $P_1$ and $P_2$. $x$ cannot be $\ell’$, otherwise $A$ would not dominate $\ell^*$. Further, $x$ cannot be $\ell^*$ since $\ell’ \rightarrow \ell_i$, and $\ell^* \not \rightarrow \ell_i$ by the IH. So $x$ must be a select. Consider a distinct post-dominator, $v$, of $x$. Like the last case, it must be that $\ell’ \rightarrow v$ and $\ell^* \rightarrow v$. So $w$ dom $\ell^*$ and $w$ dom $\ell’$. By assumption that $A$ does not dominate $\ell’$, then no member of $A$ precedes $\ell’$. This however, is not possible as in order for $\ell’$ to be in $init_{i + 1}$, $mark_{\ell’}$ must be $\ell’$, but in order for $mark_w = A$ and no member of $A$ to preceded $\ell’$, $mark_{\ell’} \subseteq A$. So we can ignore this case.

$\square$

L: Finally, we have our final result. I’m informally defining two initialization sets to be equivalent if they result in the same set of recalculations along all paths.

Thm (Hoist): $\forall S \not \equiv init_f, v.\ \exists P = (r, …, f). \ calc_v^S(P) \not \subset calc_v^{init_f}(P)$

Proof.

Observe that $\forall \ell \in init_f$, $S$ dom $\ell$ otherwise $\exists P’ = (r, …, \ell)$ not containing any member of $S$. By Hoist Induct, $\forall S’ \subseteq S$ such that $\ell \rightarrow_s S’$, $\exists P’’ = (\ell, …, f)$ such that $calc_v^{S’}(P’’) \not \subset calc_v^{init_f}(P’’)$. Joining $P’$ and $P’’$, along with the assumption that $S \not \equiv init_f$ yields a desired path.

So consider arbitrary $s \in S$. Note that $s \not \in mark_r$, otherwise $s \in init_f$. So by H3, $\exists \ell \in init_f, P’ = (r, …, \ell)$ such that $calc_v^{mark_\ell}(P’) \not \supset calc_v^{S \cap P’}(P’)$. By Hoist Induct, there exists $P’’ = (\ell, …, f)$ such that $calc_v^S(P’’) \not \subset calc_v^{init_f}(P’’)$. Joining $P’$ and $P’’$ along with the assumption that $S \not \equiv init_f$ yields a desired path.

$\square$

L: Well that was quite exhausting. I initially started out thinking this problem would be at least NP, and starting trying to prove something to that effect but I couldn’t make progress. Thanks to your help, I did end up realizing that a version based only on CFG structure would be exponentially if perfectly optimal. Thanks for helping me work through this Agrippa!

A: No problem. This was fun.

Livia and Agrippa exchange goodbyes and Livia walks away


These two algorithms were implemented in Caiman. My previous approach wasn’t even correct (or rather, complete, as it could fail to find an initialization set when one exists), so already this is a big improvement. Admittedly, the condition that a hoist step must always improve can easily create situations that prevent the algorithm from reaching a global optimum. But the point of this requirement is to prevent any need for backtracking. For example, we could hoist if the new location is at least as good as the previous locations, but I wouldn’t want to hoist an initialization, and then, after movements of other variables, learn that it would have been better to keep it where it was. This is really an effort to keep the frontend polynomial time (even if it’s a polynomial that doesn’t really scale super well).

I think the overall result of this hoist algorithm is something that won’t really optimize your code a whole lot, but will “take the hint” if you write your code in a way that “pushes” something to be initialized at a given hole. For example, without the hoist algorithm, you might write a schedule that lays out all the needed dependencies of a variable, but if you don’t use said variable while those dependencies are live, it’s likely the compiler will end up recalculating them. With the hoist algorithm, it’s much more likely to initialize something “where you would expect”, but the algorithm isn’t too great at finding a better initialization location than you might have intended. All of this is very qualitative and anecdotal, but I hope I’m getting the fuzzy idea across.

I’ll be the first to admit I sort of got tired by the end of the Hoist phase proofs, as you can probably tell by the reduction in dialogue (not like it was dialogue-heavy in the first place). But I kind of liked this style a bit, at the very least it saved me from the slightly clunky expressions like “now you might have noticed …” or “you may be wondering …”.

Now there’s another use-case where the algorithms here could apply. Recall the focus here was purely on where to initialize variables that a user defines, but doesn’t initialize. There’s also a similar problem of where to create variables a user doesn’t even define. For example, in the following schedule:

fn foo(a: i64) -> i64 impls value, time, spatial {
    ???; // hole0
    let x = bar(a, b)
    ???; // hole1
    let r = if a < 0 {
        ???; // hole2
        x + z
    } else {
        ???; // hole3
        x * z
    };
    r + a
}

Here z is used, but never defined. The current implementation is to define and initialize it at the last hole that dominates all uses. So in this case, hole1. But that’s a somewhat arbitrary choice. We could define and initialize it at hole0, or maybe we define it at hole1 and initialize it at holes hole2 and hole3. For example, we could choose to define and initialize the variable together, and choose the initializing locations based on the algorithms we just talked about. It will likely produce better code in quite a few situations. The reason I didn’t do this is because definitions and uses are needed fairly early in the frontend analysis pipeline while computing types of each variable needs to be done fairly late, after many other analyses are done. So having a circular dependence between these two would effectively require the entire frontend to iterate until convergence.


  1. It does not necessarily makes it “worse”, just “not better”. ↩︎