Tags: pl, projects, compilers, graph-theory
I recently added holes in the caiman frontend. These enable a user to only write part of a schedule, and have the compiler determine the details, using user provided specs. A hole is essentially an expression or statement that can stand in for anything the compiler might want to put there. The frontend is responsible for performing type deduction, and part of a type in caiman is a linear representation of whether the type is “usable” (initialized and not used) or “dead” (uninitialized or initialized but consumed). So the problem I faced when working on this is to determine which hole(s) should initialize a mutable variable if the user doesn’t do so.
I’ll explain this further in an example. Consider the following excerpt of a schedule in Caiman:
var x;
???; // hole1
let z = if a < 0 {
let t = foo();
??? // hole2
} else {
let y = bar();
??? // hole3
};
???; // hole4
let k = z + x;
???; // hole5
Based on what we know so far, it would seem that x
has to be usable by the
addition of x
and z
. This would rule out initialization at hole5
. Also, we
can immediately know that x
must be initialized at some point that is dominated by
its definition. But which set of holes would be best?
Suppose we choose hole1
. Well, if the value of x
depends on the value of z
(for example if x = z + 1
) that means initializing x
at hole1
requires
recomputing z
since it is not yet available. We could exclude all holes that
are not dominated by definitions of values that x
depends on, but we could
easily write a schedule that forces something to be recalculated. Continuing with
the example of x = z + 1
, we could put a use of x
before z
is defined, which
would mean that we have no choice but to recalculate z
.
So perhaps we should choose hole4
, which is “as late as possible”. But what if
x
depends on t
or y
, then it would be better for x
to be initialized
at {hole2, hole3}
to avoid recalculation.
But as an added complication, we aren’t just dealing with one variable, we
are dealing with as many as the user puts in their schedule. These values
potentially have dependencies between them. For example if some variable y
must
be initialized at hole1
, and the value of y
depends on the value of x
, then the value
of x
is already being initialized at hole1
, and we might as well assign this value
to the variable x
.
Note that any value that is recalculated at a hole (due to initialization of a variable that depends on it) should be thought of as existing only for that hole.
For example in the following:
var x;
???; // hole1
var z;
???; // hole2
If the values of x
and z
both depend on some value y
, and we decide to initialize
z
at hole2
and x
at hole1
, then y
will be recalculated twice. By recalculation,
I mean calculation at a hole where it’s needed to initialize a variable.
In the rest of this post, I’ll use the term “[mutable] variable” to refer to a location a value is held. This corresponds to the source-level concept of a variable. I’ll use the term “value” to refer to the actual value stored in the variable. The value is what is described in the value specifications in caiman. For simplicity I’ll kind of use these two a bit interchangeably when a distinction isn’t really necessary. But it may be useful to keep in mind that definitions, initializations, and locations are properties of a variable while dependencies, recalculations and true/usable uses are properties of a value.
Let’s first define some terms we’ll use throughout this post. All of the graphs we will consider are instruction-level CFGs, in particular this means they are directed graphs with a single start node. All of our CFGs are also normalized to have a single exit node as well. In our discussion, let $r$ be the unique starting root node of the CFG and $f$ the unique final node.
The control flow graphs in caiman are graphs of programs in continuation-passing style. Graph theoretically, they are what I’m defining as dom-postdom graphs.
Def: A connected, directed acyclic graph $G = (V, E)$ is a dom-postdom (DP) graph if
Def: A CFG in the Caiman frontend is called a caiman graph (CG) and it is a DP graph with the added condition that if a node has more than one immediate successor, it dominates those immediate successors.
Here’s two useful facts about DP graphs:
Lemma (DP 2): $\forall v, u, u_2 \in V(G).\ v \text{ dom } u_1 \land v \text{ dom } u_2 \Rightarrow \exists w.\ v \text{ dom } w \land w \text { pdom } u_1 \land w \text{ pdom } u_2$.
Proof.
By the definition of DP, either $\text{dom}_s(v) = \emptyset$ in which case $ v = u_1 = u_2 = w$ and we’re done, or $\exists w’.\ w’ \text{ pdom } v$ and $v \text { dom}_s w’$. WLOG and FSOC, suppose $w’ \not \text{pdom } u_1$. Then $\exists P = (u_1, …, f)$ such that $w’ \not \in P$. Suppose $w’ \not \text{dom } u_1$. Then, since $v \text { dom } u_1$ there exists a $P’ = (v, …, u_1)$ not containing $w’$ so concatenating $P$ and $P’$ yields a $v-f$ path not containing $w’$. This contradicts the definition of post-domination. Otherwise, if $w’ \text{ dom } u_1$, then repeat this process with a post dominator of $w’$ chosen as the new $w’$.
$\square$
Lemma (Scoping): $\forall v, w, u$ st $v$ dom $w \land w$ postdom $v \land v \rightarrow u \land u \rightarrow w \Rightarrow v$ dom $u \land w$ postdom $u$.
Basically, this says the nodes “in-between” $v$ and $w$ are dominated by $v$ and post dominated by $u$.
The way I think of this is that these graphs have “scopes” in which every node within the scope (depicted as the gray box with “…” here) has a shared dominator and shared post-dominator.
Proof.
Take an arbitrary $v, w, u$ such that the hypothesis holds. Since $v$ dom $w$, then every $r$-$w$ path must pass through $v$. As $v \rightarrow u$ and the graph is acyclic, then $u \not \rightarrow v$. But since $u \rightarrow w$, then it must be that every $r$-$u$ path must contain $v$, otherwise there would be a $r$-$w$ path not containing $v$. So $v$ dom $u$. Now, since $w$ postdom $v$, every $v$-$f$ path must contain $f$. We again know that $w \not \rightarrow u$, so in order for $w$ to be a post dominator, every $u$-$f$ path must contain $w$ since $v \rightarrow w$. Thus $w$ postdom $u$.
$\square$
For the graphs in Caiman, we also can say a little more.
Observation (CG1): $\forall v,w.\ vw \in E(G) \Rightarrow v$ dom $w$ or $w$ postdom $v$.
Proof. If the only successor of $v$ is $w$, then $w$ postdom $v$. If $v$ has more successors besides $w$, then by definition $v$ dom $w$. $\square$
Lemma (CG2): $\forall u, v.\ u \rightarrow v \Rightarrow \exists w. \ w$ pdom $u$ and $w$ dom $v$.
Proof.
By structural induction on $u$:
u = None()
. Then here $u = v = w$.u = Next(s)
. If $s = v$ or $v = u$ then let $w = u$.
Otherwise, by the IH, there exists a $w’$ such that $w’$ postdom $s$ and $w’$ dom $v$. Since
$s$ postdom $u$ then let $w = w’$.u = Select(a, b)
, focusing on
the two-child case which is all we need. However, this generalizes to an arbitrary number
of children. If $v = u$, $v = a$, or $v = b$, then let $w = u$. Otherwise let $z$ be a node
such that $u$ dom $z$ and $z$ postdom $u$ which exists by definition of DP graphs.
Suppose $v \rightarrow z$. Then by scoping, $u$ dom $v$. Thus, let $w = u$. Otherwise,
$z \rightarrow v$. By the IH, $\exists w’$ such that $w’$ postdom $z$ and $w’$ dom $v$.
As $z$ pdom $u$, then $w’$ pdom $u$ so let $w = w’$.$\square$
Lets’ also define some other terms and notation we’ll use throughout this post:
In the next sections, we’re going to do something weird. I’m going to style most of the rest of the post as a dialogue. It seems to work well for philosophy, although admittedly I haven’t read enough philosophical dialogues to claim I know what I’m doing here. The dialogue will be between Livia Drusilla (whom I will abbreviate as L) and Marcus Aggripa (whom I will abbreviate as A), since I’m sure this is what dead Romans would be doing if there is some kind of afterlife. The goal here is to try and provide a more natural development of ideas, rather than just presenting them to you. Of course, even this dialogue is cleaned up and reordered quite a bit compared to how the ideas actually evolved.
The algorithm will proceed in two phases, one phase which places initializations of variables “as late as possible”, and another hoist phase which tries to find earlier, more optimal initialization points. So let’s see how this goes…
Livia excitedly walks into the Roman forum, where she spots Agrippa, drinking espresso and admiring the ruins
L: Agrippa! I think I have an optimal, polynomial time solution to the initialization problem.
A: Optimal and polynomial time you say? Well I’d sure like to hear it.
Livia writes down the following on a nearby wall in chalk
init[v] = {}
use[v] = { locations of uses of variable v }
I = { mutables } - { mutables with a value returned dead }
for l in topo_order(L).rev():
for v in I:
if use[v].all(lambda u: u not dom l and
exists_path_not_containing(init[v], path_src: l, path_dest: u))
and def(v) dom l: #cond1
init[v] = init[v] + {l}
if init[v] dom use[v]: #cond2
I.remove(v)
A: So your algorithm looks at all holes going up from the final CFG node, “marking” holes which are not dominated by a use and stopping once you reach an initialization set?
L: Yes! It basically initializes the variables “as late as possible” since the only thing initializing them earlier can do is make it so we need to recalculate something that’s available later.
A: I see. Then I think your optimality claim assumes that once a variable is initialized or defined, it stays live until the end of the program. But this isn’t true, implicitly initialized values are live only for the hole they are recalculated at, and variables can be consumed or go out of scope. For example, suppose we have a CFG like so, where $u$ is a use and $h_n$ are holes:
A: Then we could have a situation where the best initializing set is $\{h_1\}$, but your algorithm will choose $\{h_2, h_3\}$. I can construct a program matching this situation like so:
val spec() -> i64 {
y :- foo() + foo()
v :- y * y
z :- y + y
r :- 0 if z < 0 else 1
returns r * v
}
fn sched() -> i64 impls spec {
var v;
let r = if ??? /*hole1*/ < 0 {
??? // hole2
} else {
??? // hole3
};
r * v
}
A: Here, it would be best to initialize v
at hole1
since hole1
must initialize
z
, and z
and v
both depend on y
. So hole1
must initialize y
and since
the user didn’t create a variable for it, the value will not persist after the hole.
The algorithm will
initialize v
at hole2
and hole3
, which will cause the recalculation of y
twice along
any given path instead of once.
A: We could avoid this problem by allowing ourselves to change the program and create variables to store these intermediate results to reuse them in later holes. But I don’t know if this will work in general, and regardless, part of Caiman is allowing the user to discover the cost of tradeoffs such as recalculation vs caching, so always caching results might be a problem if the user really wanted to recalculate them.
L: Minerva’s owl! You’re right. And considering this, I think an optimal solution must be exponential if it’s based only on the structure of the graph of holes. Take an arbitrary $S \subseteq L$ such that $S$ is a valid initializing set of $v$. Then take the spec, and arbitrarily add $m$, for some very large $m$, dependencies to the value of $v$ which are never used anywhere else in the program. Then for each $s \in S$, we create variables right before $s$ initializing each of the $m$ values and consume them right after $s$. We therefore create conditions such that the optimal initialization set is $S$ exactly, and so we’d miss it if we didn’t check the subset $S$. Thus, an optimal solution must examine all subsets, which is exponential in the number of holes in the graph.
L: In any case, I think I still have a proof of correctness…
Thm: If there exists an initializing set of $v$, $init[v]$ will be an initializing set.
Proof.
By contrapositive, assume $init[v]$ is not an initializing set.
Then let $P = (r = p_0, …, p_k = u)$ be a
path from $r$ to a $u \in use[v]$ such that $P \cap init[v] = \emptyset$ and $P \cap use[v] = \{u\}$.
Then $init[v] \not \text{dom } use[v]$; so all locations are considered by
the algorithm as cond2
is always false. Since $(p_0, …, p_{k - 1}) \cap use[v] = \emptyset$ then no
use dominates any $p_0, …, p_{k - 1}$. Note that for each $p_i$ for $i \le k - 1$,
there is a path to a use, namely $(p_i, …, p_k = u)$. Thus it must be that if any
$p_0, …, p_{k - 1}$ were in $L$, the algorithm would select them in $init[v]$
as cond1
would hold. But by assumption, $init[v]$ is not an initializing set
even though it contains every hole on any path from $r$ to a “first use”. So
we must conclude that there is no valid initializing set.
$\square$
A: I think I can buy that. I wonder, if this algorithm really does initialize the variables as late as possible, then perhaps we could follow it up with something like partial redundancy elimination to hoist the initializations to locations that reduce the dependencies that need to be recalculated.
L: I used the “as late as possible” fact in my bogus proof of optimality, so I’ll give you the proof. Starting with a helper lemma.
Lemma (O1): $\forall v. \ell \in init[v] \Rightarrow \exists u \in use[v]$ st $u$ postdom $\ell$.
Proof.
Since $\ell \in init[v]$, $use[v] \not = \emptyset$ and $\forall u \in use[v]$ $u \not \text{dom } \ell$ by the algorithm. Since $v$ dom $\ell$ and $v$ dominates all uses, then if we take an arbitrary $u \in use[v]$ we’d have that $v$ dom $u$, and $v$ dom $\ell$. So by DP2, $\exists w$ st $w$ postdom $\ell$ and $w$ postdom $u$. Thus $w \in use[v]$ by construction of the use set and thus we reached out goal.
$\square$.
L: And this is used in the proof of the next lemma…
Lemma (O2): $\forall v, S \subseteq L. \exists u \in use[v]. \ell \in init[v] \land (init[v] \setminus \{\ell\}) \not \text{dom } \{\ell\} \land \ell \rightarrow_s S \Rightarrow \exists P = (\ell, …, u)$ st $P \cap S = \emptyset$.
L: This lemma basically states that for a location $\ell$ in an initializing set produced by the algorithm, any initializing set consisting only of locations that strongly succeed $\ell$ will not suffice; there will be a path from $\ell$ to some use not containing any element of $S$.
Proof.
Suppose the contrapositive and that there exists a minimal initializing set $S \subseteq L$ st every path from $\ell$ to a use contains an element of $S$. By the contrapositive of O1, there must be a use $u$ such that $u$ postdom $\ell$, as otherwise, $\ell \not \in init[v]$. So every $\ell$-$f$ path must contain $u$. By assumption that $S$ is an initializing set, every $\ell$-$u$ path contains a member of $S$. By the algorithm, since $\ell \rightarrow_s S$, every member of $S$ will be considered before $\ell$. So consider an arbitrary $s \in S$. Then we have the following cases based on the conditions of how an element is added to $init[v]$ in the algorithm:
cond1
, there are three ways this could occur:So the only case we have left to consider is case 2. So assume $S \subseteq init[v]$. So every $\ell$-$f$ path contains a member of $init[v]$ at the time $\ell$ is considered. Thus $\ell \not \in init[v]$ reaching our goal.I claim, by a lemma I haven’t defined yet but will call O3, that this means that $\ell \not \in init[v]$, reaching our goal once again.
$\square$
A: So you aim to prove an equivalence between $\forall u$ exists_path_not_containing(init[v], path_src: l, path_dest: u)
in the algorithm
and some condition like exists_path_not_containing(init[v], path_src: l, path_dest: f)
? I suppose this
effectively removes the quantification over $use[v]$.
L: Yes. The need to do this came about as I found two different conditions for the algorithm, one that was well suited to proving the algorithm correct, and this one which was helpful in proving the “late as possible” claim about the algorithm. The intuition for their equivalence is that we defined a post-dominator of a use to also be a use. So based on this and the structure of a graph, if there is an $\ell$-$f$ path not containing $init[v]$, and $init[v]$ is an initializing set, then it must be that $f$ is not a use. But $f$ post-dominates every node, therefore there cannot be any uses.
L: Formally, we have …
Lemma (O3): Let $S$ be a minimal initialization set of a variable $v$ (so no use dominates a member of $S$). Then
$$\forall \ell \in S, \exists P = (\ell, …, f). P \cap (S \setminus \{\ell\}) = \emptyset \Leftrightarrow \exists u \in use[v], P’ = (\ell, …, u). P’ \cap (S \setminus \{\ell\}) = \emptyset$$
Proof.
$\Rightarrow$: Suppose $\exists$ a $\ell$-$f$ path $P$ st $P \cap S \setminus \{\ell\} = \emptyset$. By O1, $\exists u \in use[v]$ st $u$ postdom $\ell$. So $P = (\ell, …, u, …, f)$. Then we take the prefix path of $P$, $P’ = (\ell, …, u)$ which is a path to a use such that $P’ \cap S \setminus \{\ell\} = \emptyset$.
$\Leftarrow$: For the other direction. Suppose $\exists P = (\ell, …, u)$ such that $P \cap S \setminus \{\ell\} = \emptyset$. Then $\exists P’ = (\ell, …, u, …, f)$ for which $P$ is a prefix since all nodes reach $f$. Consider the suffix path of $P’$ $P’’ = (u = p_0, …, p_k = f)$. We will induct on $P’’$ to show every $n \in P’’$ is a use, dominated by a use, or not dominated by $v$.
Base Case: $p_0 = u$ which is a use.
IC: Assume the inductive hypothesis so $p_i$ is a use, dominated by a use, or not dominated by $v$. Then we have the following cases:
Finally we conclude by noting that we constructed a $u$-$f$ path such that every node in it is a use,
dominated by a use, or not dominated by $v$. The latter two cases show the node cannot be in
$init[v]$ by violation of cond1
. The first condition shows the node cannot be in
$init[v]$ by the assumption that our use set and hole set are distinct. So this
path contains no elements of $init[v]$, reaching our goal.
$\square$
Livia sits down at the table with Agrippa, visibly distressed
L: This is all well and good, but I can’t believe I spent all this time thinking I was proving something useful when I didn’t realize I was making a terrible assumption about value persistence.
A: Well, I may have an idea. We could start by using this first algorithm to create a “baseline” initialization set. Then, for each variable, we could try and hoist the initializations earlier so long as we put them at a strictly better location. This won’t be perfect; as a counterexample we can construct the following:
A: Suppose that x
and z
depend on t, h, k
. The algorithm will initialize x
and
z
at the holes right before x + z
and x * k
. But the globally optimal position
would be at the hole right before bar(t)
. The hoist idea I have won’t be able
to hoist x
because doing so would temporarily make things not better1. We’d have to recalculate
h
and k
now even though we save t
. Likewise for z
. Now if we hoisted z
and x
“at the same time”,
this would avoid this issue but require exponential time as a general solution that does this
would need to look at every subset.
A: I further don’t think it’s a good idea to arbitrarily decide that certain values are more or less worth it to recalculate. For example, I don’t think we should make choices that would save 100 recalculations but cost an extra recalculation of some value. It could be that one extra recalculation has a bigger impact on runtime than the saving of the 100 other recalculations.
Livia, regaining some of her optimism, stands up and paces around for a bit. Eventually, she gets an idea and writes the following on a wall
mark = {} # The set of "last best" locations
init[v] = old_init[v]
for v in mutables:
for l in topo_order(L).rev():
if l in old_init[v]:
mark[l] = {l}
if old_init[v] pdom {l} and def(v) dom l:
match l:
case Next(lp):
mark[l] = mark[lp],
case None:
mark[l] = {},
case Select(a, b):
if mark[a].empty() or mark[b].empty():
mark[l] = {}
elif mark[a] pdom mark[b]:
mark[l] = mark[a]
elif mark[b] pdom mark[a]:
mark[l] = mark[b]
else
mark[l] = mark[a] + mark[b]
if calc(v, l) < intersection([calc(v, lp) for lp in mark[l]]):
mark[l] = {l}
if init[v].any(lambda x: mark[l] dom {x}):
init[v] = mark[l] + init[v]
- [x for x in init[v].filter(lambda y: mark[l] dom y)]
L: Here, calc(v, l)
is a multiset of nodes that would need to be recalculated
given that v
is initialized at l
. old_init[v] pdom {l}
is true if every path
from l
to the final node passes through an element of old_init[v]
.
This is somewhat analogous to saying an initialization of v
is anticipated
at l
(in reference to anticipated expressions of regular partial redundancy elimination).
So basically, the algorithm
starts at the previous initialization set, and then walks up the CFG towards
the root. If it encounters any locations such that initializing v
at that
location results in less recalculations than the intersection of the previous
“best”, we mark it. If at any point the mark set dominates some subset
of the current initialization set, we add the mark set to the initialization set
and remove the dominated subset.
A: Isn’t it possible that hoisting a variable opens up opportunities for
other variables to be hoisted? For example, if we hoist a value, it might become
attractive to hoist the values it depends on and values that depend on it. So perhaps
we could iterate this procedure until convergence? If we do that, and there are $m$
variables and $n$ nodes in the CFG, we get a rough complexity of $O(m^2n^3 \log n)$,
because iterating until convergence can be at most $O(mn \log n)$ since we can only hoist a variable
$n \log n$ times. This also assumes we can check calc(v, l)
in linear time wrt $n$.
This isn’t super stellar, but it is still pseudo-polynomial.
L: You’re right, then we can think of old_init[v]
as init[i-1][v]
and the current
init as init[i][v]
. I think the statement of “optimality” we can try and prove is
$\forall S, v, \exists P = (r, …, f).\ {calc}_v^S(P) \not \subseteq {calc}_v^{{init}_f}$.
Where ${init}_f$ is the last initializing set we compute. This would say that for
a given variable. We’re going to assume for the rest of this that the initializing set
$S$ is not equivalent to $init_f$. By that I mean that there exists a path $P’$ where
$calc_v^S(P’) \not = calc_v^{init_f}(P’)$. So what this statement of “optimality” is
saying is that when examining the values we must recalculate for a given variable,
any other initializing set that is not equivalent to $init_f$ will contain a path
where it is potentially worse. This statement gives us “true”/“global” “optimality”
when a schedule contains only one mutable variable.
L: Since the algorithm looks at one variable at a time, another way of thinking about this is our algorithm starts with a “baseline” and monotonically improves it until it reaches a point where it can no longer get better.
A: Maybe we can try and prove this by induction. If we say at each step, there isn’t a better choice of initialization points from subsets of $L$ that succeed the current initialization set, then at the final iteration all we’d need to show is that the algorithm considered initializations that precede the current init set.
Thm (Hoist Induct): $\forall v, S.\ \ell \in init_i[v] \land \ell \rightarrow_s S \Rightarrow \exists P = (\ell, …, f). \ calc_v^S(P) \not \subset calc_v^{init_f}(P)$
Proof.
By induction on $i$, the number of iterations of computing $init_i$.
Base Case: The initial initializing set is the result of the previous algorithm. By the “push-down” lemma (O2) this is true since any $S$ such that $\ell \rightarrow_s S$ must not be an initializing set.
IC: Consider $\ell_i \in init_i[v]$ and $\ell_{i + 1} \in init_{i + 1}[v]$ such that there is a $\ell_{i + 1}$ dom $\ell_i$. Such an $\ell_{i + 1}$ and $\ell_i$ exist as its a condition for adding any new elements to the initializing set. Assume $\ell_i \not = \ell_{i + 1}$ as if it did, we’d be done by our inductive hypothesis. Let $P = (\ell_{i + 1} = p_0, …, p_n = \ell_i)$ be a $\ell_{i + 1}$-$\ell_i$ path. Observe that $P \cap S \not = \emptyset$, otherwise we know by the hypothesis there exists a $P’ = (\ell_i, …, f)$ such that $calc_v^S(P’) \not \subset calc_v^{init_f}(P’)$. Joining $P$ and $P’$ yields us an $\ell_{i + 1}$-$f$ path such that our goal holds since by the algorithm, and that $\ell_{i + 1}$ dom $\ell_i$, $calc_v(\ell_{i + 1}) < calc_v(\ell_i)$.
Observe $mark^{i + 1}(\ell_{i + 1})$ does not contain $s$, since $\ell_{i + 1}$ was added to the initializing set, this means that $mark^{i + 1}(\ell_{i + 1}) = \{\ell_{i + 1}\}$. Since $init_i[v]$ postdom $\{\ell_{i + 1}\}$ and S, I claim (and will refer to this as H3) that there exists a $z \in init_i[v]$ so that for $P’ = (\ell_{i + 1}, …, z), calc_v^S(P’) \not \subset calc_v^{mark_{\ell_{i + 1}}}(P’)$. I claim, (and will refer to this a H2) that $P’ \cap init_{i + 1}[v] = \{\ell_{i + 1}\}$. This claim is basically asserting that there is no path between any two locations in an initializing set. These two facts together give us $calc_v^S(P’) \not \subset calc_v^{init_{i + 1}}(P’)$.
$\square$
A: Wait, your H2 claim doesn’t work for the first inductive case ($i = 0$). That is, our phase 1 algorithm doesn’t guarantee that there isn’t a path from a location in $init_0[v]$ to another location in $init_0[v]$. For example, suppose that we have a graph like this:
A: Here, holes $h_1$ and $h_2$ would get selected in the initializing set. First $h_1$ would get chosen since no uses dominate it, and there exists a path from it to a use with no selected holes. But since $h_1$ alone does not dominate all uses, the algorithm will continue, eventually selecting $h_2$ since no uses dominate it and there exists a path from it to a use that doesn’t contain any selected holes.
L: Hmm. You’re right. It could be that we add a location to the initialization set, but then need to add a location that dominates it in order to cover a different use. I think we can fix this by removing all elements of the initialization set which are collectively dominated by other elements of the initialization set. In other words, before moving onto phase 2, we can just remove elements from the initialization set until it becomes minimal.
L: So now, we have two holes we have left to fill. Before doing that, let me give you one more helper lemma.
Lemma (H1): $\forall n \in V(G).\ mark_n = mark_k \land k$ postdom $n \Rightarrow \exists P = (n, …, k), \forall s \in P. \ calc_v(s) \not \subset mark_k$.
Proof.
By structural induction on a Caiman CFG.
n = None()
: Here $mark_n = \emptyset$ and $calc_v(n) \not \subset \emptyset$n = Next(m)
: Suppose $mark_n = mark_m$. Then $calc_v(n) \not \subset mark_m$.
By IH if $mark_m = mark_k$ where $k$ postdom $m$, then $\exists P = (m, …, k)$
such that $\forall s \in P.\ calc_v(s) \not \subset mark_k$. By appending $n$
to the front of the path, we get a desired $n$-$k$ path since $m$ postdom $n$. Otherwise,
$m = k$.n = Select(a, b)
: Suppose $mark_n = mark_a \cup mark_b$. Then we reached our goal
vacuously by assumption that $mark_b \not \text{pdom } mark_a$ and vice versa.
Now suppose. WLOG that $mark_n = mark_a$. Then $mark_a = mark_k$. Here we finish the case
by following the same logic as in the Next(m)
case.$\square$
Claim (H3): $\forall init, x, S, y \in S, x \rightarrow y \land y \not \in mark_x \Rightarrow \exists z \in init, P = (x, …, z).\ calc^{mark_x}(P) \not \supset calc^{S \cap P}(P)$
L: The idea here is basically if there is some parent node that doesn’t have a child node in its mark set, then there is some path where we can potentially do better.
Proof.
By structural induction.
x = None()
. Then $x = y = z = f$. The only place to initialize is $x$ itself,
so this is impossible as $y = x$ and $y \not \in mark_x$.x = Next(m)
and $mark_x = \{x\}$. Then by the algorithm $calc_x \subset calc_\ell$
for all $\ell \in mark_m$. Since $x \rightarrow y$ (by this case that $y \in mark_m$ and $y \not \in mark_x$)
and some $z \in init$ postdom $y$. Then the path $P = (x, …, y, …, z)$ would be a desired path where
$calc^{mark_x}(P) \not \supset calc^S(P)$ if $y \in mark_m$. If $y \not \in mark_m$, by IH
$\exists P’ = (m, …, z’)$ such that $calc^{mark_m}(P’) \not \supset calc^{S \cap P’}$. Consing
$x$ onto $P$ yields a path where $calc^{mark_x}(P) \not \supset calc^{S \cap P}(P)$.x = Next(m)
and $mark_x = mark_m$. By assumption $y \not \in mark_m$. So apply
IH and cons $x$ as before.x = Select(a, b)
and $mark_x = mark_a \cup mark_b$. WLOG,
$y \not \in mark_a$. Then by IH $\exists P’ = (a, …, z)$ st
$calc^{mark_a}(P’) \not \supset calc^{S \cap P’}(P’)$. Consing $x$ onto $P’$
yields a path $P = (x, …, z)$ st $calc^{mark_x}(P) \not \supset calc^{S \cap P}(P)$
since $calc_v(x) \not \subset calc^{mark_x}(P)$.x = Select(a, b)
and $mark_x = mark_k$ for $k$ postdom $x, a, b$. By lemma H1,
$\exists P’ = (x, …, k)$ st $\forall p \in P’$ $calc_v(p) \not \subset mark_k$. By
assumption $calc_v(x) \not \subset \bigcap mark_k$. By IH, $\exists P’’ = (k, …, z)$
st $calc^{mark_k} \not \supset calc^{S \cap P’’}(P’’)$. Thus, the path $P = (x, P’, P’’)$
is a $x$-$z$ path where $calc^{mark_x}(P) \not \supset calc^{S \cap P}(P)$ since
any initialization at $x$ or on $P’$ is no better than the initialization by $mark_k$.
Observe that if $\not \exists y’ \in S$ st $k \rightarrow y’$,
then $P$ is still a desired path as $S$ must then initialize at $x$ or on $P’$.x = Select(a, b)
and $mark_x = \{x\}$. By assumption $y \not = x$. We can reduce
this case to an argument of x = Next(Select(a, b))
, reusing the other cases.$\square$
A: I foresee that we may have a slight problem. So far, these facts required using a strict subset. That is, we seem to be working towards a statement along the lines of $\exists P$ such that $calc_v^S(P) \not \subset calc_v^{init_f}(P)$, which doesn’t seem all that useful. It would say there is some path where our solution is not worse. So it could be that on one path, we match some other solution, but on all other paths, we are worse than other solutions.
L: Oh dear, not another oversight! I’ll have to think about how best to solve this. Intuitively, it feels like if we can narrow down every path to being $\supseteq$ to the calculations of our solution, then we can just assume that the initialization set we’re examining is not equal to the $init_f$. Again, this is with respect to a single variable, so we’re still dealing with a “local optima” instead of a global one.
Claim (H2): $\forall \ell_{i + 1} \in init_{i + 1}, \ell_i \in init_i, P = (\ell_{i + 1}, …, \ell_i).\ P \cap init_{i + 1} = \{\ell_{i + 1}\}$
L: The point of this claim is to argue why an initialization set produced by the algorithm cannot have two members that can reach each other.
Proof.
By induction on $i$.
Base Case: Here $init_0[v]$ is the initialization set produced by the phase 1 algorithm.
Let’s assume we process this set such that $\forall \ell_1, \ell_2 \in init_0[v].\ \ell_1 \not$ dom $\ell_2$
and $\ell_2 \not$ dom $\ell_1$. We WTS that $\ell_1 \not \rightarrow \ell_2 \land \ell_2 \not \rightarrow \ell_1$.
Suppose FSOC, and WLOG there exists path $P_1 = (\ell_1, …, \ell_2)$ (so $\ell_1 \rightarrow \ell_2$).
Since $\ell_1 \in init_0[v]$, then there exists $P_2 = (\ell_1, …, u)$ for $u \in use[v]$ such
that $P_2 \cap init_0[v] = \{\ell_1\}$. This is true by definition of the algorithm (cond1
).
Consider the last shared node between $P_1$ and $P_2$ (which could be $\ell_1$ itself).
Let’s call this node $w$. If $w$ is a select, then it must have a distinct post-dominator,
and by assumption that $w$ is the last shared node in the paths, this post-dominator
must succeed $\ell_2$ and $u$. So by scoping, $w$ dom $\ell_2$. This means that
$\ell_2 \not \in init_0[v]$, reaching our goal.
Otherwise, if $w$ is not a select, then it must be $u$ itself. So $u \rightarrow \ell_2$.
By Lemma CG2, $\exists w’$ such that $w’$ postdom $u$ and $w’$ dom $\ell_2$. Therefore,
$w’$ is a use and $\ell_2$ is dominated by a use. So by cond1
of the phase 1 algorithm,
$\ell_2 \not \in init_0[v]$, reaching our goal again.
IC:
None()
.This is vacuously true.Next(a)
and $mark_\ell = mark_a$. If $mark_a = \emptyset$, we have
vacuous truth. So assume this is not the case. Then Applying the IH gives us that $a \rightarrow \ell_i$
for some $\ell_i \in init_i$ thus $\ell \rightarrow init_i$.Next(a)
and $mark_\ell = \{\ell\}$. If $\ell \in init_i$, we’re done.
So suppose this isn’t the case. Then $calc_v(\ell) \subset \bigcap_{\ell’ \in mark_a} calc_v(\ell’)$.
Thus $mark_a \not = \emptyset$ and following the logic of the previous case we get $\ell \rightarrow a \rightarrow init_i$.Select(a, b)
. If $mark_\ell = \{\ell\}$, then $\ell \in init_i$ and we’ve reached
our goal. So suppose WLOG that $mark_\ell = mark_a$. Following the logic of case 2, we get
that $\ell \rightarrow a \rightarrow \ell_i$. Finally, it could be that $mark_\ell = mark_a \cup mark_b$.
If $mark_\ell = \emptyset$, we’d have vacuous truth, so assume this isn’t the case. WLOG
assume $mark_b \not = \emptyset$. Then following the logic of case 2. We have $\ell \rightarrow b \rightarrow \ell_i$.Now suppose there exists some $\ell_{i + 1}$ such that $\ell_{i + 1} \rightarrow \ell’$ for $\ell’ \in init_{i + 1}$. By the claim, it must be the case that $\ell’ \rightarrow \ell_i$ for some $\ell_i \in init_i$. Let $A$ be the set of nodes added to $init_{i + 1}$ along with $\ell_{i + 1}$. Observe that $A$ dominates some $\ell^* \in init_i$. Thus $\ell_i \not = \ell^*$, since $A$ does not dominate $\ell’$.
$\square$
L: Finally, we have our final result. I’m informally defining two initialization sets to be equivalent if they result in the same set of recalculations along all paths.
Thm (Hoist): $\forall S \not \equiv init_f, v.\ \exists P = (r, …, f). \ calc_v^S(P) \not \subset calc_v^{init_f}(P)$
Proof.
Observe that $\forall \ell \in init_f$, $S$ dom $\ell$ otherwise $\exists P’ = (r, …, \ell)$ not containing any member of $S$. By Hoist Induct, $\forall S’ \subseteq S$ such that $\ell \rightarrow_s S’$, $\exists P’’ = (\ell, …, f)$ such that $calc_v^{S’}(P’’) \not \subset calc_v^{init_f}(P’’)$. Joining $P’$ and $P’’$, along with the assumption that $S \not \equiv init_f$ yields a desired path.
So consider arbitrary $s \in S$. Note that $s \not \in mark_r$, otherwise $s \in init_f$. So by H3, $\exists \ell \in init_f, P’ = (r, …, \ell)$ such that $calc_v^{mark_\ell}(P’) \not \supset calc_v^{S \cap P’}(P’)$. By Hoist Induct, there exists $P’’ = (\ell, …, f)$ such that $calc_v^S(P’’) \not \subset calc_v^{init_f}(P’’)$. Joining $P’$ and $P’’$ along with the assumption that $S \not \equiv init_f$ yields a desired path.
$\square$
L: Well that was quite exhausting. I initially started out thinking this problem would be at least NP, and starting trying to prove something to that effect but I couldn’t make progress. Thanks to your help, I did end up realizing that a version based only on CFG structure would be exponentially if perfectly optimal. Thanks for helping me work through this Agrippa!
A: No problem. This was fun.
Livia and Agrippa exchange goodbyes and Livia walks away
These two algorithms were implemented in Caiman. My previous approach wasn’t even correct (or rather, complete, as it could fail to find an initialization set when one exists), so already this is a big improvement. Admittedly, the condition that a hoist step must always improve can easily create situations that prevent the algorithm from reaching a global optimum. But the point of this requirement is to prevent any need for backtracking. For example, we could hoist if the new location is at least as good as the previous locations, but I wouldn’t want to hoist an initialization, and then, after movements of other variables, learn that it would have been better to keep it where it was. This is really an effort to keep the frontend polynomial time (even if it’s a polynomial that doesn’t really scale super well).
I think the overall result of this hoist algorithm is something that won’t really optimize your code a whole lot, but will “take the hint” if you write your code in a way that “pushes” something to be initialized at a given hole. For example, without the hoist algorithm, you might write a schedule that lays out all the needed dependencies of a variable, but if you don’t use said variable while those dependencies are live, it’s likely the compiler will end up recalculating them. With the hoist algorithm, it’s much more likely to initialize something “where you would expect”, but the algorithm isn’t too great at finding a better initialization location than you might have intended. All of this is very qualitative and anecdotal, but I hope I’m getting the fuzzy idea across.
I’ll be the first to admit I sort of got tired by the end of the Hoist phase proofs, as you can probably tell by the reduction in dialogue (not like it was dialogue-heavy in the first place). But I kind of liked this style a bit, at the very least it saved me from the slightly clunky expressions like “now you might have noticed …” or “you may be wondering …”.
Now there’s another use-case where the algorithms here could apply. Recall the focus here was purely on where to initialize variables that a user defines, but doesn’t initialize. There’s also a similar problem of where to create variables a user doesn’t even define. For example, in the following schedule:
fn foo(a: i64) -> i64 impls value, time, spatial {
???; // hole0
let x = bar(a, b)
???; // hole1
let r = if a < 0 {
???; // hole2
x + z
} else {
???; // hole3
x * z
};
r + a
}
Here z
is used, but never defined. The current implementation is to define
and initialize it at the last hole that dominates all uses. So in this case,
hole1
. But that’s a somewhat arbitrary choice. We could define and initialize it
at hole0
, or maybe we define it at hole1
and initialize it at holes hole2
and hole3
.
For example, we could choose to define and initialize the variable together,
and choose the initializing locations based on the algorithms we just talked about.
It will likely produce better code in quite a few situations. The reason I didn’t do
this is because definitions and uses are needed fairly early in the frontend analysis
pipeline while computing types of each variable needs to be done fairly late,
after many other analyses are done. So having a circular dependence between these two
would effectively require the entire frontend to iterate until convergence.
It does not necessarily makes it “worse”, just “not better”. ↩︎