In a previous post, we discussed implementing an analysis to detect potential
uses of null
pointers. The workhorse of this analysis was an interval analysis
that could (conservatively) determine all possible values an integer could
take on. We used this information to prove whether or not an array access
could be determined to always be in-bounds.
This worked, but it could not handle cases such as a dynamically sized buffer. Suppose we know that a buffer has $a$ elements. Suppose further we know that $b < a$. If we read from the buffer for all indices between $0$ and $b$, we know that this is safe. However, our interval analysis can not prove this. For $a$, it would say its range of possible values is $(-\infty, \infty)$. While we know that $b < a$, the fact we know nothing about $a$ leads the analysis to conclude the range of $b$ is also $(-\infty, \infty)$. Thus it cannot prove that accessing indices between $0$ and $b$ is safe because, as far as the analysis knows $a$ could be 1 and $b$ could be 10.
This particular example could be solved with a form of logic programming. We can construct a knowledge graph that is a directed graph where nodes are variables and edges are given interval weights which represent the range of possible values that could be added from an instance of the start node to get to any instance of the end node. For example, knowing that $b < a$, we can encode this like so:
Since $b < a$ then we must have to add at least $1$ to $b$ to get $a$. Symmetrically we must add at most $-1$ to $a$ to get the value of $b$.
If we know that $b = x + y$ and the ranges of $x$ is $[1, 5]$ and $y$ is $[-10, 0]$ we can encode this like so:
In this second example, we see that $y$ is always greater than $a$ since the only path from $y$ to $a$ has a total weight of $[2, \infty)$ indicating that we must add at least 2 to $y$ to get to $a$.
With this method, we know that for two variables $s$ and $t$, $s$ is always less than $t$ if there does not exist a path from $s$ to $t$ such that the total weight (sum of all edge intervals along the path) contains a non-negative number.
This is a good idea but it starts to break down when we start to deal with variables with unbounded ranges. Suppose now that $x$ and $y$ are unbounded and all we know is that $a = x + z$, $b = x + y$ and $b < a$. Then surely $y < z$? Unfortunately, this method cannot show this since once a path contains an edge with $(-\infty, \infty)$, the entire path will have this unbounded total weight.
So it seems that what we want is a more precise method of determining when an inequality holds.
This sounds like an
integer linear programming problem.
Except, is it always linear?
Suppose we have an index like so where num_elems
and elem_size
are function
arguments:
for (int i = 0; i < num_elems; ++i) {
for (int j = 0; j < masked; ++j) {
data[i * elem_size + j] = 0;
}
}
We essentially want to solve 0 <= i * elem_size + j < data_size
for variables
i
, elem_size
, j
, data_size
.
ILP (integer linear programming) can’t solve this, and in fact, I’m told that solving integer inequalities involving multiplication/division like this is “the hardest problem in modern mathematics.” So there goes that approach.
Instead what we’re going to do is use the Z3 SMT Solver. We’re going to use a dataflow pass to collect constraints which will then be passed off the Z3. To make it easier, we want to collect the constraints with the IR in SSA form. This allows us to just map LLLVM values to Z3 variables one-to-one without worrying about how to deal with mutation. This does mean that our analysis won’t support memory operations, but that’s OK.
We’re also going to have Z3 reason about everything as integers, which isn’t quite right since we’re actually dealing with bit vectors. Z3 can reason about bit vectors too, but for simplicity, I’m not going to worry about that.
The constraints we’re going to collect include inequalities that are known to hold
based on conditional branches, the definitions of values (such as %4 = add %3, %2
),
and the overly conservative ranges of values as determined by our interval analysis.
The reason we make use of our interval analysis is because that’s how we’re going to handle things like Phi nodes and memory loads and stores. In my previous post on this topic, we discussed an interval analysis that can track intervals through memory loads and stores of pure functions. The way we did this was essentially to take the meet of all intervals being stored to the same memory location.
The first thing we need to do is update our interval analysis to work with phi nodes. I actually discussed this in the previous post, however, it wasn’t quite right.
As a refresher, what I do when I meet two values is to simply make a new interval that contains both intervals being met. However, to support loops, I identify any values that have been mutated. When meeting a mutated value, if the mutated value’s upper bound is greater than the other value’s upper bound, the upper bound is set to infinity. Symmetrically if the mutated value’s lower bound is less than the other value’s lower bound, the new interval’s lower bound is set to negative infinity. More detail on this and the cases which led to this decision are discussed in the previous post.
So to reuse this code, we need to identify which incoming values of a phi node have been “mutated”. Let’s examine a typical example of what I mean by “mutated”.
Consider the following loop where i
is mutated:
for (int i = 0; i < 10; ++i) {
foo(i);
}
After running mem2reg
, this becomes the following LLVM instructions (see this post
for information on how to precisely schedule LLVM passes):
3: ; preds = %7, %2
%.0 = phi i32 [ 0, %2 ], [ %8, %7 ]
%4 = icmp slt i32 %.0, 10
br i1 %4, label %6, label %5
5: ; preds = %3
br label %9
6: ; preds = %3
call void @_Z1fi(i32 noundef %.0)
br label %7
7: ; preds = %6
%8 = add nsw i32 %.0, 1
br label %3, !llvm.loop !5
The variable i
maps to values %.0
and %8
. We want the transfer function of a phi
node to essentially be performing a meet of two dataflow facts. So, as we discussed
in the previous post, the range of %.0
before the conditional branch should
be $[0, \infty)$. Based on our previous discussion, this means we will think of %8
as the “mutated” version of the same variable. This matches the intuitive description
of what the source code is doing by incrementing i
.
So given a set of incoming values to a phi node, we will say that an incoming value represents a source-level mutation if that incoming value depends on one of its uses and the incoming value is not equal to the result of its use. In other words, a source-level mutation can be identified by a cycle in a dependency graph such that the cycle contains a node that transforms its arguments in some way.
Constructing a dependency graph for the previous program, we have:
Since we have a cycle between %8
and %.0
containing an instruction
that changes the value of its arguments (the add), then we will call incoming
value %8
a “mutation” for computing the result of the phi node’s
transfer function.
In code, this looks like the following, which is just a BFS through the dependency graph:
/**
* Determines if the instruction produces a new value which is not equivalent
* to its arguments.
*/
bool isMutatingInstruction(const llvm::User* I)
{
return dyn_cast<PHINode>(I) == nullptr && dyn_cast<CastInst>(I) == nullptr;
}
/**
* @brief Determines of `V` is dependent on `Other` and the dependency chain
* between them contains a mutating instruction.
*
* @param V
* @param Other
* @param Ret
* @return true
* @return false
*/
bool isDependentOn(const llvm::Value* V, const llvm::Value* Other)
{
// queue of nodes and whether we passed a mutating instruction to produce
// that node
std::queue<std::pair<const llvm::Value*, bool>> Worklist;
std::unordered_set<const llvm::Value*> Visited;
Worklist.push(std::make_pair(Other, false));
while (!Worklist.empty()) {
const auto [Curr, CurrIsMut] = Worklist.front();
if (Curr == V) {
return CurrIsMut;
}
Worklist.pop();
Visited.emplace(Curr);
for (const auto& Op : Curr->uses()) {
const auto U = Op.getUser();
if (Visited.contains(U)) {
continue;
}
Worklist.push(
std::make_pair(U, CurrIsMut || isMutatingInstruction(U)));
}
}
return false;
}
Then the transfer function for a phi node just amounts to identifying these so-called “mutations” and performing the meet operator on all incoming values:
auto PhiRange = Res.getRange(Phi->getIncomingValue(0));
if (PhiRange.hasValue() && isDependentOn(Phi->getIncomingValue(0), Phi)) {
PhiRange.value().Mutated = true;
}
for (unsigned int Idx = 1; Idx < NumIncoming; ++Idx) {
auto& IncomingRangeI = Res.getRange(Phi->getIncomingValue(Idx));
if (IncomingRangeI.hasValue() &&
isDependentOn(Phi->getIncomingValue(Idx), Phi)) {
IncomingRangeI.value().Mutated = true;
}
PhiRange = SingleFact::meet(PhiRange, IncomingRangeI, IntRange::meet);
}
Res.putRange(Phi, PhiRange);
We are interested in two kinds of constraints: those imposed by the current path through a series of conditional branches and those imposed by the operations that produce a value.
For the former, we can use a forward dataflow that just propagates any inequalities being tested in conditional branches following a similar procedure to what we did to handle conditional branches for the null dataflow analysis.
A fact for this analysis will just be a mapping from $(LHS, RHS)$ pairs of
values to a predicate that evaluates to true on those values. So for the
branch if x < y
the blocks in the true branch will contain the mapping
$(x, y) \rightarrow ‘<’$
and in the false branch we’ll have the mapping $(x, y) \rightarrow ‘>=’$.
The top value will be the set of all relations, indicated with a special Top_
flag
in the source code. The meet operator is set intersection. Note that if $\alpha \not = \beta$
$$
\begin{Bmatrix}{(x, y) \rightarrow \alpha }\end{Bmatrix} \sqcap \begin{Bmatrix}(x, y) \rightarrow \beta \end{Bmatrix} = \emptyset
$$
Finally, the transfer function will be the identity for all instructions except branches, which will add the condition they are testing to the aforementioned mapping.
For the second set of constraints, what we’ll do is simply walk backward through the dependency graph, collecting all nodes until we hit a phi node or a memory operation. The constraints of a value that is the result of one of these instructions will just be the interval we computed with the interval analysis.
As an example, consider the following pseudocode:
y := phi q, w
x := y * y
a := x + y
k := a * a
i := load 0x10
j := i + 1
b := j * 2
// is a < b here?
We want to determine if a < b
is always true. Given this expression, we will
collect all constraints on variables that are used in the expression (a
and b
).
Walking backward from a
,
we get the constraints a = x + y
, x = y * y
, and y
will be constrained
by its range of possible values determined by the interval analysis. Doing the same
for b
we get j = i + 1
and i
will be constrained by its range.
This has the effect of collecting only the constraints that mention potentially relevant values. All other constraints could only serve to slow down Z3 and are omitted.
In code, we have something that looks like this:
/**
* @brief Collects all variables which are dependended on
* (transitive closure of all operands) for the definition of
* the given instruction.
*
* @param I The instruction to collect the definitions for
* @param Intervals The result of the interval analysis
* @param Vars The map of variables to z3 variables
* @param Constraints The vector of constraints to add to
* @param Ctx The z3 context
*/
void collectDefinitions(const llvm::Instruction* I,
const DataFlowFacts<IntervalAnalysis>& Intervals,
std::unordered_map<const llvm::Value*, z3::expr>& Vars,
std::vector<z3::expr>& Constraints, z3::context& Ctx)
{
std::queue<const llvm::Instruction*> Worklist;
std::unordered_set<const llvm::Instruction*> Visited;
Worklist.push(I);
while (!Worklist.empty()) {
const auto Inst = Worklist.front();
Worklist.pop();
if (Visited.find(Inst) != Visited.end()) {
continue;
}
Visited.emplace(Inst);
if (Inst->getType()->isIntegerTy()) {
if (const auto* BinOp = dyn_cast<llvm::BinaryOperator>(Inst);
BinOp != nullptr) {
if (const auto Constraint = genBopConstraint(
getZ3Var(BinOp->getOperand(0), Vars, Ctx),
getZ3Var(BinOp->getOperand(1), Vars, Ctx),
BinOp->getOpcode());
Constraint.has_value())
{
// add constraint that BinOp =
// BinOp.operand(0) OP BinOp.operand(1)
Constraints.emplace_back(getZ3Var(BinOp, Vars, Ctx) ==
*Constraint);
}
} else if (const auto* Uop = dyn_cast<llvm::UnaryOperator>(Inst);
Uop != nullptr) {
// same idea
} else if (const auto* Cast = dyn_cast<llvm::CastInst>(Inst);
Cast != nullptr) {
// add constraint that Cast = Cast.operand(0)
Constraints.emplace_back(
getZ3Var(Cast, Vars, Ctx) ==
getZ3Var(Cast->getOperand(0), Vars, Ctx));
} else if (const auto* NextInst = Inst->getNextNode();
NextInst != nullptr) {
// use input facts for the next instruction because
// that will contain the output fact of the current instruction
const auto Range =
Intervals.InstructionInFacts.at(NextInst).getValRange(Inst);
addRangeConstraints(Range, Inst, Vars, Constraints, Ctx,
DebugConstraints);
} else {
// no terminators produce values
llvm_unreachable("Instr is terminator");
}
for (const auto& Operand : Inst->operands()) {
if (const auto* OpI = dyn_cast<llvm::Instruction>(Operand);
OpI != nullptr) {
Worklist.push(OpI);
} else if (auto Range = Intervals.InstructionInFacts.at(Inst)
.getValRange(Operand);
Range.has_value() && !Vars.contains(Operand)) {
addRangeConstraints(Range, Operand, Vars, Constraints, Ctx,
DebugConstraints);
}
}
}
}
}
Once we have our constraints, we can just hand them off to Z3. I was pleasantly surprised by how simple the C++ API is, it’s as straightforward as the Python bindings.
The first thing to do is to install Z3. I just followed the directions to build it from here. Then I just needed to add the following CMake commands to link my pass with the Z3 libraries
target_link_directories(NullCheck PUBLIC /usr/lib)
target_link_libraries(NullCheck PUBLIC libz3.so)
The first command tells CMake to look in /usr/lib
(the default location for
z3 on Linux) for libraries, and the second command tells CMake to link
with libz3
.
Then, in our code, we just need to construct a z3::conext
for creating
expressions and a solver for solving a set of equations. z3::expr
overloads
standard operators to construct larger expressions.
Here’s a simple example:
#include <z3++.h>
#include <iostream>
z3::context ctx;
const auto a = ctx.int_const("a");
const auto b = ctx.int_const("b");
const auto ten = ctx.int_val("10");
const auto e1 = a == b + ten;
z3::solver s{ctx};
s.add(e1);
const auto ret = s.check();
if (ret == z3::sat) {
const auto m = s.model();
std::cout << "a: "<< m.eval(a).to_string() << std::endl;
std::cout << "b: "<< m.eval(b).to_string() << std::endl;
} else if (ret == z3::unknown) {
// ...
} else {
// unsat
}
You can read more about using Z3 here (Python bindings)
or here (C++ bindings).
One thing to note is that internally, Z3 uses exceptions during normal operation.
So if you’re debugging your program and have a breakpoint set on all thrown exceptions,
it’ll probably trap the debugger somewhere deep in the Z3 implementation even when
something isn’t necessarily wrong. You can check for actual errors with
the z3::solver::check_error
method.
Now it’s important to know that Z3 solves satisfiability problems. It answers the question of whether there exists an assignment of variables such that all the constraints hold. This isn’t quite the question we want to answer, we want to know if for all values of variables that respect the constraints, is the predicate true.
But this is equivalent to the question of whether there exists an assignment of variables such that the predicate doesn’t hold. In other words, if we want to know whether $P(a, b)$ is always true, we can add the constraint $\lnot P(a, b)$ and check if the set of constraints is unsatisfiable. Z3 solves the negation of the question we are asking. Instead of checking that $\forall x, P(x)$ holds, we will check that $\exists x, \lnot P(x)$ doesn’t hold.
Concretely, if we want to know whether 0 <= LHS < RHS
, we will collect all
constraints as previously discussed, then add the constraint LHS < 0 || LHS >= RHS
.
If this set of constraints is satisfiable, then this means there is an assignment
of variables that respect all the constraints where LHS
is negative or greater
than or equal to RHS
(ie. LHS
is out of bounds).
If the constraints are satisfiable, Z3 produces a satisfying assignment of variables. We can use this as a counterexample to show something can be out of bounds and display this information to the user.
/**
* @brief Determines if `LHS < RHS && LHS >= 0` is always true at the given
* instruction. If this function returns false, then given the constraints
* of the values as determined by the definitons and static interval
* analysis, there exists an assignment of values such that `LHS < RHS` is
* false.
*
* @param I The instruction where we want to check if the inequality is true
* @param LHS The left hand side of the inequality
* @param RHS The right hand side of the inequality
* @return true if `0 <= LHS < RHS` is always true at `I`
*/
QueryResult InequalitySolver::isAlwaysInRange(const llvm::Instruction* I,
const LinExpr& LHS,
const LinExpr& RHS) const
{
z3::context Ctx;
const auto& Relations = RelationFacts_.get().InstructionInFacts.at(I);
std::unordered_map<const llvm::Value*, z3::expr> Vars;
std::vector<z3::expr> Constraints;
// collect all definitions LHS depends on and any constaints pertaining
// to LHS or things it depends on
collectLinExprDefinitions(I, LHS, IntervalFacts_, Vars, Constraints, Ctx);
// collect all definitions RHS depends on
collectLinExprDefinitions(I, RHS, IntervalFacts_, Vars, Constraints, Ctx);
const auto [LHSZ3, LHSName] = linExprToZ3Var(LHS, Vars, Ctx);
const auto [RHSZ3, RHSName] = linExprToZ3Var(RHS, Vars, Ctx);
for (auto [Args, Pred] : Relations.getRelations()) {
const auto [LHSArg, RHSArg] = Args;
if (Vars.contains(LHSArg) || Vars.contains(RHSArg)) {
// collects all definitions LHSArg depends on
addConstraintsForVal(I, LHSArg, IntervalFacts_, Vars,
Constraints, Ctx);
// collects all definitions RHSArg depends on
addConstraintsForVal(I, RHSArg, IntervalFacts_, Vars,
Constraints, Ctx);
Constraints.emplace_back(
genConstraint(Pred, getZ3Var(LHSArg, Vars, Ctx),
getZ3Var(RHSArg, Vars, Ctx)));
}
}
z3::solver Solver(Ctx);
for (const auto& Constraint : Constraints) {
Solver.add(Constraint);
}
// forall A, P(A) <=> exists B, !P(B)
Solver.add(LHSZ3 >= RHSZ3 || LHSZ3 < 0);
const auto Err = Solver.check_error();
const auto Ret = Solver.check();
if (Ret == z3::sat) {
const auto M = Solver.get_model();
std::unordered_map<const llvm::Value*, std::string> Assignments;
for (const auto& [K, V] : Vars) {
Assignments.emplace(K, M.eval(V).to_string());
}
// return not in range, with counterexample
return {false, Assignments};
}
// in range if Ret is z3::unsat.
return {Ret == z3::unsat, {}};
}
Note that we accept queries of whether one linear expression is less than another. Linear expressions here are linear in LLVM values, but an LLVM value could be produced by something nonlinear, like a multiplication of two variables or a remainder.
The reason we use linear expressions is so that we can handle pointer arithmetic, which we’ll discuss in the next section.
When some memory is allocated (either by alloca
, malloc
, new
, etc.) our
null analysis will keep track of the amount of bytes that each pointer may point to.
So, when we load or store from/to a pointer, we can check if the amount
of bytes we are attempting to load or store exceeds the buffer size the pointer
is known to refer to.
In LLVM all pointer arithmetic (including indexing into arrays) is done via the
GetElementPointer
instruction or GEP
. This instruction simply performs
the pointer arithmetic and doesn’t load or store anything. If you’re familiar
with x86, you can think of it like a suped-up lea
. GEP
is
somewhat complicated, but the general
structure is that we have a pointer argument followed by one or more indices. Each
index represents an offset to be added to the pointer operand, but each index is
scaled by a different factor. So %1 = getelementptr %ptr, %0, %2
would
sort of represent the pointer arithmetic %1 = %ptr + a * %0 + b * %2
where
a
, and b
are (not necessarily unique) compile-time constants. In reality,
it gets slightly more complicated as sometimes these indices indicate
field numbers of a struct, which can’t be computed by just one multiplication.
The first index is always an index into the pointer itself. In other words, the factor the first index is multiplied by when computing the offset is the size of an element the pointer refers to. The second index will be an index into an element of the pointer. Consider the following example:
int nums[10];
nums[2] = 10;
which becomes
%3 = alloca [10 x i32], align 16
%4 = getelementptr inbounds [10 x i32], ptr %3, i64 0, i64 2
store i32 10, ptr %4, align 8, !tbaa !5
The first index represents how many chunks of [10 x i32]
to offset the
pointer by. The second index represents how many i32
to offset relative to the
first index. Thus, assuming sizeof(int) == 4
, we have that %4 = %3 + 0 * 40 + 2 * 4
.
The following diagram represents this example where the center depicts an area
of memory divided into 4-byte words and the left side depicts the offsets of the first index and the
right side depicts the offsets of the second index.
Note that in this example getelementptr %3, 0, 11
is the same as
getelementptr %3, 1, 1
.
Let’s consider another example:
struct Bar {
int a;
char c;
Foo fs[10];
};
// ...
Bar b;
b.a = 20;
b.c = 'a';
which becomes:
%3 = alloca %struct.Bar, align 4
%4 = getelementptr inbounds %struct.Bar, ptr %3, i32 0, i32 0
store i32 20, ptr %4, align 4, !tbaa !5
%5 = getelementptr inbounds %struct.Bar, ptr %3, i32 0, i32 1
store i8 97, ptr %5, align 4, !tbaa !10
Here, the first index is indexing into chunks of sizeof(Bar)
, and
the second index is indexing into the fields of Bar
. The 1 in the
second GEP
means field index 1 (the second field).
Now consider this example:
int a = g();
int* t = g2();
t[a] = 2;
which becomes:
%3 = call noundef i32 @_Z1gv()
%4 = call noundef ptr @_Z2g2v()
%5 = sext i32 %3 to i64
%6 = getelementptr inbounds i32, ptr %4, i64 %5
Note the lack of a second index. In this case, we only offset by the size
of the data the pointer is directly pointing to (i32
s). This represents
%6 = %4 + 4 * %5
.
GEPs
can handle more than two indexes, which you can read about
here (note this is written before the switch to opaque pointers in LLVM),
but at least my version of Clang seems to break large nested
structures into a series of GEP
instructions with one or two indices if I don’t
apply further optimizations. For example
code like this:
struct Baz {
struct {
int j;
struct {
int a;
int b;
} c;
} d;
long p;
};
// ...
Baz b;
b.d.j = 10;
b.d.c.a = 20;
becomes
%3 = alloca %struct.Baz, align 8
%4 = getelementptr inbounds %struct.Baz, ptr %3, i32 0, i32 0
%5 = getelementptr inbounds %struct.anon, ptr %4, i32 0, i32 0
store i32 10, ptr %5, align 8, !tbaa !5
%6 = getelementptr inbounds %struct.Baz, ptr %3, i32 0, i32 0
%7 = getelementptr inbounds %struct.anon, ptr %6, i32 0, i32 1
%8 = getelementptr inbounds %struct.anon.0, ptr %7, i32 0, i32 0
store i32 20, ptr %8, align 4, !tbaa !13
and code like this:
struct Foo {
int a;
int nums[10];
};
struct Bar {
int a;
char c;
Foo fs[10];
};
// ..
Bar bars[10];
bars[2].fs[3].nums[4] = 10;
becomes:
%3 = alloca [10 x %struct.Bar], align 16
%4 = getelementptr inbounds [10 x %struct.Bar], ptr %3, i64 0, i64 2
%5 = getelementptr inbounds %struct.Bar, ptr %4, i32 0, i32 2
%6 = getelementptr inbounds [10 x %struct.Foo], ptr %5, i64 0, i64 3
%7 = getelementptr inbounds %struct.Foo, ptr %6, i32 0, i32 1
%8 = getelementptr inbounds [10 x i32], ptr %7, i64 0, i64 4
store i32 10, ptr %8, align 4, !tbaa !5
but it could also become:
%arrayidx2 = getelementptr inbounds [10 x %struct.Bar], ptr %bars,
i64 0, i64 2, i32 2, i64 3, i32 1, i64 4
store i32 10, ptr %arrayidx2, align 16
For simplicity, I schedule my analysis right after mem2reg
but before anything
else to not worry about these nested forms.
The whole point of telling you this is that suppose we have code like so:
int nums[10];
int *k = &nums[5];
k[5] = 10;
which becomes:
%3 = alloca [10 x i32], align 16
%4 = getelementptr inbounds [10 x i32], ptr %3, i64 0, i64 5
%5 = getelementptr inbounds i32, ptr %4, i64 5
store i32 10, ptr %5, align 4, !tbaa !5
That store writes to illegal memory! And we can reason about this as follows:
%3
is allocated and points to sizeof(i32) * 10 = 40
bytes%4
has a size of 40 - 0 * 40 - 5 * 4 = 20
bytes%5
has a size of 20 - 5 *4 = 0
bytesFor every GEP instruction, we’re going to compute the size of the resultant pointer’s data as the size of the argument pointer’s data minus the linear index expression. Note by “size” here I mean how many bytes of memory can be accessed by a non-negative offset from the pointer.
Then we can feed this directly into the SMT solver. So for the previous example, to check the store we’d ask if
$$0 \le 4 \le 40 - 0 \cdot 40 - 5 \cdot 4 - 5 \cdot 4 $$
Of course, this example is trivial, but once we replace hard-coded sizes and indices with variables satisfying some constraints, we start to employ the power of Z3.
Note that, because I couldn’t think of a nice way to assert the size of memory
a pointer can legally access, the size analysis is conservative in the sense
that it allows things that it can’t prove are illegal. Otherwise, there’d be no way
to use a function argument that didn’t have a dereferenceable
attribute because
we’d have no way to know how many bytes the pointer could refer to.
This is in contrast to
the part of the analysis determining just if a nullptr
is used, as discussed
in the previous post.
With all this work, we can prove the safety of programs our previous version of the analysis could not deduce were safe. For example, this program:
const auto one = argv + 1;
const auto two = argv + 2;
assert(one != nullptr && two != nullptr);
const auto elem_size = atoi(*one);
const auto num_elems = atoi(*two);
auto data = (char*)malloc(elem_size * num_elems);
for (int i = 0; i < num_elems * elem_size; ++i) {
data[i] = 0;
}
int masked = foo();
assert(masked <= elem_size);
for (int i = 0; i < num_elems; ++i) {
for (int j = 0; j < masked; ++j) {
data[i * elem_size + j] = 0;
}
}
Furthermore, we can also use Z3 to give more useful error messages when it finds a counterexample. This program is not safe:
int main(int argc, const char** argv)
{
int data_args = argc - 1;
int* vec = (int*)malloc(sizeof(int) * data_args);
assert(vec != nullptr);
for (int i = 0; i < data_args; ++i) {
auto in_arg = argv + i + 1;
assert(in_arg != nullptr);
vec[i] = atoi(*in_arg);
}
for (int i = 0; i < argc; ++i) {
printf("%d\n", vec[i]);
}
}
Safety Violation: Use of potentially null pointer in load at MakeVecBad.cc:20
Because an index could not be proven to be in range, the following counterexample was found:
argc = 1
data_args = 0
i = 1
Line 20 is the printf
in the loop bound by argc
instead of data_args
Here’s another example:
int main(int argc, char** argv)
{
if (argc >= 128) {
return 1;
}
char words[128][128];
for (int i = 0; i < argc; ++i) {
const auto word = (argv + i);
assert(word != nullptr);
const auto word_data = *word;
const auto word_len = strlen(word_data);
for (int j = 0; j < word_len; ++j) {
const auto c = word_data + j;
assert(c != nullptr);
words[i][j] = *c;
}
}
}
Safety Violation: Use of potentially null pointer in store at Bad2DArray.cc:19
Because an index could not be proven to be in range, the following counterexample was found:
argc = 1
i = 1
j = 16385
SAT problems are NP-complete, but Z3 employs many good heuristics to be reasonably quick for many problems, especially small ones. The longest running time for the simple set of tests I have is 0.837 seconds. But the main bottleneck is the interval analysis which is built in debug mode (without any optimizations).
It’s amazing what Z3 can do, and this static analysis doesn’t really do it justice since most queries don’t even warrant the full power of Z3.
One thing to note is that currently, the analysis doesn’t handle global variables or function calls correctly. It assumes all function calls don’t mutate or free any non-local data. This, of course, is a bad assumption but it’s fine as a proof of concept.
The easiest thing to do is to conservatively identify when a function could affect non-local data, and then when such a function is called, invalidate all facts about non-local variables and all facts about pointers passed to the function.
Currently, the only thing I do is invalidate information about pointers passed
to free
or delete
, which isn’t correct in general. Furthermore, the
analysis does not support concurrency. So for future work which I may or may
not ever get to, we have: