Last time we saw how to use pre-defined MLIR traits to enable upstream MLIR passes like loop-invariant-code-motion to apply to poly
programs. We left out -sccp
(sparse conditional constant propagation), and so this time we’ll add what is needed to make that pass work. It requires the concept of folding.
The code for this article is in this pull request, and as usual the commits are organized to be read in order.
Constant Propagation vs Canonicalization
-sccp
is sparse conditional constant propagation, which attempts to infer when an operation has a constant output, and then replaces the operation with the constant value. Repeating this, it “propagates” the constants as far as possible through the program. You can think of it like eagerly computing values it can during compile time, and then sticking them into the compiled program as constants.
Here’s what it looks like for arith, where all the needed tools are implemented. For an input like:
func.func @test_arith_sccp() -> i32 {
%0 = arith.constant 7 : i32
%1 = arith.constant 8 : i32
%2 = arith.addi %0, %0 : i32
%3 = arith.muli %0, %0 : i32
%4 = arith.addi %2, %3 : i32
return %2 : i32
}
The output of tutorial-opt --sccp
is
func.func @test_arith_sccp() -> i32 {
%c63_i32 = arith.constant 63 : i32
%c49_i32 = arith.constant 49 : i32
%c14_i32 = arith.constant 14 : i32
%c8_i32 = arith.constant 8 : i32
%c7_i32 = arith.constant 7 : i32
return %c14_i32 : i32
}
Note two additional facts: sccp doesn’t delete dead code, and what is not shown here is the main novelty in sccp
, which is that it can propagate constants through control flow (ifs and loops).
A related concept is the idea of canonicalization, which gets its own --canonicalize
pass, and which hides a lot of the heavy lifting in MLIR. Canonicalize overlaps a little bit with sccp
, in that it also computes constants and materializes them in the IR. Take, for example, the --canonicalize
pass on the same IR:
func.func @test_arith_sccp() -> i32 {
%c14_i32 = arith.constant 14 : i32
return %c14_i32 : i32
}
The intermediate constants are all pruned, and all that remains is the return value and no operations. Canonicalize cannot propagate constants through control flow, and as such should be thought of as more “local” than sccp
.
Both of these, however, are supported via folding, which is the process of taking series of ops and merging them together into simpler ops. It also requires our dialect has some sort of constant operation, which is inserted (“materialized”) with the results of a fold. Folding and canonicalization are more general than what I’m showing here, so we’ll come back to what else they can do in a future article.
The rough outline of what is needed to support folding in this way is:
- Adding a constant operation
- Adding a materialization hook
- Adding folders for each op
This would result in a situation (test case commit) as follows. Starting from
%0 = arith.constant dense<[1, 2, 3]> : tensor<3xi32>
%1 = poly.from_tensor %0 : tensor<3xi32> -> !poly.poly<10>
%2 = poly.mul %1, %1 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>
%3 = poly.mul %1, %1 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>
%4 = poly.add %2, %3 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>
We would get
%0 = poly.constant dense<[2, 8, 20, 24, 18]> : tensor<5xi32> : <10>
%1 = poly.constant dense<[1, 4, 10, 12, 9]> : tensor<5xi32> : <10>
%2 = poly.constant dense<[1, 2, 3]> : tensor<3xi32> : <10>
Making a constant op
Currently we’re imitating a constant polynomial construction by combing a from_tensor
op with arith.constant
. Like this:
%0 = arith.constant dense<[1, 2, 3]> : tensor<3xi32>
%p0 = poly.from_tensor %0 : tensor<3xi32> -> !poly.poly<10>
While a constant operation might combine them into one op.
%0 = poly.constant dense<[2, 8, 20, 24, 18]> : !poly.poly<10>
The from_tensor
op can also be used to build a polynomial from data, not just constants, so it’s worth having around even after we implement poly.constant
.
Having a dedicated constant operation has benefits explained in the MLIR documentation on folding. What’s relevant here is that fold
can be used to signal to passes like sccp
that the result of an op is constant (statically known), or it can be used to say that the result of an op is equivalent to a pre-existing value created by a different op. For the constant case, a materializeConstant
hook is also needed to tell MLIR how to take the constant result and turn it into a proper IR op.
The constant op itself, in this commit, comes with two new concepts, the ConstantLike
trait and an argument that is an attribute constraint.
def Poly_ConstantOp : Op<Poly_Dialect, "constant", [Pure, ConstantLike]> { // new
let summary = "Define a constant polynomial via an attribute.";
let arguments = (ins AnyIntElementsAttr:$coefficients); // new
let results = (outs Polynomial:$output);
let assemblyFormat = "$coefficients attr-dict `:` type($output)";
}
The ConstantLike
attribute is checked here during folding via the constant op matcher as an assertion. [Aside: I’m not sure why the trait is specifically required, so long as the materialization function is present on the dialect; it just seems like this check is used for assertions. Perhaps it’s just a safeguard.]
Next we have the line let arguments = (ins AnyIntElementsAttr:$coefficients);
This defines the input to the op as an attribute (statically defined data) rather than a previous SSA value. The AnyIntElementsAttr
is itself an attribute constraint, allowing any attribute that is has the IntElementsAttrBase
as a base class to be used (e.g., 32-bit or 64-bit integer attributes). This means that we could use all of the following syntax forms:
%10 = poly.constant dense<[2, 3, 4]> : tensor<3xi32> : !poly.poly<10>
%11 = poly.constant dense<[2, 3, 4]> : tensor<3xi8> : !poly.poly<10>
%12 = poly.constant dense<"0x020304"> : tensor<3xi8> : !poly.poly<10>
%13 = poly.constant dense<4> : tensor<100xi32> : !poly.poly<10>
Adding folders
We add folders in these commits:
Each has the same structure: add let hasFolder = 1;
to the tablegen for the op, which adds a header declaration of the following form (noting that the signature would be different if the op has more than one result value, see docs).
OpFoldResult <OpName>::fold(<OpName>::FoldAdaptor adaptor);
Then we implement it in PolyOps.cpp
. The semantics of this function are such that, if the fold decides the op should be replaced with a constant, it must return an attribute representing that constant, which can be given as the input to a poly.constant
. The FoldAdaptor
is a shim that has the same method names as an instance of the op’s C++ class, but arguments that have been folded themselves are replaced with Attribute
instances representing the constants they were folded with. This will be relevant for folding add and mul, since the body needs to actually compute the result eagerly, and needs access to the actual values to do so.
For poly.constant
the implementation is trivial: you just return the input attribute.
OpFoldResult ConstantOp::fold(ConstantOp::FoldAdaptor adaptor) {
return adaptor.getCoefficients();
}
The from_tensor
op is similar, but has an extra cast that acts as an assertion, since the tensor might have been constructed with weird types we don’t want as input. If the dyn_cast
fails, the result is nullptr
, which is cast by MLIR to a failed OpFoldResult
.
OpFoldResult FromTensorOp::fold(FromTensorOp::FoldAdaptor adaptor) {
// Returns null if the cast failed, which corresponds to a failed fold.
return dyn_cast<DenseIntElementsAttr>(adaptor.getInput());
}
The poly binary ops are slightly more complicated since they are actually doing work. Each of these fold
methods effectively takes as input two DenseIntElementsAttr
for each operand, and expects us to return another DenseIntElementsAttr
for the result.
For add/sub which are elementwise operations on the coefficients, we get to use an existing upstream helper method, constFoldBinaryOp, which through some template metaprogramming wizardry, allows us to specify only the elementwise operation itself.
OpFoldResult AddOp::fold(AddOp::FoldAdaptor adaptor) {
return constFoldBinaryOp<IntegerAttr, APInt>(
adaptor.getOperands(), [&](APInt a, APInt b) { return a + b; });
}
For mul, we have to write out the multiplication routine manually. In what’s below, I’m implementing the naive textbook polymul algorithm, which could be optimized if one expects people to start compiling programs with large, static polynomials in them.
OpFoldResult MulOp::fold(MulOp::FoldAdaptor adaptor) {
auto lhs = cast<DenseIntElementsAttr>(adaptor.getOperands()[0]);
auto rhs = cast<DenseIntElementsAttr>(adaptor.getOperands()[1]);
auto degree = getResult().getType().cast<PolynomialType>().getDegreeBound();
auto maxIndex = lhs.size() + rhs.size() - 1;
SmallVector<APInt, 8> result;
result.reserve(maxIndex);
for (int i = 0; i < maxIndex; ++i) {
result.push_back(APInt((*lhs.begin()).getBitWidth(), 0));
}
int i = 0;
for (auto lhsIt = lhs.value_begin<APInt>(); lhsIt != lhs.value_end<APInt>();
++lhsIt) {
int j = 0;
for (auto rhsIt = rhs.value_begin<APInt>(); rhsIt != rhs.value_end<APInt>();
++rhsIt) {
// index is modulo degree because poly's semantics are defined modulo x^N = 1.
result[(i + j) % degree] += *rhsIt * (*lhsIt);
++j;
}
++i;
}
return DenseIntElementsAttr::get(
RankedTensorType::get(static_cast<int64_t>(result.size()),
IntegerType::get(getContext(), 32)),
result);
}
Adding a constant materializer
Finally, we add a constant materializer. This is a dialect-level feature, so we start by adding let hasConstantMaterializer = 1;
to the dialect tablegen, and observing the newly generated header signature:
Operation *PolyDialect::materializeConstant(
OpBuilder &builder, Attribute value, Type type, Location loc);
The Attribute
input represents the result of each folding step above. The Type
is the desired result type of the op, which is needed in cases like arith.constant
where the same attribute can generate multiple different types via different interpretations of a hex string or splatting with a result tensor that has different dimensions.
In our case the implementation is trivial: just construct a constant op from the attribute.
Operation *PolyDialect::materializeConstant(
OpBuilder &builder, Attribute value, Type type, Location loc) {
auto coeffs = dyn_cast<DenseIntElementsAttr>(value);
if (!coeffs)
return nullptr;
return builder.create<ConstantOp>(loc, type, coeffs);
}
Other kinds of folding
While this has demonstrated a generic kind of folding with respect to static constants, many folding functions in MLIR use simple matches to determine when an op can be replaced with a value from a previously computed op.
Take, for example, the complex dialect (for complex numbers). A complex.create
op constructs a complex number from real and imaginary parts. A folder in that dialect checks for a pattern like complex.create(complex.re(%z), complex.im(%z))
, and replaces it with %z
directly. The arith
dialect similarly has folds for things like a-b + b -> a
and a + 0 -> a
.
However, most work on simplifying an IR according to algebraic rules belongs in the canonicalization pass, since while it supports folding, it also supports general rewrite patterns that are allowed to delete and create ops as needed to simplify the IR. We’ll cover canonicalization in more detail in a future article. But just remember, folds may only modify the single operation being folded, use existing SSA values, and may not create new ops. So they are limited in power and decidedly local operations.
Want to respond? Send me an email, post a webmention, or find me elsewhere on the internet.