MLIR — Folders and Constant Propagation || Math ∩ Programming

Table of Contents

Last time we saw how to use pre-defined MLIR traits to enable upstream MLIR passes like loop-invariant-code-motion to apply to poly programs. We left out -sccp (sparse conditional constant propagation), and so this time we’ll add what is needed to make that pass work. It requires the concept of folding.

The code for this article is in this pull request, and as usual the commits are organized to be read in order.

Constant Propagation vs Canonicalization

-sccp is sparse conditional constant propagation, which attempts to infer when an operation has a constant output, and then replaces the operation with the constant value. Repeating this, it “propagates” the constants as far as possible through the program. You can think of it like eagerly computing values it can during compile time, and then sticking them into the compiled program as constants.

Here’s what it looks like for arith, where all the needed tools are implemented. For an input like:

func.func @test_arith_sccp() -> i32 {
  %0 = arith.constant 7 : i32
  %1 = arith.constant 8 : i32
  %2 = arith.addi %0, %0 : i32
  %3 = arith.muli %0, %0 : i32
  %4 = arith.addi %2, %3 : i32
  return %2 : i32
}

The output of tutorial-opt --sccp is

func.func @test_arith_sccp() -> i32 {
  %c63_i32 = arith.constant 63 : i32
  %c49_i32 = arith.constant 49 : i32
  %c14_i32 = arith.constant 14 : i32
  %c8_i32 = arith.constant 8 : i32
  %c7_i32 = arith.constant 7 : i32
  return %c14_i32 : i32
}

Note two additional facts: sccp doesn’t delete dead code, and what is not shown here is the main novelty in sccp, which is that it can propagate constants through control flow (ifs and loops).

A related concept is the idea of canonicalization, which gets its own --canonicalize pass, and which hides a lot of the heavy lifting in MLIR. Canonicalize overlaps a little bit with sccp, in that it also computes constants and materializes them in the IR. Take, for example, the --canonicalize pass on the same IR:

func.func @test_arith_sccp() -> i32 {
  %c14_i32 = arith.constant 14 : i32
  return %c14_i32 : i32
}

The intermediate constants are all pruned, and all that remains is the return value and no operations. Canonicalize cannot propagate constants through control flow, and as such should be thought of as more “local” than sccp.

Both of these, however, are supported via folding, which is the process of taking series of ops and merging them together into simpler ops. It also requires our dialect has some sort of constant operation, which is inserted (“materialized”) with the results of a fold. Folding and canonicalization are more general than what I’m showing here, so we’ll come back to what else they can do in a future article.

The rough outline of what is needed to support folding in this way is:

Adding a constant operation
Adding a materialization hook
Adding folders for each op

This would result in a situation (test case commit) as follows. Starting from

%0 = arith.constant dense<[1, 2, 3]> : tensor<3xi32>
%1 = poly.from_tensor %0 : tensor<3xi32> -> !poly.poly<10>
%2 = poly.mul %1, %1 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>
%3 = poly.mul %1, %1 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>
%4 = poly.add %2, %3 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>

We would get

%0 = poly.constant dense<[2, 8, 20, 24, 18]> : tensor<5xi32> : <10>
%1 = poly.constant dense<[1, 4, 10, 12, 9]> : tensor<5xi32> : <10>
%2 = poly.constant dense<[1, 2, 3]> : tensor<3xi32> : <10>

Making a constant op

Currently we’re imitating a constant polynomial construction by combing a from_tensor op with arith.constant. Like this:

%0 = arith.constant dense<[1, 2, 3]> : tensor<3xi32>
%p0 = poly.from_tensor %0 : tensor<3xi32> -> !poly.poly<10>

While a constant operation might combine them into one op.

%0 = poly.constant dense<[2, 8, 20, 24, 18]> : !poly.poly<10>

The from_tensor op can also be used to build a polynomial from data, not just constants, so it’s worth having around even after we implement poly.constant.

Having a dedicated constant operation has benefits explained in the MLIR documentation on folding. What’s relevant here is that fold can be used to signal to passes like sccp that the result of an op is constant (statically known), or it can be used to say that the result of an op is equivalent to a pre-existing value created by a different op. For the constant case, a materializeConstant hook is also needed to tell MLIR how to take the constant result and turn it into a proper IR op.

The constant op itself, in this commit, comes with two new concepts, the ConstantLike trait and an argument that is an attribute constraint.

def Poly_ConstantOp : Op<Poly_Dialect, "constant", [Pure, ConstantLike]> {   // new
  let summary = "Define a constant polynomial via an attribute.";
  let arguments = (ins AnyIntElementsAttr:$coefficients);    // new
  let results = (outs Polynomial:$output);
  let assemblyFormat = "$coefficients attr-dict `:` type($output)";
}

The ConstantLike attribute is checked here during folding via the constant op matcher as an assertion. [Aside: I’m not sure why the trait is specifically required, so long as the materialization function is present on the dialect; it just seems like this check is used for assertions. Perhaps it’s just a safeguard.]

Next we have the line let arguments = (ins AnyIntElementsAttr:$coefficients); This defines the input to the op as an attribute (statically defined data) rather than a previous SSA value. The AnyIntElementsAttr is itself an attribute constraint, allowing any attribute that is has the IntElementsAttrBase as a base class to be used (e.g., 32-bit or 64-bit integer attributes). This means that we could use all of the following syntax forms:

%10 = poly.constant dense<[2, 3, 4]> : tensor<3xi32> : !poly.poly<10>
%11 = poly.constant dense<[2, 3, 4]> : tensor<3xi8> : !poly.poly<10>
%12 = poly.constant dense<"0x020304"> : tensor<3xi8> : !poly.poly<10>
%13 = poly.constant dense<4> : tensor<100xi32> : !poly.poly<10>

Adding folders

We add folders in these commits:

Each has the same structure: add let hasFolder = 1; to the tablegen for the op, which adds a header declaration of the following form (noting that the signature would be different if the op has more than one result value, see docs).

OpFoldResult <OpName>::fold(<OpName>::FoldAdaptor adaptor);

Then we implement it in PolyOps.cpp. The semantics of this function are such that, if the fold decides the op should be replaced with a constant, it must return an attribute representing that constant, which can be given as the input to a poly.constant. The FoldAdaptor is a shim that has the same method names as an instance of the op’s C++ class, but arguments that have been folded themselves are replaced with Attribute instances representing the constants they were folded with. This will be relevant for folding add and mul, since the body needs to actually compute the result eagerly, and needs access to the actual values to do so.

For poly.constant the timplementation is trivial: you just return the input attribute.

OpFoldResult ConstantOp::fold(ConstantOp::FoldAdaptor adaptor) {
  return adaptor.getCoefficients();
}

The from_tensor op is similar, but has an extra cast that acts as an assertion, since the tensor might have been constructed with weird types we don’t want as input. If the dyn_cast fails, the result is nullptr, which is cast by MLIR to a failed OpFoldResult.

OpFoldResult FromTensorOp::fold(FromTensorOp::FoldAdaptor adaptor) {
  // Returns null if the cast failed, which corresponds to a failed fold.
  return dyn_cast<DenseIntElementsAttr>(adaptor.getInput());
}

The poly binary ops are slightly more complicated since they are actually doing work. Each of these fold methods effectively takes as input two DenseIntElementsAttr for each operand, and expects us to return another DenseIntElementsAttr for the result.

For add/sub which are elementwise operations on the coefficients, we get to use an existing upstream helper method, constFoldBinaryOp, which through some template metaprogramming wizardry, allows us to specify only the elementwise operation itself.

OpFoldResult AddOp::fold(AddOp::FoldAdaptor adaptor) {
  return constFoldBinaryOp<IntegerAttr, APInt>(
      adaptor.getOperands(), [&](APInt a, APInt b) { return a + b; });
}

For mul, we have to write out the multiplication routine manually. In what’s below, I’m implementing the naive textbook polymul algorithm, which could be optimized if one expects people to start compiling programs with large, static polynomials in them.

OpFoldResult MulOp::fold(MulOp::FoldAdaptor adaptor) {
  auto lhs = cast<DenseIntElementsAttr>(adaptor.getOperands()[0]);
  auto rhs = cast<DenseIntElementsAttr>(adaptor.getOperands()[1]);
  auto degree = getResult().getType().cast<PolynomialType>().getDegreeBound();
  auto maxIndex = lhs.size() + rhs.size() - 1;

  SmallVector<APInt, 8> result;
  result.reserve(maxIndex);
  for (int i = 0; i < maxIndex; ++i) {
    result.push_back(APInt((*lhs.begin()).getBitWidth(), 0));
  }

  int i = 0;
  for (auto lhsIt = lhs.value_begin<APInt>(); lhsIt != lhs.value_end<APInt>();
       ++lhsIt) {
    int j = 0;
    for (auto rhsIt = rhs.value_begin<APInt>(); rhsIt != rhs.value_end<APInt>();
         ++rhsIt) {
      // index is modulo degree because poly's semantics are defined modulo x^N = 1.
      result[(i + j) % degree] += *rhsIt * (*lhsIt);
      ++j;
    }
    ++i;
  }

  return DenseIntElementsAttr::get(
      RankedTensorType::get(static_cast<int64_t>(result.size()),
                            IntegerType::get(getContext(), 32)),
      result);
}

Adding a constant materializer

Finally, we add a constant materializer. This is a dialect-level feature, so we start by adding let hasConstantMaterializer = 1; to the dialect tablegen, and observing the newly generated header signature:

Operation *PolyDialect::materializeConstant(
    OpBuilder &builder, Attribute value, Type type, Location loc);

The Attribute input represents the result of each folding step above. The Type is the desired result type of the op, which is needed in cases like arith.constant where the same attribute can generate multiple different types via different interpretations of a hex string or splatting with a result tensor that has different dimensions.

In our case the implementation is trivial: just construct a constant op from the attribute.

Operation *PolyDialect::materializeConstant(
    OpBuilder &builder, Attribute value, Type type, Location loc) {
  auto coeffs = dyn_cast<DenseIntElementsAttr>(value);
  if (!coeffs)
    return nullptr;
  return builder.create<ConstantOp>(loc, type, coeffs);
}

Other kinds of folding

While this has demonstrated a generic kind of folding with respect to static constants, many folding functions in MLIR use simple matches to determine when an op can be replaced with a value from a previously computed op.

Take, for example, the complex dialect (for complex numbers). A complex.create op constructs a complex number from real and imaginary parts. A folder in that dialect checks for a pattern like complex.create(complex.re(%z), complex.im(%z)), and replaces it with %z directly. The arith dialect similarly has folds for things like a-b + b -> a and a + 0 -> a.

However, most work on simplifying an IR according to algebraic rules belongs in the canonicalization pass, since while it supports folding, it also supports general rewrite patterns that are allowed to delete and create ops as needed to simplify the IR. We’ll cover canonicalization in more detail in a future article. But just remember, folds may only modify the single operation being folded, use existing SSA values, and may not create new ops. So they are limited in power and decidedly local operations.