Table of Contents

Last time we defined a new dialect poly for polynomial arithmetic. This time we’ll spruce up the dialect by adding some pre-defined MLIR traits, and see how the application of traits enables some general purpose passes to optimize poly programs.

The code for this article is in this pull request, and as usual the commits are organized to be read in order.

Traits and Loop Invariant Code Motion

As a compiler toolchain, MLIR heavily emphasizes code reuse. This is accomplished largely through two constructs: traits and interfaces. An interface is an interface in the normal programming sense of the word: a set of function signatures implemented by a type and providing some limited-scope behavior. Applied to MLIR, you can implement interfaces on operations and types, and then passes can operate at the interface level. Traits are closely related: a trait is an interface with no methods. Traits can just be “slapped on” an operation and passes can magically start working with them. They can also serve as mixins for common op verification routines, type inference, and more.

In this article I’ll show how adding traits to the operations in the poly dialect (defined last time) allows us to reuse existing MLIR passes on poly constructs. In future articles we’ll see how to define new traits and interfaces. But for now, existing traits are an extremely simple way to start using the batteries included in MLIR on new dialects.

As mentioned, one applies traits primarily to enable passes to do things with custom operations. So let’s start from the passes. The general transformation passes list includes a pass called loop invariant code motion. This checks loop bodies for any operations that don’t need to be in the loop, and moves them out of the loop body. Using this requires us to add two traits to ops to express that they are safe to move around. The first is called NoMemoryEffect (which is technically an empty implementation of an interface) that asserts the operation does not have any side effects related to writing to memory. The second is AlwaysSpeculatable (technically a list of two traits), which says that an operation is allowed to be “speculatively executed,” i.e., computed early. If it is speculatable, then the compiler can move the op to another location. If not, say it reads from a memory location that can be written to, there is an earliest point before which it’s not safe to move the op.

Loop invariant code motion takes ops with these two traits, and hoists them outside of loops when the operation’s operands are unchanged by the body of the loop. Conveniently, MLIR also defines a single named list of traits called Pure, which is NoMemoryEffect and AlwaysSpeculatable. So we can just add the trait name to our tablegen op definition (via a template parameter that defaults to an empty list of traits) as in this commit.

//lib/Dialect/Poly/PolyOps.td
@@ -3,9 +3,10 @@

 include "PolyDialect.td"
 include "PolyTypes.td"
+include "mlir/Interfaces/SideEffectInterfaces.td"


-class Poly_BinOp<string mnemonic> : Op<Poly_Dialect, mnemonic> {
+class Poly_BinOp<string mnemonic> : Op<Poly_Dialect, mnemonic, [Pure]> {
   let arguments = (ins Polynomial:$lhs, Polynomial:$rhs);

This commit adds the boilerplate to register all default MLIR passes in tutorial-opt, and adds an example test asserting a poorly-placed poly.mul is hoisted out of the loop body.

// RUN: tutorial-opt %s --loop-invariant-code-motion > %t
// RUN: FileCheck %s < %t
... <setup> ...
// CHECK: poly.mul
// CHECK: affine.for
%ret_val = affine.for %i = 0 to 100 iter_args(%sum_iter = %p0) -> !poly.poly<10> {
  // The poly.mul should be hoisted out of the loop.
  // CHECK-NOT: poly.mul
  %2 = poly.mul %p0, %p1 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>
  %sum_next = poly.add %sum_iter, %2 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>
  affine.yield %sum_next : !poly.poly<10>
}

In the generated C++ code, adding new traits or interfaces adds new template arguments in the op’s class definition:

// PolyOps.h.inc
class SubOp : public ::mlir::Op<SubOp,
::mlir::OpTrait::ZeroRegions,
::mlir::OpTrait::OneResult,
::mlir::OpTrait::OneTypedResult<::mlir::tutorial::poly::PolynomialType>::Impl,
::mlir::OpTrait::ZeroSuccessors,
::mlir::OpTrait::NOperands<2>::Impl,
::mlir::OpTrait::OpInvariants,
::mlir::ConditionallySpeculatable::Trait,            // <-- new
::mlir::OpTrait::AlwaysSpeculatableImplTrait,   // <-- new
::mlir::MemoryEffectOpInterface::Trait>          // <--- new
{ ... }

And NoMemoryEffect adds a trivial implementation of the memory effects interface:

// PolyOps.h.inc
void SubOp::getEffects(
  ::llvm::SmallVectorImpl<::mlir::SideEffects::EffectInstance<::mlir::MemoryEffects::Effect>> &effects) {
}

As far as using traits goes, this is it. However, to figure out what each trait does, you have to dig through the pass implementations a bit. That, and all the “helper” definitions like Pure that combine multiple traits are not documented, nor is the complete list of available traits and interfaces (the traits list is missing quite a few, like ConstantLike, Involution, Idempotent, etc.). When I first drafted this article, I had only applied the AlwaysSpeculatable trait list to the Poly_BinOp, and I was confused when --loop-invariant-code-motion was a no-op. I had to dig into the passes implementation here to actually see that it also needs isMemoryEffectFree which uses MemoryEffectOpInterface.

So next we’ll explore the general passes and traits, and add relevant traits to the poly dialect.

Passes already handled by Pure, or not relevant to poly

control-flow-sink moves ops that are only used in one branch of a conditional into the relevant branch. Requires the op to be memory-effect free, which Pure already covers. To demonstrate, I added the Pure trait to poly.from_tensor and added a test in this commit.

cse is constant subexpression elimination, which removes unnecessarily repeated computations when possible. Again, a lack of memory-effects suffices. Demonstrated in this commit.

-inline inlines function calls, which does not apply to poly.

-mem2reg replaces memory store/loads with direct usage of the underlying value, when possible. Should not require any changes and not interesting enough for me to demo here.

-remove-dead-values does things like remove function arguments that are unused, or return values that are not used by any caller. Should not require any changes.

-sroa is “scalar replacement of aggregates,” which seems like it is about reshuffling memory allocations around. Not really sure why this is useful.

-symbol-dce eliminates dead private functions, which does not apply to poly.

Punting one pass to next time

-sccp is sparse conditional constant propagation, which attempts to infer when an operation has a constant output, and then replaces the operation with the constant value. Repeating this, it “propagates” the constants as far as possible through the program. To support this requires a bit of extra work and requires me to introduces some more concepts, so I’ll do that next time.

Elementwise mappings

Now that we’ve gone through the most generic passes, I’ll cover some remaining traits I’m aware of.

There are a number of traits that extend scalar operations to tensor operations and vice versa. Elementwise, Scalarizable, Tensorizable, and Vectorizable, whose docs you can read in detail here, essentially allow you to use ops that work on scalars in tensors in the natural way. The trait list ElementwiseMappable combines them into a single trait. This commit demonstrates how adding the trait allows the poly.add op to magically work on tensor arguments. It also requires relaxing the ins arguments to the op in the tablegen, and we do this by using a so-called type constraint that permits polynomials and tensors containing them

// PolyOps.td
def PolyOrContainer : TypeOrContainer<Polynomial, "poly-or-container">;

class Poly_BinOp<string mnemonic> : Op<Poly_Dialect, mnemonic, [Pure, ElementwiseMappable]> {
  let arguments = (ins PolyOrContainer:$lhs, PolyOrContainer:$rhs);
  let results = (outs PolyOrContainer:$output);
  ...
}

Behind the hood, the generated code has a new type checking routine used in parsing:

// PolyOps.cpp.inc
static ::mlir::LogicalResult __mlir_ods_local_type_constraint_PolyOps0(
    ::mlir::Operation *op, ::mlir::Type type, ::llvm::StringRef valueKind,
    unsigned valueIndex) {
  if (!(((::llvm::isa<::mlir::tutorial::poly::PolynomialType>(type))) || ((((::llvm::isa<::mlir::VectorType>(type))) && ((::llvm::cast<::mlir::VectorType>(type).getRank() > 0))) && ([](::mlir
::Type elementType) { return (::llvm::isa<::mlir::tutorial::poly::PolynomialType>(elementType)); }(::llvm::cast<::mlir::ShapedType>(type).getElementType()))) || (((::llvm::isa<::mlir::TensorT
ype>(type))) && ([](::mlir::Type elementType) { return (::llvm::isa<::mlir::tutorial::poly::PolynomialType>(elementType)); }(::llvm::cast<::mlir::ShapedType>(type).getElementType()))))) {
    return op->emitOpError(valueKind) << " #" << valueIndex
        << " must be poly-or-container, but got " << type;
  }
  return ::mlir::success();
}

Moreover, the accessors that previously returned a PolynomialType or ::mlir::TypedValue<PolynomialType> now must return a more generic ::mlir::Type or ::mlir::Value because the values could be tensors or vectors as well. It’s left to the caller to type-switch or dyn_cast these manually during passes.

Note that after adding this, we now get the following legal syntax:

    %0 = ... -> !poly.poly<10>
    %1 = ... -> !poly.poly<10>

    %2 = tensor.from_elements %0, %1 : tensor<2x!poly.poly<10>>
    %3 = poly.add %2, %0 : (tensor<2x!poly.poly<10>>, !poly.poly<10>) -> tensor<2x!poly.poly<10>>

I would say the implied semantics is that the second poly is constant across the mapping.

Verifier traits

Some traits add extra verification checks to the operation as mixins. While we’ll cover custom verifiers in a future article, for now we can notice that the following is legal (with or without ElementwiseMappable):

    %0 = ... -> !poly.poly<10>
    %1 = ... -> !poly.poly<9>
    %2 = poly.add %0, %1 : (!poly.poly<10>, !poly.poly<9>) -> !poly.poly<10>

While one could make this legal by defining the semantics of add to embed the smaller-degree polynomial ring into the larger, we’re demonstrating traits, so we’ll add the SameOperandsAndResultElementType trait (a vectorized cousin of SameOperandsAndResultType), which asserts that the poly type in all the arguments (and elements of containers) are the same. This commit does it.

Last few unneeded traits

Involution is for operations that are their own opposite, $f(f(x)) = x$. This is a common math concept, and if we had something like a poly.neg op, it would be perfect for it. Adding it would enable a free canonicalization to remove the repeated ops.

Idempotent is for operations $f(x)$ for which $f(f(x)) = f(x)$. This is a common math concept, but none of the poly ops have it. A rounding op like ceiling or floor would. If it did apply, adding this trait would enable a free canonicalization like involution.

Broadcastable handles broadcast semantics for tensor/vector ops, which is not relevant.

Commutative is for ops whose arguments can be reordered, (including across multiple ops of the same kind). This is used by a pass here for the purpose of simplifying pattern matching, but as far as I can tell the pass is never registered or manually applied so this trait is a no-op.

AffineScope, IsolatedFromAbove, Terminator, SingleBlock, and SingleBlockImplicitTerminator are for ops that have regions and scoping rules, which we don’t.

SymbolTable is for ops that define a symbol table, which I think is mainly module (in which the symbols are functions), and this is mainly used in the -inline and -symbol-dce passes.