MLIR — Using Traits

Table of Contents

Last time we defined a new dialect poly for polynomial arithmetic. This time we’ll spruce up the dialect by adding some pre-defined MLIR traits, and see how the application of traits enables some general purpose passes to optimize poly programs.

The code for this article is in this pull request, and as usual the commits are organized to be read in order.

Traits and Loop Invariant Code Motion

As a compiler toolchain, MLIR heavily emphasizes code reuse. This is accomplished largely through two constructs: traits and interfaces. An interface is an interface in the normal programming sense of the word: a set of function signatures implemented by a type and providing some limited-scope behavior. Applied to MLIR, you can implement interfaces on operations and types, and then passes can operate at the interface level. Traits are closely related: a trait is an interface with no methods. Traits can just be “slapped on” an operation and passes can magically start working with them. They can also serve as mixins for common op verification routines, type inference, and more.

In this article I’ll show how adding traits to the operations in the poly dialect (defined last time) allows us to reuse existing MLIR passes on poly constructs. In future articles we’ll see how to define new traits and interfaces. But for now, existing traits are an extremely simple way to start using the batteries included in MLIR on new dialects.

As mentioned, one applies traits primarily to enable passes to do things with custom operations. So let’s start from the passes. The general transformation passes list includes a pass called loop invariant code motion. This checks loop bodies for any operations that don’t need to be in the loop, and moves them out of the loop body. Using this requires us to add two traits to ops to express that they are safe to move around. The first is called NoMemoryEffect (which is technically an empty implementation of an interface) that asserts the operation does not have any side effects related to writing to memory. The second is AlwaysSpeculatable (technically a list of two traits), which says that an operation is allowed to be “speculatively executed,” i.e., computed early. If it is speculatable, then the compiler can move the op to another location. If not, say it reads from a memory location that can be written to, there is an earliest point before which it’s not safe to move the op.

Loop invariant code motion takes ops with these two traits, and hoists them outside of loops when the operation’s operands are unchanged by the body of the loop. Conveniently, MLIR also defines a single named list of traits called Pure, which is NoMemoryEffect and AlwaysSpeculatable. So we can just add the trait name to our tablegen op definition (via a template parameter that defaults to an empty list of traits) as in this commit.

@@ -3,9 +3,10 @@

 include ""
 include ""
+include "mlir/Interfaces/"

-class Poly_BinOp<string mnemonic> : Op<Poly_Dialect, mnemonic> {
+class Poly_BinOp<string mnemonic> : Op<Poly_Dialect, mnemonic, [Pure]> {
   let arguments = (ins Polynomial:$lhs, Polynomial:$rhs);

This commit adds the boilerplate to register all default MLIR passes in tutorial-opt, and adds an example test asserting a poorly-placed poly.mul is hoisted out of the loop body.

// RUN: tutorial-opt %s --loop-invariant-code-motion > %t
// RUN: FileCheck %s < %t
... <setup> ...
// CHECK: poly.mul
// CHECK: affine.for
%ret_val = affine.for %i = 0 to 100 iter_args(%sum_iter = %p0) -> !poly.poly<10> {
  // The poly.mul should be hoisted out of the loop.
  // CHECK-NOT: poly.mul
  %2 = poly.mul %p0, %p1 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>
  %sum_next = poly.add %sum_iter, %2 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>
  affine.yield %sum_next : !poly.poly<10>

In the generated C++ code, adding new traits or interfaces adds new template arguments in the op’s class definition:

class SubOp : public ::mlir::Op<SubOp, 
::mlir::ConditionallySpeculatable::Trait,            // <-- new
::mlir::OpTrait::AlwaysSpeculatableImplTrait,   // <-- new
::mlir::MemoryEffectOpInterface::Trait>          // <--- new
{ ... }

And NoMemoryEffect adds a trivial implementation of the memory effects interface:

void SubOp::getEffects(
  ::llvm::SmallVectorImpl<::mlir::SideEffects::EffectInstance<::mlir::MemoryEffects::Effect>> &effects) {

As far as using traits goes, this is it. However, to figure out what each trait does, you have to dig through the pass implementations a bit. That, and all the “helper” definitions like Pure that combine multiple traits are not documented, nor is the complete list of available traits and interfaces (the traits list is missing quite a few, like ConstantLike, Involution, Idempotent, etc.). When I first drafted this article, I had only applied the AlwaysSpeculatable trait list to the Poly_BinOp, and I was confused when --loop-invariant-code-motion was a no-op. I had to dig into the passes implementation here to actually see that it also needs isMemoryEffectFree which uses MemoryEffectOpInterface.

So next we’ll explore the general passes and traits, and add relevant traits to the poly dialect.

Passes already handled by Pure, or not relevant to poly

control-flow-sink moves ops that are only used in one branch of a conditional into the relevant branch. Requires the op to be memory-effect free, which Pure already covers. To demonstrate, I added the Pure trait to poly.from_tensor and added a test in this commit.

cse is constant subexpression elimination, which removes unnecessarily repeated computations when possible. Again, a lack of memory-effects suffices. Demonstrated in this commit.

-inline inlines function calls, which does not apply to poly.

-mem2reg replaces memory store/loads with direct usage of the underlying value, when possible. Should not require any changes and not interesting enough for me to demo here.

-remove-dead-values does things like remove function arguments that are unused, or return values that are not used by any caller. Should not require any changes.

-sroa is “scalar replacement of aggregates,” which seems like it is about reshuffling memory allocations around. Not really sure why this is useful.

-symbol-dce eliminates dead private functions, which does not apply to poly.

Punting one pass to next time

-sccp is sparse conditional constant propagation, which attempts to infer when an operation has a constant output, and then replaces the operation with the constant value. Repeating this, it “propagates” the constants as far as possible through the program. To support this requires a bit of extra work and requires me to introduces some more concepts, so I’ll do that next time.

Elementwise mappings

Now that we’ve gone through the most generic passes, I’ll cover some remaining traits I’m aware of.

There are a number of traits that extend scalar operations to tensor operations and vice versa. Elementwise, Scalarizable, Tensorizable, and Vectorizable, whose docs you can read in detail here, essentially allow you to use ops that work on scalars in tensors in the natural way. The trait list ElementwiseMappable combines them into a single trait. This commit demonstrates how adding the trait allows the poly.add op to magically work on tensor arguments. It also requires relaxing the ins arguments to the op in the tablegen, and we do this by using a so-called type constraint that permits polynomials and tensors containing them

def PolyOrContainer : TypeOrContainer<Polynomial, "poly-or-container">;

class Poly_BinOp<string mnemonic> : Op<Poly_Dialect, mnemonic, [Pure, ElementwiseMappable]> {
  let arguments = (ins PolyOrContainer:$lhs, PolyOrContainer:$rhs);
  let results = (outs PolyOrContainer:$output);

Behind the hood, the generated code has a new type checking routine used in parsing:

static ::mlir::LogicalResult __mlir_ods_local_type_constraint_PolyOps0(
    ::mlir::Operation *op, ::mlir::Type type, ::llvm::StringRef valueKind,
    unsigned valueIndex) {
  if (!(((::llvm::isa<::mlir::tutorial::poly::PolynomialType>(type))) || ((((::llvm::isa<::mlir::VectorType>(type))) && ((::llvm::cast<::mlir::VectorType>(type).getRank() > 0))) && ([](::mlir
::Type elementType) { return (::llvm::isa<::mlir::tutorial::poly::PolynomialType>(elementType)); }(::llvm::cast<::mlir::ShapedType>(type).getElementType()))) || (((::llvm::isa<::mlir::TensorT
ype>(type))) && ([](::mlir::Type elementType) { return (::llvm::isa<::mlir::tutorial::poly::PolynomialType>(elementType)); }(::llvm::cast<::mlir::ShapedType>(type).getElementType()))))) {
    return op->emitOpError(valueKind) << " #" << valueIndex
        << " must be poly-or-container, but got " << type;
  return ::mlir::success();

Moreover, the accessors that previously returned a PolynomialType or ::mlir::TypedValue<PolynomialType> now must return a more generic ::mlir::Type or ::mlir::Value because the values could be tensors or vectors as well. It’s left to the caller to type-switch or dyn_cast these manually during passes.

Note that after adding this, we now get the following legal syntax:

    %0 = ... -> !poly.poly<10>
    %1 = ... -> !poly.poly<10>

    %2 = tensor.from_elements %0, %1 : tensor<2x!poly.poly<10>>
    %3 = poly.add %2, %0 : (tensor<2x!poly.poly<10>>, !poly.poly<10>) -> tensor<2x!poly.poly<10>>

I would say the implied semantics is that the second poly is constant across the mapping.

Verifier traits

Some traits add extra verification checks to the operation as mixins. While we’ll cover custom verifiers in a future article, for now we can notice that the following is legal (with or without ElementwiseMappable):

    %0 = ... -> !poly.poly<10>
    %1 = ... -> !poly.poly<9>
    %2 = poly.add %0, %1 : (!poly.poly<10>, !poly.poly<9>) -> !poly.poly<10>

While one could make this legal by defining the semantics of add to embed the smaller-degree polynomial ring into the larger, we’re demonstrating traits, so we’ll add the SameOperandsAndResultElementType trait (a vectorized cousin of SameOperandsAndResultType), which asserts that the poly type in all the arguments (and elements of containers) are the same. This commit does it.

Last few unneeded traits

Involution is for operations that are their own opposite, $f(f(x)) = x$. This is a common math concept, and if we had something like a poly.neg op, it would be perfect for it. Adding it would enable a free canonicalization to remove the repeated ops.

Idempotent is for operations $f(x)$ for which $f(f(x)) = f(x)$. This is a common math concept, but none of the poly ops have it. A rounding op like ceiling or floor would. If it did apply, adding this trait would enable a free canonicalization like involution.

Broadcastable handles broadcast semantics for tensor/vector ops, which is not relevant.

Commutative is for ops whose arguments can be reordered, (including across multiple ops of the same kind). This is used by a pass here for the purpose of simplifying pattern matching, but as far as I can tell the pass is never registered or manually applied so this trait is a no-op.

AffineScope, IsolatedFromAbove, Terminator, SingleBlock, and SingleBlockImplicitTerminator are for ops that have regions and scoping rules, which we don’t.

SymbolTable is for ops that define a symbol table, which I think is mainly module (in which the symbols are functions), and this is mainly used in the -inline and -symbol-dce passes.

Computing Percentages Easier

Problem: Compute 16% of 25 in your head.

Solution: 16% of 25 is equivalent to 25% of 16, which is clearly 4. This is true for all numbers: $x\%$ of $y$ is always equal to $y\%$ of $x$. The first one is $\frac{x}{100} y$ and the second is $\frac{y}{100}x$, and because multiplication is commutative and associative, both are equal to $(x \cdot y) / 100$. You can pick the version that is easiest.

Discussion: While this doesn’t work for every problem, it gives you three quick options to try. The first two options are to see if either of the two numbers makes it an easy problem (like 25% or 10%). The second, which “falls out” from the proof, is that you can first just multiply the two numbers together and then divide by 100. This often works well when one of the numbers is a single digit (e.g., 4% of 41 = 1.64).

Viewing the mental arithmetic as a math puzzle inspires all kinds of creativity. You can compute “$x\%$ of $y$” in their head by first computing 1% of $y$ and then scaling it up by $x$. Or you can split the denominator 100 into two pieces, such as $((x / 10) \cdot (y / 10))$, which is the same as “compute 10% of x and y separately, then multiply them together,” making a problem like 40% of 70 = 28 seem less intimidating than the “multiply then divide by 100” approach. The common 20% gratuity calculation is “move the decimal place over by 1 and double it,” i.e., $\frac{x}{100} \cdot 20 = \frac{2 \cdot 10 \cdot x}{10 \cdot 10} = \frac{2 \cdot x}{10}$.

MLIR — Defining a New Dialect

Table of Contents

In the last article in the series, we migrated the passes we had written to use the tablegen code generation framework. That was a preface to using tablegen to define dialects.

In this article we’ll define a dialect that represents arithmetic on single-variable polynomials, with coefficients in $\mathbb{Z} / 2^{32} \mathbb{Z}$ (32-bit unsigned integers).

The code for this article is in this pull request, and as usual the commits are organized to be read in order.

Sketching out a design

The basic dialect will define a new polynomial type, and provide operations to define polynomials by specifying their coefficients from standard MLIR types, extract data about a polynomial to store the results in standard MLIR types, and to do arithmetic operations on polynomials.

There is quite a large surface area of design options when writing a dialect. This talk by Jeff Niu and Mehdi Amini gives some indication of how one might start to think about dialect design. But in brief, a polynomial dialect would fit into the “computation” bucket (a dialect to represent number crunching).

A slide from “MLIR Dialect Design and Composition for Front-End Compilers” (timestamped link), describing a taxonomy of dialects.

Another idea relevant to starting a dialect design is to ask how a dialect will enable easy optimizations. That is the Optimizer dialect class above. But since this tutorial series is focusing on the low-level, day to day, nitty gritty details of working with MLIR, we’re going to focus on getting some custom dialect defined, and then come back to what makes a good dialect design later.

An empty dialect

We’ll start by defining an empty dialect and just look at the tablegen-generated code, which is done in this commit. The tablegen file looks like

include "mlir/IR/"

def Poly_Dialect : Dialect {
  let name = "poly";
  let summary = "A dialect for polynomial math";
  let description = [{
    The poly dialect defines types and operations for single-variable
    polynomials over integers.

  let cppNamespace = "::mlir::tutorial::poly";

It is almost indistinguishable from the tablegen for a Pass, except for the Dialect base class. The build rule has different flags too.

    name = "dialect_inc_gen",
    tbl_outs = [
        (["-gen-dialect-decls"], ""),
        (["-gen-dialect-defs"], ""),
    tblgen = "@llvm-project//mlir:mlir-tblgen",
    td_file = "",
    deps = [

Unlike the Pass codegen, here the dialect has us specify a codegen’ed header and implementation file, which are short enough to include directly in the article:

// bazel-bin/lib/Dialect/Poly/
namespace mlir {
namespace tutorial {

class PolyDialect : public ::mlir::Dialect {
  explicit PolyDialect(::mlir::MLIRContext *context);

  void initialize();
  friend class ::mlir::MLIRContext;
  ~PolyDialect() override;
  static constexpr ::llvm::StringLiteral getDialectNamespace() {
    return ::llvm::StringLiteral("poly");
} // namespace tutorial
} // namespace mlir

And the cpp:

// bazel-bin/lib/Dialect/Poly/
namespace mlir {
namespace tutorial {

PolyDialect::PolyDialect(::mlir::MLIRContext *context)
    : ::mlir::Dialect(getDialectNamespace(), context, ::mlir::TypeID::get<PolyDialect>()) {

PolyDialect::~PolyDialect() = default;

} // namespace tutorial
} // namespace mlir

Basically, they are empty containers that will hold the types and ops that we define next.

In this commit we register the dialect with the tutorial-opt main program, which is handled by a single API call. Now the dialect shows up in the help text of the tutorial-opt binary, though nothing else is there because we have no passes associated with it.

$ bazel run tools:tutorial-opt -- --help
Available Dialects: ..., pdl_interp, poly, quant, ...

Adding a trivial type

Next we’ll define a poly.poly type with no semantics, and in the next section we’ll focus on the semantics.

This commit defines a simple test that ensures we can parse and print our new type.

// RUN: tutorial-opt %s

module {
  func.func @main(%arg0: !poly.poly) -> !poly.poly {
    return %arg0 : !poly.poly

Note that the exclamation mark ! sigil prefix is required for out-of-tree MLIR types.

Next, in this commit we add the tablegen for the poly type. The tablegen looks like

include ""
include "mlir/IR/"

// A base class for all types in this dialect
class Poly_Type<string name, string typeMnemonic> : TypeDef<Poly_Dialect, name> {
  let mnemonic = typeMnemonic;

def Poly : Poly_Type<"Polynomial", "poly"> {
  let summary = "A polynomial with i32 coefficients";

  let description = [{
    A type for polynomials with integer coefficients in a single-variable polynomial ring.

This is effectively a boilerplate shell since nothing here is specific to polynomial arithmetic. But it shows a few new things about tablegen worth mentioning.

First and most trivially, tablegen has include statements so that you can split definitions across files. I like to put each conceptual type of thing in its own tablegen file (types, ops, attributes, etc.), but conventions differ across projects.

Second, a common pattern when defining types (and ops) is to have a base class for each dialect that all concrete type definitions inherit from. This shows first of all that there is a difference between class and def in tablegen. def defines actual types, where by “actual” I mean it generates C++ code, whereas class is only an inheritance base and disappears after tablegen is done. Think of class as allowing us to refactor out common shared code among type definitions in one file. I make a stink of this because I have mixed up class and def many times and been confused by the error messages and generated code. For example, if you change the class to a def above, you’ll see the following unhelpful error message.

lib/Dialect/Poly/ error: Expected a class name, got 'Poly_Type'
def Poly_Type<string name, string typeMnemonic> : TypeDef<Poly_Dialect, name> {

Moreover, TypeDef itself is a class that takes as template arguments the dialect the type should belong to and a name field (related to the type’s eventual C++ class name), and results in Tablegen associating the generated C++ classes with the same namespace as we told the dialect to use, among other things.

Third, the new field is the mnemonic declaration. This determines the name of the type in the textual representation of the IR.

The generated code again has separate and files:


namespace mlir {
class AsmParser;
class AsmPrinter;
} // namespace mlir
namespace mlir {
namespace tutorial {
namespace poly {
class PolynomialType;
class PolynomialType : public ::mlir::Type::TypeBase<PolynomialType, ::mlir::Type, ::mlir::TypeStorage> {
  using Base::Base;
  static constexpr ::llvm::StringLiteral getMnemonic() {
    return {"poly"};

} // namespace poly
} // namespace tutorial
} // namespace mlir





static ::mlir::OptionalParseResult generatedTypeParser(::mlir::AsmParser &parser, ::llvm::StringRef *mnemonic, ::mlir::Type &value) {
  return ::mlir::AsmParser::KeywordSwitch<::mlir::OptionalParseResult>(parser)
    .Case(::mlir::tutorial::poly::PolynomialType::getMnemonic(), [&](llvm::StringRef, llvm::SMLoc) {
      value = ::mlir::tutorial::poly::PolynomialType::get(parser.getContext());
      return ::mlir::success(!!value);
    .Default([&](llvm::StringRef keyword, llvm::SMLoc) {
      *mnemonic = keyword;
      return std::nullopt;

static ::mlir::LogicalResult generatedTypePrinter(::mlir::Type def, ::mlir::AsmPrinter &printer) {
  return ::llvm::TypeSwitch<::mlir::Type, ::mlir::LogicalResult>(def)
   .Case<::mlir::tutorial::poly::PolynomialType>([&](auto t) {
      printer << ::mlir::tutorial::poly::PolynomialType::getMnemonic();
      return ::mlir::success();
    .Default([](auto) { return ::mlir::failure(); });

namespace mlir {
namespace tutorial {
namespace poly {
} // namespace poly
} // namespace tutorial
} // namespace mlir


The name PolynomialType is generated by adding Type to the "Polynomial" template argument we passed in the tablegen file. The name of the def itself is used to refer to the class elsewhere in tablegen files, and the two can be different.

One thing to pay attention to here is that tablegen is attempting to generate a type parser and printer for us. But it’s not usable yet—we’ll come back to this. If you build at this commit you’ll see a compiler warning like generatedTypePrinter defined but not used and a hard failure if you try running the test.

A second thing to notice is that it uses the header-guards-as-function-arguments style again, here to separate the cpp file into two include-guarded sections, one that just has a list of the types defined, and the other that has implementations of functions. The first one, GET_TYPEDEF_LIST is curious because it just includes a comma-separated list of class names. This is because the PolyDialect.cpp from this commit is responsible for registering the types with the dialect, and that happens by using this include to add the C++ class names for the types as template arguments in the Dialect’s initialization function.

// PolyDialect.cpp
#include "lib/Dialect/Poly/PolyDialect.h"

#include "lib/Dialect/Poly/PolyTypes.h"
#include "mlir/include/mlir/IR/Builders.h"
#include "llvm/include/llvm/ADT/TypeSwitch.h"

#include "lib/Dialect/Poly/"
#include "lib/Dialect/Poly/"

namespace mlir {
namespace tutorial {
namespace poly {

void PolyDialect::initialize() {
#include "lib/Dialect/Poly/"

} // namespace poly
} // namespace tutorial
} // namespace mlir

We’ll do the same registration dance for ops, attributes, etc., later.

The expected way to set up the C++ interface to the tablegen files is:

  • Create a header file PolyTypes.h which is the only file allowed to include,
  • Include inside PolyDialect.cpp with any additional #includes needed for the auto-generated implementations in In our case, the default parser/printer uses the type switch function from llvm/include/llvm/ADT/TypeSwitch.h.
  • If needed, add a PolyTypes.cpp with any additional implementations needed for functions declared by tablegen that can’t be automatically generated.

I don’t think I’ve found the best arrangement of these files and build targets yet. What I really want is to have each header with a corresponding, a corresponding .cpp file that includes the relevant, and to connect them all with PolyDialect.cpp. However, PolyDialect::initialize does some introspection on the classes declared in to ensure they’re valid (specifically looking for details of Storage types like is_trivially_destructible, which are generated for us in the next section), and that requires the implementations to exist in the same compilation unit (I believe). From what I’ve read of other projects, people just tend to cram things into the same cpp files, leading to multi-thousand line implementation files, which I dislike.

As a final note, the build file uses a new rule called td_files to group all the tablegen files into one build target.

The last ingredient required to get this to compile and run is to make the type parser and printer usable. Thankfully it’s one line: adding let useDefaultTypePrinterParser = 1; to the dialect tablegen (see this commit). This adds the following declarations to, and modifies the generated implementations to be member functions on PolyDialect.

  /// Parse a type registered to this dialect.
  ::mlir::Type parseType(::mlir::DialectAsmParser &parser) const override;

  /// Print a type registered to this dialect.
  void printType(::mlir::Type type,
                 ::mlir::DialectAsmPrinter &os) const override;

Adding a poly type parameter

When designing the poly type, we have to think about what we want to express, and how the type will be lowered to types in lower level dialects. For the sake of this tutorial we’ll restrict to single-variable polynomials, so the indeterminate can be arbitrary and implicit (we don’t need to keep track of the variable name x or y or t in the IR).

The next question is the polynomial analogue of integer bit width: degree. The difficulty here is that tracking the exact degree of a polynomial is impossible in general. The degree of a sum of two polynomials depends on the coefficient values (the highest degree terms may cancel), which may not be known statically because they are read from an external source at runtime. So at best we could track an upper bound on the degree.

The simplest approach is to require a polynomial type to declare its “degree upper bound” statically, and then define “overflow” semantics just like integers have. A natural option is to implicitly treat polynomials as members of a ring $R[x]/ (x^D-1)$, where $D$ is the degree upper bound and $R$ is the ring of 32-bit unsigned integers. In less mathematical terms, this means that whenever a term $x^n$ in a polynomial would exceed degree $D \leq n$, we would replace $x^n$ with $x^{n \mod D}$, which is the same thing as “declaring” $x^D=1$ and reducing polynomials via that substitution until all terms have degree less than $D$.

This simple approach has the benefit of making the dialect design easy: all binary polynomial ops have the same input and output type, and lowering a poly type requires replacing poly.poly<D> with a tensor of D coefficients. Since the shapes are static and most polynomials would have the same bound, it’s probably the most performant option. The downsides are that now the programmer has to worry about overflow semantics, and if you want polynomials of larger degree you have to insert bookkeeping operations to extend the degree bound.

On the other hand, we could allow a poly to have a growing degree. So that, e.g., a poly.mul operation with two poly.poly<7> input polynomials would have a poly.poly<14> as output. This would require more work in the dialect definition to ensure the IR is valid, and the lowering would be more complex in that you’d have to manage tensors of different sizes. It would probably also have an impact on performance, since there would be more memory allocations and copying involved (but maybe the compiler could be smart enough to avoid that).

I would like to do both so as to show how these differences materialize in lowerings and optimization passes. But for now I will start with the one that I think is easier: wrapping overflow semantics.

To make that work, we need to add an attribute to our polynomial type representing its degree upper bound. The official docs on how to do this are here.

This is where tablegen starts to be quite useful, because the changes only require two lines added to the type definition’s tablegen.

  let parameters = (ins "int":$degreeBound);
  let assemblyFormat = "`<` $degreeBound `>`";

The first line associates each instance of the type with an integer parameter (the "int" can be any string containing a literal C++ type), and a name $degreeBound that is used both for the generated C++ and to refer to it elsewhere in the tablegen file.

The second line is required because now that a poly type has this associated data, we need to be able to print it to and parse it from the textual IR representation. This simplest option is to let tablegen auto-generate the parser and printer for us, but we could have also used the line let hasCustomAssemblyFormat = 1; which would generate some headers that it expects us to fill in the implementation for. In the syntax of that line, tokens in backticks are literally printed/parsed.

After adding these two lines in this commit, the generated code gets quite a bit more complicated. Here’s the entirety of both files. Some things to note:

  • PolynomialType gets a new int getDegreeBound() method, as well as a static get(MLIRContext, int) factory method.
  • The parser and printer are upgraded to the new format.
  • There is a new class called PolynomialTypeStorage that holds the int parameter and is hidden in an inner detail namespace.

The storage class is autogenerated for us now because integers have simple construction/destruction semantics. If we had a more complicated argument like an array that needed allocation, we’d have to implement special classes to define those semantics. And at the most extreme end, if we had a fully custom type parameter, we’d have to implement a storage class manually, implement things like hash_code, and register it with the dialect. For the curious, I had to do this in heir/pull/74 to implement a custom Polynomial type parameter.

The same commit updates the syntax test to include the type parameter.

Adding some simple operations

Moving a bit faster now, we want to add some polynomial operations. This commit adds a polynomial addition op. The tablegen:

include ""
include ""

def Poly_AddOp : Op<Poly_Dialect, "add"> {
  let summary = "Addition operation between polynomials.";
  let arguments = (ins Polynomial:$lhs, Polynomial:$rhs);
  let results = (outs Polynomial:$output);
  let assemblyFormat = "$lhs `,` $rhs attr-dict `:` `(` type($lhs) `,` type($rhs) `)` `->` type($output)";

It looks very similar to a type, but the base class is Op, the arguments correspond to the operation’s inputs, and the assembly format is more complicated. We’ll enhance it in a future article, but the reason is that without inserting some special hooks, tablegen isn’t able to generate a parser that can infer what the types of the inputs and outputs should be when constructing it from the textual representation. [Aside: I feel like it should be able to with the simple example above, but for whatever reason the MLIR devs appear to have made auto-generated type inference opt-in via the trait infrastructure, see next article.]

Still, we can add a test and it passes

  // CHECK-LABEL: test_add_syntax
  func.func @test_add_syntax(%arg0: !poly.poly<10>, %arg1: !poly.poly<10>) -> !poly.poly<10> {
    // CHECK: poly.add
    %0 = poly.add %arg0, %arg1 : (!poly.poly<10>, !poly.poly<10>) -> !poly.poly<10>
    return %0 : !poly.poly<10>

The generated C++ has quite a bit more going on. Here’s the full generated code. Mainly the generated header defines AddOp which has getters for the arguments and results, “mutable” getters for modfying the op in place, parse/print, and generated builder methods. It also defines an AddOpAdaptor class which is used during lowering. The cpp file contains mostly rote implementations of these, converting from a generic internal representation to specific types and named objects that client code would use.

Next, in this commit I added a sub and mul operation, with a slight refactoring to make a base tablegen class for a binary operation.

Next, in this commit I added a from_tensor operation that converts a list of coefficients to a polynomial, and an eval op that evaluates a polynomial at a given input value.

In the next few articles we’ll expand on this dialect’s capabilities. First, we’ll study what other “batteries” we can include in the dialect itself, and how we can run optimizations on poly programs. Then we’ll study the nuances of lowering poly to existing MLIR dialects, and eventually through to LLVM and then machine code.

MLIR — Using Tablegen for Passes

Table of Contents

In the last article in this series, we defined some custom lowering passes that modified an MLIR program. Notably, we accomplished that by implementing the required interfaces of the MLIR API directly.

This is not the way that most MLIR developers work. Instead, they use a code generation tool called tablegen to generate boilerplate for them, and then only add the implementation methods that are custom to their work. In this article, we’ll convert the custom passes we wrote previously to use this infrastructure, and going forward we will use tablegen for defining new dialects as well.

The code relevant to this article is contained in this pull request, and as usual the commits are organized so that they can be read in order.

How to think about tablegen

The basic idea of tablegen is that you can define a pass—or a dialect, as we’ll do next time—in the tablegen DSL (as specialized for MLIR) and then run the mlir-tblgen binary with appropriate flags, and it will generate headers for the functions you need to implement your pass, along with default implementations of some common functions, documentation, and hooks to register the pass/dialect with the mlir-opt-like entry point tool.

It sounds nice, but I have a love hate relationship with tablegen. I personally find it to be unpleasant to use, primarily because it provides poor diagnostic information when you do things wrong. Today, though, I realize that my part of my frustration came from having the wrong initial mindset around tablegen. I thought, incorrectly, that tablegen was an abstraction layer. That is, I could write my tablegen files, build them, and only think about the parts of the generated code that I needed to implement.

Instead, I’ve found that I still need to understand the details of the generated code, enough that I may as well read the generated code closely until it becomes familiar. This materializes in a few ways:

  • mlir-tablegen doesn’t clearly tell you what functions are left unimplemented or explain the contract of the functions you have to write. For that you have to read the docs (which are often incomplete), or often the source code of the base classes that the generated boilerplate subclasses. Or, sadly, sometimes the Discourse threads or Phabricator commit messages are the only places to find the answers to some questions.
  • The main way to determine what is missing is to try to build the generated code with some code that uses it, and then sift through hundreds of lines of C++ compiler errors, which in turn requires understanding the various template gymnastics in the generated code.
  • The generated code will make use of symbols that you have to know to import or forward-declare in the right places, and it expects you to manage the namespaces in which the generated code lives.

With the expectation that tablegen is a totally see-through code generator, and then just sitting down and learning the MLIR APIs, the result is not so bad. Indeed, that was the motivation for building the passes in the last article from “scratch” (without tablegen), because it makes the transition to using tablegen more transparent. Pass generation is also a good starting point for tablegen because, by comparison, using tablegen for dialects results in many more moving parts and configurable options.

Tablegen files and the mlir-tblgen binary

We’ll start by migrating the AffineFullUnroll pass from last time. The starting point is to write some tablegen code. This commit implements it, which I’ll abbreviate below. If you want the official docs on how to do this, see Declarative Pass Specification.

include "mlir/Pass/"

def AffineFullUnroll : Pass<"affine-full-unroll"> {
  let summary = "Fully unroll all affine loops";
  let description = [{
    Fully unroll all affine loops. (could add more docs here like code examples)
  let dependentDialects = ["mlir::affine::AffineDialect"];

[Aside: In the actual commit, there are two definitions, one for the AffineFullUnroll that walks the IR, and the other for AffineFullUnrollPatternRewrite which uses the pattern rewrite engine. In the article I’ll just show the generated code for the first.]

As you can see, tablegen has concepts like classes and class inheritance (the : Pass<...> is subclassing the Pass base class defined in, an upstream MLIR file. But the def keyword here specifically implies we’re instantiating this thing in a way that the codegen tool should see and generate real code for (as opposed to the class keyword, which is just for tablegen templating and code reuse).

So tablegen lets you let-define string variables and lists, but one thing that won’t be apparent in this article, but will be apparent in the next article, is that tablegen lets you define variables and use them across definitions, as well as define snippets of C++ code that should be put into the generated classes (which can use the defined variables). This is visible in the present context in the class which defines a code constructor variable. If you have a special constructor for your pass, you can write the C++ code for it there.

Next we define a build rule using gentbl_cc_library to run the mlir-tblgen binary (in the same commit) with the right options. This gentbl_cc_library bazel rule is provided by the MLIR devs, and it basically just assembles the CLI flags to mlir-tblgen and ensures the code-genned files end up in the right places on the filesystem and are compatible with the bazel dependency system. The build rule invocation looks like

    name = "pass_inc_gen",
    tbl_outs = [
    tblgen = "@llvm-project//mlir:mlir-tblgen",
    td_file = "",
    deps = [

The important part here is that td_file specifies our input file, and tbl_outs defines the generated file,, which is at $GIT_ROOT/bazel-bin/lib/Transform/Affine/

The main quirk with gentbl_cc_library is that the name of the bazel rule is not the target that actually generates the code. That is, if you run bazel build pass_inc_gen (or from the git root, bazel build lib/Transform/Affine:pass_inc_gen), it won’t create the files but the build will be successful. Instead, under the hood gentbl_cc_library is a bazel macro that generates the rule pass_inc_gen_filegroup, which is what you have to bazel build to see the actual files.

I’ve pasted the generated code (with both version of the AffineFullUnroll) into a gist and will highlight the important parts here. The first quirky thing the generated code does is use #ifdef as a sort of function interface for what code is produced. For example, you will see:

std::unique_ptr<::mlir::Pass> createAffineFullUnroll();

... <lots of C++ code> ...

This means that to use this file, we will need to define the appropriate symbol in a #define macro before including this header. You can see it happening in this commit, but in brief it will look like this

// in file AffineFullUnroll.h

#include "lib/Transform/Affine/"

// in file AffineFullUnroll.cpp

#include "lib/Transform/Affine/"

... <implement the missing functions from the generated code> ...

I’m no C++ expert, and this was the first time I’d seen this pattern of using #include as a function with #define as the argument. It was a little unsettling to me, until I landed on that mindset that it’s meant to be a white-box codegen, not an abstraction. So read the generated code. Inside the GEN_PASS_DECL_... guard, it defines a single function std::unique_ptr<::mlir::Pass> createAffineFullUnroll(); that is a very limited sole entry point for code that wants to use the pass. We don’t need to implement it unless our Pass has a custom constructor. Then in the GEN_PASS_DEF_... guard it defines a base class, whose functions I’ll summarize, but you should recognize many of them because we implemented them by hand last time.

template <typename DerivedT>
class AffineFullUnrollBase : public ::mlir::OperationPass<> {
  AffineFullUnrollBase() : ::mlir::OperationPass<>(::mlir::TypeID::get<DerivedT>()) {}
  AffineFullUnrollBase(const AffineFullUnrollBase &other) : ::mlir::OperationPass<>(other) {}

  static ::llvm::StringLiteral getArgumentName() {...}
  static ::llvm::StringRef getArgument() { ... }
  static ::llvm::StringRef getDescription() { ... }
  static ::llvm::StringLiteral getPassName() { ... }
  static ::llvm::StringRef getName() { ... }

  /// Support isa/dyn_cast functionality for the derived pass class.
  static bool classof(const ::mlir::Pass *pass) { ... }

  /// A clone method to create a copy of this pass.
  std::unique_ptr<::mlir::Pass> clonePass() const override { ... }

  /// Return the dialect that must be loaded in the context before this pass.
  void getDependentDialects(::mlir::DialectRegistry &registry) const override {

  ... <type_id stuff> ...

Notably, this doesn’t tell us what functions are left for us to implement. For that we have to either build it and read compiler error messages, or compare it to the base class (OperationPass) and it’s base class (Pass) to see that the only function left to implement is runOnOperation() Or, since we did this last time from the raw API, we can observe that the boilerplate functions we implemented before like getArgument are here, but runOnOperation is not.

Another notable aspect of the generated code is that it uses the curiously recurring template pattern (CRTP), so that the base class can know the eventual name of its subclass, and use that name to hook the concrete subclass into the rest of the framework.

Lower in the generated file you’ll see another #define-guarded block for GEN_PASS_REGISTRATION, which implements hooks for tutorial-opt to register the passes without having to depend on each internal Pass class directly.


inline void registerAffineFullUnroll() {
  ::mlir::registerPass([]() -> std::unique_ptr<::mlir::Pass> {
    return createAffineFullUnroll();

inline void registerAffineFullUnrollPatternRewrite() {
  ::mlir::registerPass([]() -> std::unique_ptr<::mlir::Pass> {
    return createAffineFullUnrollPatternRewrite();

inline void registerAffinePasses() {

This implies that, once we link everything properly, the changes to tutorial-opt (in this commit) simplify to calling registerAffinePasses. This registration macro is intended to go into a Passes.h file that includes all the individual pass header files, as done in this commit. And we can use that header file as an anchor for a bazel build target that includes all the passes defined in lib/Transform/Affine at once.

#include "lib/Transform/Affine/AffineFullUnroll.h"
#include "lib/Transform/Affine/AffineFullUnrollPatternRewrite.h"

namespace mlir {
namespace tutorial {

#include "lib/Transform/Affine/"

}  // namespace tutorial
}  // namespace mlir

Finally, after all this (abbreviated from this commit), the actual content of the pass reduces to the following subclass (with CRTP) and implementation of runOnOperation, the body of which is identical to the last article except for a change from reference to pointer for the return value of getOperation.

#include "lib/Transform/Affine/"

struct AffineFullUnroll : impl::AffineFullUnrollBase<AffineFullUnroll> {
  using AffineFullUnrollBase::AffineFullUnrollBase;

  void runOnOperation() {
    getOperation()->walk([&](AffineForOp op) {
      if (failed(loopUnrollFull(op))) {
        op.emitError("unrolling failed");

I split the AffineFullUnroll migration into multiple commits to highlight the tablegen vs C++ code changes. For MulToAdd, I did it all in one commit. The tests are unchanged, because the entry point is still the tutorial-opt binary with the appropriate CLI flags, and those names are unchanged in the tablegen’ed code.

Bonus: mlir-tblgen also has an option -gen-pass-doc, which you’ll see in the commits, which generates a markdown file containing auto-generated documentation for the pass. A CI workflow can copy these to a website directory, as we do in HEIR, and you get free docs. See this example from HEIR.

Addendum: hermetic Python

When I first set up this tutorial project, I didn’t realize that bazel’s Python rules use the system Python by default. Some early readers found an error that Python couldn’t find the lit module when running tests. While pip install lit in your system Python would work, I also migrated in this commit to a hermetic python runtime and explicit dependency on lit. It should all be handled automatically by bazel now.