MLIR — Defining Patterns with PDLL || Math ∩ Programming

Table of Contents

In this article I’ll show how to use PDLL, a tool for defining MLIR patterns, which itself is built with MLIR. PDLL is intended to be a replacement for defining patterns in tablegen, though there are few public examples of its use. In fact, the main impetus for PDLL is that tablegen makes it difficult to express things like:

Operations that return multiple results
Operations with regions
Operations with variadic operands
Arithmetic on static values

While not all these features are fully supported in PDLL yet, they are within scope of the language and tooling.

The code for this article is in this pull request, and as usual the commits are organized to be read in order.

Interlude: updating the LLVM commit hash

Before getting into it, there were some recent features added to PDLL since the last article in this tutorial. So this commit updates the LLVM commit hash, and this commit makes minor changes to fix build failures and warnings. The only surprising change is that when a tablegen-defined operation implements an interface, rather than just having DeclareOpInterfaceMethods, you have to explicitly list the subset of methods that will be implemented, or their headers are never generated. This is surprising because the way I had done this before is to add the DeclareOpInterfaceMethods trait and let the linker complain to tell me which methods to add and the expected signature.

PDL and PDLL

PDL stands for Pattern Descriptor Language. This is not to be confused with PDLL, which humorously stands for Pattern Descriptor Language Language; it is the syntactically-rich frontend users interact with, while PDL is just an MLIR dialect.

As a dialect, pdl models core MLIR abstractions like operations, operands, and results. pdl also has operations that represent transformations like replace and erase. A user of PDLL may ignore pdl and work with PDLL syntax directly. The mlir-pdll tool parses PDLL files and generates pdl IR, which can then be lowered to C++ code that can be compiled into a pass. This is the path we’ll take in this article. The last section of the article will show the internal process in a bit more detail.

Since PDLL defines patterns, it replaces the implementation of a C++ RewritePattern. In code, you define structure-like syntax forms via the language keyword Pattern, within which there is a “match” section and a “rewrite” section.

Pattern {
  // match section
  let root = ...

  // ... apply constraints ...

  rewrite root with {
    //rewrite section
  }
}

Within the match section you can express IR matches similarly to tablegen/DRR-style patterns (see MLIR — Canonicalizers and Declarative Rewrite Patterns), doing things like writing an operation name as a function, and providing named arguments that implicitly capture references to the operands, with nesting.

For a concrete example, whereas DRR expressed these captures like

// From lib/Dialect/Poly/PolyPatterns.td
def DifferenceOfSquares : Pattern<
  (Poly_SubOp (Poly_MulOp:$lhs $x, $x), (Poly_MulOp:$rhs $y, $y)),
  ...
>;

In PDLL you might write this as

Pattern DifferenceOfSquares {
  let root = op<poly.sub>(
    lhs: op<poly.mul>(x: Value, x: Value),
    rhs: op<poly.mul>(y: Value, y: Value));

  ...
}

PDLL also supports defining constraints. For example, to enforce that an SSA value is defined as an arith.constant of a specific value and type, you can write:

let ten: attr<"10 : i32">;
let constant = op<arith.constant> {value = ten};

The argument to attr<...> is a string that is parsed as an attribute by MLIR, so it can be any valid MLIR attribute’s textual representation. The curly braces on after op<arith.constant> show how one captures static operation attributes (as opposed to operation operands, which are SSA values), and the name value in this case is the tablegen-given name of the attribute in the upstream ArithOps.td definition file.

Also note that PDLL supports importing tablegen definitions, without which the above would not compile.

#include "mlir/Dialect/Arith/IR/ArithOps.td"

Pattern {
  let ten: attr<"10 : i32">;
  let constant = op<arith.constant> {value = ten};
  ...
}

The constraints section of the PDLL docs have more details about what can be constrained.

Finally, the rewrite section is where you define what to do with the matched IR. Here you use the same syntax as the match section, but the interpretation is that the ops are being created instead of constrained. Below, we replace an arith.constant 10 : i32 with an arith.constant 11 : i32, for no discernible reason.

Pattern ReplaceTenWithEleven {
  let ten: attr<"10 : i32">;
  let constant = op<arith.constant> {value = ten};

  rewrite constant with {
    let newConst = op<arith.constant> {value = attr<"11 : i32">};
    replace constant with newConst;
  };
}

The allowed rewrite statements are erase and replace. The expression after with in a replace may be another operation, or some value or tuple of values (ValueRange) matched on previously.

Finally, one can define constraints in their own separate functions to facilitate reuse:

Constraint MyConstraint(value: Value) {
    ...
}

The main use I have for this is actually the main thing that makes PDLL usable while it’s still being built: the ability to escape back to C++ for anything it doesn’t support.

// Expects the user to register a function via the C++ API.
Constraint MyConstraint(value: Value);

For some simple such functions, you can inject the C++ directly in the PDLL file with the same [{ }] syntax as tablegen uses to define a multiline string.

Constraint HasOneUse(value: Value) [{
  return success(value.hasOneUse());
}];

For this and the previous (register) example, there is an explicit mapping from PDLL types (like Value) to C++ types (like ::mlir::Value). So hasOneUse is just a member function of ::mlir::Value, and success is the LogicalResult function used throughout this tutorial.

Porting `MulToAdd` to PDLL

With that fuzzy outline of what PDLL can do, let’s move on to the main event: porting the MulToAdd patterns (from MLIR — Writing Our First Pass). Recall the pass converts arithmetic multiplication by constants to repeated addition, using two patterns:

PowerOfTwoExpand to split the multiplication $x \cdot 2^n$ into $x \cdot 2^{n/2} + x \cdot 2^{n/2}$.
PeelFromMul: to split other multiplications $x \cdot C$ into $x + x \cdot (C-1)$.

We start by cloning the test file from mul_to_add.mlir to use a new pass --mul-to-add-pdll in this commit. The test is otherwise identical.

The next commit adds the pass boilerplate and the PDLL for PowerOfTwoExpand.

First, the parts of the boilerplate that are specific to PDLL. We define a new pass MulToAddPDLL, and we must add PDL and this PDLInterp dialect as dependent dialects. We will explain why these are required when discussing the internals of PDL later.

include "mlir/Dialect/PDL/IR/PDLDialect.td"
include "mlir/Dialect/PDLInterp/IR/PDLInterpOps.td"

def MulToAddPdll : Pass<"mul-to-add-pdll"> {
  let summary = "Convert multiplications to repeated additions using pdll";
  let description = [{
    Convert multiplications to repeated additions (using pdll).
  }];
  let dependentDialects = [
    "mlir::pdl::PDLDialect",
    "mlir::pdl_interp::PDLInterpDialect",
];
}

The bazel build rule used to process the pdll code is a bit curious: it is gentbl_cc_library, the same bazel rule as for processing tablegen files!

gentbl_cc_library(
    name = "MulToAddPdllIncGen",
    tbl_outs = [
        (
            ["-x=cpp"],
            "MulToAddPdll.h.inc",
        ),
    ],
    tblgen = "@llvm-project//mlir:mlir-pdll",
    td_file = "MulToAdd.pdll",
    deps = [
        "@llvm-project//mlir:ArithDialect",
        "@llvm-project//mlir:FuncDialect",
        "@llvm-project//mlir:ArithOpsTdFiles",
    ],
)

What changes here is that the “tblgen” argument, which specifies the binary to run, is now mlir-pdll instead of the default mlir-tblgen. The td_file argument doesn’t actually force anything to be tablegen, so it is given a pdll file. And the tbl_outs argument specifies which commands to pass to the “tblgen” binary, and in this case we need to pass -x=cpp to generate C++ code. We’ll see what this C++ code looks like in the last section of this article.

Finally, we have the actual PDLL and the C++ harness to load the patterns.

#include "mlir/Dialect/Arith/IR/ArithOps.td"

Constraint IsPowerOfTwo(attr: Attr) [{
  int64_t value = cast<::mlir::IntegerAttr>(attr).getValue().getSExtValue();
  return success((value & (value - 1)) == 0);
}];

// Currently, constraints that return values must be defined in C++
Constraint Halve(attr: Attr) -> Attr;

Pattern PowerOfTwoExpand with benefit(2) {
  let root = op<arith.muli>(op<arith.constant> {value = const: Attr}, rhs: Value);
  IsPowerOfTwo(const);
  let halved: Attr = Halve(const);

  rewrite root with {
    let newConst = op<arith.constant> {value = halved};
    let newMul = op<arith.muli>(newConst, rhs);
    let newAdd = op<arith.addi>(newMul, newMul);
    replace root with newAdd;
  };
}

Before discussing, here was the C++ version of this pattern for comparison:

struct PowerOfTwoExpand : public OpRewritePattern<MulIOp> {
  PowerOfTwoExpand(mlir::MLIRContext *context)
      : OpRewritePattern<MulIOp>(context, /*benefit=*/2) {}

  LogicalResult matchAndRewrite(MulIOp op,
                                PatternRewriter &rewriter) const override {
    Value lhs = op.getOperand(0);

    // canonicalization patterns ensure the constant is on the right, if there
    // is a constant See
    // https://mlir.llvm.org/docs/Canonicalization/#globally-applied-rules
    Value rhs = op.getOperand(1);
    auto rhsDefiningOp = rhs.getDefiningOp<arith::ConstantIntOp>();
    if (!rhsDefiningOp) {
      return failure();
    }

    int64_t value = rhsDefiningOp.value();
    bool is_power_of_two = (value & (value - 1)) == 0;

    if (!is_power_of_two) {
      return failure();
    }

    ConstantOp newConstant = rewriter.create<ConstantOp>(
        rhsDefiningOp.getLoc(),
        rewriter.getIntegerAttr(rhs.getType(), value / 2));
    MulIOp newMul = rewriter.create<MulIOp>(op.getLoc(), lhs, newConstant);
    AddIOp newAdd = rewriter.create<AddIOp>(op.getLoc(), newMul, newMul);

    rewriter.replaceOp(op, newAdd);
    return success();
  }
};

The PDLL is relatively straightforward, except for this bit:

// Currently, constraints that return values must be defined in C++
Constraint Halve(attr: Attr) -> Attr;

Pattern PowerOfTwoExpand with benefit(2) {
  ...
  let halved: Attr = Halve(const);
  ...
}

This is not only a constraint that is defined in C++, but a constraint that actually returns a result that can be used in the IR. This is a recently added feature, but it’s the only way I am aware of (without shelling out the entire rewrite section to C++), to do arithmetic on a statically-known numeric value in PDLL.

In the C++ implementation of the pass, which otherwise has the following usual boilerplate:

struct MulToAddPdll : impl::MulToAddPdllBase<MulToAddPdll> {
  using MulToAddPdllBase::MulToAddPdllBase;

  void runOnOperation() {
    mlir::RewritePatternSet patterns(&getContext());
    populateGeneratedPDLLPatterns(patterns);  // Auto-generated by mlir-pdll
    registerNativeConstraints(patterns);      // Not boilerplate
    (void)applyPatternsAndFoldGreedily(getOperation(), std::move(patterns));
  }
};

We must also implement registerNativeConstraints that maps PDLL-defined constraints (by name) to appropriate C++ functions. For Halve, that is

LogicalResult halveImpl(PatternRewriter &rewriter, PDLResultList &results,
                        ArrayRef<PDLValue> args) {
  Attribute attr = args[0].cast<Attribute>();
  IntegerAttr cAttr = cast<IntegerAttr>(attr);
  int64_t value = cAttr.getValue().getSExtValue();
  results.push_back(rewriter.getIntegerAttr(cAttr.getType(), value / 2));
  return success();
}

void registerNativeConstraints(RewritePatternSet &patterns) {
  patterns.getPDLPatterns().registerConstraintFunction("Halve", halveImpl);
}

registerConstraintFunction performs the mapping, and halveImpl is required to have a very specific signature. Note, the required signature of this C++ function is not clearly documented. I had to ask the guy who implemented it, who pointed me to upstream MLIR unit tests and this code comment. For external constraints that don’t return values, the expected signature is a bit more obvious, just LogicalResult myConstraintImpl(PatternRewriter &rewriter, ...), where the ellipsis are the PDLL arguments of the constraint with PDLL-to-MLIR type conversions applied.

But once that is done, everything compiles and runs, but the tests don’t quite pass yet. PowerOfTwoExpand, as currently defined, requires the constant to be in the first argument, whereas the test has it in the second. Canonicalization patterns ensure the constant is on the right, but because I forgot that when I was originally writing this tutorial, I wrote a second version of the pattern with the constant swapped to the other argument. That is in this commit.

Finally, the next commit implements PeelFromMul, along with a new native constraint MinusOne. These constraints also use the with benefit(...) syntax to ensure that PowerOfTwoExpand is applied first when both patterns could succeed. After this, the tests pass, and the last commit adds the CMake build.

A peek inside: `mlir-pdll`

We’ll start investigating the internals of PDL and PDLL by taking a closer look at the mlir-pdll tool. In the bazel build, you can run it as bazel run @llvm-project//mlir:mlir-pdll, but passing the arguments correctly takes some care and bazel knowhow. The actual command looks like this:

bazel run @llvm-project//mlir:mlir-pdll -- \
-x=mlir \
-I $PWD/external/llvm-project/mlir/include \
$PWD/lib/Transform/Arith/MulToAdd.pdll

Note the -- separates flags to pass to bazel run from flags to pass to mlir-pdll, and the paths must all be absolute because bazel run runs the binary in a hermetic filepath, not the current directory. Because MulToAdd.pdll includes the upstream ArithOps.td, we have to add an -I include flag, and the location external/llvm-project is a sym-link to where bazel stores the LLVM source code. And finally, the -x flag tells mlir-pdll what code to generate. The options are mlir, cpp, and ast.

The ast option parses the PDLL and shows the abstract syntax tree (AST), which is not useful for us. The mlir option generates pdl IR, and the cpp option generates the C++ code to include in the pass.

Here is an abbreviated version of the pdl IR generated for PowerOfTwoExpandRhs when passing -x=mlir:

pdl.pattern @PowerOfTwoExpandRhs : benefit(2) {
  %0 = operands
  %1 = attribute
  %2 = types
  %3 = operation "arith.constant"(%0 : !pdl.range<value>)  {"value" = %1} -> (%2 : !pdl.range<type>)
  %4 = result 0 of %3
  %5 = operand
  %6 = types
  %7 = operation "arith.muli"(%4, %5 : !pdl.value, !pdl.value)  -> (%6 : !pdl.range<type>)
  apply_native_constraint "IsPowerOfTwo"(%1 : !pdl.attribute)
  %8 = apply_native_constraint "Halve"(%1 : !pdl.attribute) : !pdl.attribute
  rewrite %7 {
    %9 = operation "arith.constant"  {"value" = %8}
    %10 = result 0 of %9
    %11 = operation "arith.muli"(%10, %5 : !pdl.value, !pdl.value)
    %12 = result 0 of %11
    %13 = result 0 of %11
    %14 = operation "arith.addi"(%12, %13 : !pdl.value, !pdl.value)
    replace %7 with %14
  }
}

And here is the corresponding C++

static ::llvm::LogicalResult IsPowerOfTwoPDLFn(::mlir::PatternRewriter &rewriter, ::mlir::Attribute attr) {
  int64_t value = cast<::mlir::IntegerAttr>(attr).getValue().getSExtValue();
  return success((value & (value - 1)) == 0);
}

namespace {

struct PowerOfTwoExpandRhs : ::mlir::PDLPatternModule {
  template <typename... ConfigsT>
  PowerOfTwoExpandRhs(::mlir::MLIRContext *context, ConfigsT &&...configs)
    : ::mlir::PDLPatternModule(::mlir::parseSourceString<::mlir::ModuleOp>(
R"mlir(pdl.pattern @PowerOfTwoExpandRhs : benefit(2) {
  %0 = operands
  %1 = attribute
  %2 = types
  %3 = operation "arith.constant"(%0 : !pdl.range<value>)  {"value" = %1} -> (%2 : !pdl.range<type>)
  %4 = result 0 of %3
  %5 = operand
  %6 = types
  %7 = operation "arith.muli"(%4, %5 : !pdl.value, !pdl.value)  -> (%6 : !pdl.range<type>)
  apply_native_constraint "IsPowerOfTwo"(%1 : !pdl.attribute)
  %8 = apply_native_constraint "Halve"(%1 : !pdl.attribute) : !pdl.attribute
  rewrite %7 {
    %9 = operation "arith.constant"  {"value" = %8}
    %10 = result 0 of %9
    %11 = operation "arith.muli"(%10, %5 : !pdl.value, !pdl.value)
    %12 = result 0 of %11
    %13 = result 0 of %11
    %14 = operation "arith.addi"(%12, %13 : !pdl.value, !pdl.value)
    replace %7 with %14
  }
}
    )mlir", context), std::forward<ConfigsT>(configs)...) {
    registerConstraintFunction("IsPowerOfTwo", IsPowerOfTwoPDLFn);
  }
};

} // end namespace

template <typename... ConfigsT>
static void LLVM_ATTRIBUTE_UNUSED populateGeneratedPDLLPatterns(::mlir::RewritePatternSet &patterns, ConfigsT &&...configs) {
  patterns.add<PowerOfTwoExpandRhs>(patterns.getContext(), configs...);
}

As you can see, the PDL MLIR code is inlined as a string and parsed, while the native constraint is filled in as a C++ function and registered. This also explains why pdl is required as a dependent dialect of the pass; the pass will create new pdl IR from this string, so the dialect needs to be loaded.

The above shows a design choice of the PDL/PDLL system: instead of generating C++ code directly from pdl IR, they defer to a runtime interpreter to run the patterns and apply constraints. Only the native constraints are implemented in C++. In other words, this is where the scope of mlir-pdll ends and the scope of the PDL interpreter begins.

A peek inside: `pdl-interp`

The above code adds an ::mlir::PDLPatternModule to a RewritePatternSet, and indeed RewritePatternSet has a special hook to handle these pattern module classes. When the RewritePatternSet is converted to a FrozenRewritePatternSet (which happens implicitly on calling applyPatternsAndFoldGreedily), MLIR will convert the pdl to pdl_interp, a lower-level dialect that represents the individual operations required to execute a pattern match and rewrite. Then this pdl_interp IR is converted to PDLByteCode and stored on the pattern set. This all happens here.

The PDL bytecode class doubles as the bytecode interpreter. It exposes match and rewrite hooks used in the Pattern applicator. And its implementation is itself quite a beast, starting from this ByteCodeExecutor::exeucte function. I won’t go into detail here, mainly because I haven’t studied much of this code, but I will note that this is all wrapped up in a Pattern and handled just like any other MLIR pattern.

Why to use PDLL and what’s next for it

I asked the MLIR maintainers about the justifications for PDL and PDLL. They gave me a few reasons.

The first reason they give is extensibility. By interpreting patterns, you can let the user pass their own patterns to the compiler without having to rebuild it. I don’t believe mlir-opt has a built-in option for this, but given the pipeline we saw above, it’s clear how one would support it: parse PDLL, convert to PDL and then pdl_interp, then run the interpreter on the bytecode. Given how slow the MLIR build process is, this seems like a serious benefit.

The second reason they gave is the compiler’s binary size bloat. The more patterns you compile, the larger the compiler’s binary gets. My group’s MLIR-based HEIR project has a nightly build that is already 160 MiB, and most of that is upstream MLIR stuff. According to the maintainers, the PDL bytecode is roughly 10x smaller than the corresponding compiled C++ object file.

The third reason is eliminating duplicate work by combining common work across match phases in different patterns. For example, attribute lookups are slow in MLIR because they involve a linear scan across all of an op’s attributes and use string comparisons to find the attribute’s name. If lots of patterns match on the same attribute, that work can be performed jointly. Even accessing operands and results can be slow, if you are doing it across hundreds of patterns for the same op. This “joint computation” is achieved by merging the pattern match clauses into a single finite state machine (described in some more detail in a 2019 talk). In that talk, the author claims that the time to do all this pattern merging work is roughly 15x faster than the time to compile the analogous C++ code.

And finally, they suggested that using the interpreter for pattern matching is faster than the same patterns written in C++. This seems to ignore the cost of constructing the finite state machine mentioned above, but the 2019 talk was before the code was upstreamed and I think it could use some additional confirmation.

PDLL is impressive, but not yet feature complete. Here are some of the features I have noticed are not yet in the language.

Support for arithmetic, boolean logic, and comparisons (has an RFC).
Support for regions (no RFC yet).
Support for use in dialect conversion (no RFC yet).

For more information about PDLL, see the official documentation, the 2021 MLIR open meeting recording on PDLL, the 2021 meeting on PDL, and the 2019 meeting on the interpreter

Thanks to Matthias Gehre for answering my questions about PDL, and the folks in this thread for providing links to the talks mentioned above.

Want to respond? Send me an email, post a webmention, or find me elsewhere on the internet.

DOI: https://doi.org/10.59350/d1qb5-w8y53

MLIR — Defining Patterns with PDLL

Interlude: updating the LLVM commit hash

PDL and PDLL

Porting MulToAdd to PDLL

A peek inside: mlir-pdll

A peek inside: pdl-interp

Why to use PDLL and what’s next for it

Porting `MulToAdd` to PDLL

A peek inside: `mlir-pdll`

A peek inside: `pdl-interp`