Getting started with writing LLVM passes

Posted on November 28, 2017

Tags: LLVM, compilers, c++

The LLVM compiler infrastructure is probably the most popular compiler at the moment (at least for research/ teaching purposes). By being more modular than GCC, it allows users to use it in more creative ways. Adding passes is relatively simple, allowing anyone to get started with compilers.

By adding passes, LLVM can analyze and transform intermediate representation. One example of such a transformation pass is Dead Code Elimination, which removes instructions that are never executed in a program.

For example the LLVM IR of the function myfunc:

define i32 @myfunc() #0 {
  %1 = alloca i32, align 4
  %2 = load i32, i32* %1, align 4
  ret i32 0
}

myfunc contains an unnecessary load on line 3, as the value of the load is never used. Using dead code elimination, this function can be transformed into the following snippet, keeping the same semantic functionality:

define i32 @myfunc() #0 {
  ret i32 0
}

In general an LLVM pass takes as input some IR (be it a function, a module, or a basic block) and transforms it into something else. Naturally, many passes can be chained.

We will now have a look at how to set up LLVM and write simple comiler passes. This guide will be tailored for macOS, however, roughly the same steps apply to Linux.

Setting up LLVM

Since compiling LLVM from source takes a long time (and is not really necessary for getting started with writing passes), we will install some pre-compiled binaries.

Install brew.
Install llvm anc clang using brew: brew install llvm clang
Export the llvm binary path export PATH="/usr/local/opt/llvm/bin:$PATH"

You should now be able to use the LLVM tools like opt, llc and llvm-dis.

Writing your first pass

First, create a new directory (e.g., llvm-test). In the directory, create a new file called DummyPass.cpp. You should include the minimal required headers:

#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"

and create a class named DummyPass, extending FunctionPass. This class will have one public method, runOnFunction, which will be called for every function in the program to be compiled:

namespace {
    struct DummyPass : public FunctionPass {
        public:
            static char ID;
            DummyPass() : FunctionPass(ID) {}
            virtual bool runOnFunction(Function &F) override;
    };
}


bool DummyPass::runOnFunction(Function &F) {
    return false; // We did not alter the IR
}

Finally, we have to register our pass with LLVM:


// Register the pass with llvm, so that we can call it with dummypass
char DummyPass::ID = 0;
static RegisterPass<DummyPass> X("dummypass", "Example LLVM pass printing each function it visits");

The contents of the file should now look like this:

#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"

namespace {
    struct DummyPass : public FunctionPass {
        public:
            static char ID;
            DummyPass() : FunctionPass(ID) {}
            virtual bool runOnFunction(Function &F) override;
    };
}


bool DummyPass::runOnFunction(Function &F) {
    return false; // We did not alter the IR
}

// Register the pass with llvm, so that we can call it with dummypass
char DummyPass::ID = 0;
static RegisterPass<DummyPass> X("dummypass", "Example LLVM pass printing each function it visits");

Compile your pass by running clang -g3 -shared -o libdummypass.so DummyPass.cpp. If the linker complains about missing symbols, add the following flag: -Wl,-headerpad_max_install_names -undefined dynamic_lookup.

You should now have created a shared library libdummypass.so, containing your pass.

Now create a dummy program hello.c:


void a(){
  int c = 5;
}

int main()
{
  int d = 1;
  a();
  return 5;
}

Using clang, transform this program into llvm bitcode, like so: clang -O0 -emit-llvm hello.c. Your directory should now have a file called hello.bc. This file contains LLVM bitcode, and as such is not human-readable. You can disassemble this bitcode file into llvm IR using the llvm-dis tool: llvm-dis hello.bc.

You now have a file called hello.ll. This file contains human-readable LLVM IR. Open it with a text-editor and have a look at it.

We now want to compiler our program using our newly created LLVM pass. By running opt -load libdummypass.so -dummypass hello.ll, you will run the DummyPass on your program. So far your pass does not do anything.

Let’s add some code that does something! In your runOnFunction add the following and re-compile your pass.

errs() << "Visiting function " << F.getName();

Now, when you run your pass you should see the following output from opt:

Visiting function a
Visiting function main

Congratulations! You just wrote your first LLVM analysis pass!

Doing more

Of course this pass is not that usefull; you probably want to do something on a more-granular level than function-level.

Luckily it is quite simple to follow the structure in your program using features of LLVM. You can loop over all basic blocks of a function using the following code:

for (BasicBlock &BB : F) {
  errs() << "Visiting BB: " << BB;
}

Or even loop over each instruction:

for (BasicBlock &BB : F) {
  for (Instruction &II : BB) {
    // Do something with II
  }
}

Of course the instruction can be one of the various instruction types (e.g., CallInst, LoadInst, etc). Typically, you try to cast the Instruction to a certain type. If the cast succeeds, it is of that type:

Instruction *I = &II;
if (CallInst *CI = dyn_cast<CallInst>(I)) {
  errs() << "Encountered a call instruction " << CI << "\n";
}

Using debug info

Additional debug information can be obtained using LLVM debug info. This is really nice if you want to do some analysis which points back to your source file.

For example, the following code visits each call instruction, and prints where the function is called in your source file.

bool DummyPass::runOnFunction(Function &F) {
    errs() << "Visiting function " << F.getName();

    for (BasicBlock &BB : F) {
        for (Instruction &II : BB) {
            Instruction *I = &II;
            if (CallInst *CI = dyn_cast<CallInst>(I)) {
                if (DILocation *Loc = I->getDebugLoc()) {
                  unsigned Line = Loc->getLine();
                  StringRef File = Loc->getFilename();
                  StringRef Dir = Loc->getDirectory();
                  errs() << Dir << "/" << File << ":" << Line << "\n";
                }
            }
        }
    }

    return false;
}

Kinda neat right?! So far we have only done some simple analysis operations. Of course, LLVM allows you to modify the IR as well as doing a bunch of other things. In the next post we will have a look at the IRBuilder.