Getting started with writing LLVM passes

Posted on November 28, 2017
Tags: LLVM, compilers, c++

The LLVM com­piler in­fra­struc­ture is prob­ably the most pop­ular com­piler at the mo­ment (at least for re­search/ teaching pur­pos­es). By being more mod­ular than GCC, it al­lows users to use it in more cre­ative ways. Adding passes is rel­a­tively sim­ple, al­lowing anyone to get started with com­pil­ers.

By adding passes, LLVM can an­a­lyze and trans­form in­ter­me­diate rep­re­sen­ta­tion. One ex­ample of such a trans­for­ma­tion pass is Dead Code Elim­i­na­tion, which re­moves in­struc­tions that are never ex­e­cuted in a pro­gram.

For ex­ample the LLVM IR of the func­tion my­func:

de­fine i32 @my­func() #0 {
  %1 = al­loca i32, align 4
  %2 = load i32, i32* %1, align 4
  ret i32 0
}

my­func con­tains an un­nec­es­sary load on line 3, as the value of the load is never used. Using dead code elim­i­na­tion, this func­tion can be trans­formed into the fol­lowing snip­pet, keeping the same se­mantic func­tion­al­ity:

de­fine i32 @my­func() #0 {
  ret i32 0
}

In gen­eral an LLVM pass takes as input some IR (be it a func­tion, a mod­ule, or a basic block) and trans­forms it into some­thing else. Nat­u­rally, many passes can be chained.

We will now have a look at how to set up LLVM and write simple comiler passes. This guide will be tai­lored for ma­cOS, how­ever, roughly the same steps apply to Linux.

Set­ting up LLVM

Since com­piling LLVM from source takes a long time (and is not re­ally nec­es­sary for get­ting started with writing pass­es), we will in­stall some pre-­com­piled bi­na­ries.

  1. In­stall brew.
  2. In­stall llvm anc clang using brew: brew in­stall llvm clang
  3. Ex­port the llvm bi­nary path ex­port PATH="/us­r/lo­cal/op­t/l­lvm/bin:$­PATH"

You should now be able to use the LLVM tools like opt, llc and llvm-dis.

Writing your first pass

First, create a new di­rec­tory (e.g., llvm-test). In the di­rec­tory, create a new file called Dum­my­Pass.cpp. You should in­clude the min­imal re­quired head­ers:

#in­clude "l­lvm/­Pass.h"
#in­clude "l­lvm/IR/­Func­tion.h"
#in­clude "l­lvm/­Sup­port­/raw_ostream.h"

and create a class named Dum­my­Pass, ex­tending Func­tion­Pass. This class will have one public method, runOn­Func­tion, which will be called for every func­tion in the pro­gram to be com­piled:

name­space {
    struct Dum­my­Pass : public Func­tion­Pass {
        public:
            static char ID;
            Dum­my­Pass() : Func­tion­Pass(ID) {}
            vir­tual bool runOn­Func­tion(­Func­tion &F) over­ride;
    };
}


bool Dum­my­Pass::runOn­Func­tion(­Func­tion &F) {
    re­turn false; // We did not alter the IR
}

Fi­nally, we have to reg­ister our pass with LLVM:


// Reg­ister the pass with llvm, so that we can call it with dum­my­pass
char Dum­my­Pass::ID = 0;
static Reg­is­ter­Pass<­Dum­my­Pass> X("dum­my­pass", "Ex­ample LLVM pass printing each func­tion it vis­its");

The con­tents of the file should now look like this:

#in­clude "l­lvm/­Pass.h"
#in­clude "l­lvm/IR/­Func­tion.h"
#in­clude "l­lvm/­Sup­port­/raw_ostream.h"

name­space {
    struct Dum­my­Pass : public Func­tion­Pass {
        public:
            static char ID;
            Dum­my­Pass() : Func­tion­Pass(ID) {}
            vir­tual bool runOn­Func­tion(­Func­tion &F) over­ride;
    };
}


bool Dum­my­Pass::runOn­Func­tion(­Func­tion &F) {
    re­turn false; // We did not alter the IR
}

// Reg­ister the pass with llvm, so that we can call it with dum­my­pass
char Dum­my­Pass::ID = 0;
static Reg­is­ter­Pass<­Dum­my­Pass> X("dum­my­pass", "Ex­ample LLVM pass printing each func­tion it vis­its");

Com­pile your pass by run­ning clang -g3 -shared -o lib­dum­my­pass.so Dum­my­Pass.cpp. If the linker com­plains about missing sym­bols, add the fol­lowing flag: -Wl,-­head­er­pad_­max_in­stal­l_­names -un­de­fined dy­nam­ic_lookup.

You should now have cre­ated a shared li­brary lib­dum­my­pass.so, con­taining your pass.

Now create a dummy pro­gram hel­lo.c:


void a(){
  int c = 5;
}

int main()
{
  int d = 1;
  a();
  re­turn 5;
}

Using clang, trans­form this pro­gram into llvm bit­code, like so: clang -O0 -emit-llvm hel­lo.c. Your di­rec­tory should now have a file called hel­lo.bc. This file con­tains LLVM bit­code, and as such is not hu­man-read­able. You can dis­as­semble this bit­code file into llvm IR using the llvm-dis tool: llvm-dis hel­lo.bc.

You now have a file called hel­lo.ll. This file con­tains hu­man-read­able LLVM IR. Open it with a tex­t-ed­itor and have a look at it.

We now want to com­piler our pro­gram using our newly cre­ated LLVM pass. By run­ning opt -load lib­dum­my­pass.so -dum­my­pass hel­lo.ll, you will run the Dum­my­Pass on your pro­gram. So far your pass does not do any­thing.

Let’s add some code that does some­thing! In your runOn­Func­tion add the fol­lowing and re-­com­pile your pass.

errs() << "Vis­iting func­tion " << F.get­Name();

Now, when you run your pass you should see the fol­lowing output from opt:

Vis­iting func­tion a
Vis­iting func­tion main

Con­grat­u­la­tions! You just wrote your first LLVM analysis pass!

Doing more

Of course this pass is not that use­full; you prob­ably want to do some­thing on a more-­gran­ular level than func­tion-level.

Luckily it is quite simple to follow the struc­ture in your pro­gram using fea­tures of LLVM. You can loop over all basic blocks of a func­tion using the fol­lowing code:

for (Ba­sicBlock &BB : F) {
  errs() << "Vis­iting BB: " << BB;
}

Or even loop over each in­struc­tion:

for (Ba­sicBlock &BB : F) {
  for (In­struc­tion &II : BB) {
    // Do some­thing with II
  }
}

Of course the in­struc­tion can be one of the var­ious in­struc­tion types (e.g., Call­Inst, Load­Inst, etc). Typ­i­cally, you try to cast the In­struc­tion to a cer­tain type. If the cast suc­ceeds, it is of that type:

In­struc­tion *I = &II;
if (Call­Inst *CI = dyn_­cast<­Call­Inst>(I)) {
  errs() << "En­coun­tered a call in­struc­tion " << CI << "\n";
}

Using debug info

Ad­di­tional debug in­for­ma­tion can be ob­tained using LLVM debug info. This is re­ally nice if you want to do some analysis which points back to your source file.

For ex­am­ple, the fol­lowing code visits each call in­struc­tion, and prints where the func­tion is called in your source file.

bool Dum­my­Pass::runOn­Func­tion(­Func­tion &F) {
    errs() << "Vis­iting func­tion " << F.get­Name();

    for (Ba­sicBlock &BB : F) {
        for (In­struc­tion &II : BB) {
            In­struc­tion *I = &II;
            if (Call­Inst *CI = dyn_­cast<­Call­Inst>(I)) {
                if (DILo­ca­tion *Loc = I->get­De­bu­gLoc()) {
                  un­signed Line = Loc->get­Line();
                  StringRef File = Loc->get­File­name();
                  StringRef Dir = Loc->get­Di­rec­to­ry();
                  errs() << Dir << "/" << File << ":" << Line << "\n";
                }
            }
        }
    }

    re­turn false;
}

Kinda neat right?! So far we have only done some simple analysis op­er­a­tions. Of course, LLVM al­lows you to modify the IR as well as doing a bunch of other things. In the next post we will have a look at the IR­Builder.

blog comments powered by Disqus