The Clang AST

A good way to get an understanding of the Clang AST is to look at it: you can get a dump of the AST by running:

clang -Xclang -ast-dump -fsyntax-only <your-file>

You should try this for a few examples to get a feel of it.

The -fsyntax-only option tells Clang to stop the compilation process after building the AST. The -Xclang flag specifies that the option following it should be passed to the front-end (which is confusingly dubbed Clang, just like the command line driver). Without this flag, -ast-dump is not recognized.

There are several ways to obtain information from the AST, for instance by using Matchers or Visitors. We will use the most powerful visitor: RecursiveASTVisitor. Its strength lies in the fact that it not only provides access to the nodes of the AST, but also allows us to change the order in which the AST is traversed.

An interesting thing to note about the implementation of RecursiveASTVisitor is that it uses the CRTP (Curiously Recurring Template Pattern) idiom in order to achieve static polymorphism.

We will subclass RecursiveASTVisitor and invoke its TraverseTranslationUnitDecl from the HandleTranslationUnitDecl method of our ASTConsumer. This will make the visitor traverse the entire declaration (including, for function definitions, the body of the function), invoking Visit* methods on every node. We will write the largest part of our tools in these Visit* methods.

You can get a good understanding of the RecursiveASTVisitor by reading the comments in its header file (clang/AST/RecursiveASTVisitor.h).

The ASTContext

The ASTContext stores a wealth of valuable information regarding the AST. For instance, it holds information about types (the size, alignment etc).

We won’t interact with it that much - except in order to get the SourceManager and LangOptions, which are needed by many other objects that will be discussed later (such as the Rewriter).

The rewriting mechanism

Clang maintains a close correspondence between the source code and its internal structures - for example, most AST elements have at least two methods for retrieving SourceLocation objects: getLocStart and getLocEnd, which indicate the range in the source file where that particular element is located. SourceLocation objects don’t provide much information in and of themselves - they usually need to be translated with the help of a SourceManager (you should use the one from the ASTContext or from the Rewriter).

These links to the original source file are useful for many purposes (e.g. for providing expressive diagnostics), but we will focus on how they can be used for rewriting code. Clang provides a Rewriter class, with methods for adding, removing or replacing text (among others). These methods take as inputs SourceLocations (or SourceRanges), and perform all the modifications without affecting the original file. We can decide what to do with the rewritten text - we can overwrite the original file, by calling overwriteChangedFiles, or we can get a representation of the modifications by calling getRewriteBufferFor.

You should note that Rewriter methods usually return true if something has failed and false otherwise. Also, if several modifications are performed on the same source range, the order of these modifications matters.

sesiuni/compiler/ast.txt · Last modified: 2014/07/13 22:10 by freescale