Syntax Highlighting

As a case study, we will try to use the infrastructure provided by Clang for syntax highlighting. We will start from this project based on the LibTooling interface that we described earlier. It has its own FrontendAction, ASTConsumer and RecursiveASTVisitor, which all have access to a Rewriter object (created by the FrontendAction for each input file). We will use the RecursiveASTVisitor in order to get information about the input and the Rewriter in order to prepare our output. We will use the HandleTranslationUnitDecl method of our ASTConsumer in order to invoke the RecursiveASTVisitor and to wrap things up (e.g. dump the output).

Our project will take as input a set of source files and produce, for each of them, an HTML file containing the highlighted source code. An example can be seen below. Note that this does not necessarily showcase the best HTML practices.

<html>
	<head>
		<title>Some title (can be anything you want)</title>
	</head>
	<body>
		<pre>
<span style="color: blue">int</span> foo;
		</pre>
	</body>
</html>

The source code will be placed inside the <pre> tag, which indicates preformatted text (this guarantees that the indentation of the original code will be preserved). The elements that we wish to highlight will be placed inside a <span> tag, which allows us to customize, among others, the color of the text.

HTMLSupport.h declares a few functions for dealing with the gory HTML details:

  • std::string Prologue(const std::string &title) - generates everything up to (and including) the <pre> tag; its output will be inserted before the text of the source file;
  • std::string Epilogue(void) - returns the closing tags starting from </pre>;
  • std::string HighlightingBeginTag(const std::string &color) - generates a <span> tag with the color given as argument (you can choose colors from this list); its output will be inserted at appropriate places within the text of the source code;
  • std::string HighlightingEndTag(void) - returns the </span> tag.

Download this project and follow the comments.

Task 0

Build and run the project on a few sample code files. You should obtain HTML files that can be viewed in the browser. You may also wish to inspect the generated HTML code.

Task 1

Time to do some highlighting. For this we are going to use the RecursiveASTVisitor - we will define Visit* methods for the AST nodes that we wish to highlight.

You should note that these methods do not override in the traditional sense any methods from RecursiveASTVisitor (i.e., they are not virtual), but you should still make sure that they have the same signature as their correspondents from RecursiveASTVisitor (or else they won't get called). Furthermore, these methods should return true if everything went well - if they return false, the process of visiting is aborted.

The project currently contains a stub for VisitIntegerLiteral, to be used for highlighting integer constants. You can also add the corresponding methods for floating point constants or string constants.

Task 2

Obviously, highlighting constants is not very spectacular - it can be done using only the lexer. The true strength of our AST-based highlighter becomes apparent when trying to highlight type names. The lexer is only capable of identifying built-in types (int, short etc), whereas having a fully annotated AST gives us the possibility to correctly identify types defined by the user.

Task 3

You probably noticed during the previous task that function declarations are highlighted in their entirety. Since this is likely not the behaviour we were after, we can skip highlighting TypeLocs that represent function types.

Task 4

Another thing that we may wish to do is to highlight uses of the parameters within functions. We can do this by looking for DeclRefExprs - references of declared values. We can isolate the parameters by checking the kind of declaration that the expression is referencing.

With the same method we can also highlight function calls or uses of global variables.

Task 5

Now that you are a bit familiar with how our highlighter should work, you should try to add your own Visit* methods for other elements of the AST. Keep in mind that the signatures of the methods that you define must match the ones in RecursiveASTVisitor, or else they won’t get called. Also beware that the definition of RecursiveASTVisitor is laden with macros, so if you’re having difficulty figuring out method signatures you may want to try to identify them on the preprocessed file instead.

Task 6 (bonus)

One of the reasons why it has traditionally been difficult to write good refactoring tools for C/C++ is the existence of the preprocessor. Try to find examples where the use of macros and preprocessing directives is causing trouble for our syntax highlighter.

sesiuni/llvm/highlighting.txt · Last modified: 2015/09/10 12:03 by freescale