Syntax highlighting

As a case study, we will try to use the infrastructure provided by Clang for syntax highlighting. The ~/Workshop/Highlighter directory contains a project based on the LibTooling interface that we described earlier. It has its own FrontendAction, ASTConsumer and RecursiveASTVisitor, which all have access to a Rewriter object (created by the FrontendAction for each input file). We will use the RecursiveASTVisitor in order to get information about the input and the Rewriter in order to prepare our output. We will use the HandleTranslationUnitDecl method of our ASTConsumer in order to invoke the RecursiveASTVisitor and to wrap things up (e.g. dump the output).

Our project will take as input a set of source files and produce, for each of them, an HTML file containing the highlighted source code. An example can be seen below. Note that this does not showcase the best HTML practices.

<html>
	<head>
		<title>Some title (can be anything you want)</title>
	</head>
	<body>
		<pre>
<span style="color: blue">int</span> foo;
		</pre>
	</body>
</html>

The source code will be placed inside the <pre> tag, which indicates preformatted text (this guarantees that the indentation of the original code will be preserved). The elements that we wish to highlight will be placed inside a <span> tag, which allows us to customize, among others, the color of the text.

HTMLSupport.h declares a few functions for dealing with the gory HTML details:

  • std::string Prologue(const std::string &title) - generates everything up to (and including) the <pre> tag; its output will be inserted before the text of the source file;
  • std::string Epilogue(void) - returns the closing tags starting from </pre>;
  • std::string HighlightingBeginTag(const std::string &color) - generates a <span> tag with the color given as argument (you can choose colors from this list); its output will be inserted at appropriate places within the text of the source code;
  • std::string HighlightingEndTag(void) - returns the </span> tag.

Task 1

Build and run the project on a few sample code files. You should obtain HTML files that can be viewed in the browser. You may however notice some problems (hint: try to include a system header in your test file).

If you want to run on your own machine, you can find the project here here.

Task 2

The first thing that we have to do in order to get our highlighter to work right is to escape special HTML characters such as the angular brackets. HTMLSupport.h offers a method for escaping several HTML characters that could cause problems: std::string EscapeText(const std::string &text, bool &isDifferent).

We will escape the text entire text of the original file and then replace it in the Rewriter.

Task 3

Now that we can see our code in the browser, it’s time to do some highlighting. For this we are going to use the RecursiveASTVisitor - we will define Visit* methods for the AST nodes that we wish to highlight.

You should note that these methods do not override in the traditional sense any methods from RecursiveASTVisitor (i.e., they are not virtual), but you should still make sure that they have the same signature as their correspondents from RecursiveASTVisitor (or else they won't get called). Furthermore, these methods should return true if everything went well - if they return false, the process of visiting is aborted.

The project currently contains a stub for VisitIntegerLiteral, to be used for highlighting integer constants. You can also add the corresponding methods for floating point constants or string constants.

Task 4

Obviously, highlighting constants is not very spectacular - it can be done using only the lexer. The true strength of our AST-based highlighter becomes apparent when trying to highlight type names. The lexer is only capable of identifying built-in types (int, short etc), whereas having a fully annotated AST gives us the possibility to correctly identify types defined by the user.

Task 5

You probably noticed during the previous task that function declarations are highlighted in their entirety. Since this is likely not the behaviour we were after, we can skip highlighting TypeLocs that represent function types.

Task 6

Another thing that we may wish to do is to highlight uses of the parameters within functions. We can do this by looking for DeclRefExprs - references of declared values. We can isolate the parameters by checking the kind of declaration that the expression is referencing.

With the same method we can also highlight function calls or variable uses.

Task 7

Now that you are a bit familiar with how our highlighter should work, you should try to add your own Visit* methods for other elements of the AST. Keep in mind that the signatures of the methods that you define must match the ones in RecursiveASTVisitor, or else they won’t get called. Also beware that the definition of RecursiveASTVisitor is laden with macros, so if you’re having difficulty figuring out method signatures it is highly recommended that you make use of the autocomplete feature available in Eclipse (or if you don’t like IDEs in general or Eclipse in particular, you can try to preprocess the RecursiveASTVisitor header and identify the signatures on the preprocessed file).

sesiuni/compiler/highlighting.txt · Last modified: 2014/07/16 11:02 by apicus