Lab 2 (en): lexer generators

Learn a lexer generator for your programming language. This can include Flex (C/C++), Alex (Haskell), possibly also JFlex (Java) and other generators.

In the program written in the previous session, replace the handwritten lexer by a generated one.

Flex

Discuss an example of a Flex analyser, e.g.

%{
#include "exp.tab.h" /* Defines lexems, like NUM etc. */
%}
%option noyywrap
%%
[[:digit:]]+  {
         yylval = atoi( yytext ) ;
         return (int)NUM;
       }
.  { return (int)(yytext[0]); }
\n  { return (int)'\n'; }
%%

Running Flex:

flex -o lexer.c lexer.l

The generated file will contain a function yylex() which will return one lexem at a time.

JFlex

Discuss an example, e.g.

%class Scanner
%unicode
%cup
%line 
 
%{
  StringBuffer string = new StringBuffer();
  // Leksemy są klasy Symbol, zdefiniowanej poza tym plikiem
  private Symbol symbol(int type) {
    return new Symbol(type, yyline, -1);
  }
  private Symbol symbol(int type, Object value) {
    return new Symbol(type, yyline, -1, value);
  }
%}
 
WhiteSpace     = (LineTerminator | [ \t\f])
DecIntLiteral = 0 | [1-9][0-9]*
 
%%
(0|[1-9][0-9]*) 	{ return symbol(sym.NUM, new Integer(yytext())); }
[\r\n\t\f ]		{ }
.		{ System.out.println("Unexpected character:"+yytext());}

From this specification JFlex generates a .java file with one class (for the above example: Scanner) that contains code for the scanner. The class will have a constructor taking a java.io.Reader from which the input is read. The class will also have a function yylex() returning the subsequent lexeme each time it is called.

Alex

Discuss an example, e.g.

{
module CalcLex(alexScanTokens) where
import CalcToken(Token(..))
}
%wrapper "basic"
$digit = 0-9                    
$alpha = [a-zA-Z]               
tokens :-
  $white+                        ; -- whitespace
  "--".*                         ; -- comment
  let                            { \s -> Let }
  in                             { \s -> In }
  $digit+                        {\s -> Int (read s)}
  [\=\+\-\*\/\(\)]               {\s -> Sym (head s)}
  $alpha [$alpha $digit \_ \']*  { \s -> Var s }

Running Alex:

ben@sowa$ alex CalcLex.x
ben@sowa$ ghci CalcLex.hs
GHCi, version 6.12.1...
Ok, modules loaded: CalcLex, CalcTokens.
*CalcLex> alexScanTokens "let x = 1 in x +x"
Loading package array-0.3.0.0 ... linking ... done.
[Let,Var "x",Sym '=',Int 1,In,Var "x",Sym '+',Var "x"]