Sunday, December 20, 2009

Yacc, JavaCC and Racc

To compare the three parser generators, here I'll show an easy sample written in them.

Target Syntax

The following codes are accepted.

  • "1+2"
  • "23423423432 + 923401"
  • "23432 + 2"

The compiler will calculate the single addition and shows the value.

The following codes aren't accepted.

  • "1+2+3"
  • "1-2"
  • "1+"

JavaCC

The following code is from Standard Compiler by Minero Aoki.

The filename is A.jj.

PARSER_BEGIN(A)
import java.io.*;

class A {
  static public void main(String[] args) {
    for (String arg : args) {
      try {
        System.out.println(evaluate(arg));
      }
      catch (ParseException ex) {
        System.err.println(ex.getMessage());
      }
    }
  }

  static public long evaluate(String src) throws ParseException {
    Reader reader = new StringReader(src);
    return new A(reader).expr();
  }
}
PARSER_END(A)

SKIP: { <[" ", "\t", "\r", "\n"]> }

TOKEN: {
  <INTEGER: (["0"-"9"])+>
}

long expr():
{
  Token x, y;
}
{
  x=<INTEGER> "+" y=<INTEGER> <EOF>
    {
      return Long.parseLong(x.image) + Long.parseLong(y.image);
    }
}

To run this, type the following commends

$ javacc A.jj
$ javac A.java
$ java A '1 +  3'
4

This build process produces the following files automatically.

  • A.class
  • A.java
  • AConstants.class
  • AConstants.java
  • ATokenManager.class
  • ATokenManager.java
  • ParseException.class
  • ParseException.java
  • SimpleCharStream.class
  • SimpleCharStream.java
  • Token.class
  • Token.java
  • TokenMgrError.class
  • TokenMgrError.java

Yacc and Lex

a.y

%token NUMBER

%%

expr : NUMBER '+' NUMBER { printf("%d", $1 + $3); }

%%
#include <stdio.h>
#include "lex.yy.c"

a.l

%{
#include "a.tab.h"
%}
%%
[0-9]+    {sscanf(yytext,"%d",&yylval); return(NUMBER);}
[ \r\n\t]   ;
.         return(yytext[0]);
%%
#ifndef yywrap
yywrap() { return 1; }
#endif

run

$ bison -d a.y && lex a.l
$ gcc a.tab.c -ly -ll
$ ./a.out

files automatically generated

  • a.out
  • a.tab.c
  • a.tab.h
  • lex.yy.c

Racc

I referred this blog entry http://d.hatena.ne.jp/morphine57/20090824/1251129740

a.racc

class A
  token NUM
rule
   expr : NUM '+' NUM { result = val[0] + val[2] }
end
---- header
require 'strscan'
---- inner
  def parse(str)
    @tokens = []
    s = StringScanner.new(str)
    until s.eos?
      case
      when s.scan(/[0-9]+/)
        @tokens << [:NUM, s[0].to_i]
      when s.skip(/[ \t\r\n]/)
      else
        @tokens << [s.getch, nil]
      end
    end
    do_parse()
  end

  def next_token
    @tokens.shift
  end

And runs

$ racc a.racc
$ ruby192 -r ./a.tab.rb -e 'p A.new.parse "1+ 2"'
3

Files automatically generated

  • a.tab.rb

Statistics

The lines of code which I have to write by myself:

  • JavaCC: 39
  • Yacc: 9 + 11
  • Racc: 26

The lines of code which are automatically generated:

  • JavaCC: 1446
  • Yacc: 3280
  • Racc: 117 (assuming the gem library racc is already installed)

3 comments:

  1. For java, it's too producing files.i think so.

    ReplyDelete
  2. The only thing I would suggest is that if you don't mind a static parser, you can eliminate the "options" part of the grammar completely. That only saves 3 lines, though...

    ReplyDelete
  3. Thanks!

    (I updated this blog entry now to eliminate the "options" and added the lists of automagically generated by yacc and racc)

    ReplyDelete