Post Job Free
Sign in

It Code

Location:
United States
Posted:
January 15, 2013

Contact this candidate

Resume:

Co oking ex with Perl

Alberto Manuel Sim es, *****@**********.**.******.**

o

Abstract

Perl has a lot of tools for parser generation using Perl. Perl has exible data structures which makes

it easy to generate generic trees. While it is easy to write a grammar and a lexical analyzer using modules

like Parse::Yapp and Parse::Lex, these tools are not as fast as I would like. I use the Parse::Yapp syntax

parser with the ex lexical analyzer to solve this.

1 Intro duction

Some time ago, I needed to make a dictionary programming language parser faster. The original programmer

had written the parser with a patched Berkeley Yacc (byacc), which can generate Perl from a grammar

speci cation like common yacc. The only di erence resides in the semantic actions, written with Perl. He

wrote the lexical analyzer with simple Perl regular expressions. This combination works, but loading the

complete dictionary took about 27 seconds, which was too long for me. I combined Perl with ex to cut the

parsing time in half.

2 Parser structure

When I write a program in any language and a compiler interprets source code, a parser is involved. It

takes the code I write, splits it out as strings of characters with a special meaning tokens and then tries

to match these token sequences with a grammar. The process has two parts lexing and parsing.

A lexical analyzer, or lexer, splits a program into pieces called tokens. Regular expressions match each token,

which the lexer returns one by one to the caller program. Commonly, it returns the token type with the

string which corresponds to the matched token.

Another program, a syntactic analyzer, or parser, puts these pieces together to check if their sequence makes

any sense. The parser uses a grammar, or a set of rules, to construct a sentence and check if all the necessary

tokens are used, and that no more are needed. It builds a tree of the program structure to generate or

interpret code.

Traditional tools for generating parsers in C are lex and yacc, or their GNU variants, ex and bison. When

using Perl, people implement parsers in many ways.

4

Cooking Perl with ex

3 Why ex

On my rst approach I converted the grammar to a full Perl parser. I could have chosen tools like

Parse::RecDescent, Parse::YALALR, or others, but I chose Parse::Yapp because its syntax and function-

ality is very similar to the old yacc. In a half an hour I had a new, working version of the parser, but

had only reduced the time for the parsing task by one or two seconds. This small speed-up occurs because

Parse::Yapp creates full Perl programs, taking advantage of Perl facilities.

Next, since I could not make the syntax analyzer quicker, I tried to change the lexical one. I wanted to use

Parse::Lex, but since it used Perl regular expressions I decided to nd another option. I really like Perl, but

started thinking about a C implementation for the Parser. That would take a long time but I heard about

the XSUB concept and started working on a ex speci cation to glue with Parse::Yapp.

I could have implemented this glue using di erent techniques. I could write the lexical analyzer returning

integers, where each one represents a di erent grammar terminal symbol, or I could return a string with the

name of the symbol. While the second method can be slower than returning integers, the grammar is more

legible, so I went with it.

I implemented my ex analyzer and glued it with Parse::Yapp. It ran in about half the original parsing time.

Perl is very exible, and ex is very fast. I can create simple and fast parsers using both of them together.

4 The Recip e

To demonstrate what I did I use something a lot more simple than my original problem, although I illustrate

all of the ma jor points in the process.

I want to create a parser for simple arithmetic problems like 1 + 2 . The lexer will break the string into

tokens ( 1, +, 2 ), perform the arithmetic operation, and return the result.

4.1 Writing the Grammar

I need to write two things the grammar and the lexical analyzer. I like to start with the grammar, although

that is personal preference.

The Parse::Yapp syntax is like yacc. I can write a grammar for arithmetic expressions, which I put in a le

I name myGrammar.yp shown in code listing 1.

Code Listing 1: myGrammar.yp

%token NUMBER NL

command: exp NL { return $_[1] }

7

;

8

9

exp: exp + exp { return $_[1]+$_[3]}

10

The Perl Review (0, 3 ) 5

Cooking Perl with ex

exp - exp { return $_[1]-$_[3]}

11

exp * exp { return $_[1]*$_[3]}

12

exp / exp { return $_[1]/$_[3]} #forget x/0

13

number { return $_[1]}

14

;

15

16

number: NUMBER { return $_[1] }

17

;

18

%%

19

4.2 Writing the Lexical Analyser

Next, I write the lexical analyser in C. I create the le named myLex.l in which I put the instructions for

lexing the source, shown in code listing 2.

Code Listing 2: myLex.l

%{

1

#define YY_DECL char* yylex void;

2

3

char buffer[15];

4

5

%}

6

7

%%

8

[0-9]+ { return strcpy(buffer, "NUMBER"); }

9

10

\n { return strcpy(buffer, "NL"); }

11

12

. { return strcpy(buffer, yytext); }

13

14

%%

15

int perl_yywrap(void) {

16

return 1;

17

}

18

char* perl_yylextext(void) {

19

return perl_yytext;

20

}

21

The rst three lines de ne the prototype for the lexical analyzer. I return strings instead of the integers

returned by default. I have to put these strings somewhere. In this example, I allocate a char array named

bu er where I will put the token information.

The next section is a normal lexical analyzer, returning the name of the token or the character found.

The perl yywrap and perl yylextext glue the parser to the lexical analyzer. The perl yywrap function

restarts the parsing task while perl yylextext accesses the text which matched the regular expression through

perl yytext, which ex creates for me.

The Perl Review (0, 3 ) 6

Cooking Perl with ex

4.3 Writing the glue

Now I have the two main tools and I am only missing the glue. The easiest way I can do this is creating a

module for my parser. I start with h2xs which creates most of the module structure and les for me.

h2xs -n myParser

I have to edit some of the les that h2xs creates, and add the les that I just created. I need to add some

XSUB code to myParser.xs to the lexer in myLex.l to Perl, add a line to the typemaps le, and adjust the

Make le.PL to correctly compile everything.

I copy the ex source to myParser/myLex.l and the Parse::Yapp grammar to myParser/myGrammar.yp to

the module directory.

Parse::Yapp expects a yylex function that returns a pair the name of the token and the matched text so

I add a perl lex function to myParser.pm, shown in code listing 3. I wrote perl yylextext in myLex.l (code

listing 2), and ex generates perl yylex for me.

Code Listing 3: The perl lex subroutine in myParser.pm

sub perl_lex {

1

my $token = perl_yylex ;

2

if ($token) {

3

return ($token, perl_yylextext 4

} else {

5

return (undef,

6

}

7

}

8

I also need an error-handling function in myParser.pm. My error function in code listing 4 will simply print

the token read and the token it expected if it encounters an error.

Code Listing 4: An error message for unrecognized tokens

sub perl_error {

1

my $self = shift;

2

warn "Error: found ",$self->YYCurtok,

3

" and expecting one of ",join(" or ",$self->YYExpect);

4

}

5

Once I have that done, I edit myParser.xs le to map the C functions into Perl ones. I leave the code that

h2xs generated alone, and add another header le with the others.

#include "myLex.h"

At the end of the le I add the parts that connect my lex functions with the functions that I call from Perl,

shown in code listing 5. The spaces and newlines are signi cant. The rst line of each function is the return

type, followed by a line with the name of the function. Then I have two lines showing Perl where to get the

return value from the C functions.

The Perl Review (0, 3 ) 7

Cooking Perl with ex

Code Listing 5: Function glue in myParser.xs

I create the myLex.h le, shown in code listing 6, which holds the prototypes for the functions in myLex.l.

Code Listing 6: Lexer function prototypes in myLex.h

char* perl_yylex(void);

1

char* perl_yylextext(void);

2

I have to map data types from C to Perl. Integers are trivial and handled directly by perl, but I also have

pointers to characters. I create a le named typemap, shown in code listing 7 that translates pointers to

characters to the built-in T PV Perl type.

Code Listing 7: The typemap file

char* T_PV

1

4.4 Putting it all together

The code is now complete, but I need to modify Make le.PL so it can compile it. The yapp command from

the Parser::Yapp distribution creates myGrammar.pm from myGrammar.yp, and ex creates lex.perl yy.c. I

add an additional target to the Make le through the MY::postamble subroutine. I also add the ex library,, to the library list, and name the produced library myLexer.so. Code listing 8 shows my nal Make le.PL.

Code Listing 8: My modified Makefile.PL

use ExtUtils::MakeMaker;

1

2

$YACC_COMMAND = "( yapp -o myGrammar.pm -m myGrammar myGrammar.yp)";

3

4

# This is needed before WriteMakefile

VERSION_FROM => myParser.pm,

10

LIBS => [ -lfl ], # This is for flex

11

MYEXTLIB => myLexer.so, # Our lexer

12

);

13

14

The Perl Review (0, 3 ) 8

Cooking Perl with ex

sub MY::postamble {

15

"

16

\$(MYEXTLIB): lex.perl_yy.c myParser.pm myGrammar.pm

17

\t\$(CC) -c lex.perl_yy.c

18

\t\$(AR) cr \$(MYEXTLIB) lex.perl_yy.o

19

\tranlib \$(MYEXTLIB)

20

21

myGrammar.pm: myGrammar.yp

22

\t$YACC_COMMAND

23

24

lex.perl_yy.c: myLex.l

25

\tflex -Pperl_yy myLex.l

26

";

27

}

28

I compile my new module with the standard Perl module make sequence.

perl Makefile.PL

make

4.5 Testing

I can also test my module with the standard Perl test harness, although I have not added any real tests to

test.pl.

make test

This does not really test if my module really works only that it compiles and loads without error. I add a

test, shown in code listing 9, which takes a string from standard input, parses it, and returns the result of

the arithmetic operation.

Code Listing 9: My Grammar test

use myGrammar;

1

my $parser = new myGrammar;

2

my $result = $parser->YYParse(yylex => \&myParser::perl_lex,

3

yyerror => \&myParser::perl_error);

4

print "The result is $result\n";

5

Now, I run make test again and enter 1+4*2+3, press enter, and end standard input with D (or Z on

Windows), and I get the answer 25.

5 Conclusion

While Perl is really exible, some tools are quicker than their Perl equivalents. I can use these tools from

Perl with a little bit of work.

The Perl Review (0, 3 ) 9

Cooking Perl with ex

6 References

Manual pages:

ex(1), perlguts(1), perlxs(1), perlxstut(1)

Perl Module documentation:

Parse::Yapp, ExtUtils::MakeMaker

The Perl Review (0, 3 ) 10



Contact this candidate