Sunday, August 20, 2006

[General] Parsing a grammar (parse)

Validating a string with the appropiate format can be difficult job, just imagine this scenario:

String s1 = "1+2+4+A"
The idea is to go through each character in the string and look for an invalid character, in this case 'A'. So far this is an easy thing to do, lets see something harder:

String s2 = "1+sum(1,2,3,sum(4,5,6,average(4,4,4,6)))"
In this case the string represents an arithmetic function, it contains calls to two different functions, sum and average, each case with different number of parameters. Here we have to look for appropiate usage of '(' ')' characters, ',' commas, etc.

As you can see, it can get harder if you need to use a richer grammar, lets get into the idea for performing this task:

Lets say you need a grammar for arithmetic operations (just '+' and '-'), something like this:

Definition of expression:
expression can be an expression '+' a factor
expression can be an expression '-' a factor
expression can be a factor

Definition of factor:
factor can be a NUMeric value
factor can be an expression

A program (code) that performs the verification of a stream (group of strings) it's called a parser, there are tools that generate this parser's, the hard work there is to define a grammar that support what we want to do, in linux you can use 'yacc' to generate the code for a parser in C/C++, you just have to write a file with the grammar, something like this:


The grammar above support expressions like: 4+(4+(8*2)*5).

But that's just jacc, there's also a tool that generates java code, it's called javacc.

You can use a parser on many tasks, arithmetic operations, language definition, compiler verification, etc. Write me if you want some examples on this subject, have fun.

0 Comments:

Post a Comment

<< Home