Original PDF Flash format list-resolution-techniques-for-grammar-development  


List Resolution Techniques For Grammar Development

List Resolution Techniques for Grammar Development
Crutcher Dunnavant
University of Alabama
crutcher@gmail.com
ABSTRACT
(or java cup)[6](Java), and JavaCC[8](Java). These tools
Most popular parser generation tools work in terms of gram-
work on input files structured to associate, and partially re-
mar rules which are not directly capable of providing list
write, action code which is attached to the various rules of
semantics. While the basic techniques for realizing list se-
the grammar. For example, the rule:
mantics are frequently re-invented, the literature has suf-
F oo = A B C
fered from a lack of a solid collection of these techniques.
This paper presents a collection of techniques for realizing
would require a description for most of the yacc-alikes akin
various forms of lists as grammatically re-written forms us-
to:
ing AST re-writes compatible with most parser generation
A B C {
environments. Also provided is a set of additional EBNF
ASTNode Foo = new ASTNode(’Foo’);
extension operators for specifying AST re-write actions.
Foo.addChild($1); /* For A */
Foo.addChild($2); /* For B */
Foo.addChild($3); /* For C */
1.
INTRODUCTION
$$ = Foo;
return Foo_symbol;
Using most parser generation tools, matching a list of ele-
}
ments requires the construction of a recursive non-terminal
(left-recursive for LALR(1) grammars) which matches the
which would be re-written by the parser generator to define
various productions of the list, and has an epsilon produc-
$1, $2, $3, and $$. Working within these constraints, it
tion. Thus, in order to describe a list containing some arbi-
is often difficult to provide the list semantics desired for a
trary number of As and Bs, representable in ISO EBNF[10]
given grammar’s AST.
as:
Some of the more advanced parser-generation environ-
ments support lists directly. The meta-DSL tool smgn[4]
L = {A | B}
flattens recursive lists, but does not handle delimited lists,
and provides no means of controlling the nature of the col-
most parser generation tools would require the re-structured
lapse.
SDF[1], which provides parser generation for full
rule:
context-free languages, supports list constructs; as does GDK[2],
L = L A | L B |
which exists for debugging and/or reverse engineering gram-
mars and provides a rich framework of grammar re-write
Unfortunately, given a symbol stream A B A A, this rule
algorithms.
would yield the containment (L (L (L (L (L) A) B) A) A),
However, most computer programming and configuration
rather than the more useful (L A B A A). It would likely be
languages are not developed in robust grammar transforma-
necessary to restructure this tree before applying seman-
tion environments, but are instead developed with vanilla
tic actions; and since matching lists of elements is virtually
yacc-alike parser generation tools. In such environments, it
guaranteed to come up while writing a language evaluator,
is necessary to perform custom grammar tweaks to achieve
there is some real value in having a higher level model for
the list semantics desired.
dealing with this.
This paper defines transformations upon grammatical lists
3.
AST RE-WRITE ACTIONS
of various kinds which permit their realization in most parser
Provided with a sufficiently abstract AST implementa-
generation environments. These transformations are them-
tion, there are many useful tree re-writes which can be per-
selves defined in terms of 4 simple parse-time AST re-write
formed at production time. The list resolution transforms
actions, which should be simple to implement in any parsing
will be expressed in terms of four such actions. These ac-
environment.
tions are here defined as extensions to EBNF, in order to
permit an abstract description of the list transforms. Fig-
2.
BACKGROUND
ure 1 provides a graphical account of the functioning of the
The yacc[3] family of parser generators won the parser
preserve, token, and content actions.
generation space. This family is what is taught in compiler
classes, and what books are written about. Many projects
3.1
AST Action: Preserve
exist to re-implement the yacc generator in various environ-
The default action taken for a symbol in a production is
ments. These include bison[5](C), Yacc++[9](C++), CUP
preserve. When a symbol is preserved in a production rule,

4.
LIST TRANSFORMATIONS
Note: {P }− is defined in EBNF[10] as one or more oc-
currences of P (an arbitrary number of P except the empty
set). This is confusing, and the equivalent form P, {P } will
be used here instead.
4.1
Non-Delimited Lists
The simplest form of lists are repetitions of the same sym-
bol over and over again without delimiters. There are two
variations on this form, lists which may be empty, and lists
which may not be. Given I, a single grammar symbol, these
forms may be handled quite adequately with the following
transformations (though for empty lists, I cannot itself have
Figure 1: AST Re-Write Actions
an
production without producing a shift-reduce conflict).
L = {I}

L =
| L I
the AST sub-tree rooted at that symbol is made a child of
L = I, {I}

L = I | L I
the sub-tree produced for the rule’s non-term. For example,
Upon matching I I I, both of these rules would produce
the rule:
the sub-tree (L I I I).
F oo = Bar Baz
However, if it is necessary for the list to match multiple
symbols, or collections of symbols, there are several distinct
would, upon matching Bar Baz, produce a F oo node which
cases to consider.
contained the trees for Bar and Baz as it’s children.
4.1.1
Grouped Lists
3.2
AST Action: Drop
Often there are several possible matches for a list item, as
The drop action, represented by [ Symbol] , is very simple.
in:
When a symbol is dropped in a production rule, the AST
sub-tree rooted at that symbol is omitted from the sub-tree
L = { Keyword equals Expression | Expression }
produced for the rule’s non-term. For example, the rule:
and we wish to collect each item up, so that each group
F oo = [ Bar] Baz
which makes up the list is collected together. This is a
simple matter of re-writing the list to create an item non-
would, upon matching Bar Baz, produce a F oo node which
terminal I which produces the various options. Letting Φ
contained only the tree for Baz as a child.
be a series of production alternatives lacking :
3.3
AST Action: Token
L
=
{Φ}
The token action, represented by Symbol (as though

Symbol is to be cut below), is more complex. When a sym-
L
=
{I}
I
=
Φ
bol is tokenized in a production rule, only the root of the

AST sub-tree rooted at that symbol is included in the sub-
L
=
| L I
tree produced for the rule’s non-term, while the symbol’s
I
=
Φ
sub-tree’s root’s children are omitted. For example, the rule:
The non-empty variant of this transform is straightfor-
F oo = Bar Baz
ward:
would, upon matching
L
=
Φ, {Φ}
Bar Baz, produce a F oo node which

contained as children the root of the tree for Bar, and the
L
=
I, {I}
complete tree for Baz. Alternatively, if this rule matched
I
=
Φ
the subtrees (Bar a b c) and (Baz e f g), it would pro-

L
=
I | L I
duce (Foo Bar (Baz e f g)).
I
=
Φ
3.4
AST Action: Content
If Φ were Keyword equals Expression | Expression,
The content action, represented by Symbol (as though
upon matching (I Expression) (I Expression) (I Keyword
Symbol is to be cut above), is the complement of token.
equals Expression), these rules would produce the sub-
When the content action is performed on a symbol in a
tree (L (I Expression) (I Expression) (I Keyword equals
production rule, the children of the symbol’s sub-tree’s root
Expression)).
are included as children of the sub-tree produced for the
rule’s non-term, while the symbol’s sub-tree root itself is
4.1.2
Ungrouped Lists
omitted. For example, the rule:
If there are several possible matches for a list item, as in:
F oo = Bar Baz
L = { A B | C }
would, upon matching Bar Baz, produce a F oo node which
but the items are NOT to be grouped, then there are two
contained as children the root of the tree for Bar, and the
transformations available, both which produce the same re-
complete tree for Baz. Alternatively, matching (Bar a b
sult. Either we re-write the tree at production time, using
c) (Baz e f g) would produce (Foo a b c (Baz e f g)).
this transform:

L
=
{Φ}
4.2.3
Delimiters on Ungrouped Non-Empty Lists

L
=
{ I }
When working with ungrouped lists, it is sometimes useful
I
=
Φ
to maintain the delimiters in order to separate the groups

L
=
| L
I
at a later time. The following transform provides this:
I
=
Φ
L
=
Φ, {D, Φ}
or, letting φ

1...φ
be the various individual productions of
n
L
=
I, {D, I}
Φ, we re-write the grammar, using this transform:
I
=
Φ

L
=
{Φ}
L
=
I
| L D I

I
=
Φ
L
=
| L φ1 | ... | L φn
If Φ were A B | C, upon matching A B D C, these rules
These transforms are roughly equivalent; but the non-
would produce the sub-tree (L A B D C).
empty variant of the first transform:
4.2.4
Variable Delimiters
L
=
Φ, {Φ}

The same collection technique used for grouping list items
L
=
I , { I }
should be used in the case that multiple delimiters are per-
I
=
Φ

mitted for a given list. The various delimiter options should
L
=
I
| L
I
be extracted into their own non-terminal, and appropri-
I
=
Φ
ate re-write action should be taken on the resultant non-
is so much more efficient than that of the second:
terminal if necessary. Letting Φ be a series of production
alternatives for the delimiter lacking :
L
=
Φ, {Φ}

L
=
I, {∆, I}
L
=
| φ1 | ... | φ | L φ

n
1 | ... |
L φn
L
=
I, {D, I}
that it is usually wisest to use the first form of both the
D
=

empty and non-empty list transforms, unless a pressing rea-

L
=
I | L D I
son exists not too. If Φ were A B | C, upon matching A B
D
=

C, all of these rules would produce the sub-tree (L A B C).
4.2
Non-Empty Delimited Lists
4.3
Empty Delimited Lists
A more complex form of lists are delimited lists. Delim-
Delimited lists which can be empty require the most com-
ited lists have explicit separators between items. Beginning
plex list transformations to realize. For starters, neither
again with simple cases, let us take I and D, singular gram-
their items nor their delimiters can yield productions with-
mar symbols, and describe a transformation which is suffi-
out conflicting with the
productions of the list itself. Also,
cient to handle a list of I delimited by D:
if the list is given an
production and a recursive produc-
tion containing the delimiter D, then it would be possible
L = I, {DI}

L = I | L D I
to match D I, which is not what is intended. As a result, we
Note that a shift-reduce conflict will exist if I and D both
will introduce an additional non-terminal, ˆ
L, which will be
contain
productions. Upon matching I D I D I, this rule
processed with the content action wherever it occurs. The
would produce the sub-tree (L I D I D I).
following transformation provides for empty delimited lists
There are several distinct variants which we may consider
matching against the item symbol I and the delimiter sym-
for delimited lists.
bol D.
L
=
(I, {D I}) |
4.2.1
Dropped Delimiters

Suppose that we only wanted to keep the I symbols? In
L
=
|
ˆ
L
ˆ
L
=
I |
ˆ
L D I
this case, we would describe the initial list, and the trans-
form, as:
More complex item and delimiter groups can be realized
using techniques from 4.1.1. Grouped Lists and 4.2.4. Vari-
L = I, {[ D] I}

L = I | L [ D] I
able Delimiters. All of the usual AST re-writes remain ap-
Which, upon matching I D I D I, would produce the sub-
plicable with this transformation situation, and a complete
tree (L I I I). Using I as a list item non-terminal with
enumeration of the various possibilities is not needed here.
dropped delimiters is the preferred way to parse most lists
4.4
A General Form
in most environments, though it is necessary to examine the
empty version of this transform if empty lists are needed.
There is a general target form which, with the application
of different AST re-write actions on I and D, is capable
4.2.2
Token Delimiters
of realizing most list semantics. If Φ and ∆ be a series of
Suppose that we wished to keep a marker that a delimiter
production alternatives lacking :
had been matched, but not it’s content. In this case, we
L
=
|
ˆ
L
would describe the initial list, and the transform, as:
ˆ
L
=
I |
ˆ
L D I
I
=
Φ
L = I, { D I}

L = I | L
D
I
D
=

Which, upon matching I (D a) I (D b) I, would produce
Because of its flexibility, this form is a good target form
the sub-tree (L I D I D I).
for automated tools which provide list semantics.

5.
REFERENCES
[1] Joost Visser, and Jeroen Scheerder A Quick
Introduction to SDF. 2000.
http://homepages.cwi.nl/~jvisser/papers/sdfintro.pdf
[2] Jan Kort, Ralf L¨
ammel, and Chris Verhoef The
Grammar Deployment Kit. In Electronic Notes in
Theoretical Computer Science 65 No. 3, 2002.
[3] John Levine, Tony Mason, and Doug Brown lex &
yacc. O’Reilly, 2nd Edition, 1992. ISBN: 1-56592-000-7
[4] Holger M. Kienle Using smgn for Rapid Prototyping of
Small Domain-Specific Languages. In ACM SIGPLAN
Notices V.36(9), September 2001.
[5] The GNU Project Bison.
www.gnu.org/software/bison/
[6] Scott E. Hudson CUP Parser Generator for Java.
www.cs.princeton.edu/~appel/modern/java/CUP/
[7] Don Yessick, and Joel Jones Reinventing the Wheel Or
Not Yet Another Compiler Compiler Compiler. In
Southeast ACM Conference, 2002.
[8] Java Compiler Compiler [tm] (JavaCC [tm]) - The
Java Parser Generator.
javacc.dev.java.net
[9] Yacc++ and the Language Objects Library.
world.std.com/~compres/
[10] International Organization for Standards ISO/IEC
14977 - Extended BNF. 1996.