The Metaprogramming System - Reference Manual

2.1 Structured Context-Free Grammars

The grammar formalism used in the metaprogramming system is a variant of context-free grammars. The main reason for introducing this formalism is to make it possible automatically to generate pattern definitions from a grammar.

A structured context-free grammar is a context-free grammar where the rules (productions) satisfy a certain structure.

Each nonterminal must be defined by exactly one of the following rules:

An alternation rule has the following form:
<A0>::| <A1> | <A2> | ... | <An>
where <A0>, <A1>,..., <An> are nonterminal symbols. The rule specifies that <A0> derives one of <A1>, <A2>,..., or <An>.
A constructor rule has the following form:
<A0>::= w0 <t1:A1> w1 ... <tn:An> wn
where <A0>, <t1:A1>, ..., <tn:An> are nonterminal symbols and w0, w1, ..., wn are possibly empty strings of terminal symbols. This rule describes that <A0> derives the string w0 <A1> w1... <An>. A nonterminal on the right side of the rule has the form <t:A> where t is a tag-name and A is the syntactic category. Tag-names are used to distinguish between nonterminals belonging to the same syntactic category. Consequently all tag-names in a rule must be different. If no tag-name is provided the name of the syntactic category is used as a tag-name.
A list rule has one of the following forms:
<A>::+ w <A>::* w
where is a nonterminal and w is a possibly empty string of terminal symbols. The nonterminal <A> generates a list of 's separated by w's: w w... w . The +-rule specifies that at least one element is generated; the *-rule specifies that the list may be empty.
An optional rule has the following form:
<A>::? 
where is a nonterminal. The nonterminal <A> may generate either the same strings as may generate, or the empty string (i.e. nothing).

There exists four predefined nonterminal symbols named <NameDecl>, <NameAppl>, <String> and <Const>. These nonterminals are called lexem-symbols. They derive identifiers, character-strings and integer constants. A lexem-symbol may also have a tag-name, like <Title:NameAppl>.

A nonterminal may only appear once on the left-hand side of a rule, and the complete grammar must be LALR(1) ^[2]. The limitations on the rules which can be used in a structured context-free grammar do not restrict the class of languages that can be described. Any context-free language may be generated by a structured context-free grammar. It may perhaps be awkward to be forced to follow the rules. On the other hand being forced to structure a grammar using the rules often results in a more readable grammar.

There is one important representation for sentential forms and sentences of any context-free grammar, the abstract syntax tree. An abstract syntax tree (for short AST) represents how a sentential form (or sentence) has been derived from the grammar (or how it has been constructed by the parser). The AST does not represent the terminals of the grammar, only the involved nonterminals. The nodes in an AST are productions and the branches in the AST signifies derivations of the nonterminals involved in the production in that node. The leaves of the AST are lexems, if the AST represents sentences. If the AST represents a sentential form, leaves may also be nonterminals. ASTs are a very convenient representation of programs and many manipulations of programs may be specified as manipulations on the underlying AST.

A context-free grammar for a language induces an abstract syntax that may be used to give an AST-definition. In the Mjølner BETA System, the representation of a program as an AST is defined by means of a context-free grammar for the language. In addition there is a set of rules that specify how the context-free grammar is mapped into a set of data types. The context-free grammar is then part of the specification of the environment.

The metaprogramming system is an object-oriented model of the ASTs. An AST is modelled as an instance of a pattern. There is a pattern corresponding to each syntactic category (nonterminal) of the grammar. ASTs derived from a syntactic category are then modelled as instances of the corresponding pattern. The grammar hierarchy is modelled by a corresponding pattern hierarchy.

Special grammar symbols are shown enclosed by << and >>. These symbols are called placeholders. Placeholders are always associated with a nonterminal of the grammar. A placeholder may have a tag-name in order to be able to distinguish between several instances of the same nonterminal in a program. I.e. <<PopLib:attributes>> is a placeholder, where attributes specifies the syntactic category and PopLib is the tag-name. As illustrated below, some placeholders are also equipped with the symbols SLOT or "...". These will be explained below

(# Private: @<<SLOT PrivateBody:descriptor>>;
   Push: (# e: @integer                            
         enter e                                  
         do <<SLOT PushBody:descriptor>>     
         #);                                       
   Pop: (# <<PopLib:attributes>> ... #); 
   <<attributes>>                     
#)

The above is a BETA pattern for a Stack. The pattern is not fully specified in the sense that the pattern contains placeholders that have not been expanded.

Generally, programs may contain placeholders of three different types: nonterminals, slots or contractions. Nonterminals and slots denote unexpanded nonterminals of the underlying grammar, whereas contractions denote expanded nonterminals of the underlying grammar. I.e. a program containing nonterminals and slots are sentential forms of the BETA language, whereas programs only containing contractions are sentences of the BETA language.

Nonterminals are indications that these parts of the program have not been specified yet (e.g. <<PopLib:attributes>>). The ability to leave nonterminals in the program is the means for allowing syntax directed editing.

Slots are indications of parts of the program that deliberately have been kept open (e.g. <<SLOT PushBody:descriptor>>). Slots are a means for specifying a program in which parts are separately specified (these parts are indicated by the slots). A program with slots may be separately compiled, and other programs may contain the slot definition to be applied later.

Contractions are placeholders indicating that this part of the program, derived from the nonterminal, is not shown (e.g. ...). I.e. contractions are a means for suppressing details that are otherwise present in the program. Note that contractions may suppress other placeholders, i.e. nonterminals, slots and other contractions may appear within a contraction. Contractions are merely a means for presenting an overview of programs etc., and the handlig of contractions are therefore totally controlled by the different tools (i.e. not managed by the metaprogramming system)

In terms of the underlying AST of the program, nonterminals are leaves in the tree, slots are references from one AST to another AST (possibly not yet derived), and contractions indicates a sub-AST that is not shown (i.e. only represented by the syntactic category of the root of the sub-AST.

The super-category of a given syntactic category A is defined as follows:

If <A> appears on the right side of an alternation rule of the form:
::| ... | <A> | ... | ...
then the super-category of A is B.
If <A> appears in a list rule in one of the forms:
<A>::+ ... <A>::* ...
then the super-category of A is List.
If <A> appears in an optional rule:
<A> ::? 
then no category A is defined (to be discussed later).
Otherwise the super-category of A is Cons.

The inheritance hierarchy of the generated patterns of the context-free level is the same as the classification hierarchy of the syntactic categories. In general a syntactic category may have more than one super-category. This corresponds to multiple inheritance in object-oriented languages. Since BETA currently does not support multiple inheritance, there is the additional restriction that the hierarchy must be tree structured. That is, the following grammar will not be a legal grammar:

<A> ::| <B> | <C>
<B> ::+ <D>
<C> ::= <E> terminal <F>

since will have both List and A as super-category.

2.1.1 Example of Structured Context-Free Grammar

Below an example of a structured context-free grammar is given ^[3]:

Grammar Small:
        <Block>::= begin <DclPart:DclLst>
                  do <ImpPart:ImpLst>
                  end
        <Dcl>::| <VarDcl> | <ProcDcl>
        <VarDcl>::= var <Name:NameDecl>: <VarType:Type>
        <ProcDcl>::= proc <Name:NameDecl> <Body:Block>
        <Imp>::| <IfImp> | <AssignmentImp> | <ProcCall>
        <IfImp>::= if <Condition:Exp> 
                  then <ThenPart: ImpLst>
                  else <ElsePart: ImpLst> endif
        <AssignmentImp>::= <Var:NameAppl> := <Value:Exp>
        <ProcCall>::= <Proc:NameAppl>
        <DclLst>::* <Dcl>;
        <ImpLst>::* <Imp>;

The nonterminals <Type> and <Exp> will not be defined.

The syntactic categories of a structured context-free grammar may be organized into a classification hierarchy according to the set of strings being generated. The hierarchy mainly derives from the alternation rules of the grammar. The hierarchy for the example grammar is:

The categories Cons and List generalize all categories according to the rule type that defines the category. <Imp> is a super-category of <IfImp> since any string generated by <IfImp> may be generated by <Imp>.

[2] However, this restriction is only necessary if the grammar is to be used for parsing purposes (see chapter 7 for more details)
[3] Please note, that this grammar specification (as the previous grammar) is not fully consistent with the actual grammar specification syntax of the metaprogramming system (see chapter 7)

The Metaprogramming System - Reference Manual

[Modified: Wednesday July 3^rd 2002 at 16:09]