2 Introduction to Context-Free Grammers

The theory of formal grammars is very extensive, and we will only describe here the very basics of formal grammars. Several textbooks dealing with formal grammars exists and their application as a basis for describing programming languages.

Formal grammars may be divided into several classes, of which the most important classes are the right-linear grammars, the context-free grammars and the context-sensitive grammars. Right-linear grammars are identical to regular expressions.

The most important result is that context-free grammars can be generated and recognized effectively by automated tools such as parsers, whereas context-sensitive grammars are undecidable in general.

Context-free grammars play an important role in the definition of programming languages and is therefore also the foundation for the grammar-based tools of the metaprogramming system. Before going into details with the particular grammar formalism used in the metaprogramming system, we will give a quick introduction to context-free grammars.

Any context-free grammar is defined by means of a set of terminals, a set of nonterminals, a set of productions(also called rules), and one startsymbol. A terminal is a string of characters (e.g. BEGIN in a Pascal grammar). A nonterminal is a special symbol that may derive other symbols. Productions are the means for specifying the rules for deriving sentences and sentential forms from the grammar. We say that the grammar is able to derive a set (possible indefinite) of sentences. A sentence consists solely of terminals (e.g. a Pascal program is a sentence derived from a Pascal grammar). A sentential form is like a sentence, except that nonterminals may be present in a sentential form. I.e. a sentential form is not fully derived (the remaining nonterminals have not been expanded).

Terminals are denoted by wi, nonterminals by <Ai>, productions by <A> ::= w0 <B0> w1 <B1> ..... The startsymbol is a nonterminal, from which all derivations according to the grammar are defined to constitute the language, defined by the grammar. There is one special symbol, empty, which stands for the empty terminal (string).

To illustrate these concepts, a small part of the Pascal grammar is given [1]:

<program>      ::= program <name> 
                     <declarations>
                   begin  
                     <statements> 
                   end.
<statements>   ::= <statement> ; <statements>
<statements>   ::= empty
<statement>    ::= if <condition>
                   then <statement>
                   else <statement>
<statement>    ::= <name> := <value>
<condition>    ::= <name> 
<declarations> ::= <declaration> ; <declarations>
<declarations> ::= empty
<declaration>  ::= <name> : <name>

<name> may be any identifier, <value> may be true or false

In this grammar, program, begin, end, ".", ";", ":=", ":", if, then, and else are terminals, whereas <program>, <name>, <declarations>, <declaration>, <statements>, <statement> and <condition> are all nonterminals. The nonterminal <program> is the startsymbol.

This grammar may e.g. generate the sentence:

program P
V:T;
begin V := true; end.

and the sentential form:

program P
V:T;
begin if <condition> then <statement> else V := false; end.

From this sentential form, several derivations are possible. Derivations are defined by the productions of the grammar. A derivation consists of substituting a nonterminal in the sentential form with the right-side of one of the productions having this nonterminal on the left-side of the "::=" symbol of the production. This implies that <condition> may be substituted with any legal name and <statement> with either an assignment statement or an if statement.

It is important to note, that if one takes another nonterminal as the startsymbol, the language that can be derived from that nonterminal, will be a sublanguage of the original language. That is, if we choose <statements> as the startsymbol, we will get the sublanguage of legal statements in Pascal.

The above discussion is taken in terms of derivations (i.e. the grammar generating the possible sentences defined by the grammar). The observations are equally important when we are talking about parsing a string of characters, and evaluate whether or not the string is a legal sentence according to the grammar.


[1] Please note, that this grammar specification is not fully consistent with the actual grammar specification syntax of the metaprogramming system (see chapter 7). The primary difference is, that terminals in the actual grammars to be used by the metaprogramming system must be enclosed in single quotes (e.g. 'begin' instead of begin


The Metaprogramming System - Reference Manual
© 1991-2004 Mjølner Informatics
[Modified: Friday April 6th 2001 at 12:43]