For computer programming languages, the reference grammar is often ambiguous, due to issues such as the dangling else problem. If present, these ambiguities are generally resolved by adding precedence rules or other context-sensitive parsing rules, so the overall phrase grammar is unambiguous.[citation needed] Some parsing algorithms (such as Earley[2] or GLR parsers) can generate sets of parse trees (or "parse forests") from strings that are syntactically ambiguous.[3]
Examples
Trivial language
The simplest example is the following ambiguous grammar (with start symbol A) for the trivial language that consists of only the empty string:
A → A | ε
…meaning that the nonterminal A can be derived to either itself again, or to the empty string. Thus the empty string has leftmost derivations of length 1, 2, 3, and indeed of any length, depending on how many times the rule A → A is used.
This language also has an unambiguous grammar, consisting of a single production rule:
A → ε
…meaning that the unique production can produce only the empty string, which is the unique string in the language.
In the same way, any grammar for a non-empty language can be made ambiguous by adding duplicates.
Unary string
The regular language of unary strings of a given character, say 'a' (the regular expression a*), has the unambiguous grammar:
A → aA | ε
…but also has the ambiguous grammar:
A → aA | Aa | ε
These correspond to producing a right-associative tree (for the unambiguous grammar) or allowing both left- and right- association. This is elaborated below.
A common example of ambiguity in computer programming languages is the dangling else problem. In many languages, the else in an If–then(–else) statement is optional, which results in nested conditionals having multiple ways of being recognized in terms of the context-free grammar.
Concretely, in many languages one may write conditionals in two valid forms: the if-then form, and the if-then-else form – in effect, making the else clause optional:[note 1]
In a grammar containing the rules
Statement → if Condition then Statement |
if Condition then Statement else Statement |
...
Condition → ...
some ambiguous phrase structures can appear. The expression
if a thenif b then s else s2
can be parsed as either
if a thenbeginif b then s endelse s2
or as
if a thenbeginif b then s else s2 end
depending on whether the else is associated with the first if or second if.
This is resolved in various ways in different languages. Sometimes the grammar is modified so that it is unambiguous, such as by requiring an endif statement or making else mandatory. In other cases the grammar is left ambiguous, but the ambiguity is resolved by making the overall phrase grammar context-sensitive, such as by associating an else with the nearest if. In this latter case the grammar is unambiguous, but the context-free grammar is ambiguous.[clarification needed]
An unambiguous grammar with multiple derivations
The existence of multiple derivations of the same string does not suffice to indicate that the grammar is ambiguous; only multiple leftmost derivations (or, equivalently, multiple parse trees) indicate ambiguity.
For example, the simple grammar
S → A + A
A → 0 | 1
is an unambiguous grammar for the language { 0+0, 0+1, 1+0, 1+1 }. While each of these four strings has only one leftmost derivation, it has two different derivations, for example
Unambiguous context-free grammars can be nondeterministic. For example, the language of even-length palindromes on the alphabet of 0 and 1 has the unambiguous context-free grammar S → 0S0 | 1S1 | ε. An arbitrary string of this language cannot be parsed without reading all its symbols first, which means that a pushdown automaton has to try alternative state transitions to accommodate for the different possible lengths of a semi-parsed string.[7]
Nevertheless, removing grammar ambiguity may produce a deterministic context-free grammar and thus allow for more efficient parsing. Compiler generators such as YACC include features for resolving some kinds of ambiguity, such as by using the precedence and associativity constraints.
Inherently ambiguous languages
While some context-free languages (the set of strings that can be generated by a grammar) have both ambiguous and unambiguous grammars, there exist context-free languages for which no unambiguous context-free grammar can exist. Such languages are called inherently ambiguous.
There are no inherently ambiguous regular languages.[8][9]
The existence of inherently ambiguous context-free languages was proven with Parikh's theorem in 1961 by Rohit Parikh in an MIT research report.[10]
Ogden's lemma[12] can be used to prove that certain context-free languages, such as , are inherently ambiguous. See this page for a proof.
The union of with is inherently ambiguous. This set is context-free, since the union of two context-free languages is always context-free. But Hopcroft & Ullman (1979) give a proof that any context-free grammar for this union language cannot unambiguously parse strings of form .[13]
More examples, and a general review of techniques for proving inherent ambiguity of context-free languages, are found given by Bassino and Nicaud (2011).[14]
See also
GLR parser, a type of parser for ambiguous and nondeterministic grammars
Chart parser, another type of parser for ambiguous grammars