The shift-reduce conflicts are easily discovered by looking at the tiger.output
file produced with the -v
flag.
Here's an example (I edited out the repetition):
State 88
11 exp: exp . PLUS exp
12 | exp . MINUS exp
# ...
29 | WHILE exp DO exp .
PLUS shift, and go to state 34
MINUS shift, and go to state 35
# ...
PLUS [reduce using rule 29 (exp)]
MINUS [reduce using rule 29 (exp)]
# ...
$default reduce using rule 29 (exp)
We can see that State 88 occurs when a WHILE
expression can be reduced (that's evident from the position of the .
in the state description:
29 | WHILE exp DO exp .
If the lookahead token at this point is a binary operator, the parser doesn't know whether to shift the operator, making the trailing exp
in the WHILE
expression longer, or to immediately reduce the WHILE
. Obviously (to us, not to bison
), the solution is to shift. bison
doesn't know that because the production exp: WHILE exp DO exp
doesn't have a precedence. The precedence of that production would be the precedence of its last terminal, which is DO
, so the simple solution is to define a precedence for DO
. Unsurprisingly, it should be the same as the precedence of ELSE
, as shown by the fact that IF exp THEN exp ELSE exp .
does not produce a shift/reduce conflict.
A similar problem occurs in states 112 and 129.
The shift/reduce conflict in state 1 is also clear from the output
file:
State 1
9 exp: ID . LPAREN RPAREN
10 | ID . LPAREN arglist RPAREN
23 | ID . LBRACE RBRACE
24 | ID . LBRACE idlist RBRACE
25 | ID . LBRACK exp RBRACK OF exp
34 lvalue: ID .
LPAREN shift, and go to state 15
LBRACK shift, and go to state 16
LBRACE shift, and go to state 17
LBRACK [reduce using rule 34 (lvalue)]
$default reduce using rule 34 (lvalue)
Here, the parser has just found an ID
in a context in which an exp
might be reduced, and it faces two possibilities:
shift. The exp
is ID [exp] OF exp
, so that in the end the result will be:
ID '[' exp ']' OF exp --> exp (rule 25)
reduce. The exp
is the lvalue ID[exp]
, using the following productions:
ID --> lvalue (rule 34)
lvalue '[' exp ']' --> lvalue (rule 36)
lvalue --> exp (rule 2)
In order to use the second alternative, the parser must immediately reduce ID
to lvalue
, and therein lies the problem: the parser cannot know which of these two possibilities is correct until it sees the OF
following the matching ], but that's far in the future -- in fact, it could be an arbitrary number of tokens away.
The solution here is to avoid forcing the parser to make a decision at this point. There are a couple of possibilities.
Since an expression can only be ID [ exp ] OF
(and not anything more complicated), we can factor ID
out of the conflict:
exp : ID
| lvalue_not_id
| ...
lvalue: ID
| lvalue_not_id
lvalue_not_ID
: lvalue DOT ID
| ID LBRACK exp RBRACK
| lvalue_not_ID LBRACK exp RBRACK
Comparing the current state machine with the state machine after this change should make it clear how that works (and is a useful exercise in learning about bottom-up parsing).
If you don't want to go to all that work, you can simply add an "apparently redundant" production, as suggested by Appel in his textbook:
lvalue: ID
| lvalue DOT ID
| lvalue LBRACK exp RBRACK
| ID LBRACK exp RBRACK
The added production to lvalue
is obviously going to create a shift-reduce conflict; indeed, it is exactly the same shift-reduce conflict as in the original grammar. But this time, the conflict is between two different productions for lvalue
, and the default shift action is definitely the one you want to take in the case of a bare ID
followed by a [. After the shift, both the lvalue
production and the exp
production are still available, so the parser doesn't have to make a decision until it finds the token after the ].
The downside with this solution is that the parser generator will continue to report a shift-reduce conflict, since there clearly is one. Since shift-reduce conflicts are generally considered a sign that the grammar may be ambiguous, leaving a shift-reduce conflict in the code will be a long-term maintenance issue: after every grammar change, it will be necessary to verify that the shift-reduce conflict is benign.
Another solution, which also unfortunately leaves the warning in place, is to use bison's %glr-parser
directive to generate a GLR parser. The GLR algorithm is capable of delaying reduction decisions by effectively maintaining two (or more) different possible parser stacks at the same time. For unambiguous grammars, this turns out to still be O(n) in the length of the input, but it is slightly slower. (Also, this option is not available in many other yacc derivatives.)
Finally, you could get rid of lvalue
by just adding its productions to exp
. You would then need to generalize the lvalue [ exp ]
to be exp [ exp ]
, which means the grammar would recognize a a superset of the original language: it will now accept certain inputs which are not valid. However, it is easy to check in the semantic actions for the relevant productions to see whether an exp
has the form of an lvalue
or not; if it is not, you can generate a syntax error in the semantic action.