Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
529 views
in Technique[技术] by (71.8m points)

java - Antlr : beginner 's mismatched input expecting ID

As a beginner, when I was learning ANTLR4 from the The Definitive ANTLR 4 Reference book, I tried to run my modified version of the exercise from Chapter 7:

/**
 * to parse properties file
 * this example demonstrates using embedded actions in code
 */
grammar PropFile;

@header  {
    import java.util.Properties;
}
@members {
    Properties props = new Properties();
}
file
    : 
    {
        System.out.println("Loading file...");
    }
        prop+
    {
        System.out.println("finished:
"+props);
    }
    ;

prop
    : ID '=' STRING NEWLINE 
    {
        props.setProperty($ID.getText(),$STRING.getText());//add one property
    }
    ;

ID  : [a-zA-Z]+ ;
STRING  :(~[
])+; //if use  STRING : '"' .*? '"'  everything is fine
NEWLINE :   '
'?'
' ;

Since Java properties are just key-value pair I use STRING to match eveything except NEWLINE (I don't want it to just support strings in the double-quotes). When running following sentence, I got:

D:AntlrExPropFileProp1>grun PropFile prop -tokens
driver=mysql
^Z
[@0,0:11='driver=mysql',<3>,1:0]
[@1,12:13='
',<4>,1:12]
[@2,14:13='<EOF>',<-1>,2:14]
line 1:0 mismatched input 'driver=mysql' expecting ID

When I use STRING : '"' .*? '"' instead, it works.

I would like to know where I was wrong so that I can avoid similar mistakes in the future.

Please give me some suggestion, thank you!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Since both ID and STRING can match the input text starting with "driver", the lexer will choose the longest possible match, even though the ID rule comes first.

So, you have several choices here. The most direct is to remove the ambiguity between ID and STRING (which is how your alternative works) by requiring the string to start with the equals sign.

file : prop+ EOF ;
prop : ID STRING NEWLINE ;

ID      : [a-zA-Z]+ ;
STRING  : '=' (~[
])+;
NEWLINE : '
'?'
' ;

You can then use an action to trim the equals sign from the text of the string token.

Alternately, you can use a predicate to disambiguate the rules.

file : prop+ EOF ;
prop : ID '=' STRING NEWLINE ;

ID      : [a-zA-Z]+ ;
STRING  : { isValue() }? (~[
])+; 
NEWLINE : '
'?'
' ;

where the isValue method looks backwards on the character stream to verify that it follows an equals sign. Something like:

@members {
public boolean isValue() {
    int offset = _tokenStartCharIndex;
    for (int idx = offset-1; idx >=0; idx--) {
        String s = _input.getText(Interval.of(idx, idx));
        if (Character.isWhitespace(s.charAt(0))) {
            continue;
        } else if (s.charAt(0) == '=') {
            return true;
        } else {
            break;
        }
    }
    return false;
}
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...