'Parse a string using ANTLR4
Example: (CHGA/B234A/B231
String:
a) Designator: 3 LETTERS
b) Message number (OPTIONAL): 1 to 4 LETTERS, followed by A SLASH (/) followed by 1 to 4 LETTERS, followed by 3 NUMBERS indicating the serial number.
c) Reference data (OPTIONAL): 1 to 4 LETTERS, followed by A SLASH (/) followed by 1 to 4 LETTERS, followed by 3 NUMBERS indicating the serial number.
Result:
CHG
A/B234
A/B231
In grammar file:
/*
* Parser Rules
*/
tipo3: designador idmensaje? idmensaje?;
designador: PARENTHESIS CHG;
idmensaje: LETTER4 SLASH LETTER4 DIGIT3;
/*
* Lexer Rules
*/
CHG : 'CHG' ;
fragment DIGIT : [0-9] ;
fragment LETTER : [a-zA-Z] ;
SLASH : '/' ;
PARENTHESIS : '(' ;
DIGIT3 : DIGIT DIGIT DIGIT ;
LETTER4 : LETTER LETTER? LETTER? LETTER? ;
But when testing the tipo3 rule its giving me the following message:
line 1:1 missing 'CHG' at 'CHGA'
How can i parse that string in antlr4?
Solution 1:[1]
When you're confused why a certain parser rule is not being matched, always start with the lexer. Dump what tokens your lexer is producing on the stdout. Here's how you can do that:
// I've placed your grammar in a file called T.g4 (hence the name `TLexer`)
String source = "(CHGA/B234A/B231";
TLexer lexer = new TLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();
for (Token t : stream.getTokens()) {
System.out.printf("%-20s `%s`%n",
TLexer.VOCABULARY.getSymbolicName(t.getType()),
t.getText().replace("\n", "\\n"));
}
If you runt the Java code above, this will be printed:
PARENTHESIS `(`
LETTER4 `CHGA`
SLASH `/`
LETTER4 `B`
DIGIT3 `234`
LETTER4 `A`
SLASH `/`
LETTER4 `B`
DIGIT3 `231`
EOF `<EOF>`
As you can see, CHGA becomes a single LETTER4, not a CHG + LETTER4 token. Try changing LETTER4 into LETTER4 : LETTER; and re-test. Now you'll get the expected result.
In your current grammar CHGA will always become a single LETTER4. This is just how ANTLR works (the lexer tries to consume as many chars for a single rule as possible). You cannot change this.
What you could do, it move the construction of the multi-letter rule to the parser instead of the lexer:
tipo3 : designador idmensaje? idmensaje?;
designador : PARENTHESIS CHG;
idmensaje : letter4 SLASH letter4 DIGIT3;
letter4 : LETTER LETTER? LETTER? LETTER?
| CHG
;
CHG : 'CHG' ;
LETTER : [a-zA-Z] ;
SLASH : '/';
PARENTHESIS : '(';
DIGIT3 : DIGIT DIGIT DIGIT;
fragment DIGIT : [0-9];
resulting in:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Bart Kiers |

