'Antlr string token without a certain character sequence
I'm trying to define a lexer grammar that matches string tokens that don't contain a certain sequence of characters. For instance "AB"
Example of strings I want to capture
""
"asda A rewr A"
"asda A"
"asdas B ad"
but not
"asdas AB fdsdf"
I tried a few things but I always seem to miss some case
Solution 1:[1]
Could be done with a little mode magic: when you're in the first string-mode and you encounter a AB, you just push into the second string-mode:
lexer grammar MyLexer;
QUOTE : '"' -> more, pushMode(MODE_1);
SPACES : [ \t\r\n]+ -> skip;
mode MODE_1;
STR_1 : '"' -> popMode;
AB : 'AB' -> more, pushMode(MODE_2);
CONTENTS_1 : ~["] -> more;
mode MODE_2;
STR_2 : '"' -> popMode, popMode;
CONTENTS_2 : ~["]+ -> more;
The Java demo:
String source = "\"\"\n" +
"\"asda A rewr A\"\n" +
"\"asdas AB fdsdf\"\n" +
"\"asda A\"\n" +
"\"asdas B ad\"\n";
Lexer lexer = new MyLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();
System.out.println(source);
for (Token t : stream.getTokens()) {
System.out.printf("%-20s `%s`%n",
MyLexer.VOCABULARY.getSymbolicName(t.getType()),
t.getText().replace("\n", "\\n"));
}
will print the following:
""
"asda A rewr A"
"asdas AB fdsdf"
"asda A"
"asdas B ad"
STR_1 `""`
STR_1 `"asda A rewr A"`
STR_2 `"asdas AB fdsdf"`
STR_1 `"asda A"`
STR_1 `"asdas B ad"`
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Bart Kiers |
