'Antlr string token without a certain character sequence

I'm trying to define a lexer grammar that matches string tokens that don't contain a certain sequence of characters. For instance "AB"

Example of strings I want to capture

""

"asda A rewr A"

"asda A"

"asdas B ad"

but not

"asdas AB fdsdf"

I tried a few things but I always seem to miss some case



Solution 1:[1]

Could be done with a little mode magic: when you're in the first string-mode and you encounter a AB, you just push into the second string-mode:

lexer grammar MyLexer;

QUOTE      : '"'        -> more, pushMode(MODE_1);
SPACES     : [ \t\r\n]+ -> skip;

mode MODE_1;
STR_1      : '"'        -> popMode;
AB         : 'AB'       -> more, pushMode(MODE_2);
CONTENTS_1 : ~["]       -> more;

mode MODE_2;
STR_2      : '"'        -> popMode, popMode;
CONTENTS_2 : ~["]+      -> more;

The Java demo:

String source = "\"\"\n" +
    "\"asda A rewr A\"\n" +
    "\"asdas AB fdsdf\"\n" +
    "\"asda A\"\n" +
    "\"asdas B ad\"\n";

Lexer lexer = new MyLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();

System.out.println(source);

for (Token t : stream.getTokens()) {
  System.out.printf("%-20s `%s`%n",
      MyLexer.VOCABULARY.getSymbolicName(t.getType()),
      t.getText().replace("\n", "\\n"));
}

will print the following:

""
"asda A rewr A"
"asdas AB fdsdf"
"asda A"
"asdas B ad"

STR_1                `""`
STR_1                `"asda A rewr A"`
STR_2                `"asdas AB fdsdf"`
STR_1                `"asda A"`
STR_1                `"asdas B ad"`

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Bart Kiers