'ANTLR4 - How to close "longest-match-wins" and use first match rule?

Orignial question:

My code to parse: N100G1M4 What I expcted: N100 G1 M4 But ANTLR can not idetify this because ANTLR always match longest substring? How to handle the case?

Update

What I am going to do:

I am trying to parse CNC G-Code txt and get keywords from a file stream, which is usually used to control a machine and drive motors to move.

The G-Code rule is :

// Define a grammar called Hello
grammar GCode;

script  : blocks+ EOF;

blocks: 
      assign_stat
    | ncblock 
    | NEWLINE
    ;

ncblock : 
     ncelements  NEWLINE  // 
    ;
ncelements :
        ncelement+
    ;

ncelement 
    :   
        LINENUMEXPR    // linenumber N100 
    |   GCODEEXPR   // G10 G54.1
    |   MCODEEXPR   // M30
    |   coordexpr   // X100 Y100 Z[A+b*c]
    |   FeedExpr    // F10.12
    |   AccExpr     // E2.0
    // |   callSubroutine 
    ;

assign_stat: 
        VARNAME '=' expression NEWLINE
    ;

expression: 
       multiplyingExpression  ('+' | '-') multiplyingExpression   
    ;

multiplyingExpression
   : powExpression (('*' | '/') powExpression)*
   ;

powExpression
   : signedAtom ('^' signedAtom)*
   ;

signedAtom
   : '+' signedAtom
   | '-' signedAtom
   | atom
   ;

atom
   : scientific
   | variable
   | '(' expression ')'
   ;

LINENUMEXPR: 'N' Digit+ ;
GCODEEXPR : 'G' GPOSTFIX;
MCODEEXPR : 'M' INT;
coordexpr: 
        CoordExpr
    |   ParameterKeyword getValueExpr
    ;

getValueExpr: 
        '[' expression ']'
    ;

CoordExpr 
        : 
         ParameterKeyword SCIENTIFIC_NUMBER
        ;
ParameterKeyword: [XYZABCUVWIJKR];
FeedExpr: 'F' SCIENTIFIC_NUMBER;
AccExpr: 'E' SCIENTIFIC_NUMBER;



fragment
GPOSTFIX
    : Digit+ ('.' Digit+)*
    ;

variable
   : VARNAME
   ;

scientific
   : SCIENTIFIC_NUMBER
   ;

SCIENTIFIC_NUMBER
   : SIGN? NUMBER (('E' | 'e') SIGN? NUMBER)?
   ;

fragment NUMBER
   : ('0' .. '9') + ('.' ('0' .. '9') +)?
   ;

HEX_INTEGER
 : '0' [xX] HEX_DIGIT+
 ;

fragment HEX_DIGIT
 : [0-9a-fA-F]
 ;
 
INT : Digit+;

fragment
Digit : [0-9];

fragment 
SIGN
   : ('+' | '-')
   ;

VARNAME
    : [a-zA-Z_][a-zA-Z_0-9]*
    ;

NEWLINE 
    : '\r'? '\n'
    ;

WS : [ \t]+ -> skip ; // skip spaces, tabs, newlines


Sample program(it works well except the last line):

N200 G54.1
a = 100
b = 10
c = a + b 
Z[a + b*c]
N002 G2 X30.1 Y20.1 I20.1 J0.1 K0.2 R20

N100 G1X100.5Z[VAR1+100]M3H3 // it works well except the last line

I want to parse N100G1X100.5YE5Z[VAR1+100]M3H3 to

  • -> N100 G1 X100 Z[VAR1+100]
  • -> or it will be better to split the node X100 to two subnode X 100: NC block tree

I am trying to use ANTLR, but ANTLR always take the rule "longest match wins". N100G1X100 is identified to a word.

Append question: What's the best tool to finish the task?



Solution 1:[1]

ANTLR has a strict separation between pasrer and lexer, and therefor the lexer operates in a predictable way (longest match wins). So if you have some sort of identifier rule that matches N100G1M4 but sometimes want to match N100, G1 and M4 separately, you're out of luck.

How to handle the case?

The only answer one can give (with the amount of details given) is: remove the rule that matches N100G1M4 as 1 token. If that is something you cannot do, then don't use ANTLR, but use a "scannerless" parser.

Scannerless Parser Generators

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Bart Kiers