'Regex for capturing a block of variable assignments
I have tokens that I need to parse and would like to have a regex that captures them. Here is how the tokens look
EVAL
INPUT A = 5;
INPUT B = 6;
...
INPUT LongVariableName = 10;
I want to validate that every EVAL block is formatted correctly for parsing. A naive approach I have is to take these tokens and build a string out of them like so:
"EVAL:INPUT A = 5;#...#INPUT EXAMPLE = 10;#" where we signify that we have an EVAL with a : separating it from the inputs. Each input is then delimited by a #.
Or getting into the spirit of regex, EVAL:(INPUT ID = NUM;#)+,
- where
IDbegins with an alphabetic character but can contain any amount of alphanumeric characters after, and NUMis any nonnegative integer.- there must always be at least one INPUT.
I want this regex to make sure that within the string that we build, each input is instantiated (with INPUT), named (with ID), assigned a value and terminated with = NUM;, and delimited correctly (with #).
I am however having trouble how to design the regex so that we are able to capture variable names of any length greater than one.
Solution 1:[1]
I am however having trouble how to design the regex so that we are able to capture variable names of any length greater than one
I suggest that you construct a state machine for this. You could define states in which your lexer is currently in and based on that process the strings.
You could use a loop and inside you construct several if statements to detect the state of the lexer and have the corresponding action in it.
So initially your lexer's state would be set to read # followed by a string INPUT then after reading it the lexer's state would be set to read ID. For this you could use sub-states (say readID), i.e initially the lexer's readID state would be set to a constant READ_FIRST_CHAR(that you define as a macro or enum) which would be any of the alphabetic character, right after reading the first character of ID you set the readID state to read READ_INTERIOR_CHAR, which is any alphanumeric character.
I hope you get the idea. The language you defined is very simple and I don't think you would need a parser for it.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
