'Character being skipped when looping through string
I'm making a hobby programming language and there's an issue with my lexer when its reading an integer.
Here is the code for when the current character is in a string list of numbers:
integers = "1234567890"
elif currentChar in integers:
res = ""
while pos < length and src[pos] in integers:
print(src[pos])
res += src[pos]
pos += 1
column += 1
pos += 1
column += 1
tokens.append({"type": "INTEGER", "value": res})
If you need the entire main lexer function here it is:
def tokenize(self):
tokens = []
pos = 0
line = 1
column = 1
src = self.src
length = len(src)
varChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_"
integers = "1234567890"
KEYWORDS = ["print"]
while pos < length:
currentChar = src[pos]
if currentChar == " ":
pos += 1
column += 1
continue
elif currentChar == "\n":
line += 1
column = 0
pos += 1
continue
elif currentChar == '"':
pos += 1
column += 1
res = ""
while pos < length and src[pos] != '"':
res += src[pos]
pos += 1
column += 1
try:
if src[pos] != '"':
return [], f"Unterminated string at line {line}, column {column}"
except IndexError:
if src[pos - 1] != '"':
return [], f"Unterminated string at line {line}, column {column}"
pos += 1
column += 1
tokens.append({"type": "STRING", "value": res})
elif currentChar in varChars:
pos += 1
column += 1
res = currentChar
while pos < length and src[pos] in varChars:
res += src[pos]
pos += 1
column += 1
if res not in KEYWORDS:
tokens.append({"type": "VARIABLE_NAME", "value": res})
elif res in KEYWORDS:
tokens.append({"type": "KEYWORD", "value": res})
elif currentChar == "=":
pos += 1
column += 1
tokens.append({"type": "OPERATOR", "value": currentChar})
elif currentChar in integers:
res = ""
while pos < length and src[pos] in integers:
print(src[pos])
res += src[pos]
pos += 1
column += 1
tokens.append({"type": "INTEGER", "value": res})
elif currentChar == "(":
pos += 1
column += 1
tokens.append({"type": "OPEN_PAREN", "value": currentChar})
elif currentChar == ")":
pos += 1
column += 1
tokens.append({"type": "CLOSE_PAREN", "value": currentChar})
elif currentChar == ";":
res = ""
pos += 1
column += 1
while pos < length and src[pos] != "\n":
res += src[pos]
pos += 1
column += 1
pos += 1
column += 1
tokens.append({"type": "COMMENT", "value": res})
else:
return [], f"Unexpected character {currentChar} at line {line}, column {column}"
P.S: pos is the current index in the src, and src is the code.
When i eventually reach the end of my parser it says that I'm missing a character, always being the character after the end of a number.
e.g:
print(10)
in this code the closing parenthesis would be skipped over by the lexer.
Any help would be appreciated!
Solution 1:[1]
You're accidentally incrementing your position again outside of your loop.
while pos < length and src[pos] in integers:
print(src[pos])
res += src[pos]
pos += 1
column += 1
pos += 1 # <-------------------
column += 1 # <----------------
As your lexer enters it's last iteration of this loop, it reads the final digit, increments it's position, and then increments it again, thus skipping the character afterward.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Amit Kulkarni |
