'How to separate tokens(parantheses,colons,etc) when a scanner is scanning a file?
I have created a Python scanner that is a lexical analyzer with created dictionaries. An argument(file) will be passed through command line that will then scan and print out each token. The problem is that if there is no space between tokens, it counts as a single string. Obviously, I do not want that. Is there a way to make a single rule for encountering no white space and create a single whitespace and then continue or will that be a bunch of conditionals for each one? An example of the command line print out that has the problem.
Line # 2 - Program: hello_world
Token: Program:, token, 3004
Token: hello_world, token, 3004
----------------------------------------
Line # 3 - Author: example
Token: Author:, token, 3004
Token: example, token, 3004
# Implement split() for separation
for word in line.split():
# Implementation of block comment encounter
# If encountered, skip
if tokenize(word).value == 3006:
self.insideComment = True
return ""
if tokenize(word).value == 3007:
self.insideComment = False
return ""
if tokenize(word).value == 3008:
return (tokenizedLine)
if tokenize(word).value == 3014:
self.insideComment == False
return ""
# If program encounters ':' then white space
if tokenize(word).value == 3013:
return...
#Special characters, other tokens
"otherTokens": {
'\"': 3001,
# cannot start with number
'\[([a-zA-Z]|([\-]?[0-9]?.[0-9]))+\]': 3002,
'[\-]?[0-9]': 3003,
'[a-zA-Z]+': 3004,
'\(': 3005,
'\)': 3006,
'\/\*': 3007,
'\*\/': 3008,
'\/\/': 3009,
'\[\]': 3010,
'\,': 3011,
'\s+': 3012,
':': 3013, # colon
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
