'How to recursively iterate through a txt or html file using functions and returning each individual character
I am trying to create an input stream for the tokenization stage of an HTML parser I am building. Here is some context:
The input stream consists of the characters pushed into it as the input byte stream is decoded.
Before the tokenization stage, the input stream must be preprocessed by normalizing newlines. Thus, newlines in HTML DOMs are represented by U+000A LF characters, and there are never any U+000D CR characters in the input to the tokenization stage.
The next input character is the first character in the input stream that has not yet been consumed. Initially, the next input character is the first character in the input. The current input character is the last character to have been consumed.
My test.html file:
< !DOCTYPE html > on line 0
< head >Hi< /head > on line 1
My code:
with open('test.html', 'r') as f:
file = f.readlines()
file = [item.replace('\n', '\f') for item in file]
file = [str(item) for item in file]
def input_stream():
for line_no, line in enumerate(file): # the whole line
eof_no = len(file[line_no]) - 1
for char_no, char in enumerate(line): # each character in that line
eof_no = len(file[line_no]) - 1
if char_no == eof_no:
eof = True
return eof
return char
def run():
eof = False
while eof == False:
result = input_stream()
if result == True:
break
else:
return result
print(run())
def state_machine(input): #Output of run() is to be passed in here
#Statements..
So far I believe I have managed to include everything apart from the recursive portion of it. I need to return one character at a time, pass it into the state_machine function for it to perform certain operations and eventually return token(s) - all until the end of the file.
I know that returning anything in a function will end/break out of any loops but I do not know how else to model it.
Recap: The returning of result iteratively does not work. Any ideas?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
