I'm trying to craft an LL(1) parser for a deterministic context-free grammar. One of the things I'd like to be able to use, because it would enable much simpler, less greedy and more maintainable parsing of literal records like numbers, strings, comments and quotations is k tokens of lookahead, instead of just 1 token of lookahead.
Currently, my solution (which works but which I feel is suboptimal) is like (but not) the following:
for idx, tok in enumerate(toklist):
if tok == "blah":
do(stuff)
elif tok == "notblah":
try:
toklist[idx + 1]
except:
whatever()
else:
something(else)
(You can see my actual, much larger implementation at the link above.)
Sometimes, like if the parser finds the beginning of a string or block comment, it would be nice to "jump" the iterator's current counter, such that many indices in the iterator would be skipped.
This can in theory be done with (for example) idx += idx - toklist[idx+1:].index(COMMENT)
, however in practice, each time the loop repeats, the idx
and obj
are reinitialised with toklist.next()
, overwriting any changes to the variables.
The obvious solution is a while True:
or while i < len(toklist): ... i += 1
, but there are a few glaring problems with those:
Using while
on an iterator like a list is really C-like and really not Pythonic, besides the fact it's horrendously unreadable and unclear compared to an enumerate
on the iterator. (Also, for while True:
, which may sometimes be desirable, you have to deal with list index out of range
.)
For each cycle of the while
, there are two ways to get the current token:
- using
toklist[i]
everywhere (ugly, when you could just iterate)
- assigning
toklist[i]
to a shorter, more readable, less typo-vulnerable name each cycle. this has the disadvantage of hogging memory and being slow and inefficient.
Perhaps it can be argued that a while
loop is what I should use, but I think while
loops are for doing things until a condition is no longer true, and for
loops are for iterating and looping finitely over an iterator, and a(n iterative LL) parser should clearly implement the latter.
Is there a clean, Pythonic, efficient way to control and change arbitrarily the iterator's current index?
This is not a dupe of this because all those answers use complicated, unreadable while
loops, which is what I don't want.
See Question&Answers more detail:
os