Here is a seemingly simple problem: given a list of iterators that yield sequences of integers in ascending order, write a concise generator that yields only the integers that appear in every sequence.
After reading a few papers last night, I decided to hack up a completely minimal full text indexer in Python, as seen here (though that version is quite old now).
My problem is with the search()
function, which must iterate over each posting list and yield only the document IDs that appear on every list. As you can see from the link above, my current non-recursive 'working' attempt is terrible.
Example:
postings = [[1, 100, 142, 322, 12312],
[2, 100, 101, 322, 1221],
[100, 142, 322, 956, 1222]]
Should yield:
[100, 322]
There is at least one elegant recursive function solution to this, but I'd like to avoid that if possible. However, a solution involving nested generator expressions, itertools
abuse, or any other kind of code golf is more than welcome. :-)
It should be possible to arrange for the function to only require as many steps as there are items in the smallest list, and without sucking the entire set of integers into memory. In future, these lists may be read from disk, and larger than available RAM.
For the past 30 minutes I've had an idea on the tip of my tongue, but I can't quite get it into code. Remember, this is just for fun!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…