I'm doing data scraping calls with an urllib2, yet they each take around 1 seconds to complete. I was trying to test if I could multi-thread the URL-call loop into threading with different offsets.
I'm doing this now with my update_items() method, where first and second parameter are the offset and limit to do loops:
import threading
t1 = threading.Thread(target=trade.update_items(1, 100))
t2 = threading.Thread(target=trade.update_items(101, 200))
t3 = threading.Thread(target=trade.update_items(201, 300))
t1.start()
t2.start()
t3.start()
#t1.join()
#t2.join()
#t3.join()
Like the code, I tried to commment out the join() to prevent waiting of the threads, but it seems I get the idea of this library wrong. I inserted print() functions into the update_items() method, funny tho it shows that it's still looping just in serial routine and not all 3 threads in parallel, like I wanted to achieve.
My normal scraping protocol takes about 5 hours to complete and it's only very small pieces of data, but the HTTP call always takes some time. I want to multi-thread this task at least a few times to shorten the fetching at least to around 30-45minutes.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…