When a Spout is executed it runs in a single thread. This thread loops "forever" and has multiple duties:
- call
Spout.nextTuple()
- retrieve "acks" and process them
- retrieve "fails" and process them
- time-out tuples
For this to happen, it is essential, that you do not stay "forever" (ie, loop or block) in nextTuple()
but return after emitting a tuple to the system (or just return if no tuple can be emitted, but do not block). Otherwise, the Spout cannot does its work properly. nextTuple()
will be called in a loop by Storm. Thus, after ack/fail messages are processed etc. the next call to nextTuple()
happens quickly.
Therefore, it is also considered bad practice to emit multiple tuples in a single call to nextTuple()
. As long as the code stays in nextTuple()
, the spout thread cannot (for example) react on incoming acks. This might lead to unnecessary time-outs because acks cannot be processed timely.
Best practice is to emit a single tuple for each call to nextTuple()
. If no tuple is available to be emitted, you should return (without emitting) and not wait until a tuple is available.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…