In both the first and the final examples the original sequence passed to split-with
is retained while being realized in full in memory; hence the OOME. The way this happens is indirect; what is retained directly is t
, while the original sequence is being held onto by t
, a lazy seq, in its unrealized state.
The way t
causes the original sequence to be held is as follows. Prior to being realized, t
is a LazySeq
object storing a thunk which may be called upon at some point to realize t
; this thunk needs to store a pointer to the original sequence argument to split-with
before it is realized to pass it on to take-while
-- see the implementation of split-with
. Once t
is realized, the thunk becomes eligible for GC (the field which holds it in the LazySeq
object is set to null
) at t
no longer holds the head of the huge input seq.
The input seq itself is being realized in full by (count d)
, which needs to realize d
, and thus the original input seq.
Moving on to why t
is being retained:
In the first case, this is because (count d)
gets evaluated before (count t)
. Since Clojure evaluates these expressions left to right, the local t
needs to hang around for the second call to count, and since it happens to hold on to a huge seq (as explained above), that leads to the OOME.
The final example where only (count d)
is returned should ideally not hold on to t
; the reason that is not the case is somewhat subtle and best explained by referring to the second example.
The second example happens to work fine, because after (count t)
is evaluated, t
is no longer needed. The Clojure compiler notices this and uses a clever trick to have the local reset to nil
simultaneously with the count
call being made. The crucial piece of Java code does something like f(t, t=null)
, so that the current value of t
is passed to the appropriate function, but the local is cleared before control is handed over to f
, since this happens as a side effect of the expression t=null
which is an argument to f
; clearly here Java's left-to-right semantics are key to this working.
Back to the final example, this doesn't work, because t
is not actually used anywhere and unused locals are not handled by the locals clearing process. (The clearing happens at the point of last use; in absence of such a point in the program, there is no clearing.)
As for count
realizing lazy sequences: it must do that, as there is no general way of predicting the length of a lazy seq without realizing it.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…