Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
247 views
in Technique[技术] by (71.8m points)

java - Where is defined the combination order of the combiner of collect(supplier, accumulator, combiner)?

The Java API documentations states that the combiner parameter of the collect method must be:

an associative, non-interfering, stateless function for combining two values, which must be compatible with the accumulator function

A combiner is a BiConsumer<R,R> that receives two parameters of type R and returns void. But the documentation does not state if we should combine the elements into the first or the second parameter?

For instance the following examples may give different results, depending on the order of combination be: m1.addAll(m2) or m2.addAll(m1).

List<String> res = LongStream
     .rangeClosed(1, 1_000_000)
     .parallel()
     .mapToObj(n -> "" + n)
     .collect(ArrayList::new, ArrayList::add,(m1, m2) -> m1.addAll(m2));

I know that in this case we could simply use a method handle, such as ArrayList::addAll. Yet, there are some cases where it is required a Lambda and we must combine the items in the correct order, otherwise we could get an inconsistent result when processing in parallel.

Is this claimed in any part of the Java 8 API documentation? Or it really doesn't matter?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Of course, it matters, as when you use m2.addAll(m1) instead of m1.addAll(m2), it doesn’t just change the order of elements, but completely breaks the operation. Since a BiConsumer doesn’t return a result, you have no control over which object the caller will use as the result and since the caller will use the first one, modifying the second instead will cause data loss.

There is a hint if you look at the accumulator function which has the type BiConsumer<R,? super T>, in other words can’t do anything else than storing the element of type T, provided as second argument, into the container of type R, provided as first argument.

If you look at the documentation of Collector, which uses a BinaryOperator as combiner function, hence allows the combiner to decide which argument to return (or even an entirely different result instance), you find:

The associativity constraint says that splitting the computation must produce an equivalent result. That is, for any input elements t1 and t2, the results r1 and r2 in the computation below must be equivalent:

A a1 = supplier.get();
accumulator.accept(a1, t1);
accumulator.accept(a1, t2);
R r1 = finisher.apply(a1);  // result without splitting

A a2 = supplier.get();
accumulator.accept(a2, t1);
A a3 = supplier.get();
accumulator.accept(a3, t2);
R r2 = finisher.apply(combiner.apply(a2, a3));  // result with splitting

So if we assume that the accumulator is applied in encounter order, the combiner has to combine the first and second argument in left-to-right order to produce an equivalent result.


Now, the three-arg version of Stream.collect has a slightly different signature, using a BiConsumer as combiner exactly for supporting method references like ArrayList::addAll. Assuming consistency throughout all these operations and considering the purpose of this signature change, we can safely assume that it has to be the first argument which is the container to modify.

But it seems that this is a late change and the documentation hasn’t adapted accordingly. If you look at the Mutable reduction section of the package documentation, you will find that it has been adapted to show the actual Stream.collect’s signature and usage examples, but repeats exactly the same definition regarding the associativity constraint as shown above, despite the fact that finisher.apply(combiner.apply(a2, a3)) doesn’t work if combiner is a BiConsumer


The documentation issue has been reported as JDK-8164691 and addressed in Java?9. The new documentation says:

combiner - an associative, non-interfering, stateless function that accepts two partial result containers and merges them, which must be compatible with the accumulator function. The combiner function must fold the elements from the second result container into the first result container.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...