The difference between reduce
and collect
is that collect
is an enhanced form of reduction that can deal with mutable objects in parallel. The collect
algorithm thread-confines the various result objects, so that they can be mutated safely, even if they aren't thread-safe. That's why Averager
works using collect
. For sequential computation using reduce
this doesn't usually matter, but for parallel computation it will give incorrect results, as you observed.
A key point is that reduce
works as long as it is dealing with values but not mutable objects. You can see this by looking at the first argument to reduce
. The example code passes new Averager()
which is a single object that's used as the identity value by multiple threads in the parallel reduction. The way parallel streams work is that the workload is split into segments that are processed by individual threads. If multiple threads are mutating the same (non-thread-safe) object, it should be clear why this will lead to incorrect results.
It is possible to use reduce
to compute an average, but you need to make your accumulation object be immutable. Consider an object ImmutableAverager
:
static class ImmutableAverager {
private final int total;
private final int count;
public ImmutableAverager() {
this.total = 0;
this.count = 0;
}
public ImmutableAverager(int total, int count) {
this.total = total;
this.count = count;
}
public double average() {
return count > 0 ? ((double) total) / count : 0;
}
public ImmutableAverager accept(int i) {
return new ImmutableAverager(total + i, count + 1);
}
public ImmutableAverager combine(ImmutableAverager other) {
return new ImmutableAverager(total + other.total, count + other.count);
}
}
Note that I've adjusted the signatures of accept
and combine
to return a new ImmutableAverager
instead of mutating this
. (These changes also make the methods match the function arguments to reduce
so we can use method references.) You'd use ImmutableAverager
like this:
double average = Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
.parallel()
.reduce(new ImmutableAverager(),
ImmutableAverager::accept,
ImmutableAverager::combine)
.average();
System.out.println("Average: "+average);
Using immutable value objects with reduce
should give the correct results in parallel.
Finally, note that IntStream
and DoubleStream
have summaryStatistics()
methods and Collectors
has averagingDouble
, averagingInt
, and averagingLong
methods that can do these computations for you. However, I think the question is more about the mechanics of collection and reduction than about how to do averaging most concisely.