The <T> T[] toArray(T[] a)
method on Collection is weird, because it's trying to fulfill two purposes at once.
First, let's look at toArray()
. This takes the elements from the collection and returns them in an Object[]
. That is, the component type of the returned array is always Object
. That's useful, but it doesn't satisfy a couple other use cases:
1) The caller wants to re-use an existing array, if possible; and
2) The caller wants to specify the component type of the returned array.
Handling case (1) turns out to be a fairly subtle API problem. The caller wants to re-use an array, so it clearly needs to be passed in. Unlike the no-arg toArray()
method, which returns an array of the right size, if the caller's array is re-used, we need to a way to return the number of elements copied. OK, let's have an API that looks like this:
int toArray(T[] a)
The caller passes in an array, which is reused, and the return value is the number of elements copied into it. The array doesn't need to be returned, because the caller already has a reference to it. But what if the array is too small? Well, maybe throw an exception. In fact, that's what Vector.copyInto does.
void copyInto?(Object[] anArray)
This is a terrible API. Not only does it not return the number of elements copied, it throws IndexOutOfBoundsException
if the destination array is too short. Since Vector is a concurrent collection, the size might change at any time before the call, so the caller cannot guarantee that the destination array is of sufficient size, nor can it know the number of elements copied. The only thing the caller can do is to lock the Vector around the entire sequence:
synchronized (vec) {
Object[] a = new Object[vec.size()];
vec.copyInto(a);
}
Ugh!
The Collections.toArray(T[])
API avoids this problem by having different behavior if the destination array is too small. Instead of throwing an exception like Vector.copyInto(), it allocates a new array of the right size. This trades away the array-reuse case for more reliable operation. The problem is now that caller can't tell whether its array was reused or a new one was allocated. Thus, the return value of toArray(T[])
needs to return an array: the argument array, if it was large enough, or the newly allocated array.
But now we have another problem. We no longer have a way to tell the caller the number of elements that were copied from the collection into the array. If the destination array was newly allocated, or the array happens to be exactly the right size, then the length of the array is the number of elements copied. If the destination array is larger than the number of elements copied, the method attempts to communicate to the caller the number of elements copied, by writing a null
to the array location one beyond the last element copied from the collection. If it's known that the source collection has no null values, this enables the caller to determine the number of elements copied. After the call, the caller can search for the first null value in the array. If there is one, its position determines the number of elements copied. If there is no null in the array, it knows that the number of elements copied equals the length of the array.
Quite frankly, this is pretty lame. However, given the constraints on the language at the time, I admit I don't have a better alternative.
I don't think I've ever seen any code that reuses arrays or that checks for nulls this way. This is probably a holdover from the early days when memory allocation and garbage collection were expensive, so people wanted to reuse memory as much as possible. More recently, the accepted idiom for using this method has been the second use case described above, that is, to establish the desired component type of the array as follows:
MyType[] a = coll.toArray(new MyType[0]);
(It seems wasteful to allocate a zero-length array for this purpose, but it turns out that this allocation can be optimized away by the JIT compiler, and the obvious alternative toArray(new MyType[coll.size()])
is actually slower. This is because of the need to initialize the array to nulls, and then to fill it in with the collection's contents. See Alexey Shipilev's article on this topic, Arrays of Wisdom of the Ancients.)
However, many people find the zero-length array counterintuitive. In JDK 11, there is a new API that allows one to use an array constructor reference instead:
MyType[] a = coll.toArray(MyType[]::new);
This lets the caller specify the component type of the array, but it lets the collection provide the size information.