I was reading this blogpost.
And the author was talking about breaking the hashCode()
in String
in multithread environment.
By having:
public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
Changed to:
public int hashCode() {
if (hash == 0) {
int off = offset;
char val[] = value;
int len = count;
int h = 0;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return hash;
}
Which the author says and I quote:
"What I've done here is to add an additional read: the second read of hash, before the return. As odd as it sounds, and as unlikely as it is to happen, the first read can return the correctly computed hash value, and the second read can return 0! This is allowed under the memory model because the model allows extensive reordering of operations. The second read can actually be moved, in your code, so that your processor does it before the first!"
So further going through comments, someone says it can be reordered to
int h = hash;
if (hash == 0) {
...
}
return h;
How is that possible? I thought reordering only involves moving program statements up and down. What rules is it following? I've Googled, read the JSR133 FAQ, checked the Java Concurrency in Practice book, but I can't seem to find a place that helps me to understand particularly on reordering. If anyone can point me to the right direction, I would really appreciate it.
Thanks to Louis clarifying the meaning of "Reordering", I wasn't thinking in terms of "byteCode"
However, I still don't understand why is it allowed to move the 2nd Read to the front, this is my naive attempt to translate it to somewhat "bytecode" format.
For simplification purpose, operations that are used to calculate the hashcode are express as calchash()
. Therefore, I express the program as:
if (hash == 0) {
h = calchash();
hash = h;
}
return hash;
And my attempt to express it in "bytecode" form:
R1,R2,R3 are in the operands stack, or the registers
h is in the array of local variables
In program order:
if (hash == 0) { ---------- R1 = read hash from memory (1st read)
---------- Compare (R1 == 0)
h = calchash(); ---------- R2 = calchash()
---------- h = R2 (Storing the R2 to local variable h)
hash = h; ---------- Hash = h (write to hash)
}
return hash ---------- R3 = read hash from memory again(2nd read)
---------- return R3
Reordered transformation (My version based on comments):
---------- R3 = read hash from memory (2nd read) *moved*
if (hash == 0) { ---------- R1 = read hash from memory (1st read)
---------- Compare (R1 == 0)
h = calchash(); ---------- R2 = calchash()
---------- h = R2 (Storing the R2 to local variable h)
hash = h; ---------- hash = h (write to hash)
}
return hash ---------- return R3
Checking the comments again, I found this answered by the author:
Reordered Transformation (From the blog)
r1 = hash;
if (hash == 0) {
r1 = hash = // calculate hash
}
return r1;
This case actually works on single thread, but it's possible to fail with multiple threads.
It seems that the JVM are making simplifications based on
h = hash and it simplifies the use of R1, R2, R3 to single R1
Therefore, JVM does more than reordering instructions, it also seems reducing the amount of registers being used.
See Question&Answers more detail:
os