Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
925 views
in Technique[技术] by (71.8m points)

performance - High cost of polymorphism in Java Hotspot server

When I run my timing test program in Java Hotspot client, I get consistent behavior. However, when I run it in Hotspot server, I get unexpected result. Essentially, the cost of polymorphism is unacceptably high in certain situations that I've tried to duplicate bellow.

Is this a known issue/bug with Hotspot server, or am I doing something wrong?

Test program and timing are given bellow:

Intel i7, Windows 8
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
Mine2: 0.387028831 <--- polymorphic call with expected timing
Trivial: 1.545411765 <--- some more polymorphic calls
Mine: 0.727726371 <--- polymorphic call with unexpected timing. Should be about 0.38
Mine: 0.383132698 <--- direct call with expected timing

The situation gets worse as I add additional tests. Timing of the tests near the end of the list are completely off.

interface canDoIsSquare {
    boolean isSquare(long x);
}

final class Trivial implements canDoIsSquare {
    @Override final public boolean isSquare(long x) {
        if (x > 0) {
            long t = (long) Math.sqrt(x);
            return t * t == x;
        }
        return x == 0;
    }
    @Override public String toString() {return "Trivial";}
}

final class Mine implements canDoIsSquare {
    @Override final public boolean isSquare(long x) {
        if (x > 0) {
            while ((x & 3) == 0)
                x >>= 2;
            if ((x & 2) != 0 || (x & 7) == 5)
                return false;
            final long t = (long) Math.sqrt(x);
            return (t * t == x);
        }
        return x == 0;
    }

    @Override public String toString() {return "Mine";}
}

final class Mine2 implements canDoIsSquare {
    @Override final public boolean isSquare(long x) {
        // just duplicated code for this test
        if (x > 0) {
            while ((x & 3) == 0)
                x >>= 2;
            if ((x & 2) != 0 || (x & 7) == 5)
                return false;
            final long t = (long) Math.sqrt(x);
            return (t * t == x);
        }
        return x == 0;
    }
    @Override final public String toString() {return "Mine2";}
}

public class IsSquared {
    static final long init = (long) (Integer.MAX_VALUE / 8)
            * (Integer.MAX_VALUE / 2) + 1L;

    static long test1(final canDoIsSquare fun) {
        long r = init;
        long startTimeNano = System.nanoTime();
        while (!fun.isSquare(r))
            ++r;
        long taskTimeNano = System.nanoTime() - startTimeNano;
        System.out.println(fun + ": " + taskTimeNano / 1e9);
        return r;
    }

    static public void main(String[] args) {
        Mine mine = new Mine();
        Trivial trivial = new Trivial();
        Mine2 mine2 = new Mine2();

        test1(mine2);
        test1(trivial);
        test1(mine);

        long r = init;
        long startTimeNano = System.nanoTime();
        while (!mine.isSquare(r))
            ++r;
        long taskTimeNano = System.nanoTime() - startTimeNano;
        System.out.println(mine + ": " + taskTimeNano / 1e9);
        System.out.println(r);
    }
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The cost is high, indeed, but your benchmark doesn't measure anything really relevant. The JIT can optimize away most of the overhead, but you didn't give it any chance. See e.g. here.

In any case, there's no benchmark warmup and there's On Stack Replacement.

The explanation is probably that the Server Hotspot optimizes better but slower. It assumes that it has enough time and collects the necessary stats longer. So while the Client Hotspot optimized your program, the Server Hotspot was preparing itself to produce better code.

The reason for the worsening with additional tests is that the initially monomorphic call site became bimorphic and then megamorphic.

In reality it's possible that only one of the methods gets called. If you want benchmark this, you have to run each test in its own JVM. This is a real pain, but existing benchmarking frameworks do it for you.

Or you may want to measure the polymorphic case, but then you need to warm up the code with all cases first. This way you can find out which method is faster even in a single JVM (though each will be slowed down by the megamorphic call overhead.

Update

The explanation seems to be the change from monomorphic to megamorhic. When the first test was run, the JVM was knew all the classes (as the instances were already created), but was optimistically assuming that only Mine2 occurs on the call site. So it did a quick check (translated as a conditional branch, which was always correctly predicted and thus very fast), and called the proper method. As it later saw the other two instances being used there, it had to create a branch table for them (the branch prediction still works, but the overhead is higher).

Question

What's unclear: The JVM can move this test out of the loop and thus reduce it's cost to nearly nothing. I can't tell why it doesn't happen.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...