Monday, February 6, 2012

Devirtualizing method calls in Java

If you've read code I wrote, chances are you've seen I'm a strong adept of const correctness (WP). Naturally, when I started writing Java code (to my despair), I became equally adept of "final correctness". This is mostly because the keywords const (C/C++) and final (Java/Scala) are truly here to help the compiler help you. Many things aren't supposed to change. References in a given scope are often not made point to another object, various methods aren't supposed to be overridden, most classes aren't designed to be subclassed, etc. In C/C++ const also helps avoid doing unintentional pointer arithmetic. So when something isn't supposed to happen, if you state it explicitly, you allow the compiler to catch and report any violation of this otherwise implicit assumption.

The other aspect of const correctness is that you also help the compiler itself. Often the extra bit of information enables it to produce more efficient code. In Java especially, final plays an important role in thread safety, and when used on Strings as well as built-in types. Here's an example of the latter:

     1 final class concat {
     2   public static void main(final String[] _) {
     3     String a = "a";
     4     String b = "b";
     5     System.out.println(a + b);
     6     final String X = "X";
     7     final String Y = "Y";
     8     System.out.println(X + Y);
     9   }
    10 }
Which gets compiled to:
public static void main(java.lang.String[]);
  Code:
   0: ldc #2; //String a
   2: astore_1
   3: ldc #3; //String b
   5: astore_2
   6: getstatic #4; //Field java/lang/System.out:Ljava/io/PrintStream;
   9: new #5; //class java/lang/StringBuilder
   12: dup
   13: invokespecial #6; //Method java/lang/StringBuilder."":()V
   16: aload_1
   17: invokevirtual #7; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   20: aload_2
   21: invokevirtual #7; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   24: invokevirtual #8; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   27: invokevirtual #9; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   30: getstatic #4; //Field java/lang/System.out:Ljava/io/PrintStream;
   33: ldc #10; //String XY
   35: invokevirtual #9; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   38: return
}
In the original code, lines 3-4-5 are identical to lines 6-7-8 modulo the presence of two final keywords. Yet, lines 3-4-5 get compiled to 14 byte code instructions (lines 0 through 27), whereas 6-7-8 turn into only 3 (lines 30 through 35). I find it kind of amazing that the compiler doesn't even bother optimizing such a simple piece of code, even when used with the -O flag which, most people say, is almost a no-op as of Java 1.3 – at least I checked in OpenJDK6, and it's truly a no-op there, the flag is only accepted for backwards compatibility. OpenJDK6 has a -XO flag instead, but the Sun Java install that comes on Mac OS X doesn't recognize it...

There was another thing that I thought was a side effect of final. I thought any method marked final, or any method in a class marked final would allow the compiler to devirtualize method calls. Well, it turns out that I was wrong. Not only it doesn't do this, but also the JVM considers this compile-time optimization downright illegal! Only the JIT compiler is allowed to do it.

All method calls in Java are compiled to an invokevirtual byte code instruction, except:

  • Constructors and private method use invokespecial.
  • Static methods use invokestatic.
  • Virtual method calls on objects with a static type that is an interface use invokeinterface.
The last one is weird, one might wonder why special-case virtual method calls when the static type is an interface. The reason essentially boils down to the fact that if the static type is not an interface, then we know at compile-time what entry in the vtable to use for that method, and all we have to do at runtime is essentially to read that entry from the vtable. If the static type is an interface, the compiler doesn't even know which entry in the vtable will be used, as this will depend at what point in the class hierarchy the interface will be used.

Anyway, I always imagined that having a final method meant that the compiler would compile all calls to it using invokespecial instead of invokevirtual, to "devirtualize" the method calls since it already knows for sure at compile-time where to transfer execution. Doing this at compile time seems like a trivial optimization, while leaving this up to the JIT is far more complex. But no, the compiler doesn't do this. It's not even legal to do it!

interface iface {
  int foo();
}

class base implements iface {
  public int foo() {
    return (int) System.nanoTime();
  }
}

final class sealed extends base {  // Implies that foo is final
}

final class sealedfinal extends base {
  public final int foo() {  // Redefine it to be sure / help the compiler.
    return super.foo();
  }
}

public final class devirt {
  public static void main(String[] a) {
    int n = 0;
    final iface i = new base();
    n ^= i.foo();              // invokeinterface
    final base b = new base();
    n ^= b.foo();              // invokevirtual
    final sealed s = new sealed();
    n ^= s.foo();              // invokevirtual
    final sealedfinal s = new sealedfinal();
    n ^= s.foo();              // invokevirtual
  }
}
A simple Caliper benchmark also shows that in practice all 4 calls above have exactly the same performance characteristic (see full microbenchmark). This seems to indicate that the JIT compiler is able to devirtualize the method calls in all these cases.

To try to manually devirtualize one of the last two calls, I applied a binary patch (courtesy of xxd) on the .class generated by javac. After doing this, javap correctly shows an invokespecial instruction. To my dismay the JVM then rejects the byte code: Exception in thread "main" java.lang.VerifyError: (class: devirt, method: timeInvokeFinalFinal signature: (I)I) Illegal use of nonvirtual function call

I find the wording of the JLS slightly ambiguous as to whether or not this is truly illegal, but in any case the Sun JVM rejects it, so it can't be used anyway.

The moral of the story is that javac is really only translating Java code into pre-parsed Java code. Nothing interesting happens at all in the "compiler", which should really be called the pre-parser. They don't even bother doing any kind of trivial optimization. Everything is left up to the JIT compiler. Also Java byte code is bloated, but then it's normal, it's Java :)

11 comments:

Stanimir Simeonoff said...

The _devirtualizing_ does happen in the JIT and it's very efficient. It runs CHA (class hierarchy analysis) and final or not doesn't matter. As long as the method is not overridden is statically linked/inline. Dual-morph methods are usually inlined w/ a guard type check.

One of the reasons javac doesn't do anything fancy is the class-loading (and tons of "POJO" tools/frameworks)which is/are free to mangle the code, incl. removal of finals and extending the class, if it sees fit.

I, myself, am very happy java byte code is almost the source and the heavy lifting is performed by the JIT.

Benoit Sigoure said...

It's just that the JIT has to figure out what the runtime type of a particular reference always is in order to devirtualize the calls.

Say I have a class that has a reference to a HashMap. The JIT can't just devirtualize the calls directly, because the instances of my class could in theory have references to various sub-classes of HashMap (if there are subclasses). So first it has to figure out that my class will always hold references to an actual instance of a HashMap, and then it can devirtualize the call. This might be non-trivial.

Regarding the classloaders and frameworks that do rather evil things, if I have a reference with a static type that is a final class, it doesn't matter what the classloader does, it will always be a reference to that static type, so in theory I think the compiler could devirtualize the calls right there.

Stanimir Simeonoff said...

JITs do truly amazing optimization.

The Optimization the virtual calls of HashMap (btw HashMap sucks horribly much, it's just the worst hashing structure there is) is implemented via guarded calls. If the object pointer can keep the class metadata as well (usually possible on 64bit), it's quite cheap mask/check and no extra loads, well predicted branch + trap for the uncommon case. Most important the code can be inlined.

Falling into the trap may lead to deoptimization of the code and use of inline caches.

If you have a public method, make it final at some point and you want it invoked like static (nonvirtual) if may lead to binary incompatibility with previously compiled code that uses invokevirtual.

Here much better explanation on how JVM: http://www.azulsystems.com/blog/cliff/2007-11-06-clone-or-not-clone

and this one: http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-inlining-problem

Benoit Sigoure said...

I was just using HashMap as an example. It might suck but whether or not we like it, it's the most commonly used hashing structure by Java programmers. If you have a good, lean & mean alternative to propose, I'd love to hear about it.

The first article you linked explains exactly the problem I was trying to describe.

The second article is very interesting, thanks for sharing it.

Stanimir Simeonoff said...

The downsides of java.util.HashMap are: bucket based, one extra indirection/cache miss on get, allocation on put, volatile write on put (!!, modCount is volatile for no reason), standard/bad coping w/ collisions via linked list, also unpredictable iterator() that also checks modCount. On x86 volatile reads are sorta free, though.

Open address hash table would outperform HashMap and has significantly lower memory footprint, i.e. an implementation similar to IdentityHashMap.

As Map I used LinkedHashMap since it offers a good and useful ordered iterator, helps debugging too.

It's a just a pity JDK doesn't offer anything similar to IdentityHashMap but w/ real hashCode. I can't think of any major selling point of HashMap besides being a general purpose single threaded hash table, yet nothing it particularly shines at.
----

Anyways, back to final method, the JIT cannot take a benefit of a final method or class. On runtime if JIT doesn't see an overridden method, similar to final, would not use vtable. My point is that it doesn't actually matter what bytecode is generated, JIT is an optimizing compiler, similar to gcc -o2, it also includes a built-in profiler.

----
As for the linked articles, Cliff Click was the lead architect behind HotSpot JVM.

Apology for the HashMap offtopic, it also ticks me x.x

Stanimir Simeonoff said...

prematurely posted, meant to add another link on inline caches by the same author.

Benoit Sigoure said...

I'm not sure why you're talking about LinkedHashMap – in Sun's JDK, it's implemented by subclassing HashMap so it inherits from all its problems (quite literally!).

I agree HashMap is non-ideal, and it sucks that it's what's in the JDK because that's what everyone uses by default. The JDK has certain things like that, that are surprisingly low-quality for a platform that is "industry standard".

Is there a good clean-room open-source implementation that you'd recommend to replace all j.u.HashMap? Yes this is off-topic but interesting :)

I understand your point that the JIT can devirtualize the calls if it doesn't see another overriding method, but I was just surprised that the javac "compiler" didn't even use the invokespecial byte code instruction. I somehow always thought it would. I didn't find much information about this online, so I thought maybe I'd blog my findings :)

Stanimir Simeonoff said...

Sure LHM subclasses the HashMap but the links between the entries is a saving grace, consider it a LinkedList with O(1) lookup. It's useful for dumb LRU caches and so.

I tend to write my/company's datastructures, incl. concurrent ones, when needed. That makes me very bad at open-source libs. Decided to try a fast search and come up with... and impl of Doug Lea. here is the original presentation. Need to check it myself, from first glance (10min) it does look well, though

Benoit Sigoure said...

I don't understand how you can be concerned with the cache inefficiency of HashMap, its volatile write on put, and other things like that, and yet say that LinkedHashMap is OK or usable as a (non-thread-safe) LRU cache, since we agreed that LHM has inherits all of HM's problems.

The only thing I've found to be usable for LRU caches is Guava's LoadingCache, which is based on a modified version of ConcurrentHashMap.

Stanimir Simeonoff said...

That's an interesting question how I put up w/. I guess I've appreciate for the ordering along with the standard hash map and the fact I am unaware of a better algorithm to keep the insertion order.

The LHM is nice when iteration is a relatively often operation and preservation the order is useful.

Doh, my posts totally distorted the topic.

Anonymous said...

Java has a separate compilation model, thus it forbids cross-file optimization (with a notable exception, compile-time constant inlining). What if you change the method to non-final, and do not recompile the clients? What if you do runtime bytecode replacement (search for "instrumentation")?

Side note: as an engineer, your expectations should be a function of the tool. This is not C++. Where you can afford an interpreter, bytecode optimization is premature optimization.

Keep your mind object-oriented. You ask the compiler to do it, let it decide the best behind the scenes; if a 20yo compiler does not do it, it's probably unimportant. -O is well documented (at least on Oracle's JDK) and it just inlines the private and static methods inside the file that declares them.