Tsuna's blog: java

Showing posts with label java. Show all posts

Saturday, April 14, 2012

How Apache Hadoop is molesting IOException all day

Today I'd like to rant on one thing that's been bugging me for last couple years with Apache Hadoop (and all its derived projects). It's a big issue that concerns us all. We have to admit it, each time we write code for the Apache Hadoop stack, we feel bad about it, but we try hard to ignore what's happening right before our eyes. I'm talking, of course, about the constant abuse and molestation of IOException.

I'm not even going to debate how checked exceptions are like communism (good idea in theory, totally fails in practice). Even if people don't get that, I wish they at least stopped the madness with this poor little IOException.
Let's review again what IOException is for:

"Signals that an I/O exception of some sort has occurred. This class is the general class of exceptions produced by failed or interrupted I/O operations."

In Hadoop everything is an IOException. Everything. Some assertion fails, IOException. A number exceeds the maximum allowed by the config, IOException. Some protocol versions don't match, IOException. Hadoop needs to fart, IOException.
How are you supposed to handle these exceptions? Everything is declared as throws IOException and everything is catching, wrapping, re-throwing, logging, eating, and ignoring IOExceptions. Impossible. No matter what goes wrong, you're left clueless. And it's not like there is a nice exception hierarchy to help you handle them. No, virtually everything is just a bare IOException.
Because of this, it's not uncommon to see code that inspects the message of the exception (a bare String) to try to figure out what's wrong and what to do with it. A friend of mine was recently explaining to me how Apache Kafka was "stringly typed" (a new cutting-edge paradigm whereby you show the middle finger to the type system and stuff everything in Strings). Well Hadoop has invented better than checked exceptions, they have stringed exceptions. Unfortunately, half of the time you can't even leverage this awesome new idiom because the message of the exception itself is useless. For example when a MapReduce chokes on a corrupted file, it will just throw an IOException without telling you the path of the problematic file. This way it's more fun, once you nail it down (with a binary search of course), you feel like you accomplished something. Or you'll get messages like "IOException: Split metadata size exceeded 10000000.". Figuring out what was the actual value is left as an exercise to the reader.
So, seriously Apache folks...

Stop Abusing IOException!

Leave this poor little IOException alone!
Hadoop (0.20.2) currently has a whopping 1300+ lines of code creating bare IOExceptions. HBase (0.92.1) has over 400. Apache committers should consider every single one of these lines as a code smell that needs to be fixed, that's begging to be fixed. Please introduce a new base exception type, and create a sound exception hierarchy.
Updates:

Apr 15: There is now an issue for HBase to fix their abuse of IOException (HBASE-5796).
Will update if someone from Hadoop/HDFS/MapReduce files a similar issue on their side.

Monday, February 6, 2012

Devirtualizing method calls in Java

If you've read code I wrote, chances are you've seen I'm a strong adept of const correctness (WP). Naturally, when I started writing Java code (to my despair), I became equally adept of "final correctness". This is mostly because the keywords const (C/C++) and final (Java/Scala) are truly here to help the compiler help you. Many things aren't supposed to change. References in a given scope are often not made point to another object, various methods aren't supposed to be overridden, most classes aren't designed to be subclassed, etc. In C/C++ const also helps avoid doing unintentional pointer arithmetic. So when something isn't supposed to happen, if you state it explicitly, you allow the compiler to catch and report any violation of this otherwise implicit assumption.

The other aspect of const correctness is that you also help the compiler itself. Often the extra bit of information enables it to produce more efficient code. In Java especially, final plays an important role in thread safety, and when used on Strings as well as built-in types. Here's an example of the latter:

     1 final class concat {
     2   public static void main(final String[] _) {
     3     String a = "a";
     4     String b = "b";
     5     System.out.println(a + b);
     6     final String X = "X";
     7     final String Y = "Y";
     8     System.out.println(X + Y);
     9   }
    10 }

Which gets compiled to:

public static void main(java.lang.String[]);
  Code:
   0: ldc #2; //String a
   2: astore_1
   3: ldc #3; //String b
   5: astore_2
   6: getstatic #4; //Field java/lang/System.out:Ljava/io/PrintStream;
   9: new #5; //class java/lang/StringBuilder
   12: dup
   13: invokespecial #6; //Method java/lang/StringBuilder."":()V
   16: aload_1
   17: invokevirtual #7; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   20: aload_2
   21: invokevirtual #7; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   24: invokevirtual #8; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   27: invokevirtual #9; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   30: getstatic #4; //Field java/lang/System.out:Ljava/io/PrintStream;
   33: ldc #10; //String XY
   35: invokevirtual #9; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   38: return
}

In the original code, lines 3-4-5 are identical to lines 6-7-8 modulo the presence of two final keywords. Yet, lines 3-4-5 get compiled to 14 byte code instructions (lines 0 through 27), whereas 6-7-8 turn into only 3 (lines 30 through 35). I find it kind of amazing that the compiler doesn't even bother optimizing such a simple piece of code, even when used with the -O flag which, most people say, is almost a no-op as of Java 1.3 – at least I checked in OpenJDK6, and it's truly a no-op there, the flag is only accepted for backwards compatibility. OpenJDK6 has a -XO flag instead, but the Sun Java install that comes on Mac OS X doesn't recognize it...

There was another thing that I thought was a side effect of final. I thought any method marked final, or any method in a class marked final would allow the compiler to devirtualize method calls. Well, it turns out that I was wrong. Not only it doesn't do this, but also the JVM considers this compile-time optimization downright illegal! Only the JIT compiler is allowed to do it.

All method calls in Java are compiled to an invokevirtual byte code instruction, except:

Constructors and private method use invokespecial.
Static methods use invokestatic.
Virtual method calls on objects with a static type that is an interface use invokeinterface.

The last one is weird, one might wonder why special-case virtual method calls when the static type is an interface. The reason essentially boils down to the fact that if the static type is not an interface, then we know at compile-time what entry in the vtable to use for that method, and all we have to do at runtime is essentially to read that entry from the vtable. If the static type is an interface, the compiler doesn't even know which entry in the vtable will be used, as this will depend at what point in the class hierarchy the interface will be used.

Anyway, I always imagined that having a final method meant that the compiler would compile all calls to it using invokespecial instead of invokevirtual, to "devirtualize" the method calls since it already knows for sure at compile-time where to transfer execution. Doing this at compile time seems like a trivial optimization, while leaving this up to the JIT is far more complex. But no, the compiler doesn't do this. It's not even legal to do it!

interface iface {
  int foo();
}

class base implements iface {
  public int foo() {
    return (int) System.nanoTime();
  }
}

final class sealed extends base {  // Implies that foo is final
}

final class sealedfinal extends base {
  public final int foo() {  // Redefine it to be sure / help the compiler.
    return super.foo();
  }
}

public final class devirt {
  public static void main(String[] a) {
    int n = 0;
    final iface i = new base();
    n ^= i.foo();              // invokeinterface
    final base b = new base();
    n ^= b.foo();              // invokevirtual
    final sealed s = new sealed();
    n ^= s.foo();              // invokevirtual
    final sealedfinal s = new sealedfinal();
    n ^= s.foo();              // invokevirtual
  }
}

A simple Caliper benchmark also shows that in practice all 4 calls above have exactly the same performance characteristic (see full microbenchmark). This seems to indicate that the JIT compiler is able to devirtualize the method calls in all these cases.

To try to manually devirtualize one of the last two calls, I applied a binary patch (courtesy of xxd) on the .class generated by javac. After doing this, javap correctly shows an invokespecial instruction. To my dismay the JVM then rejects the byte code: Exception in thread "main" java.lang.VerifyError: (class: devirt, method: timeInvokeFinalFinal signature: (I)I) Illegal use of nonvirtual function call

I find the wording of the JLS slightly ambiguous as to whether or not this is truly illegal, but in any case the Sun JVM rejects it, so it can't be used anyway.

The moral of the story is that javac is really only translating Java code into pre-parsed Java code. Nothing interesting happens at all in the "compiler", which should really be called the pre-parser. They don't even bother doing any kind of trivial optimization. Everything is left up to the JIT compiler. Also Java byte code is bloated, but then it's normal, it's Java :)

Thursday, July 28, 2011

VM warning: GC locker is held; pre-dump GC was skipped

If you ever run into this message while using the Sun JVM / OpenJDK:

Java HotSpot(TM) 64-Bit Server VM warning: GC locker is held; pre-dump GC was skipped

then I wouldn't worry too much about it as it seems like it's printed when running a jmap -histo:live while the GC is already running or holding a certain lock in the jVM.

Saturday, May 7, 2011

JVM u24 segfault in clearerr on Jaunty

At StumbleUpon we've been tracking down a weird problem with one of our application servers written in Java. We run Sun's jdk1.6.0_24 on Ubuntu Jaunty (9.04 – yes, these servers are old and due for an upgrade) and this application seems to do something that causes the JVM to segfault:

[6972247.491417] hbase_regionser[32760]: segfault at 8 ip 00007f26cabd608b sp 00007fffb0798270 error 4 in libc-2.9.so[7f26cab66000+168000]
[6972799.682147] hbase_regionser[30904]: segfault at 8 ip 00007f8878fb608b sp 00007fff09b69900 error 4 in libc-2.9.so[7f8878f46000+168000]

What's odd is that the problem always happens on different hosts, and almost always around 6:30 - 6:40 am. Go figure.

Understanding segfault messages from the Linux kernel

Let's try to make sense of the messages shown above, logged by the Linux kernel. Back in Linux v2.6.28, it was logged by do_page_fault, but since then this big function has been refactored into multiple smaller functions, so look for show_signal_msg now.

 791                         printk(
 792                         "%s%s[%d]: segfault at %lx ip %p sp %p error %lx",
 793                         task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
 794                         tsk->comm, task_pid_nr(tsk), address,
 795                         (void *) regs->ip, (void *) regs->sp, error_code);
 796                         print_vma_addr(" in ", regs->ip);

From the above, we see that segfault at 8 means that the code attempted to access the address "8", which is what caused the segfault (because there is no page ever mapped at address 0 (normally)). ip stands for instruction pointer, so the code that triggered the segfault was mapped at the address 0x00007f8878fb608b. sp is stack pointer and isn't very relevant here. error 4 means that this was a read access (4 = PF_USER, which used to be a #define but is now part of enum x86_pf_error_code). The rest of the message tells us that the address of the instruction pointer falls inside the memory region mapped for the code of the libc, and it tells us in square brackets that the libc is mapped at the base address 0x7f8878f46000 and that there's 168000 bytes of code mapped. So that means that we were at 0x00007f8878fb608b - 0x7f8878f46000 = 0x7008b into the libc when the segfault occurred.

So where did the segfault occur exactly?

Since now we know what offset into the libc we were while the segfault happened, we can fire gdb and see what's up with that code:

$ gdb -q /lib/libc.so.6
(no debugging symbols found)
(gdb) x/i 0x7008b
0x7008b <clearerr+27>: cmp    %r8,0x8(%r10)

Interesting... So the JVM is segfaulting in clearerr. We're 27 bytes into this function when the segfault happens. Let's see what the function does up to here:

(gdb) disas clearerr
Dump of assembler code for function clearerr:
0x0000000000070070 <clearerr+0>:        push   %rbx
0x0000000000070071 <clearerr+1>:        mov    (%rdi),%eax
0x0000000000070073 <clearerr+3>:        mov    %rdi,%rbx
0x0000000000070076 <clearerr+6>:        test   %ax,%ax
0x0000000000070079 <clearerr+9>:        js     0x700c7 <clearerr+87>
0x000000000007007b <clearerr+11>:       mov    0x88(%rdi),%r10
0x0000000000070082 <clearerr+18>:       mov    %fs:0x10,%r8
0x000000000007008b <clearerr+27>:       cmp    %r8,0x8(%r10)
0x000000000007008f <clearerr+31>:       je     0x700c0 <clearerr+80>
0x0000000000070091 <clearerr+33>:       xor    %edx,%edx
0x0000000000070093 <clearerr+35>:       mov    $0x1,%esi
0x0000000000070098 <clearerr+40>:       mov    %edx,%eax
0x000000000007009a <clearerr+42>:       cmpl   $0x0,0x300fa7(%rip)        # 0x371048
0x00000000000700a1 <clearerr+49>:       je     0x700ac <clearerr+60>
0x00000000000700a3 <clearerr+51>:       lock cmpxchg %esi,(%r10)
[...]

Reminder: the prototype of the function is void clearerr(FILE *stream); so there's one pointer argument and no return value. The code above starts by saving rbx (because it's the callee's responsibility to save this register), then dereferences the first (and only) argument (passed in rdi) and saves the dereferenced address in eax. Then it copies the pointer passed in argument in rbx. It then tests whether low 16 bits in eax are negative and jumps over some code if it is, because they contain the _flags field of the FILE* passed in argument. At this point it helps to know what a FILE looks like. This structure is opaque so it depends on the libc implementation. In this case, it's the glibc's:

 271 struct _IO_FILE {
 272   int _flags;           /* High-order word is _IO_MAGIC; rest is flags. */
[...]
 310   _IO_lock_t *_lock;
 311 #ifdef _IO_USE_OLD_IO_FILE
 312 };

Then it's looking 0x88 = 136 bytes into the FILE* passed in argument and storing this in r10. If you look at the definition of FILE* and add up the offsets, 136 bytes into the FILE* you'll find the _IO_lock_t *_lock; member of the struct, the mutex that protects this FILE*. Then we're loading address 0x10 from the FS segment in r8. On Linux x86_64, the F segment is used for thread-local data. In this case it's loading a pointer to a structure that corresponds to the local thread. Finally, we're comparing r8 to the value 8 bytes into the value pointed to by r10, and kaboom, we get a segfault. This suggest that r10 is a NULL pointer, meaning that the _lock of the FILE* given in argument is NULL. Now that's weird. I'm not sure how this happened. So the assembly code above is essentially doing:

void clearerr(FILE *stream) {
  if (stream->_flags & 0xFFFF >= 0) {
    struct pthread* self = /* mov %fs:0x10,%r8 -- (can't express this in C, but you can use arch_prctl) */;
    struct lock_t lock = *stream->_lock;
    if (lock.owner != self) {  // We segfault here, when doing lock->owner
      mutex_lock(lock.lock);
      lock.owner = self;
    }
    // ...
  }
  // ...
}

What's odd is that the return value of the JVM is 143 (128+SIGTERM) and not 139 (=128+SIGSEGV). Maybe it's because the JVM is always catching and handling SIGSEGV (they do this to allow the JIT to optimize away some NULL-pointer checks and translate them into NullPointerExceptions, among other things). But even then, normally the JVM will write a file where it complains about the segfault, asks you to file a bug, and dumps all the registers and whatnot... We should see that file somewhere. Yet it's nowhere to be found in the JVM's current working directory or anywhere else I looked.

So this segfault remains a mystery so far. Next step: run the application server with ulimit -c unlimited and analyze a core dump.

Friday, December 10, 2010

Java IO: slowest readLine ever

I have a fairly simple problem: I want to count the number of lines in a file, then seek back to after the first line, and then read the file line by line. Easy heh? Not in Java. Enter the utterly retarded world of the JDK.

So if you're n00b, you'll start with a FileInputStream, but quickly you'll realize that seeking around with it isn't really possible... Indeed, the only way to go back to a previous position in the file is to call reset(), which will take you back to the previous location you marked with mark(int). The argument to mark is "the maximum limit of bytes that can be read before the mark position becomes invalid". OK WTF.

If you dig around some more, you'll see that you should really be using a RandomAccessFile – so much for good OO design. The other seemingly cool thing about RandomAccessFile is that it's got a readLine() method. Unfortunately, this method was implemented by a 1st year CS student who probably dropped out before understanding the basics of systems programming.

Believe it or not, but readLine() reads the file one byte at a time. It does one system call to read per byte. As such, it's 2 orders of magnitude slower than it could be... In fact, you can't really implement a readline function that's much slower than that. facepalm.

PS: This is with Sun's JRE version 1.6.0_22-b04. JDK7/OpenJDK has the same implementation. Apache's Harmony implementation is the same, so Android has the same retarded implementation.

Monday, October 4, 2010

IOException: well-known file is not secure

I implemented a command-line tool to discover and read JMX attributes on Java processes to make it easier to write monitoring scripts for things that, unfortunately, use JMX. The tool uses Sun's Attach API, which is a Sun specific API to attach to a running, local VM, which allows you to load a JMX agent in an existing JVM without requiring that JVM to be started with all the -Djmx flags crap.

It worked like a charm, except when run as root. As root, it would instead fail most of the time (but not always!) with this perfectly intelligible exception:

java.io.IOException: well-known file is not secure
 at sun.tools.attach.LinuxVirtualMachine.checkPermissions(Native Method)
 at sun.tools.attach.LinuxVirtualMachine.(LinuxVirtualMachine.java:93)
 at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:46)
 at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:195)

I Googled that to find it was bug #6649594, but the bug report was completely unhelpful. It mentions a race condition but doesn't explain what the race condition is or how to work around it. Awesome.

So, Google Code Search to the rescue! This exception is thrown by Java_sun_tools_attach_LinuxVirtualMachine_checkPermissions, which is checking that the permissions and mode of a so-called "well-known file" match that of the running JVM. I couldn't trace the code to the location that produces the path, as it's the usual Java-style code with a bazillion layers of abstractions and indirections, but I'm pretty certain this is due to the /tmp/hsperfdata_$USER/$PID files. Sure enough, one of my files was owned by the group root instead of the group of the user (this is because before forking the JVM, I was calling setuid to drop the privileges, but I forgot to call setgid). Fixing the permissions on the file solved the error.

So if you too are scratching your head over this mysterious "well-known file", make sure the permissions on all the /tmp/hsperfdata_$USER/$PID files are consistent.

The jLOL of the day: it's impossible, in Java, to portably find the PID of the current JVM. So when iterating on all the VMs running on the localhost, here's a workaround to detect whether a VirtualMachine instance corresponds to the current JVM. Insert something in the system properties (System.setProperty(..)) and then check whether the VirtualMachine you have at hand has this property. There's a bug filed in — I kid you not — 1999 about this: Bug #4244896 with a whopping 109 votes for it. The bug is in state "Fix Understood", I guess I'm too dumb to understand the fix.

Friday, August 27, 2010

JPMP (Java's Poor Man's Profiler)

I recently came across PMP and found it surprisingly useful to troubleshoot and analyze live production systems, despite it being so ridiculously simple and naive. Unfortunately for me, I've had to deal with quite a bit of Java lately. So here we go, JPMP:

$ cat jpmp
#!/bin/bash
pid=$1
nsamples=$2
sleeptime=$3

for x in $(seq 1 $nsamples)
  do
    jstack $pid
    sleep $sleeptime
  done | \
awk 'BEGIN { s = "" }
/^"/ { if (s) print s; s = "" }
/^ at / { sub(/\([^)]*\)?$/, "", $2); sub(/^java/, "j", $2);
               if (s) s = s "," $2; else s = $2 }
END { if(s) print s }' | \
sort | uniq -c | sort -rnk1

Output:

$ jpmp 28881 5 0
    120 sun.misc.Unsafe.park,j.util.concurrent.locks.LockSupport.park,j.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await,j.util.concurrent.LinkedBlockingQueue.take,j.util.concurrent.ThreadPoolExecutor.getTask,j.util.concurrent.ThreadPoolExecutor$Worker.run,j.lang.Thread.run
     15 sun.nio.ch.EPollArrayWrapper.epollWait,sun.nio.ch.EPollArrayWrapper.poll,sun.nio.ch.EPollSelectorImpl.doSelect,sun.nio.ch.SelectorImpl.lockAndDoSelect,sun.nio.ch.SelectorImpl.select,org.simpleframework.transport.reactor.ActionDistributor.distribute,org.simpleframework.transport.reactor.ActionDistributor.execute,org.simpleframework.transport.reactor.ActionDistributor.run,j.lang.Thread.run
      5 sun.nio.ch.EPollArrayWrapper.epollWait,sun.nio.ch.EPollArrayWrapper.poll,sun.nio.ch.EPollSelectorImpl.doSelect,sun.nio.ch.SelectorImpl.lockAndDoSelect,sun.nio.ch.SelectorImpl.select,org.apache.zookeeper.ClientCnxn$SendThread.run
      5 sun.misc.Unsafe.park,j.util.concurrent.locks.LockSupport.park,j.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await,j.util.concurrent.LinkedBlockingQueue.take,org.apache.zookeeper.ClientCnxn$EventThread.run
      5 sun.misc.Unsafe.park,j.util.concurrent.locks.LockSupport.park,j.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await,j.util.concurrent.DelayQueue.take,org.simpleframework.util.lease.LeaseCleaner.clean,org.simpleframework.util.lease.LeaseCleaner.run,j.lang.Thread.run
      5 j.lang.Thread.sleep,org.simpleframework.util.buffer.FileManager.run,j.lang.Thread.run
      5 j.lang.Object.wait,j.lang.ref.ReferenceQueue.remove,j.lang.ref.ReferenceQueue.remove,j.lang.ref.Finalizer$FinalizerThread.run
      5 j.lang.Object.wait,j.lang.Object.wait,j.lang.ref.Reference$ReferenceHandler.run

Java NIO and overengineering

In an earlier post about some of the most horrible APIs in the JDK (the date related APIs), I mentioned how the Java's IO APIs were equally (if not more) retarded. Today I had the joy to dig in the code of Java NIO try to understand why NIO doesn't have "built-in" proxy support like OIO through -D socksProxyHost and -DsocksProxyPort. I wanted to see if I could find a way to work around this limitation, possibly by doing evil things with reflection.

The problems boils down to the fact that with NIO, you use a SocketChannel, and under the hood, some kind of a Socket must be used. Here the fun starts, because as the javadoc says, "The actual work of the socket is performed by an instance of the SocketImpl class", which is itself an abstract class (it's an abstract Impl, yeah, right...).

So let's see how the code goes from a SocketChannel to a Socket or a SocketImpl:

People typically create a SocketChannel like so: SocketChannel.open()
SocketChannel.open() does return SelectorProvider.provider().openSocketChannel();
SelectorProvider.provider() checks to see if the system property java.nio.channels.spi.SelectorProvider is set and a few other things. Unless you specifically do something manually, none of that stuff will end up finding a provider, so the code will end up returning sun.nio.ch.DefaultSelectorProvider.create();.
Now here things get a little bit blurry. I think we end up in here (on Linux) doing return new sun.nio.ch.EPollSelectorProvider();
So in step 2., when openSocketChannel() is called on the provider, we end up in here, doing return new SocketChannelImpl(this);
SocketChannelImpl then does all the work for NIO sockets, mostly using sun.nio.ch.Net which is chiefly made of native methods.
If you want to view a SocketChannelImpl as a Socket by calling socket(), the SocketChannelImpl wraps itself in a SocketAdaptor, but the adaptor simply transforms all the calls to calls on the underlying SocketChannelImpl which uses the native functions in sun.nio.ch.Net, so there's nothing really you can do to make it use a SOCKS proxy.

So the bottom line of the story is that there's no way to use a SOCKS proxy with NIO unless you implement the SOCKS proxy protocol yourself (and if you do, make sure you use enough abstract factories implementations and other layers of indirections, so that your code will fit well in the Java IO philosophy).

Also, just for the fun of it, I was curious to see how many objects were involved under the hood of a single SocketChannel. The NIO code is so bloated that I was expecting quite a bit of overhead. I used the following not-very-scientific method (which is kinda similar to PMP) to try to get an answer:

$ cat t.java
import java.net.InetSocketAddress;
import java.nio.channels.SocketChannel;

final class t {
  public static void main(String[]a) throws Exception {
    System.out.println("ready");
    Thread.sleep(5000);
    SocketChannel socket = SocketChannel.open();
    InetSocketAddress addr = new InetSocketAddress("www.google.com", 80);
    socket.connect(addr);
    System.out.println("connected");
    Thread.sleep(5000);
  }
}

then I run this code and use jmap -histo:live right after seeing the "ready" message and right after seeing the "connected" message. In between both of those messages, 2933 objects occupying 371744 bytes of RAM (on my MBP with the JDK 1.6.0_20-b02-279-10M3065 and HotSpot JVM 16.3-b01-279, both shipped by Apple with OS X 10.6.4). I don't know if I'm supposed to laugh or cry.

Class name                         Diff:       # instances Memory used (bytes)
<methodKlass>                                         +615 +74264
<constMethodKlass>                                    +615 +72416
<constantPoolKlass>                                    +65 +47288
<instanceKlassKlass>                                   +65 +46264
<symbolKlass>                                         +705 +33312
<constantPoolCacheKlass>                               +67 +32024
java.lang.Class                                        +69 +12696
[C                                                     +85 +10664
[[I                                                   +121  +8760
[I                                                     +73  +5928
[S                                                     +96  +5232
[B                                                     +58  +4600
java.lang.String                                       +94  +3760
java.util.LinkedList$Entry                             +65  +2600
<objArrayKlassKlass>                                    +4  +2336
<methodDataKlass>                                       +4  +1800
java.net.URL                                           +15  +1560
java.util.HashMap$Entry                                +29  +1392
[Ljava.util.HashMap$Entry;                              +8  +1344
java.util.LinkedHashMap$Entry                          +15   +960
java.util.LinkedList                                   +23   +920
sun.misc.URLClassPath$JarLoader                         +6   +432
java.util.HashMap                                       +6   +384
java.io.ExpiringCache$Entry                            +11   +352
java.util.jar.JarFile$JarFileEntry                      +3   +336
java.lang.reflect.Constructor                           +2   +240
[Ljava.lang.Object;                                     +2   +208
java.net.Inet4Address                                   +5   +200
sun.nio.ch.SocketChannelImpl                            +1   +176
java.util.LinkedHashMap                                 +2   +160
java.lang.Object                                        +8   +128
java.lang.ThreadLocal                                   +5   +120
java.lang.ClassLoader$NativeLibrary                     +2    +96
[Ljava.net.InetAddress;                                 +2    +88
java.util.ArrayList                                     +2    +80
sun.misc.JarIndex                                       +2    +80
[Ljava.lang.String;                                     +2    +64
java.net.InetAddress$CacheEntry                         +2    +64
java.net.InetAddress$Cache                              +2    +64
java.net.InetAddress$Cache$Type                         +2    +64
sun.misc.URLClassPath                                   +1    +56
java.lang.ref.SoftReference                             +1    +56
[Ljava.lang.ThreadLocal;                                +1    +48
java.util.Stack                                         +1    +40
sun.reflect.NativeConstructorAccessorImpl               +1    +40
java.net.InetSocketAddress                              +1    +40
[Ljava.net.InetAddress$Cache$Type;                      +1    +40
[Ljava.lang.reflect.Constructor;                        +1    +32
java.net.Inet6AddressImpl                               +1    +32
sun.reflect.DelegatingConstructorAccessorImpl           +1    +24
java.util.HashMap$KeySet                                +1    +24
java.nio.channels.spi.AbstractInterruptibleChannel$1    +1    +24
[Ljava.lang.Class;                                      +1    +24
java.net.InetAddress$1                                  +1    +16
sun.net.www.protocol.jar.Handler                        +1    +16
sun.nio.ch.KQueueSelectorProvider                       +1    +16
sun.nio.ch.SocketDispatcher                             +1    +16
java.io.FileDescriptor                                  -1    -24
java.util.Vector                                        -1    -40
java.io.FileInputStream                                 -2    -64
java.util.jar.JarFile                                   -1    -80
java.util.zip.ZStreamRef                                -4    -96
java.util.zip.Inflater                                  -4   -192
java.util.zip.ZipFile$ZipFileInputStream               -12   -672
java.lang.ref.Finalizer                                -17  -1088

Er.. OK I guess that wasn't entirely fair, since 2286 objects (taking 328928 bytes of RAM) have been created by the VM for classes that have been loaded and compiled. So if we discount those, we're still left with 647 objects taking 42816 bytes of RAM... Ahem... Just for a little SocketChannel.

PS: I'm half-ashamed but to produce the output above, I used what could possibly be the most horrible "one-liner" I ever wrote in Python. I put the output of jmap in /tmp/a for the first run and /tmp/b for the second run:

python -c 'load = lambda path: dict((klass, (int(instances), int(bytes))) for (num, instances, bytes, klass) in (line.split() for line in open(path) if line[4] == ":")); a = load("/tmp/a"); b = load("/tmp/b"); print "\n".join("%s\033[55G%+4d\t%+6d" % (klass, diffinst, diffbytes) for (klass, diffinst, diffbytes) in sorted(((klass, b[klass][0] - a.get(klass, (0, 0))[0], b[klass][1] - a.get(klass, (0, 0))[1]) for klass in b), key=lambda i: i[2], reverse=True) if diffbytes != 0).replace(">", "&lt;").replace(">", "&gt;")'

Tuesday, June 22, 2010

SimpleDateFormat and "AssertionError: cache control: inconsictency"

Oh boy, the JDK was really poorly """designed""". Today's fail is:

java.lang.AssertionError: cache control: inconsictency, cachedFixedDate=733763, computed=733763, date=2009-12-22T00:00:00.000Z
        at java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2070)
        at java.util.GregorianCalendar.computeTime(GregorianCalendar.java:2472)
        at java.util.Calendar.updateTime(Calendar.java:2468)
        at java.util.Calendar.getTimeInMillis(Calendar.java:1087)
        at java.util.Calendar.getTime(Calendar.java:1060)
        at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1368)
        at java.text.DateFormat.parse(DateFormat.java:335)

(Courtesy of -enablesystemassertions). Because, yeah, SimpleDateFormat is not thread-safe.

Date formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.

Like many things in Java, this is just retarded. If SimpleDateFormat was written in a more functional way instead of storing the result of the last call to parse in itself, this problem wouldn't exist.
So yeah, I'm probably a JDK n00b and did not RTFM properly, but this just adds up to the already overflowing list of crap in the JDK.

In the next episode we'll review the OMGWTFBBQ code you have to write to use Java's crappy IO library – and I'm not even talking about the fact that the library exclusively allows 1975-style programming... which is generally the case for everything written in Java anyway.

Also, it's spelled "inconsistency" not "inconsictency", dumbass.

The moral of the story is that if grep 'static .* SimpleDateFormat' on your code returns anything, you have a bug!

Edit 2010-09-27: This can also manifest itself if parsing a date fails randomly with weird error messages such as: ParseException: For input string: "" (even though the string passed in argument was definitely not empty). With such concurrency bugs, a number of weird things can happen...

Tsuna's blog