Saturday, May 7, 2011

JVM u24 segfault in clearerr on Jaunty

At StumbleUpon we've been tracking down a weird problem with one of our application servers written in Java. We run Sun's jdk1.6.0_24 on Ubuntu Jaunty (9.04 – yes, these servers are old and due for an upgrade) and this application seems to do something that causes the JVM to segfault:
[6972247.491417] hbase_regionser[32760]: segfault at 8 ip 00007f26cabd608b sp 00007fffb0798270 error 4 in libc-2.9.so[7f26cab66000+168000]
[6972799.682147] hbase_regionser[30904]: segfault at 8 ip 00007f8878fb608b sp 00007fff09b69900 error 4 in libc-2.9.so[7f8878f46000+168000]
What's odd is that the problem always happens on different hosts, and almost always around 6:30 - 6:40 am. Go figure.

Understanding segfault messages from the Linux kernel

Let's try to make sense of the messages shown above, logged by the Linux kernel. Back in Linux v2.6.28, it was logged by do_page_fault, but since then this big function has been refactored into multiple smaller functions, so look for show_signal_msg now.
 791                         printk(
792 "%s%s[%d]: segfault at %lx ip %p sp %p error %lx",
793 task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
794 tsk->comm, task_pid_nr(tsk), address,
795 (void *) regs->ip, (void *) regs->sp, error_code);
796 print_vma_addr(" in ", regs->ip);
From the above, we see that segfault at 8 means that the code attempted to access the address "8", which is what caused the segfault (because there is no page ever mapped at address 0 (normally)). ip stands for instruction pointer, so the code that triggered the segfault was mapped at the address 0x00007f8878fb608b. sp is stack pointer and isn't very relevant here. error 4 means that this was a read access (4 = PF_USER, which used to be a #define but is now part of enum x86_pf_error_code). The rest of the message tells us that the address of the instruction pointer falls inside the memory region mapped for the code of the libc, and it tells us in square brackets that the libc is mapped at the base address 0x7f8878f46000 and that there's 168000 bytes of code mapped. So that means that we were at 0x00007f8878fb608b - 0x7f8878f46000 = 0x7008b into the libc when the segfault occurred.

So where did the segfault occur exactly?

Since now we know what offset into the libc we were while the segfault happened, we can fire gdb and see what's up with that code:
$ gdb -q /lib/libc.so.6
(no debugging symbols found)
(gdb) x/i 0x7008b
0x7008b <clearerr+27>: cmp %r8,0x8(%r10)
Interesting... So the JVM is segfaulting in clearerr. We're 27 bytes into this function when the segfault happens. Let's see what the function does up to here:
(gdb) disas clearerr
Dump of assembler code for function clearerr:
0x0000000000070070 <clearerr+0>: push %rbx
0x0000000000070071 <clearerr+1>: mov (%rdi),%eax
0x0000000000070073 <clearerr+3>: mov %rdi,%rbx
0x0000000000070076 <clearerr+6>: test %ax,%ax
0x0000000000070079 <clearerr+9>: js 0x700c7 <clearerr+87>
0x000000000007007b <clearerr+11>: mov 0x88(%rdi),%r10
0x0000000000070082 <clearerr+18>: mov %fs:0x10,%r8
0x000000000007008b <clearerr+27>: cmp %r8,0x8(%r10)
0x000000000007008f <clearerr+31>: je 0x700c0 <clearerr+80>
0x0000000000070091 <clearerr+33>: xor %edx,%edx
0x0000000000070093 <clearerr+35>: mov $0x1,%esi
0x0000000000070098 <clearerr+40>: mov %edx,%eax
0x000000000007009a <clearerr+42>: cmpl $0x0,0x300fa7(%rip) # 0x371048
0x00000000000700a1 <clearerr+49>: je 0x700ac <clearerr+60>
0x00000000000700a3 <clearerr+51>: lock cmpxchg %esi,(%r10)
[...]
Reminder: the prototype of the function is void clearerr(FILE *stream); so there's one pointer argument and no return value. The code above starts by saving rbx (because it's the callee's responsibility to save this register), then dereferences the first (and only) argument (passed in rdi) and saves the dereferenced address in eax. Then it copies the pointer passed in argument in rbx. It then tests whether low 16 bits in eax are negative and jumps over some code if it is, because they contain the _flags field of the FILE* passed in argument. At this point it helps to know what a FILE looks like. This structure is opaque so it depends on the libc implementation. In this case, it's the glibc's:
 271 struct _IO_FILE {
272 int _flags; /* High-order word is _IO_MAGIC; rest is flags. */
[...]
310 _IO_lock_t *_lock;
311 #ifdef _IO_USE_OLD_IO_FILE
312 };
Then it's looking 0x88 = 136 bytes into the FILE* passed in argument and storing this in r10. If you look at the definition of FILE* and add up the offsets, 136 bytes into the FILE* you'll find the _IO_lock_t *_lock; member of the struct, the mutex that protects this FILE*. Then we're loading address 0x10 from the FS segment in r8. On Linux x86_64, the F segment is used for thread-local data. In this case it's loading a pointer to a structure that corresponds to the local thread. Finally, we're comparing r8 to the value 8 bytes into the value pointed to by r10, and kaboom, we get a segfault. This suggest that r10 is a NULL pointer, meaning that the _lock of the FILE* given in argument is NULL. Now that's weird. I'm not sure how this happened. So the assembly code above is essentially doing:
void clearerr(FILE *stream) {
if (stream->_flags & 0xFFFF >= 0) {
struct pthread* self = /* mov %fs:0x10,%r8 -- (can't express this in C, but you can use arch_prctl) */;
struct lock_t lock = *stream->_lock;
if (lock.owner != self) { // We segfault here, when doing lock->owner
mutex_lock(lock.lock);
lock.owner = self;
}
// ...
}
// ...
}
What's odd is that the return value of the JVM is 143 (128+SIGTERM) and not 139 (=128+SIGSEGV). Maybe it's because the JVM is always catching and handling SIGSEGV (they do this to allow the JIT to optimize away some NULL-pointer checks and translate them into NullPointerExceptions, among other things). But even then, normally the JVM will write a file where it complains about the segfault, asks you to file a bug, and dumps all the registers and whatnot... We should see that file somewhere. Yet it's nowhere to be found in the JVM's current working directory or anywhere else I looked.

So this segfault remains a mystery so far. Next step: run the application server with ulimit -c unlimited and analyze a core dump.