Friday, December 10, 2010

Java IO: slowest readLine ever

I have a fairly simple problem: I want to count the number of lines in a file, then seek back to after the first line, and then read the file line by line. Easy heh? Not in Java. Enter the utterly retarded world of the JDK.

So if you're n00b, you'll start with a FileInputStream, but quickly you'll realize that seeking around with it isn't really possible... Indeed, the only way to go back to a previous position in the file is to call reset(), which will take you back to the previous location you marked with mark(int). The argument to mark is "the maximum limit of bytes that can be read before the mark position becomes invalid". OK WTF.

If you dig around some more, you'll see that you should really be using a RandomAccessFile – so much for good OO design. The other seemingly cool thing about RandomAccessFile is that it's got a readLine() method. Unfortunately, this method was implemented by a 1st year CS student who probably dropped out before understanding the basics of systems programming.

Believe it or not, but readLine() reads the file one byte at a time. It does one system call to read per byte. As such, it's 2 orders of magnitude slower than it could be... In fact, you can't really implement a readline function that's much slower than that. facepalm.

PS: This is with Sun's JRE version 1.6.0_22-b04. JDK7/OpenJDK has the same implementation. Apache's Harmony implementation is the same, so Android has the same retarded implementation.

Thursday, December 9, 2010

OpenTSDB at Strata'11

I will be speaking about OpenTSDB at the Strata conference, Wednesday, February 02, 2011, in Santa Clara, CA. You can sign up with this promo code and get a 25% discount: str11fsd.
Strata is a new conference about large scale systems put together by O'Reilly.
Strata 2011