So if you're n00b, you'll start with a
FileInputStream
, but quickly you'll realize that seeking around with it isn't really possible... Indeed, the only way to go back to a previous position in the file is to call reset()
, which will take you back to the previous location you marked with mark(int)
. The argument to mark
is "the maximum limit of bytes that can be read before the mark position becomes invalid". OK WTF.If you dig around some more, you'll see that you should really be using a
RandomAccessFile
– so much for good OO design. The other seemingly cool thing about RandomAccessFile
is that it's got a readLine()
method. Unfortunately, this method was implemented by a 1st year CS student who probably dropped out before understanding the basics of systems programming.Believe it or not, but
readLine()
reads the file one byte at a time. It does one system call to read
per byte. As such, it's 2 orders of magnitude slower than it could be... In fact, you can't really implement a readline function that's much slower than that. facepalm.PS: This is with Sun's JRE version 1.6.0_22-b04. JDK7/OpenJDK has the same implementation. Apache's Harmony implementation is the same, so Android has the same retarded implementation.
Wow, thats very interesting (and sort of dumb). How expensive is to reopen the file if its huge file, or cache the contents of small files to avoid the "re-seeking start of file" issue.
ReplyDeleteNice history !
ReplyDeleteHow did you dive that deep into readLine() implementation ? Strace ?
What about readFully() ? Did it also fill the byte array... byte after byte ?
@royans: with a RandomAccessFile you can seek around freely, no need to reopen the file.
ReplyDelete@Adrien: I always read the code of library functions before calling them from my code. I know not to trust the JDK especially, because there are a lot of dumb things in it. I spend a lot of time reading the code of the JDK on Google Code Search, but in this case I also double-checked that the JRE I was using had the problem with strace indeed. A number of packages in the JDK are very poorly implemented. Some packages are pretty awesome however, such as everything that Doug Lea wrote (no surprise there) for java.util.concurrent.
I'm still new to programming, but wouldn't it be better to use
ReplyDeleteString s = "";
ArrayList a = new ArrayList();
while ((s=in.readline())!=null) a.add(s);
to get the entire file which you intend to read line-by-line anyway. Then you'd have a.size() as the number of lines in the file, and could use a for loop to access them quickly for whatever processing needs done. I don't know much about speed or memory usage yet, so please don't yell at me if I said something stupid. T.T
@Anonymous: you could do this if you know the file isn't too large and can fit in memory – but still, you'd have to implement readLine yourself because the implementation in the JDK is unacceptable. In my case though, there's another reason why I require two passes through the file, the first pass pre-computes some data that the second pass needs.
ReplyDeleteAnd what about using java BufferReader, or read(byte[] b, int off, int len) if you need to use a RandomAccessFile ?
ReplyDeleteIf I'm not wrong, javadocs tell us that it'll be read byte per byte, and might be inefficient...
@Anonymous: the javadoc of RandomAccessFile doesn't mention anything about reading byte by byte, it just says "This method successively reads bytes from the file". Either way, there's no reason whatsoever to have such an utterly stupid implementation of readLine. BufferedReader doesn't have a seek method. So you can either seek around or have a non-stupid readLine, but not both. Yay.
ReplyDeleteyou posted this on my birthday!
ReplyDeleteYO BEN! you in sf now? come visit!
For any speedy file reading on Android you need to build some class that uses an InputStream like DataInputStream then use chunked buffers and parse.
ReplyDeleteBuffering the whole file obv gives you random access - into the buffer.
If you have a size problem (most of us do! ;) ) then
Performant random access and seeking makes things more complicated.
If you can help it, always use mapped files with DirectBuffers. RandomAccessFile is retarded in reading all primitives, incl. readInt(), readLong(). Full JNI switch for each bytes, magic!!
ReplyDeleteOf course mapping comes w/ a price of being unable ( to unmap (baring hacks)
In case anyone was looking for a good solution using just the standard JDK...
ReplyDeleteBufferedReader bufferedReader = new BufferedReader(new FileReader(file));
String line;
while ((line = bufferedReader.readLine()) != null)
{
// process line
}
bufferedReader.close();