Tuesday, September 13, 2011

ext4 2x faster than XFS?

For a lot of people, the conventional wisdom is that XFS outperforms ext4. I'm not sure whether this is just because XFS used to be a lot faster than ext2 or ext3 or what. I don't have anything against XFS, and actually I would like to see it outperform ext4, unfortunately my benchmarks show otherwise. I'm wondering whether I'm doing something wrong.

In the benchmark below, the same machine and same HDDs were tested with 2 different RAID controllers. In most tests, ext4 has better results than XFS. In some tests, the difference is as much as 2x. Here are the details of the config:

Both RAID controllers are equipped with 512MB of RAM and are in their respective default factory config, except that WriteBack mode was enabled on the LSI because it's disabled by default (!). One other notable difference between the default configurations is that the Adaptec uses a strip size of 256k whereas the LSI uses 64k – this was left unchanged. Both arrays were created as RAID10 (6 pairs of 2 disks, so no spares). One controller was tested at a time, in the same machine and with the same disks. The OS (Linux 2.6.32) was on a separate RAID1 of 2 drives. The IO scheduler in use was "deadline". SysBench was using O_DIRECT on 64 files, for a total of 100GB of data.

Some observations:

  • Formatting XFS with the optimal values for sunit and swidth doesn't lead to much better performance. The gain is about 2%, except for sequential writes where it actually makes things worse. Yes, there was no partition table, the whole array was formatted directly as one single big filesystem.
  • Creating more allocation groups in XFS than physical threads doesn't lead to better performance.
  • XFS has much better random write throughput at low concurrency levels, but quickly degrades to the same performance level as ext4 with more than 8 threads.
  • ext4 has consistently better random read/write throughput and latency, even at high concurrency levels.
  • Similarly, for random reads ext4 also has much better throughput and latency.
  • By default XFS creates too few allocation groups, which artificially limits its performance at high concurrency levels. It's important to create as many AGs as hardware threads. ext4, on the other hand, doesn't really need any tuning as it performs well out of the box.

See the benchmark results in full screen or look at the raw outputs of SysBench.

14 comments:

  1. logbsize and logbufs

    ReplyDelete
  2. That's kinda terse... What do you mean more precisely?

    ReplyDelete
  3. I assume he means try those mount options for tuning.

    Also I would try a newer kernel, lots of changes in XFS and EXT4 since 2.6.32. http://sandeen.net/wordpress/?p=532

    ReplyDelete
  4. I'm curious - which "2.6.32" did you test? Sounds like a good chance it's a distro kernel. :)

    ReplyDelete
  5. @Eric yes it's the kernel that comes with Ubuntu Server 10.04 LTS (2.6.32-21-server).

    So how do you guys recommend logbsize and logbufs to be tuned? What's the rule? I couldn't find anything on Google other than people using logbufs=8,logbsize=256k with no explanation whatsoever.

    ReplyDelete
  6. In absence of indications to the contrary, we recommend the defaults

    I don't know how hard it'd be to test something upstream for comparison, but it might be interesting. A lot has changed in both filesystems since those days, and to be honest I have no idea what filesystem code is in that Ubuntu kernel.

    ReplyDelete
  7. In my experience with many many kernels in around 2.6.32 and earlier, XFS has worse performance than ext4 when both filesystems are used with defaults.

    I'm also puzzled by the small performance gain when specifying sunit/swidth for my RAID array. I expected it would make a bigger difference. I'm guessing that something is causing misaligned accesses, but I can't see why, since I don't use a partition table (which seems to be a common cause of alignment issues).

    Unfortunately the FAQ – which I read already – isn't very precise. It's not easy to tune something when there's no rule of thumb to orient you in the search space of tuning parameters. I can't do a brute-force search because each benchmark of each individual combination of parameter changes takes too long to run.

    ReplyDelete
  8. Can you post the actual sysbench command that you used so I can attempt to verify your results on my local installation?

    ReplyDelete
  9. The sysbench command along with the raw sysbench outputs are all available under the subdirectories of http://tsunanet.net/~tsuna/benchmarks/ext4-xfs-raid10/

    ReplyDelete
  10. Not sure how I missed that one. Thanks!

    ReplyDelete
  11. I was able to squeeze a bit more performance out of XFS by disabling Hyperthreading (It doesn't look like that would apply to you) and by using the noop scheduler instead of deadline. Only anecdotal at this point, but I thought I would pass it on. I'm also running Lucid.

    ReplyDelete
  12. The "delaylog" mount option for XFS in the later kernels increases small file proformance and is the default mount option in kernel 3.0 or higher.

    ReplyDelete
  13. I recently experienced the same as I did a debootstrap on a simple USB stick with 1 GB. The debootstrap took with xfs about 10 minuted and with ext4 about 3 or 4 minutes. After then I did some testing with my workstation and notebook and in any case ext4 was much faster. I used a 2.6.32 debian squeeze kernel.
    I was suprised by this and changed to ext4 everywhere, although I'm a big fan of XFS.

    ReplyDelete
  14. You may also want to play with benchmarking the filesystem performance impact to mysql workload using the mysqlslap command that comes with it, i.e. to simulate writing 10 million rows from 100 clients in parallel (10,000 rows from each) and then 500 clients you could do something like:
    mysqlslap --concurrency=100,500 --iterations=2 --number-int-cols=1 --auto-generate-sql-add-autoincrement --number-char-cols=3 --auto-generate-sql --csv=/tmp/mysqlslap_10m_rows_insert_my5.1lvm.csv --engine=innodb --auto-generate-sql-load-type=write --number-of-queries=10000000

    ReplyDelete