For a lot of people, the conventional wisdom is that XFS outperforms ext4. I'm not sure whether this is just because XFS used to be a lot faster than ext2 or ext3 or what. I don't have anything against XFS, and actually I would like to see it outperform ext4, unfortunately my benchmarks show otherwise. I'm wondering whether I'm doing something wrong.
In the benchmark below, the same machine and same HDDs were tested with 2 different RAID controllers. In most tests, ext4 has better results than XFS. In some tests, the difference is as much as 2x. Here are the details of the config:
Both RAID controllers are equipped with 512MB of RAM and are in their respective default factory config, except that WriteBack mode was enabled on the LSI because it's disabled by default (!). One other notable difference between the default configurations is that the Adaptec uses a strip size of 256k whereas the LSI uses 64k – this was left unchanged. Both arrays were created as RAID10 (6 pairs of 2 disks, so no spares). One controller was tested at a time, in the same machine and with the same disks. The OS (Linux 2.6.32) was on a separate RAID1 of 2 drives. The IO scheduler in use was "deadline".
SysBench was using
O_DIRECT
on 64 files, for a total of 100GB of data.
Some observations:
- Formatting XFS with the optimal values for
sunit
and swidth
doesn't lead to much better performance. The gain is about 2%, except for sequential writes where it actually makes things worse. Yes, there was no partition table, the whole array was formatted directly as one single big filesystem.
- Creating more allocation groups in XFS than physical threads doesn't lead to better performance.
- XFS has much better random write throughput at low concurrency levels, but quickly degrades to the same performance level as ext4 with more than 8 threads.
- ext4 has consistently better random read/write throughput and latency, even at high concurrency levels.
- Similarly, for random reads ext4 also has much better throughput and latency.
- By default XFS creates too few allocation groups, which artificially limits its performance at high concurrency levels. It's important to create as many AGs as hardware threads. ext4, on the other hand, doesn't really need any tuning as it performs well out of the box.
See the
benchmark results in full screen or look at the
raw outputs of SysBench.