[JDSS] Resilvering speed and resilvering time explained
Additional information:
- product name: JovianDSS
- product version: all
- build: all
Subject:
Resilvering speed and resilvering time explained
Contents:
What is resilvering and why it can be very slow for ZFS-based pools?
When a device is replaced, a resilvering operation is initiated to move the data from the good copies to the new device. This action is a form of disk scrubbing. Therefore, only one such action can occur at a given time in the pool. If a scrubbing operation is in progress, a resilvering operation suspends the current scrubbing and restarts it after the resilvering is completed. RAIDz resilvering is quite slow in ZFS-based pools for the following reasons:
1. Resilvering process starts with every transaction that's ever happened in the pool and plays them back one-by-one to the new drive. This is very IO-intensive and requires long periods of time, especially if you're using hard drives larger than 1TB. From a certain point of view, one might think that RAIDz's only legitimate use case is with all-SSD pools for faster rebuilds.
2. RAID-Z2 is slower than a RAID-Z1, however, it offers a very good balance between capacity and redundancy. WIder RAID-Z2 VDEV's are more space-efficient, but it is also clear that resilver operations take longer by a factor of 2x due to double the parity. Because RAID-Z2 can tolerate the loss of two drives longer resilver times are a reasonable tradeoff.
3. The other reason could be the type of files, many smaller files can take longer than a few larger files.
Resilvering speed and resilvering time of traditional RAID vs ZFS RAIDz.
After a drive goes bad and is replaced, the data from the bad drive needs to be regenerated onto the new drive. This process is typically called rebuild, but ZFS calls it resilvering. There are two significant metrics:
1. Rebuild speed, measured in megabytes per second.
2. Rebuild time, amount of time required to rebuild all the missing data.
and two significant considerations:
1. Rebuild speed for traditional RAID is much faster, however, traditional RAID has to rebuild both used and free blocks.
2. Rebuild speed for ZFS RAIDz is slower, however, RAIDz only needs to rebuild blocks that do hold data.RAIDz does not rebuild empty blocks, thus completing rebuilds faster when the pool has significant free space on it.
In a traditional RAID, where all blocks are regular, you take block 0 from each of the old drives, compute correct data for block 0 on the missing drive and write the data onto a new device. This process is then repeated for all blocks, even for the blocks that hold no data. This is because the traditional RAID does not know which blocks on the RAID are in use and which are not. If the array is otherwise idle, serving no user requests during the rebuild, the process is done sequentially from start to end, which is the fastest way to access rotational hard drives.
ZFS uses variable-sized blocks. Therefore, for each recordsize wort of data, which can be anywhere from 4KB to 1MB, ZFS needs to consult the block pointer tree to see how data is laid out on disks. Because block pointer trees are often fragmented and files are often fragmented, there is quite a lot of head movement involved. Rotational hard drives perform much slower with a lot of head movement, so megabyte per second speed of the rebuild is slower than that of a traditional RAID. Now, ZGS only rebuilds the part of the array which is in use and it does not rebuild free space. Therefore, on lightly used pools it may actually complete faster than a traditional RAID. However, this advantage disappears as the pool fills up.