mbox series

[0/3] btrfs: use btrfs_path::reada to replace the

Message ID 20211213131054.102526-1-wqu@suse.com (mailing list archive)
Headers show
Series btrfs: use btrfs_path::reada to replace the | expand

Message

Qu Wenruo Dec. 13, 2021, 1:10 p.m. UTC
[PROBLEMS]
The metadata readahead code is introduced in 2011 (surprisingly, the
commit message even contains changelog), but now there is only one user
for it, and even for the only one user, the readahead mechanism can't
provide all it needs:

- No support for commit tree readahead
  Only support readahead on current tree.

- Bad layer separation
  To manage on-fly bios, btrfs_reada_add() mechanism internally manages
  a kinda complex zone system, and it's bind to per-device.

  This is against the common layer separation, as metadata should all be
  in btrfs logical address space, should not be bound to device physical
  layer.

  In fact, this is the cause of all recent reada related bugs.

- No caller for asynchronous metadata readahead
  Even btrfs_reada_add() is designed to be fully asynchronous, scrub
  only calls it in a synchronous way (call btrfs_reada_add() and
  immediately call btrfs_reada_wait()).
  Thus rendering a lot of code unnecessary.

[ALTERNATIVE]
On the other hand, we have btrfs_path::reada mechanism, which is already
used by tons of btrfs sub-systems like send.

[MODIFICATION]
This patch will use btrfs_path::reada to replace btrfs_reada_add()
mechanism.

[BENCHMARK]
The conclusion looks like this:

For the worst case (no dirty metadata, slow HDD), there will be around 5%
performance drop for scrub.
For other cases (even SATA SSD), there is no distinguishable performance
difference.

The number is reported scrub speed, in MiB/s.
The resolution is limited by the reported duration, which only has a
resolution of 1 second.

	Old		New		Diff
SSD	455.3		466.332		+2.42%
HDD	103.927 	98.012		-5.69%


[BENCHMARK DETAILS]
Both tests are done in the same host and VM, the VM has one dedicated
SSD and one dedicated HDD attached to it (virtio driver though)

Host:
CPU:	5900X
RAM:	DDR4 3200MT, no ECC

	During the benchmark, there is no other active other than light
	DE operations.

VM:
vCPU:	16x
RAM:	4G

	The VM kernel doesn't have any heavy debug options to screw up
	the benchmark result.

Test drives:
SSD:	500G SATA SSD
	Crucial CT500MX500SSD1
	(With quite some wear, as it's my main test drive for fstests)

HDD:	2T 5400rpm SATA HDD (device-managed SMR)
	WDC WD20SPZX-22UA7T0
	(Very new, invested just for this benchmark)

Filesystem contents:
For filesystems on SSD and HDD, they are generated using the following
fio job file:

  [scrub-populate]
  directory=/mnt/btrfs
  nrfiles=16384
  openfiles=16
  filesize=2k-512k
  readwrite=randwrite
  ioengine=libaio
  fallocate=none
  numjobs=4
  
Then randomly remove 4096 files (1/16th of total files) to create enough
gaps in extent tree.

Finally run scrub on each filesystem 5 times, with cycle mount and
module reload between each run.

Full result can be fetched here:
https://docs.google.com/spreadsheets/d/1cwUAlbKPfp4baKrS92debsCt6Ejqvxr_Ylspj_SDFT0/edit?usp=sharing

[CHANGELOG]
RFC->v1:
- Add benchmark result
- Add extent tree readahead using btrfs_path::reada

Qu Wenruo (3):
  btrfs: remove the unnecessary path parameter for scrub_raid56_parity()
  btrfs: remove reada mechanism
  btrfs: use btrfs_path::reada for scrub extent tree readahead

 fs/btrfs/Makefile      |    2 +-
 fs/btrfs/ctree.h       |   25 -
 fs/btrfs/dev-replace.c |    5 -
 fs/btrfs/disk-io.c     |   20 +-
 fs/btrfs/extent_io.c   |    3 -
 fs/btrfs/reada.c       | 1086 ----------------------------------------
 fs/btrfs/scrub.c       |   65 +--
 fs/btrfs/super.c       |    1 -
 fs/btrfs/volumes.c     |    7 -
 fs/btrfs/volumes.h     |    7 -
 10 files changed, 18 insertions(+), 1203 deletions(-)
 delete mode 100644 fs/btrfs/reada.c

Comments

David Sterba Dec. 14, 2021, 12:41 p.m. UTC | #1
On Mon, Dec 13, 2021 at 09:10:51PM +0800, Qu Wenruo wrote:
> [PROBLEMS]
> The metadata readahead code is introduced in 2011 (surprisingly, the
> commit message even contains changelog), but now there is only one user
> for it, and even for the only one user, the readahead mechanism can't
> provide all it needs:
> 
> - No support for commit tree readahead
>   Only support readahead on current tree.
> 
> - Bad layer separation
>   To manage on-fly bios, btrfs_reada_add() mechanism internally manages
>   a kinda complex zone system, and it's bind to per-device.
> 
>   This is against the common layer separation, as metadata should all be
>   in btrfs logical address space, should not be bound to device physical
>   layer.
> 
>   In fact, this is the cause of all recent reada related bugs.
> 
> - No caller for asynchronous metadata readahead
>   Even btrfs_reada_add() is designed to be fully asynchronous, scrub
>   only calls it in a synchronous way (call btrfs_reada_add() and
>   immediately call btrfs_reada_wait()).
>   Thus rendering a lot of code unnecessary.

I agree with removing the reada.c code, it's overengineered perhaps with
intentions to use it in more places but this hasn't happened and nobody
is interested doing the work. We have the path readahead so it's not
we'd lose readahead capabilities at all.

Thanks for benchmarking it, the drop is acceptable and we know people
are more interested in limiting the load anyway.

> [BENCHMARK]
> The conclusion looks like this:
> 
> For the worst case (no dirty metadata, slow HDD), there will be around 5%
> performance drop for scrub.
> For other cases (even SATA SSD), there is no distinguishable performance
> difference.
> 
> The number is reported scrub speed, in MiB/s.
> The resolution is limited by the reported duration, which only has a
> resolution of 1 second.
> 
> 	Old		New		Diff
> SSD	455.3		466.332		+2.42%
> HDD	103.927 	98.012		-5.69%

I'll copy this information to the last patch changelog.
Qu Wenruo Dec. 14, 2021, 12:45 p.m. UTC | #2
On 2021/12/14 20:41, David Sterba wrote:
> On Mon, Dec 13, 2021 at 09:10:51PM +0800, Qu Wenruo wrote:
>> [PROBLEMS]
>> The metadata readahead code is introduced in 2011 (surprisingly, the
>> commit message even contains changelog), but now there is only one user
>> for it, and even for the only one user, the readahead mechanism can't
>> provide all it needs:
>>
>> - No support for commit tree readahead
>>    Only support readahead on current tree.
>>
>> - Bad layer separation
>>    To manage on-fly bios, btrfs_reada_add() mechanism internally manages
>>    a kinda complex zone system, and it's bind to per-device.
>>
>>    This is against the common layer separation, as metadata should all be
>>    in btrfs logical address space, should not be bound to device physical
>>    layer.
>>
>>    In fact, this is the cause of all recent reada related bugs.
>>
>> - No caller for asynchronous metadata readahead
>>    Even btrfs_reada_add() is designed to be fully asynchronous, scrub
>>    only calls it in a synchronous way (call btrfs_reada_add() and
>>    immediately call btrfs_reada_wait()).
>>    Thus rendering a lot of code unnecessary.
>
> I agree with removing the reada.c code, it's overengineered perhaps with
> intentions to use it in more places but this hasn't happened and nobody
> is interested doing the work. We have the path readahead so it's not
> we'd lose readahead capabilities at all.
>
> Thanks for benchmarking it, the drop is acceptable and we know people
> are more interested in limiting the load anyway.
>
>> [BENCHMARK]
>> The conclusion looks like this:
>>
>> For the worst case (no dirty metadata, slow HDD), there will be around 5%
>> performance drop for scrub.
>> For other cases (even SATA SSD), there is no distinguishable performance
>> difference.
>>
>> The number is reported scrub speed, in MiB/s.
>> The resolution is limited by the reported duration, which only has a
>> resolution of 1 second.
>>
>> 	Old		New		Diff
>> SSD	455.3		466.332		+2.42%
>> HDD	103.927 	98.012		-5.69%
>
> I'll copy this information to the last patch changelog.

Since I found a bug in the first patch, let me have the chance to
re-order the patches, and put this into the patch removing reada.

Thanks,
Qu