diff mbox series

[v14,32/42] btrfs: avoid async metadata checksum on ZONED mode

Message ID 13728adcc4f433c928b00be73ea5466f62ccb4b9.1611627788.git.naohiro.aota@wdc.com (mailing list archive)
State New
Headers show
Series btrfs: zoned block device support | expand

Commit Message

Naohiro Aota Jan. 26, 2021, 2:25 a.m. UTC
In ZONED, btrfs uses per-FS zoned_meta_io_lock to serialize the metadata
write IOs.

Even with these serialization, write bios sent from btree_write_cache_pages
can be reordered by async checksum workers as these workers are per CPU and
not per zone.

To preserve write BIO ordering, we can disable async metadata checksum on
ZONED.  This does not result in lower performance with HDDs as a single CPU
core is fast enough to do checksum for a single zone write stream with the
maximum possible bandwidth of the device. If multiple zones are being
written simultaneously, HDD seek overhead lowers the achievable maximum
bandwidth, resulting again in a per zone checksum serialization not
affecting performance.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/disk-io.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

David Sterba Feb. 2, 2021, 2:54 p.m. UTC | #1
On Tue, Jan 26, 2021 at 11:25:10AM +0900, Naohiro Aota wrote:
> In ZONED, btrfs uses per-FS zoned_meta_io_lock to serialize the metadata
> write IOs.
> 
> Even with these serialization, write bios sent from btree_write_cache_pages
> can be reordered by async checksum workers as these workers are per CPU and
> not per zone.
> 
> To preserve write BIO ordering, we can disable async metadata checksum on
> ZONED.  This does not result in lower performance with HDDs as a single CPU
> core is fast enough to do checksum for a single zone write stream with the
> maximum possible bandwidth of the device. If multiple zones are being
> written simultaneously, HDD seek overhead lowers the achievable maximum
> bandwidth, resulting again in a per zone checksum serialization not
> affecting performance.
> 
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  fs/btrfs/disk-io.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index a41bdf9312d6..5d14100ecf72 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -814,6 +814,8 @@ static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio,
>  static int check_async_write(struct btrfs_fs_info *fs_info,
>  			     struct btrfs_inode *bi)
>  {
> +	if (btrfs_is_zoned(fs_info))
> +		return 0;

This check need to be after the other ones as zoned is a static per-fs
status, while other others depend on either current state or system
state (crypto implementation).

>  	if (atomic_read(&bi->sync_writers))
>  		return 0;
>  	if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags))
> -- 
> 2.27.0
Johannes Thumshirn Feb. 2, 2021, 4:50 p.m. UTC | #2
On 02/02/2021 15:58, David Sterba wrote:
>> static int check_async_write(struct btrfs_fs_info *fs_info,
>>  			     struct btrfs_inode *bi)
>>  {
>> +	if (btrfs_is_zoned(fs_info))
>> +		return 0;
> This check need to be after the other ones as zoned is a static per-fs
> status, while other others depend on either current state or system
> state (crypto implementation).
> 
>>  	if (atomic_read(&bi->sync_writers))
>>  		return 0;
>>  	if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags))

Can you explain the reasoning behind that rule? For a non-zoned FS this won't
make a huge difference to check fs_info->zoned and for a  zoned FS we're 
bailing out fast as we can't support async checksums.
David Sterba Feb. 2, 2021, 7:28 p.m. UTC | #3
On Tue, Feb 02, 2021 at 04:50:26PM +0000, Johannes Thumshirn wrote:
> On 02/02/2021 15:58, David Sterba wrote:
> >> static int check_async_write(struct btrfs_fs_info *fs_info,
> >>  			     struct btrfs_inode *bi)
> >>  {
> >> +	if (btrfs_is_zoned(fs_info))
> >> +		return 0;
> > This check need to be after the other ones as zoned is a static per-fs
> > status, while other others depend on either current state or system
> > state (crypto implementation).
> > 
> >>  	if (atomic_read(&bi->sync_writers))
> >>  		return 0;
> >>  	if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags))
> 
> Can you explain the reasoning behind that rule? For a non-zoned FS this won't
> make a huge difference to check fs_info->zoned and for a  zoned FS we're 
> bailing out fast as we can't support async checksums.

On first sight it looked like a special case for zoned while it's not
the major usecase but the test is cheap and fast, it's ok to keep it
first.
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a41bdf9312d6..5d14100ecf72 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -814,6 +814,8 @@  static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio,
 static int check_async_write(struct btrfs_fs_info *fs_info,
 			     struct btrfs_inode *bi)
 {
+	if (btrfs_is_zoned(fs_info))
+		return 0;
 	if (atomic_read(&bi->sync_writers))
 		return 0;
 	if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags))