btrfs: properly record errors for super block writeback

Message ID	6726644169a9f4affbb6894a8a560c96072be9cf.1654841347.git.wqu@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Subject: [PATCH] btrfs: properly record errors for super block writeback Date: Fri, 10 Jun 2022 14:09:22 +0800 Message-Id: <6726644169a9f4affbb6894a8a560c96072be9cf.1654841347.git.wqu@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	btrfs: properly record errors for super block writeback \| expand btrfs: properly record errors for super block writeback

Message ID

6726644169a9f4affbb6894a8a560c96072be9cf.1654841347.git.wqu@suse.com (mailing list archive)

State

New, archived

Headers

From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH] btrfs: properly record errors for super block writeback
Date: Fri, 10 Jun 2022 14:09:22 +0800
Message-Id: 
 <6726644169a9f4affbb6894a8a560c96072be9cf.1654841347.git.wqu@suse.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

btrfs: properly record errors for super block writeback | expand

Commit Message

Qu Wenruo June 10, 2022, 6:09 a.m. UTC

Although function write_dev_supers() will report error, it only report
things like failed to grab the page, all those errors before we submit
the bio.

But if our bio really failed, due to real IO error, we just set the page
error, output an error message, and call it a day.

This makes btrfs to completely ignore super block writeback error.

Thankfully, this is not that a big deal, as normally we should got error
when flushing the device before we reached super block writeback.

Anyway we should make write_dev_supers() to include the IO errors.

This patch will enhance the error reporting by:

- Introduce a new on-stack structure to record all errors including IO
  errors
  The new strucuture, super_block_io_status, will have a wait queue and
  accounting for flying bios along with errors we hit so far.

- Output a human readable error message
  Instead of something like:

   lost page write due to IO error on dm-1 (-5)

  Now we will have:

   failed to write super block at 65536 on dm-1 (-5)

- Wait for all super block IO finished before returning
  write_dev_supers()
  So now write_dev_supers() will wait for all flying bios to finish, and
  use real number of errors to determine if the write failed.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 54 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 40 insertions(+), 14 deletions(-)

Comments

Qu Wenruo June 10, 2022, 8:22 a.m. UTC | #1

On 2022/6/10 14:09, Qu Wenruo wrote:
> Although function write_dev_supers() will report error, it only report
> things like failed to grab the page, all those errors before we submit
> the bio.
> 
> But if our bio really failed, due to real IO error, we just set the page
> error, output an error message, and call it a day.
> 
> This makes btrfs to completely ignore super block writeback error.
> 
> Thankfully, this is not that a big deal, as normally we should got error
> when flushing the device before we reached super block writeback.
> 
> Anyway we should make write_dev_supers() to include the IO errors.
> 
> This patch will enhance the error reporting by:
> 
> - Introduce a new on-stack structure to record all errors including IO
>    errors
>    The new strucuture, super_block_io_status, will have a wait queue and
>    accounting for flying bios along with errors we hit so far.
> 
> - Output a human readable error message
>    Instead of something like:
> 
>     lost page write due to IO error on dm-1 (-5)
> 
>    Now we will have:
> 
>     failed to write super block at 65536 on dm-1 (-5)
> 
> - Wait for all super block IO finished before returning
>    write_dev_supers()
>    So now write_dev_supers() will wait for all flying bios to finish, and
>    use real number of errors to determine if the write failed.

My bad, I didn't check the later wait_dev_supers() call, which uses 
PageError to detect if our previous write failed.

So please drop this patch.

Although it looks like we don't even need to bother the parallel 
submission, since in super block writeback, we're already in 
TRANS_STATE_UNBLOCKED status, thus we're fine to do the work in serial.

Thanks,
Qu
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/disk-io.c | 54 ++++++++++++++++++++++++++++++++++------------
>   1 file changed, 40 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 800ad3a9c68e..9ed88a921465 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3864,9 +3864,21 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>   }
>   ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
>   
> +struct super_block_io_status {
> +	struct btrfs_device *device;
> +	wait_queue_head_t wait;
> +
> +	/* Pending sb bios. */
> +	atomic_t pending;
> +
> +	/* Total errors, including both primary and backup sb writeback errors. */
> +	atomic_t errors;
> +};
> +
>   static void btrfs_end_super_write(struct bio *bio)
>   {
> -	struct btrfs_device *device = bio->bi_private;
> +	struct super_block_io_status *sbios = bio->bi_private;
> +	struct btrfs_device *device = sbios->device;
>   	struct bio_vec *bvec;
>   	struct bvec_iter_all iter_all;
>   	struct page *page;
> @@ -3875,21 +3887,29 @@ static void btrfs_end_super_write(struct bio *bio)
>   		page = bvec->bv_page;
>   
>   		if (bio->bi_status) {
> +			struct btrfs_super_block *sb;
> +			void *addr;
> +			u64 bytenr;
> +
> +			addr = kmap_local_page(page) + bvec->bv_offset;
> +			sb = addr;
> +			bytenr = btrfs_super_bytenr(sb);
> +			kunmap_local(addr);
> +
> +
> +			atomic_inc(&sbios->errors);
>   			btrfs_warn_rl_in_rcu(device->fs_info,
> -				"lost page write due to IO error on %s (%d)",
> -				rcu_str_deref(device->name),
> +				"failed to write super block at %llu on %s (%d)",
> +				bytenr, rcu_str_deref(device->name),
>   				blk_status_to_errno(bio->bi_status));
> -			ClearPageUptodate(page);
> -			SetPageError(page);
>   			btrfs_dev_stat_inc_and_print(device,
>   						     BTRFS_DEV_STAT_WRITE_ERRS);
> -		} else {
> -			SetPageUptodate(page);
>   		}
> -
>   		put_page(page);
>   		unlock_page(page);
>   	}
> +	atomic_dec(&sbios->pending);
> +	wake_up(&sbios->wait);
>   
>   	bio_put(bio);
>   }
> @@ -3975,9 +3995,9 @@ static int write_dev_supers(struct btrfs_device *device,
>   {
>   	struct btrfs_fs_info *fs_info = device->fs_info;
>   	struct address_space *mapping = device->bdev->bd_inode->i_mapping;
> +	struct super_block_io_status sbios = {.device = device};
>   	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
>   	int i;
> -	int errors = 0;
>   	int ret;
>   	u64 bytenr, bytenr_orig;
>   
> @@ -3986,6 +4006,10 @@ static int write_dev_supers(struct btrfs_device *device,
>   
>   	shash->tfm = fs_info->csum_shash;
>   
> +	atomic_set(&sbios.pending, 0);
> +	atomic_set(&sbios.errors, 0);
> +	init_waitqueue_head(&sbios.wait);
> +
>   	for (i = 0; i < max_mirrors; i++) {
>   		struct page *page;
>   		struct bio *bio;
> @@ -3999,7 +4023,7 @@ static int write_dev_supers(struct btrfs_device *device,
>   			btrfs_err(device->fs_info,
>   				"couldn't get super block location for mirror %d",
>   				i);
> -			errors++;
> +			atomic_inc(&sbios.errors);
>   			continue;
>   		}
>   		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
> @@ -4018,7 +4042,7 @@ static int write_dev_supers(struct btrfs_device *device,
>   			btrfs_err(device->fs_info,
>   			    "couldn't get super block page for bytenr %llu",
>   			    bytenr);
> -			errors++;
> +			atomic_inc(&sbios.errors);
>   			continue;
>   		}
>   
> @@ -4028,6 +4052,7 @@ static int write_dev_supers(struct btrfs_device *device,
>   		disk_super = page_address(page);
>   		memcpy(disk_super, sb, BTRFS_SUPER_INFO_SIZE);
>   
> +		atomic_inc(&sbios.pending);
>   		/*
>   		 * Directly use bios here instead of relying on the page cache
>   		 * to do I/O, so we don't lose the ability to do integrity
> @@ -4037,7 +4062,7 @@ static int write_dev_supers(struct btrfs_device *device,
>   				REQ_OP_WRITE | REQ_SYNC | REQ_META | REQ_PRIO,
>   				GFP_NOFS);
>   		bio->bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
> -		bio->bi_private = device;
> +		bio->bi_private = &sbios;
>   		bio->bi_end_io = btrfs_end_super_write;
>   		__bio_add_page(bio, page, BTRFS_SUPER_INFO_SIZE,
>   			       offset_in_page(bytenr));
> @@ -4054,9 +4079,10 @@ static int write_dev_supers(struct btrfs_device *device,
>   		submit_bio(bio);
>   
>   		if (btrfs_advance_sb_log(device, i))
> -			errors++;
> +			atomic_inc(&sbios.errors);
>   	}
> -	return errors < i ? 0 : -1;
> +	wait_event(sbios.wait, atomic_read(&sbios.pending) == 0);
> +	return atomic_read(&sbios.errors) < i ? 0 : -EIO;
>   }
>   
>   /*

David Sterba June 13, 2022, 3:42 p.m. UTC | #2

On Fri, Jun 10, 2022 at 04:22:17PM +0800, Qu Wenruo wrote:
> On 2022/6/10 14:09, Qu Wenruo wrote:
> > Although function write_dev_supers() will report error, it only report
> > things like failed to grab the page, all those errors before we submit
> > the bio.
> > 
> > But if our bio really failed, due to real IO error, we just set the page
> > error, output an error message, and call it a day.
> > 
> > This makes btrfs to completely ignore super block writeback error.
> > 
> > Thankfully, this is not that a big deal, as normally we should got error
> > when flushing the device before we reached super block writeback.
> > 
> > Anyway we should make write_dev_supers() to include the IO errors.
> > 
> > This patch will enhance the error reporting by:
> > 
> > - Introduce a new on-stack structure to record all errors including IO
> >    errors
> >    The new strucuture, super_block_io_status, will have a wait queue and
> >    accounting for flying bios along with errors we hit so far.
> > 
> > - Output a human readable error message
> >    Instead of something like:
> > 
> >     lost page write due to IO error on dm-1 (-5)
> > 
> >    Now we will have:
> > 
> >     failed to write super block at 65536 on dm-1 (-5)
> > 
> > - Wait for all super block IO finished before returning
> >    write_dev_supers()
> >    So now write_dev_supers() will wait for all flying bios to finish, and
> >    use real number of errors to determine if the write failed.
> 
> My bad, I didn't check the later wait_dev_supers() call, which uses 
> PageError to detect if our previous write failed.
> 
> So please drop this patch.
> 
> Although it looks like we don't even need to bother the parallel 
> submission, since in super block writeback, we're already in 
> TRANS_STATE_UNBLOCKED status, thus we're fine to do the work in serial.

The discussions about the super block writes are scattered in several
places, so the parallelism of IO will stay.

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 800ad3a9c68e..9ed88a921465 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3864,9 +3864,21 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 }
 ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
 
+struct super_block_io_status {
+	struct btrfs_device *device;
+	wait_queue_head_t wait;
+
+	/* Pending sb bios. */
+	atomic_t pending;
+
+	/* Total errors, including both primary and backup sb writeback errors. */
+	atomic_t errors;
+};
+
 static void btrfs_end_super_write(struct bio *bio)
 {
-	struct btrfs_device *device = bio->bi_private;
+	struct super_block_io_status *sbios = bio->bi_private;
+	struct btrfs_device *device = sbios->device;
 	struct bio_vec *bvec;
 	struct bvec_iter_all iter_all;
 	struct page *page;
@@ -3875,21 +3887,29 @@  static void btrfs_end_super_write(struct bio *bio)
 		page = bvec->bv_page;
 
 		if (bio->bi_status) {
+			struct btrfs_super_block *sb;
+			void *addr;
+			u64 bytenr;
+
+			addr = kmap_local_page(page) + bvec->bv_offset;
+			sb = addr;
+			bytenr = btrfs_super_bytenr(sb);
+			kunmap_local(addr);
+
+
+			atomic_inc(&sbios->errors);
 			btrfs_warn_rl_in_rcu(device->fs_info,
-				"lost page write due to IO error on %s (%d)",
-				rcu_str_deref(device->name),
+				"failed to write super block at %llu on %s (%d)",
+				bytenr, rcu_str_deref(device->name),
 				blk_status_to_errno(bio->bi_status));
-			ClearPageUptodate(page);
-			SetPageError(page);
 			btrfs_dev_stat_inc_and_print(device,
 						     BTRFS_DEV_STAT_WRITE_ERRS);
-		} else {
-			SetPageUptodate(page);
 		}
-
 		put_page(page);
 		unlock_page(page);
 	}
+	atomic_dec(&sbios->pending);
+	wake_up(&sbios->wait);
 
 	bio_put(bio);
 }
@@ -3975,9 +3995,9 @@  static int write_dev_supers(struct btrfs_device *device,
 {
 	struct btrfs_fs_info *fs_info = device->fs_info;
 	struct address_space *mapping = device->bdev->bd_inode->i_mapping;
+	struct super_block_io_status sbios = {.device = device};
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	int i;
-	int errors = 0;
 	int ret;
 	u64 bytenr, bytenr_orig;
 
@@ -3986,6 +4006,10 @@  static int write_dev_supers(struct btrfs_device *device,
 
 	shash->tfm = fs_info->csum_shash;
 
+	atomic_set(&sbios.pending, 0);
+	atomic_set(&sbios.errors, 0);
+	init_waitqueue_head(&sbios.wait);
+
 	for (i = 0; i < max_mirrors; i++) {
 		struct page *page;
 		struct bio *bio;
@@ -3999,7 +4023,7 @@  static int write_dev_supers(struct btrfs_device *device,
 			btrfs_err(device->fs_info,
 				"couldn't get super block location for mirror %d",
 				i);
-			errors++;
+			atomic_inc(&sbios.errors);
 			continue;
 		}
 		if (bytenr + BTRFS_SUPER_INFO_SIZE >=
@@ -4018,7 +4042,7 @@  static int write_dev_supers(struct btrfs_device *device,
 			btrfs_err(device->fs_info,
 			    "couldn't get super block page for bytenr %llu",
 			    bytenr);
-			errors++;
+			atomic_inc(&sbios.errors);
 			continue;
 		}
 
@@ -4028,6 +4052,7 @@  static int write_dev_supers(struct btrfs_device *device,
 		disk_super = page_address(page);
 		memcpy(disk_super, sb, BTRFS_SUPER_INFO_SIZE);
 
+		atomic_inc(&sbios.pending);
 		/*
 		 * Directly use bios here instead of relying on the page cache
 		 * to do I/O, so we don't lose the ability to do integrity
@@ -4037,7 +4062,7 @@  static int write_dev_supers(struct btrfs_device *device,
 				REQ_OP_WRITE | REQ_SYNC | REQ_META | REQ_PRIO,
 				GFP_NOFS);
 		bio->bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
-		bio->bi_private = device;
+		bio->bi_private = &sbios;
 		bio->bi_end_io = btrfs_end_super_write;
 		__bio_add_page(bio, page, BTRFS_SUPER_INFO_SIZE,
 			       offset_in_page(bytenr));
@@ -4054,9 +4079,10 @@  static int write_dev_supers(struct btrfs_device *device,
 		submit_bio(bio);
 
 		if (btrfs_advance_sb_log(device, i))
-			errors++;
+			atomic_inc(&sbios.errors);
 	}
-	return errors < i ? 0 : -1;
+	wait_event(sbios.wait, atomic_read(&sbios.pending) == 0);
+	return atomic_read(&sbios.errors) < i ? 0 : -EIO;
 }
 
 /*

btrfs: properly record errors for super block writeback

Commit Message

Comments

Patch