[v2] btrfs: add framework to handle device flush error as a volume
diff mbox

Message ID 20170505231754.9201-1-anand.jain@oracle.com
State New
Headers show

Commit Message

Anand Jain May 5, 2017, 11:17 p.m. UTC
This adds comments to the flush error handling part of
the code, and hopes to maintain the same logic with a
framework which can be used to handle the errors at the
volume level.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2:
 fix -ENOMEM at two places
 add code readability changes in check_barrier_error()

 fs/btrfs/disk-io.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 fs/btrfs/volumes.h |  1 +
 2 files changed, 54 insertions(+), 4 deletions(-)

Comments

David Sterba May 9, 2017, 5:12 p.m. UTC | #1
On Sat, May 06, 2017 at 07:17:54AM +0800, Anand Jain wrote:
> This adds comments to the flush error handling part of
> the code, and hopes to maintain the same logic with a
> framework which can be used to handle the errors at the
> volume level.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>

Reviewed-by: David Sterba <dsterba@suse.com>

So the next step is to preallocate the flush_bio and remove the ENOMEM
handling.

I think we can skip the error handling on the submission path, even if
submit_bio can fail, the error code will be stored to the bio and this
will be checked in the waiting side.

Then write_dev_flush can be split into 2 functions by the parameter
'wait', as the paths are completely independent.

> +static int check_barrier_error(struct btrfs_fs_devices *fsdevs)
> +{
> +	int submit_flush_error = 0;
> +	int dev_flush_error = 0;
> +	struct btrfs_device *dev;
> +	int tolerance;
> +
> +	list_for_each_entry_rcu(dev, &fsdevs->devices, dev_list) {
> +		if (!dev->bdev) {
> +			submit_flush_error++;
> +			dev_flush_error++;
> +			continue;
> +		}
> +		if (dev->last_flush_error == -ENOMEM)
> +			submit_flush_error++;
> +		if (dev->last_flush_error && dev->last_flush_error != -ENOMEM)
> +			dev_flush_error++;
> +	}
> +
> +	tolerance = fsdevs->fs_info->num_tolerated_disk_barrier_failures;
> +	if (submit_flush_error > tolerance || dev_flush_error > tolerance)
> +		return -EIO;

Actually, after reading submit_bio and the comment above, do we need to
care about the submission errors? As submit_bio could return -EIO
immeditaelly, but does not help us anyway. We'll wait and check again
later.  It's the bio completion what counts.

This should simplify the code.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anand Jain May 16, 2017, 9:52 a.m. UTC | #2
On 05/10/2017 01:12 AM, David Sterba wrote:
> On Sat, May 06, 2017 at 07:17:54AM +0800, Anand Jain wrote:
>> This adds comments to the flush error handling part of
>> the code, and hopes to maintain the same logic with a
>> framework which can be used to handle the errors at the
>> volume level.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>
> Reviewed-by: David Sterba <dsterba@suse.com>

  Thanks.

> So the next step is to preallocate the flush_bio and remove the ENOMEM
> handling.

  one or the other way ENOMEM is going to go away. Thinking to fix
  blkdev_issue_flush(), which IMO is the right way, its RFC is sent out.


> submit_bio can fail, the error code will be stored to the bio and this
> will be checked in the waiting side.
>
> Then write_dev_flush can be split into 2 functions by the parameter
> 'wait', as the paths are completely independent.

  The new KPI blkdev_issue_flush_no_wait() makes it much simpler,
  I have sent out the btrfs part of the RFC patch as well. Thanks.

>> +static int check_barrier_error(struct btrfs_fs_devices *fsdevs)
>> +{
>> +	int submit_flush_error = 0;
>> +	int dev_flush_error = 0;
>> +	struct btrfs_device *dev;
>> +	int tolerance;
>> +
>> +	list_for_each_entry_rcu(dev, &fsdevs->devices, dev_list) {
>> +		if (!dev->bdev) {
>> +			submit_flush_error++;
>> +			dev_flush_error++;
>> +			continue;
>> +		}
>> +		if (dev->last_flush_error == -ENOMEM)
>> +			submit_flush_error++;
>> +		if (dev->last_flush_error && dev->last_flush_error != -ENOMEM)
>> +			dev_flush_error++;
>> +	}
>> +
>> +	tolerance = fsdevs->fs_info->num_tolerated_disk_barrier_failures;
>> +	if (submit_flush_error > tolerance || dev_flush_error > tolerance)
>> +		return -EIO;
>
> Actually, after reading submit_bio and the comment above, do we need to
> care about the submission errors? As submit_bio could return -EIO
> immeditaelly, but does not help us anyway. We'll wait and check again
> later.  It's the bio completion what counts.
>
> This should simplify the code.

  Yeah. we don;t have to worry about the send part of the error, when
  all the error that may encounter is all related to the device itself.

  There are RFC patches for the device state: offline and failed, now
  depending on whether the flush returned EIO or transport error ENXIO
  I intend to choice the device state such as of failed or offline
  respectively.

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e064913fa62f..2f0d0688e0a6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3625,6 +3625,10 @@  static int write_dev_flush(struct btrfs_device *device, int wait)
 	if (wait) {
 		bio = device->flush_bio;
 		if (!bio)
+			/*
+			 * This means the alloc has failed with ENOMEM, however
+			 * here we return 0, as its not a device error.
+			 */
 			return 0;
 
 		wait_for_completion(&device->flush_wait);
@@ -3664,6 +3668,32 @@  static int write_dev_flush(struct btrfs_device *device, int wait)
 	return 0;
 }
 
+static int check_barrier_error(struct btrfs_fs_devices *fsdevs)
+{
+	int submit_flush_error = 0;
+	int dev_flush_error = 0;
+	struct btrfs_device *dev;
+	int tolerance;
+
+	list_for_each_entry_rcu(dev, &fsdevs->devices, dev_list) {
+		if (!dev->bdev) {
+			submit_flush_error++;
+			dev_flush_error++;
+			continue;
+		}
+		if (dev->last_flush_error == -ENOMEM)
+			submit_flush_error++;
+		if (dev->last_flush_error && dev->last_flush_error != -ENOMEM)
+			dev_flush_error++;
+	}
+
+	tolerance = fsdevs->fs_info->num_tolerated_disk_barrier_failures;
+	if (submit_flush_error > tolerance || dev_flush_error > tolerance)
+		return -EIO;
+
+	return 0;
+}
+
 /*
  * send an empty flush down to each device in parallel,
  * then wait for them
@@ -3691,6 +3721,7 @@  static int barrier_all_devices(struct btrfs_fs_info *info)
 		ret = write_dev_flush(dev, 0);
 		if (ret)
 			errors_send++;
+		dev->last_flush_error = ret;
 	}
 
 	/* wait for all the barriers */
@@ -3705,12 +3736,30 @@  static int barrier_all_devices(struct btrfs_fs_info *info)
 			continue;
 
 		ret = write_dev_flush(dev, 1);
-		if (ret)
+		if (ret) {
+			dev->last_flush_error = ret;
 			errors_wait++;
+		}
+	}
+
+	/*
+	 * Try hard in case of flush. Lets say, in RAID1 we have
+	 * the following situation
+	 *  dev1: EIO dev2: ENOMEM
+	 * this is not a fatal error as we hope to recover from
+	 * ENOMEM in the next attempt to flush.
+	 * But the following is considered as fatal
+	 *  dev1: ENOMEM dev2: ENOMEM
+	 *  dev1: bdev == NULL dev2: ENOMEM
+	 */
+	if (errors_send || errors_wait) {
+		/*
+		 * At some point we need the status of all disks
+		 * to arrive at the volume status. So error checking
+		 * is being pushed to a separate loop.
+		 */
+		return check_barrier_error(info->fs_devices);
 	}
-	if (errors_send > info->num_tolerated_disk_barrier_failures ||
-	    errors_wait > info->num_tolerated_disk_barrier_failures)
-		return -EIO;
 	return 0;
 }
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index cfb817384cb5..9bb248b3fa0e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -87,6 +87,7 @@  struct btrfs_device {
 	int offline;
 	int can_discard;
 	int is_tgtdev_for_dev_replace;
+	int last_flush_error;
 
 #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
 	seqcount_t data_seqcount;