diff mbox

[v2,4/6] btrfs: Allow barrier_all_devices to do chunk level device check

Message ID 20170306085855.11403-5-quwenruo@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

Qu Wenruo March 6, 2017, 8:58 a.m. UTC
The last user of num_tolerated_disk_barrier_failures is
barrier_all_devices().
But it's can be easily changed to new per-chunk degradable check
framework.

Now btrfs_device will have two extra members, representing send/wait
error, set at write_dev_flush() time.
With these 2 new members, btrfs_check_rw_degradable() can check if the
fs is still OK when the fs is committed to disk.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/disk-io.c | 15 +++++++--------
 fs/btrfs/volumes.c |  4 +++-
 fs/btrfs/volumes.h |  4 ++++
 3 files changed, 14 insertions(+), 9 deletions(-)

Comments

Anand Jain March 7, 2017, 4:48 a.m. UTC | #1
On 03/06/2017 04:58 PM, Qu Wenruo wrote:
> The last user of num_tolerated_disk_barrier_failures is
> barrier_all_devices().
> But it's can be easily changed to new per-chunk degradable check
> framework.
>
> Now btrfs_device will have two extra members, representing send/wait
> error, set at write_dev_flush() time.
> With these 2 new members, btrfs_check_rw_degradable() can check if the
> fs is still OK when the fs is committed to disk.

  This logic isn't reentrant, earlier it was. How about using
  stack variable instead ?

Thanks, Anand


> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/disk-io.c | 15 +++++++--------
>  fs/btrfs/volumes.c |  4 +++-
>  fs/btrfs/volumes.h |  4 ++++
>  3 files changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index c26b8a0b121c..f596bd130524 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3569,17 +3569,17 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>  {
>  	struct list_head *head;
>  	struct btrfs_device *dev;
> -	int errors_send = 0;
> -	int errors_wait = 0;
>  	int ret;
>
>  	/* send down all the barriers */
>  	head = &info->fs_devices->devices;
>  	list_for_each_entry_rcu(dev, head, dev_list) {
> +		dev->err_wait = false;
> +		dev->err_send = false;
>  		if (dev->missing)
>  			continue;
>  		if (!dev->bdev) {
> -			errors_send++;
> +			dev->err_send = true;
>  			continue;
>  		}
>  		if (!dev->in_fs_metadata || !dev->writeable)
> @@ -3587,7 +3587,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>
>  		ret = write_dev_flush(dev, 0);
>  		if (ret)
> -			errors_send++;
> +			dev->err_send = true;
>  	}
>
>  	/* wait for all the barriers */
> @@ -3595,7 +3595,7 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>  		if (dev->missing)
>  			continue;
>  		if (!dev->bdev) {
> -			errors_wait++;
> +			dev->err_wait = true;
>  			continue;
>  		}
>  		if (!dev->in_fs_metadata || !dev->writeable)
> @@ -3603,10 +3603,9 @@ static int barrier_all_devices(struct btrfs_fs_info *info)
>
>  		ret = write_dev_flush(dev, 1);
>  		if (ret)
> -			errors_wait++;
> +			dev->err_wait = true;
>  	}
> -	if (errors_send > info->num_tolerated_disk_barrier_failures ||
> -	    errors_wait > info->num_tolerated_disk_barrier_failures)
> +	if (!btrfs_check_rw_degradable(info))
>  		return -EIO;
>  	return 0;
>  }
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index dd9dd94d7043..729cbd0d2b60 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6796,7 +6796,9 @@ bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
>  			btrfs_get_num_tolerated_disk_barrier_failures(
>  					map->type);
>  		for (i = 0; i < map->num_stripes; i++) {
> -			if (map->stripes[i].dev->missing)
> +			if (map->stripes[i].dev->missing ||
> +			    map->stripes[i].dev->err_wait ||
> +			    map->stripes[i].dev->err_send)
>  				missing++;
>  		}
>  		if (missing > max_tolerated) {
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index db1b5ef479cf..112fccacdabc 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -75,6 +75,10 @@ struct btrfs_device {
>  	int can_discard;
>  	int is_tgtdev_for_dev_replace;
>
> +	/* If this devices fails to send/wait dev flush */
> +	bool err_send;
> +	bool err_wait;



>  #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>  	seqcount_t data_seqcount;
>  #endif
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo March 7, 2017, 5:36 a.m. UTC | #2
At 03/07/2017 12:48 PM, Anand Jain wrote:
>
>
> On 03/06/2017 04:58 PM, Qu Wenruo wrote:
>> The last user of num_tolerated_disk_barrier_failures is
>> barrier_all_devices().
>> But it's can be easily changed to new per-chunk degradable check
>> framework.
>>
>> Now btrfs_device will have two extra members, representing send/wait
>> error, set at write_dev_flush() time.
>> With these 2 new members, btrfs_check_rw_degradable() can check if the
>> fs is still OK when the fs is committed to disk.
>
>  This logic isn't reentrant, earlier it was. How about using
>  stack variable instead ?
>
> Thanks, Anand

Thanks for the review.

However I didn't quite get the point about the re-entrance and stack 
variable here.

1) About reentrancy
In previous version, the err_* bits are still put into btrfs_devices 
structure, just timing of resetting these bits are changes.

So either way, it's not reentrant in theory.

But that doesn't make a problem, as we have other things to protect when 
calling write_all_supers(), the only caller of barrier_all_devices().

So would you give me an example why we need to make it reentrant?

2) About using stack variable?
Did you mean build a array on stack to record which devices fails to 
send/wait and use the array as check condition other than 
btrfs_device->err_* and btrfs_device->missing ?

The only problem is, it sounds more complex than needed.
Despite the err_*, we also needs to check already missing devices, so I 
prefer the easy way, just checking btrfs_device->err_* and 
btrfs_device->missing.

Any simple example to explain your suggestion here?

Thanks,
Qu

>
>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>  fs/btrfs/disk-io.c | 15 +++++++--------
>>  fs/btrfs/volumes.c |  4 +++-
>>  fs/btrfs/volumes.h |  4 ++++
>>  3 files changed, 14 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index c26b8a0b121c..f596bd130524 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3569,17 +3569,17 @@ static int barrier_all_devices(struct
>> btrfs_fs_info *info)
>>  {
>>      struct list_head *head;
>>      struct btrfs_device *dev;
>> -    int errors_send = 0;
>> -    int errors_wait = 0;
>>      int ret;
>>
>>      /* send down all the barriers */
>>      head = &info->fs_devices->devices;
>>      list_for_each_entry_rcu(dev, head, dev_list) {
>> +        dev->err_wait = false;
>> +        dev->err_send = false;
>>          if (dev->missing)
>>              continue;
>>          if (!dev->bdev) {
>> -            errors_send++;
>> +            dev->err_send = true;
>>              continue;
>>          }
>>          if (!dev->in_fs_metadata || !dev->writeable)
>> @@ -3587,7 +3587,7 @@ static int barrier_all_devices(struct
>> btrfs_fs_info *info)
>>
>>          ret = write_dev_flush(dev, 0);
>>          if (ret)
>> -            errors_send++;
>> +            dev->err_send = true;
>>      }
>>
>>      /* wait for all the barriers */
>> @@ -3595,7 +3595,7 @@ static int barrier_all_devices(struct
>> btrfs_fs_info *info)
>>          if (dev->missing)
>>              continue;
>>          if (!dev->bdev) {
>> -            errors_wait++;
>> +            dev->err_wait = true;
>>              continue;
>>          }
>>          if (!dev->in_fs_metadata || !dev->writeable)
>> @@ -3603,10 +3603,9 @@ static int barrier_all_devices(struct
>> btrfs_fs_info *info)
>>
>>          ret = write_dev_flush(dev, 1);
>>          if (ret)
>> -            errors_wait++;
>> +            dev->err_wait = true;
>>      }
>> -    if (errors_send > info->num_tolerated_disk_barrier_failures ||
>> -        errors_wait > info->num_tolerated_disk_barrier_failures)
>> +    if (!btrfs_check_rw_degradable(info))
>>          return -EIO;
>>      return 0;
>>  }
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index dd9dd94d7043..729cbd0d2b60 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6796,7 +6796,9 @@ bool btrfs_check_rw_degradable(struct
>> btrfs_fs_info *fs_info)
>>              btrfs_get_num_tolerated_disk_barrier_failures(
>>                      map->type);
>>          for (i = 0; i < map->num_stripes; i++) {
>> -            if (map->stripes[i].dev->missing)
>> +            if (map->stripes[i].dev->missing ||
>> +                map->stripes[i].dev->err_wait ||
>> +                map->stripes[i].dev->err_send)
>>                  missing++;
>>          }
>>          if (missing > max_tolerated) {
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index db1b5ef479cf..112fccacdabc 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -75,6 +75,10 @@ struct btrfs_device {
>>      int can_discard;
>>      int is_tgtdev_for_dev_replace;
>>
>> +    /* If this devices fails to send/wait dev flush */
>> +    bool err_send;
>> +    bool err_wait;
>
>
>
>>  #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>>      seqcount_t data_seqcount;
>>  #endif
>>
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anand Jain March 7, 2017, 6:55 a.m. UTC | #3
> 1) About reentrancy
> In previous version, the err_* bits are still put into btrfs_devices
> structure, just timing of resetting these bits are changes.
>
> So either way, it's not reentrant in theory.
>
> But that doesn't make a problem, as we have other things to protect when
> calling write_all_supers(), the only caller of barrier_all_devices().
>
> So would you give me an example why we need to make it reentrant?

  Its updating the device struct I would avoid such a change
  for the reasons of this patch. (I notice that in V1 as well).
  Further btrfs does not handle online intermittent device failure,
  unless the complete story is taken care, I would avoid such a change.

  Theoretically this patch is buggy, btrfs_check_rw_degradable() has
  more consumers than just the barrier_all_devices(). However the
  dev->err_wait and dev->err_send are managed by only
  barrier_all_devices(). And the bad news is dev->err_wait and
  dev->err_send would affect the result of "missing" coming out of
  btrfs_check_rw_degradable() which is wrong for the threads other
  than barrier_all_devices(). Further, the only way dev->err_wait and
  dev->err_send gets reset is by the next call to
  barrier_all_devices(). And if lock is an answer that would makes
  it messy and complicated. I won't do that.

  There is a simple solution as below..

> 2) About using stack variable?

  pass err_send and err_write to btrfs_check_rw_degradable() through
  argument so to compute degradable for the barrier_all_devices().
  In this way changes are kept local thread specific.

Thanks, Anand


> Did you mean build a array on stack to record which devices fails to
> send/wait and use the array as check condition other than
> btrfs_device->err_* and btrfs_device->missing ?
>
> The only problem is, it sounds more complex than needed.
> Despite the err_*, we also needs to check already missing devices, so I
> prefer the easy way, just checking btrfs_device->err_* and
> btrfs_device->missing.
>
> Any simple example to explain your suggestion here?
>
> Thanks,
> Qu
>
>>
>>
>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> ---
>>>  fs/btrfs/disk-io.c | 15 +++++++--------
>>>  fs/btrfs/volumes.c |  4 +++-
>>>  fs/btrfs/volumes.h |  4 ++++
>>>  3 files changed, 14 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index c26b8a0b121c..f596bd130524 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3569,17 +3569,17 @@ static int barrier_all_devices(struct
>>> btrfs_fs_info *info)
>>>  {
>>>      struct list_head *head;
>>>      struct btrfs_device *dev;
>>> -    int errors_send = 0;
>>> -    int errors_wait = 0;
>>>      int ret;
>>>
>>>      /* send down all the barriers */
>>>      head = &info->fs_devices->devices;
>>>      list_for_each_entry_rcu(dev, head, dev_list) {
>>> +        dev->err_wait = false;
>>> +        dev->err_send = false;
>>>          if (dev->missing)
>>>              continue;
>>>          if (!dev->bdev) {
>>> -            errors_send++;
>>> +            dev->err_send = true;
>>>              continue;
>>>          }
>>>          if (!dev->in_fs_metadata || !dev->writeable)
>>> @@ -3587,7 +3587,7 @@ static int barrier_all_devices(struct
>>> btrfs_fs_info *info)
>>>
>>>          ret = write_dev_flush(dev, 0);
>>>          if (ret)
>>> -            errors_send++;
>>> +            dev->err_send = true;
>>>      }
>>>
>>>      /* wait for all the barriers */
>>> @@ -3595,7 +3595,7 @@ static int barrier_all_devices(struct
>>> btrfs_fs_info *info)
>>>          if (dev->missing)
>>>              continue;
>>>          if (!dev->bdev) {
>>> -            errors_wait++;
>>> +            dev->err_wait = true;
>>>              continue;
>>>          }
>>>          if (!dev->in_fs_metadata || !dev->writeable)
>>> @@ -3603,10 +3603,9 @@ static int barrier_all_devices(struct
>>> btrfs_fs_info *info)
>>>
>>>          ret = write_dev_flush(dev, 1);
>>>          if (ret)
>>> -            errors_wait++;
>>> +            dev->err_wait = true;
>>>      }
>>> -    if (errors_send > info->num_tolerated_disk_barrier_failures ||
>>> -        errors_wait > info->num_tolerated_disk_barrier_failures)
>>> +    if (!btrfs_check_rw_degradable(info))
>>>          return -EIO;
>>>      return 0;
>>>  }
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index dd9dd94d7043..729cbd0d2b60 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -6796,7 +6796,9 @@ bool btrfs_check_rw_degradable(struct
>>> btrfs_fs_info *fs_info)
>>>              btrfs_get_num_tolerated_disk_barrier_failures(
>>>                      map->type);
>>>          for (i = 0; i < map->num_stripes; i++) {
>>> -            if (map->stripes[i].dev->missing)
>>> +            if (map->stripes[i].dev->missing ||
>>> +                map->stripes[i].dev->err_wait ||
>>> +                map->stripes[i].dev->err_send)
>>>                  missing++;
>>>          }

  This is rather wrong.


>>>          if (missing > max_tolerated) {
>>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>>> index db1b5ef479cf..112fccacdabc 100644
>>> --- a/fs/btrfs/volumes.h
>>> +++ b/fs/btrfs/volumes.h
>>> @@ -75,6 +75,10 @@ struct btrfs_device {
>>>      int can_discard;
>>>      int is_tgtdev_for_dev_replace;
>>>
>>> +    /* If this devices fails to send/wait dev flush */
>>> +    bool err_send;
>>> +    bool err_wait;
>>
>>
>>
>>>  #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>>>      seqcount_t data_seqcount;
>>>  #endif
>>>
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo March 7, 2017, 7:08 a.m. UTC | #4
At 03/07/2017 02:55 PM, Anand Jain wrote:
>
>
>
>
>> 1) About reentrancy
>> In previous version, the err_* bits are still put into btrfs_devices
>> structure, just timing of resetting these bits are changes.
>>
>> So either way, it's not reentrant in theory.
>>
>> But that doesn't make a problem, as we have other things to protect when
>> calling write_all_supers(), the only caller of barrier_all_devices().
>>
>> So would you give me an example why we need to make it reentrant?
>
>  Its updating the device struct I would avoid such a change
>  for the reasons of this patch. (I notice that in V1 as well).
>  Further btrfs does not handle online intermittent device failure,
>  unless the complete story is taken care, I would avoid such a change.
>
>  Theoretically this patch is buggy, btrfs_check_rw_degradable() has
>  more consumers than just the barrier_all_devices(). However the
>  dev->err_wait and dev->err_send are managed by only
>  barrier_all_devices(). And the bad news is dev->err_wait and
>  dev->err_send would affect the result of "missing" coming out of
>  btrfs_check_rw_degradable() which is wrong for the threads other
>  than barrier_all_devices(). Further, the only way dev->err_wait and
>  dev->err_send gets reset is by the next call to
>  barrier_all_devices(). And if lock is an answer that would makes
>  it messy and complicated. I won't do that.
>
>  There is a simple solution as below..
>
>> 2) About using stack variable?
>
>  pass err_send and err_write to btrfs_check_rw_degradable() through
>  argument so to compute degradable for the barrier_all_devices().
>  In this way changes are kept local thread specific.

In this way, we need to make a expandable structure to record devid <-> 
err_send/wait mapping.

Simple array is not suitable here, as the starting devid can be either 1 
or 0 depending on whether we're replacing.
Furthermore devid is not always sequential, we can have holes in devid 
allocation.

Although this behavior will indeed makes less impact on btrfs_device 
structure.

I'll update the patchset and try such method, just hopes this won't 
introduce too much new code.

Thanks for the advice,
Qu

>
> Thanks, Anand
>
>
>> Did you mean build a array on stack to record which devices fails to
>> send/wait and use the array as check condition other than
>> btrfs_device->err_* and btrfs_device->missing ?
>>
>> The only problem is, it sounds more complex than needed.
>> Despite the err_*, we also needs to check already missing devices, so I
>> prefer the easy way, just checking btrfs_device->err_* and
>> btrfs_device->missing.
>>
>> Any simple example to explain your suggestion here?
>>
>> Thanks,
>> Qu
>>
>>>
>>>
>>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>> ---
>>>>  fs/btrfs/disk-io.c | 15 +++++++--------
>>>>  fs/btrfs/volumes.c |  4 +++-
>>>>  fs/btrfs/volumes.h |  4 ++++
>>>>  3 files changed, 14 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index c26b8a0b121c..f596bd130524 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -3569,17 +3569,17 @@ static int barrier_all_devices(struct
>>>> btrfs_fs_info *info)
>>>>  {
>>>>      struct list_head *head;
>>>>      struct btrfs_device *dev;
>>>> -    int errors_send = 0;
>>>> -    int errors_wait = 0;
>>>>      int ret;
>>>>
>>>>      /* send down all the barriers */
>>>>      head = &info->fs_devices->devices;
>>>>      list_for_each_entry_rcu(dev, head, dev_list) {
>>>> +        dev->err_wait = false;
>>>> +        dev->err_send = false;
>>>>          if (dev->missing)
>>>>              continue;
>>>>          if (!dev->bdev) {
>>>> -            errors_send++;
>>>> +            dev->err_send = true;
>>>>              continue;
>>>>          }
>>>>          if (!dev->in_fs_metadata || !dev->writeable)
>>>> @@ -3587,7 +3587,7 @@ static int barrier_all_devices(struct
>>>> btrfs_fs_info *info)
>>>>
>>>>          ret = write_dev_flush(dev, 0);
>>>>          if (ret)
>>>> -            errors_send++;
>>>> +            dev->err_send = true;
>>>>      }
>>>>
>>>>      /* wait for all the barriers */
>>>> @@ -3595,7 +3595,7 @@ static int barrier_all_devices(struct
>>>> btrfs_fs_info *info)
>>>>          if (dev->missing)
>>>>              continue;
>>>>          if (!dev->bdev) {
>>>> -            errors_wait++;
>>>> +            dev->err_wait = true;
>>>>              continue;
>>>>          }
>>>>          if (!dev->in_fs_metadata || !dev->writeable)
>>>> @@ -3603,10 +3603,9 @@ static int barrier_all_devices(struct
>>>> btrfs_fs_info *info)
>>>>
>>>>          ret = write_dev_flush(dev, 1);
>>>>          if (ret)
>>>> -            errors_wait++;
>>>> +            dev->err_wait = true;
>>>>      }
>>>> -    if (errors_send > info->num_tolerated_disk_barrier_failures ||
>>>> -        errors_wait > info->num_tolerated_disk_barrier_failures)
>>>> +    if (!btrfs_check_rw_degradable(info))
>>>>          return -EIO;
>>>>      return 0;
>>>>  }
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index dd9dd94d7043..729cbd0d2b60 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -6796,7 +6796,9 @@ bool btrfs_check_rw_degradable(struct
>>>> btrfs_fs_info *fs_info)
>>>>              btrfs_get_num_tolerated_disk_barrier_failures(
>>>>                      map->type);
>>>>          for (i = 0; i < map->num_stripes; i++) {
>>>> -            if (map->stripes[i].dev->missing)
>>>> +            if (map->stripes[i].dev->missing ||
>>>> +                map->stripes[i].dev->err_wait ||
>>>> +                map->stripes[i].dev->err_send)
>>>>                  missing++;
>>>>          }
>
>  This is rather wrong.
>
>
>>>>          if (missing > max_tolerated) {
>>>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>>>> index db1b5ef479cf..112fccacdabc 100644
>>>> --- a/fs/btrfs/volumes.h
>>>> +++ b/fs/btrfs/volumes.h
>>>> @@ -75,6 +75,10 @@ struct btrfs_device {
>>>>      int can_discard;
>>>>      int is_tgtdev_for_dev_replace;
>>>>
>>>> +    /* If this devices fails to send/wait dev flush */
>>>> +    bool err_send;
>>>> +    bool err_wait;
>>>
>>>
>>>
>>>>  #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>>>>      seqcount_t data_seqcount;
>>>>  #endif
>>>>
>>>
>>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo March 7, 2017, 8:07 a.m. UTC | #5
At 03/07/2017 03:08 PM, Qu Wenruo wrote:
>
>
> At 03/07/2017 02:55 PM, Anand Jain wrote:
>>
>>
>>
>>
>>> 1) About reentrancy
>>> In previous version, the err_* bits are still put into btrfs_devices
>>> structure, just timing of resetting these bits are changes.
>>>
>>> So either way, it's not reentrant in theory.
>>>
>>> But that doesn't make a problem, as we have other things to protect when
>>> calling write_all_supers(), the only caller of barrier_all_devices().
>>>
>>> So would you give me an example why we need to make it reentrant?
>>
>>  Its updating the device struct I would avoid such a change
>>  for the reasons of this patch. (I notice that in V1 as well).
>>  Further btrfs does not handle online intermittent device failure,
>>  unless the complete story is taken care, I would avoid such a change.
>>
>>  Theoretically this patch is buggy, btrfs_check_rw_degradable() has
>>  more consumers than just the barrier_all_devices(). However the
>>  dev->err_wait and dev->err_send are managed by only
>>  barrier_all_devices(). And the bad news is dev->err_wait and
>>  dev->err_send would affect the result of "missing" coming out of
>>  btrfs_check_rw_degradable() which is wrong for the threads other
>>  than barrier_all_devices(). Further, the only way dev->err_wait and
>>  dev->err_send gets reset is by the next call to
>>  barrier_all_devices(). And if lock is an answer that would makes
>>  it messy and complicated. I won't do that.
>>
>>  There is a simple solution as below..
>>
>>> 2) About using stack variable?
>>
>>  pass err_send and err_write to btrfs_check_rw_degradable() through
>>  argument so to compute degradable for the barrier_all_devices().
>>  In this way changes are kept local thread specific.
>
> In this way, we need to make a expandable structure to record devid <->
s/a expandable/an expendable/

Sorry for the confusion.

Thanks,
Qu
> err_send/wait mapping.
>
> Simple array is not suitable here, as the starting devid can be either 1
> or 0 depending on whether we're replacing.
> Furthermore devid is not always sequential, we can have holes in devid
> allocation.
>
> Although this behavior will indeed makes less impact on btrfs_device
> structure.
>
> I'll update the patchset and try such method, just hopes this won't
> introduce too much new code.
>
> Thanks for the advice,
> Qu
>
>>
>> Thanks, Anand
>>
>>
>>> Did you mean build a array on stack to record which devices fails to
>>> send/wait and use the array as check condition other than
>>> btrfs_device->err_* and btrfs_device->missing ?
>>>
>>> The only problem is, it sounds more complex than needed.
>>> Despite the err_*, we also needs to check already missing devices, so I
>>> prefer the easy way, just checking btrfs_device->err_* and
>>> btrfs_device->missing.
>>>
>>> Any simple example to explain your suggestion here?
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>>
>>>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>>> ---
>>>>>  fs/btrfs/disk-io.c | 15 +++++++--------
>>>>>  fs/btrfs/volumes.c |  4 +++-
>>>>>  fs/btrfs/volumes.h |  4 ++++
>>>>>  3 files changed, 14 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>>> index c26b8a0b121c..f596bd130524 100644
>>>>> --- a/fs/btrfs/disk-io.c
>>>>> +++ b/fs/btrfs/disk-io.c
>>>>> @@ -3569,17 +3569,17 @@ static int barrier_all_devices(struct
>>>>> btrfs_fs_info *info)
>>>>>  {
>>>>>      struct list_head *head;
>>>>>      struct btrfs_device *dev;
>>>>> -    int errors_send = 0;
>>>>> -    int errors_wait = 0;
>>>>>      int ret;
>>>>>
>>>>>      /* send down all the barriers */
>>>>>      head = &info->fs_devices->devices;
>>>>>      list_for_each_entry_rcu(dev, head, dev_list) {
>>>>> +        dev->err_wait = false;
>>>>> +        dev->err_send = false;
>>>>>          if (dev->missing)
>>>>>              continue;
>>>>>          if (!dev->bdev) {
>>>>> -            errors_send++;
>>>>> +            dev->err_send = true;
>>>>>              continue;
>>>>>          }
>>>>>          if (!dev->in_fs_metadata || !dev->writeable)
>>>>> @@ -3587,7 +3587,7 @@ static int barrier_all_devices(struct
>>>>> btrfs_fs_info *info)
>>>>>
>>>>>          ret = write_dev_flush(dev, 0);
>>>>>          if (ret)
>>>>> -            errors_send++;
>>>>> +            dev->err_send = true;
>>>>>      }
>>>>>
>>>>>      /* wait for all the barriers */
>>>>> @@ -3595,7 +3595,7 @@ static int barrier_all_devices(struct
>>>>> btrfs_fs_info *info)
>>>>>          if (dev->missing)
>>>>>              continue;
>>>>>          if (!dev->bdev) {
>>>>> -            errors_wait++;
>>>>> +            dev->err_wait = true;
>>>>>              continue;
>>>>>          }
>>>>>          if (!dev->in_fs_metadata || !dev->writeable)
>>>>> @@ -3603,10 +3603,9 @@ static int barrier_all_devices(struct
>>>>> btrfs_fs_info *info)
>>>>>
>>>>>          ret = write_dev_flush(dev, 1);
>>>>>          if (ret)
>>>>> -            errors_wait++;
>>>>> +            dev->err_wait = true;
>>>>>      }
>>>>> -    if (errors_send > info->num_tolerated_disk_barrier_failures ||
>>>>> -        errors_wait > info->num_tolerated_disk_barrier_failures)
>>>>> +    if (!btrfs_check_rw_degradable(info))
>>>>>          return -EIO;
>>>>>      return 0;
>>>>>  }
>>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>>> index dd9dd94d7043..729cbd0d2b60 100644
>>>>> --- a/fs/btrfs/volumes.c
>>>>> +++ b/fs/btrfs/volumes.c
>>>>> @@ -6796,7 +6796,9 @@ bool btrfs_check_rw_degradable(struct
>>>>> btrfs_fs_info *fs_info)
>>>>>              btrfs_get_num_tolerated_disk_barrier_failures(
>>>>>                      map->type);
>>>>>          for (i = 0; i < map->num_stripes; i++) {
>>>>> -            if (map->stripes[i].dev->missing)
>>>>> +            if (map->stripes[i].dev->missing ||
>>>>> +                map->stripes[i].dev->err_wait ||
>>>>> +                map->stripes[i].dev->err_send)
>>>>>                  missing++;
>>>>>          }
>>
>>  This is rather wrong.
>>
>>
>>>>>          if (missing > max_tolerated) {
>>>>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>>>>> index db1b5ef479cf..112fccacdabc 100644
>>>>> --- a/fs/btrfs/volumes.h
>>>>> +++ b/fs/btrfs/volumes.h
>>>>> @@ -75,6 +75,10 @@ struct btrfs_device {
>>>>>      int can_discard;
>>>>>      int is_tgtdev_for_dev_replace;
>>>>>
>>>>> +    /* If this devices fails to send/wait dev flush */
>>>>> +    bool err_send;
>>>>> +    bool err_wait;
>>>>
>>>>
>>>>
>>>>>  #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
>>>>>      seqcount_t data_seqcount;
>>>>>  #endif
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c26b8a0b121c..f596bd130524 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3569,17 +3569,17 @@  static int barrier_all_devices(struct btrfs_fs_info *info)
 {
 	struct list_head *head;
 	struct btrfs_device *dev;
-	int errors_send = 0;
-	int errors_wait = 0;
 	int ret;
 
 	/* send down all the barriers */
 	head = &info->fs_devices->devices;
 	list_for_each_entry_rcu(dev, head, dev_list) {
+		dev->err_wait = false;
+		dev->err_send = false;
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_send++;
+			dev->err_send = true;
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3587,7 +3587,7 @@  static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 0);
 		if (ret)
-			errors_send++;
+			dev->err_send = true;
 	}
 
 	/* wait for all the barriers */
@@ -3595,7 +3595,7 @@  static int barrier_all_devices(struct btrfs_fs_info *info)
 		if (dev->missing)
 			continue;
 		if (!dev->bdev) {
-			errors_wait++;
+			dev->err_wait = true;
 			continue;
 		}
 		if (!dev->in_fs_metadata || !dev->writeable)
@@ -3603,10 +3603,9 @@  static int barrier_all_devices(struct btrfs_fs_info *info)
 
 		ret = write_dev_flush(dev, 1);
 		if (ret)
-			errors_wait++;
+			dev->err_wait = true;
 	}
-	if (errors_send > info->num_tolerated_disk_barrier_failures ||
-	    errors_wait > info->num_tolerated_disk_barrier_failures)
+	if (!btrfs_check_rw_degradable(info))
 		return -EIO;
 	return 0;
 }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index dd9dd94d7043..729cbd0d2b60 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6796,7 +6796,9 @@  bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info)
 			btrfs_get_num_tolerated_disk_barrier_failures(
 					map->type);
 		for (i = 0; i < map->num_stripes; i++) {
-			if (map->stripes[i].dev->missing)
+			if (map->stripes[i].dev->missing ||
+			    map->stripes[i].dev->err_wait ||
+			    map->stripes[i].dev->err_send)
 				missing++;
 		}
 		if (missing > max_tolerated) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index db1b5ef479cf..112fccacdabc 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -75,6 +75,10 @@  struct btrfs_device {
 	int can_discard;
 	int is_tgtdev_for_dev_replace;
 
+	/* If this devices fails to send/wait dev flush */
+	bool err_send;
+	bool err_wait;
+
 #ifdef __BTRFS_NEED_DEVICE_DATA_ORDERED
 	seqcount_t data_seqcount;
 #endif