diff mbox

[v2,2/2] btrfs: wait for bdev put

Message ID 1b7bb28b-939e-c111-9bb0-5091ab1cdcf1@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Anand Jain June 22, 2016, 10:18 a.m. UTC
Thanks for the review Chris.

On 06/21/2016 09:00 PM, Chris Mason wrote:
> On 06/21/2016 06:24 AM, Anand Jain wrote:
>> From: Anand Jain <Anand.Jain@oracle.com>
>>
>> Further to the commit
>>       bc178622d40d87e75abc131007342429c9b03351
>>       btrfs: use rcu_barrier() to wait for bdev puts at unmount
>>
>> This patch implements a method to time wait on the __free_device()
>> which actually does the bdev put. This is needed as the user space
>> running 'btrfs fi show -d' immediately after the replace and
>> unmount, is still reading older information from the device.
>
> Thanks for working on this Anand.  Since it looks like blkdev_put can
> deadlock against us, can we please switch to making sure we fully flush
> the outstanding IO?  It's probably enough to do a sync_blockdev() call
> before we allow the unmount to finish, but we can toss in an
> invalidate_bdev for good measure.


------------
# git diff
         BUG_ON(IS_ERR(new_device)); /* -ENOMEM */
-----------


  However, theoretically still there might be a problem - at the end of
  unmount, if the device exclusive open is not actually closed, then
  there might be a race with another program which is trying to open
  the device in exclusive mode. Like for eg:
       unmount /btrfs; fsck /dev/X
  and here fsck might fail to open the device if it wins the race.


> Then we can get rid of the mdelay loop completely, which seems pretty
> error prone to me.

  Yes, the code got little complex here (and also when sysfs fixes
  were wrote) as struct btrfs_device is getting migrated to a new
  struct btrfs_device at unmount, which I don't think was necessary?


Thanks, Anand


> Thanks!
>
> -chris
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Chris Mason June 22, 2016, 9:47 p.m. UTC | #1
On Wed, Jun 22, 2016 at 6:18 AM, Anand Jain <anand.jain@oracle.com> 
wrote:
> 
> 
>  Thanks for the review Chris.
> 
> On 06/21/2016 09:00 PM, Chris Mason wrote:
>> On 06/21/2016 06:24 AM, Anand Jain wrote:
>>> From: Anand Jain <Anand.Jain@oracle.com>
>>> 
>>> Further to the commit
>>>       bc178622d40d87e75abc131007342429c9b03351
>>>       btrfs: use rcu_barrier() to wait for bdev puts at unmount
>>> 
>>> This patch implements a method to time wait on the __free_device()
>>> which actually does the bdev put. This is needed as the user space
>>> running 'btrfs fi show -d' immediately after the replace and
>>> unmount, is still reading older information from the device.
>> 
>> Thanks for working on this Anand.  Since it looks like blkdev_put can
>> deadlock against us, can we please switch to making sure we fully 
>> flush
>> the outstanding IO?  It's probably enough to do a sync_blockdev() 
>> call
>> before we allow the unmount to finish, but we can toss in an
>> invalidate_bdev for good measure.
> 
> 
> ------------
> # git diff
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 604daf315669..e0ad29d6fe9a 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -870,6 +870,11 @@ static void btrfs_close_one_device(struct 
> btrfs_device *device)
>         if (device->missing)
>                 fs_devices->missing_devices--;
> 
> +       if (device->bdev && device->writeable) {
> +               sync_blockdev(device->bdev);
> +               invalidate_bdev(device->bdev);
> +       }
> +
>         new_device = btrfs_alloc_device(NULL, &device->devid,
>                                         device->uuid);
>         BUG_ON(IS_ERR(new_device)); /* -ENOMEM */
> -----------
> 
> 
>  However, theoretically still there might be a problem - at the end of
>  unmount, if the device exclusive open is not actually closed, then
>  there might be a race with another program which is trying to open
>  the device in exclusive mode. Like for eg:
>       unmount /btrfs; fsck /dev/X
>  and here fsck might fail to open the device if it wins the race.

This true, but at least we know he'll have up to date buffers if he 
does manage to open the device.

With the generic code, the blkdev_put happens after the super is gone, 
so I'm not sure we can completely fix this from inside our callback.

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anand Jain June 23, 2016, 1:07 p.m. UTC | #2
On 06/23/2016 05:47 AM, Chris Mason wrote:
>
>
> On Wed, Jun 22, 2016 at 6:18 AM, Anand Jain <anand.jain@oracle.com> wrote:
>>
>>
>>  Thanks for the review Chris.
>>
>> On 06/21/2016 09:00 PM, Chris Mason wrote:
>>> On 06/21/2016 06:24 AM, Anand Jain wrote:
>>>> From: Anand Jain <Anand.Jain@oracle.com>
>>>>
>>>> Further to the commit
>>>>       bc178622d40d87e75abc131007342429c9b03351
>>>>       btrfs: use rcu_barrier() to wait for bdev puts at unmount
>>>>
>>>> This patch implements a method to time wait on the __free_device()
>>>> which actually does the bdev put. This is needed as the user space
>>>> running 'btrfs fi show -d' immediately after the replace and
>>>> unmount, is still reading older information from the device.
>>>
>>> Thanks for working on this Anand.  Since it looks like blkdev_put can
>>> deadlock against us, can we please switch to making sure we fully flush
>>> the outstanding IO?  It's probably enough to do a sync_blockdev() call
>>> before we allow the unmount to finish, but we can toss in an
>>> invalidate_bdev for good measure.
>>
>>
>> ------------
>> # git diff
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 604daf315669..e0ad29d6fe9a 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -870,6 +870,11 @@ static void btrfs_close_one_device(struct
>> btrfs_device *device)
>>         if (device->missing)
>>                 fs_devices->missing_devices--;
>>
>> +       if (device->bdev && device->writeable) {
>> +               sync_blockdev(device->bdev);
>> +               invalidate_bdev(device->bdev);
>> +       }
>> +
>>         new_device = btrfs_alloc_device(NULL, &device->devid,
>>                                         device->uuid);
>>         BUG_ON(IS_ERR(new_device)); /* -ENOMEM */
>> -----------
>>
>>
>>  However, theoretically still there might be a problem - at the end of
>>  unmount, if the device exclusive open is not actually closed, then
>>  there might be a race with another program which is trying to open
>>  the device in exclusive mode. Like for eg:
>>       unmount /btrfs; fsck /dev/X
>>  and here fsck might fail to open the device if it wins the race.
>
> This true, but at least we know he'll have up to date buffers if he does
> manage to open the device.
>
> With the generic code, the blkdev_put happens after the super is gone,
> so I'm not sure we can completely fix this from inside our callback.

  Makes sense, sent out v3 with title
   (btrfs: make sure device is synced before return)

  Also sent out RFC patch
     btrfs: make sure device is synced before return
  where I have tried not to background blkdev_put(),
  which seems to be better, it works fine per fstests.


Thanks, Anand


> -chris
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 604daf315669..e0ad29d6fe9a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -870,6 +870,11 @@  static void btrfs_close_one_device(struct 
btrfs_device *device)
         if (device->missing)
                 fs_devices->missing_devices--;

+       if (device->bdev && device->writeable) {
+               sync_blockdev(device->bdev);
+               invalidate_bdev(device->bdev);
+       }
+
         new_device = btrfs_alloc_device(NULL, &device->devid,
                                         device->uuid);