[3/3] btrfs: fix race between mkfs and mount

Message ID	20180604150030.12883-3-anand.jain@oracle.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> From: Anand Jain <anand.jain@oracle.com> To: linux-btrfs@vger.kernel.org Subject: [PATCH 3/3] btrfs: fix race between mkfs and mount Date: Mon, 4 Jun 2018 23:00:30 +0800 Message-Id: <20180604150030.12883-3-anand.jain@oracle.com> In-Reply-To: <20180604150030.12883-1-anand.jain@oracle.com> References: <20180604150030.12883-1-anand.jain@oracle.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk

Anand Jain June 4, 2018, 3 p.m. UTC

In an instrumented testing it is possible that the mount and
a newer mkfs.btrfs thread on the same device can race and if the new
mkfs.btrfs wins it will free the older fs_devices, then the mount thread
will lead to oops.

Thread1						Thread2
-------						-------
mkfs.btrfs -fq /dev/sdb
mount /dev/sdb /btrfs
|_btrfs_mount_root()
  |_btrfs_scan_one_device(... &fs_devices)

						mkfs.btrfs -fq /dev/sdb
						|_btrfs_contol_ioctl()
						  |_btrfs_scan_one_device(... &fs_devices)
						    |_::
						      |_btrfs_free_stale_devices()

  |_btrfs_open_devices(fs_devices ..) <-- stale fs_devices.

Fix this with a mutually exclusive flag BTRFS_VOL_FLAG_EXCL_OPS.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/super.c   |  6 ++++++
 fs/btrfs/volumes.c | 10 +++++++++-
 fs/btrfs/volumes.h |  1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

David Sterba June 7, 2018, 4:39 p.m. UTC | #1

On Mon, Jun 04, 2018 at 11:00:30PM +0800, Anand Jain wrote:
> In an instrumented testing it is possible that the mount and
> a newer mkfs.btrfs thread on the same device can race and if the new
> mkfs.btrfs wins it will free the older fs_devices, then the mount thread
> will lead to oops.
> 
> Thread1						Thread2
> -------						-------
> mkfs.btrfs -fq /dev/sdb
> mount /dev/sdb /btrfs
> |_btrfs_mount_root()
>   |_btrfs_scan_one_device(... &fs_devices)
> 
> 						mkfs.btrfs -fq /dev/sdb
> 						|_btrfs_contol_ioctl()
> 						  |_btrfs_scan_one_device(... &fs_devices)
> 						    |_::
> 						      |_btrfs_free_stale_devices()
> 
>   |_btrfs_open_devices(fs_devices ..) <-- stale fs_devices.
> 
> Fix this with a mutually exclusive flag BTRFS_VOL_FLAG_EXCL_OPS.

I'm not sure this is the right way to fix it, adding another bit to the
uuid_mutex and device_list_mutex combo.

To fix the race between mount and mkfs we can add a bit of logic to the
device scanning so that mount calling scan will track the purpose and
mkfs scan will not be allowed to free that device.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba June 19, 2018, 1:53 p.m. UTC | #2

On Thu, Jun 07, 2018 at 06:39:32PM +0200, David Sterba wrote:
> On Mon, Jun 04, 2018 at 11:00:30PM +0800, Anand Jain wrote:
> > In an instrumented testing it is possible that the mount and
> > a newer mkfs.btrfs thread on the same device can race and if the new
> > mkfs.btrfs wins it will free the older fs_devices, then the mount thread
> > will lead to oops.
> > 
> > Thread1						Thread2
> > -------						-------
> > mkfs.btrfs -fq /dev/sdb
> > mount /dev/sdb /btrfs
> > |_btrfs_mount_root()
> >   |_btrfs_scan_one_device(... &fs_devices)
> > 
> > 						mkfs.btrfs -fq /dev/sdb
> > 						|_btrfs_contol_ioctl()
> > 						  |_btrfs_scan_one_device(... &fs_devices)
> > 						    |_::
> > 						      |_btrfs_free_stale_devices()
> > 
> >   |_btrfs_open_devices(fs_devices ..) <-- stale fs_devices.
> > 
> > Fix this with a mutually exclusive flag BTRFS_VOL_FLAG_EXCL_OPS.
> 
> I'm not sure this is the right way to fix it, adding another bit to the
> uuid_mutex and device_list_mutex combo.
> 
> To fix the race between mount and mkfs we can add a bit of logic to the
> device scanning so that mount calling scan will track the purpose and
> mkfs scan will not be allowed to free that device.

Last version of the proposed fix is to extend the uuid_mutex over the
whole mount callback and use it around calls to btrfs_scan_one_device.
That way we'll be sure the mount will always get to the device it
scanned and that will not be freed by the parallel scan.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba June 20, 2018, 2:06 p.m. UTC | #3

On Mon, Jun 04, 2018 at 11:00:30PM +0800, Anand Jain wrote:
> In an instrumented testing it is possible that the mount and
> a newer mkfs.btrfs thread on the same device can race and if the new
> mkfs.btrfs wins it will free the older fs_devices, then the mount thread
> will lead to oops.
> 
> Thread1						Thread2
> -------						-------
> mkfs.btrfs -fq /dev/sdb
> mount /dev/sdb /btrfs
> |_btrfs_mount_root()
>   |_btrfs_scan_one_device(... &fs_devices)
> 
> 						mkfs.btrfs -fq /dev/sdb
> 						|_btrfs_contol_ioctl()
> 						  |_btrfs_scan_one_device(... &fs_devices)
> 						    |_::
> 						      |_btrfs_free_stale_devices()
> 
>   |_btrfs_open_devices(fs_devices ..) <-- stale fs_devices.
> 
> Fix this with a mutually exclusive flag BTRFS_VOL_FLAG_EXCL_OPS.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/super.c   |  6 ++++++
>  fs/btrfs/volumes.c | 10 +++++++++-
>  fs/btrfs/volumes.h |  1 +
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index f0c13defc9eb..b60e7cbe39f5 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1565,7 +1565,13 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
>  		goto error_fs_info;
>  	}
>  
> +	if (test_and_set_bit(BTRFS_VOLUME_STATE_EXCL_OPS, &fs_devices->volume_state)) {
> +		error = -EBUSY;

We'd need to wait until the bit is not set instead of BUSY, as the
parallel scan is not really a reason to fail the whole mount.

I'll post the patch series to address this problem today, it utilizes
the uuid_mutex in a similar way you try to do with the new bit, but it
will not lead to EBUSY.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anand Jain June 26, 2018, 6:25 a.m. UTC | #4

(sorry for the delay in reply due to my vacation and, other
  urgent things took my priority too).

  Please see inline below.

On 06/19/2018 09:53 PM, David Sterba wrote:
> On Thu, Jun 07, 2018 at 06:39:32PM +0200, David Sterba wrote:
>> On Mon, Jun 04, 2018 at 11:00:30PM +0800, Anand Jain wrote:
>>> In an instrumented testing it is possible that the mount and
>>> a newer mkfs.btrfs thread on the same device can race and if the new
>>> mkfs.btrfs wins it will free the older fs_devices, then the mount thread
>>> will lead to oops.
>>>
>>> Thread1						Thread2
>>> -------						-------
>>> mkfs.btrfs -fq /dev/sdb
>>> mount /dev/sdb /btrfs
>>> |_btrfs_mount_root()
>>>    |_btrfs_scan_one_device(... &fs_devices)
>>>
>>> 						mkfs.btrfs -fq /dev/sdb
>>> 						|_btrfs_contol_ioctl()
>>> 						  |_btrfs_scan_one_device(... &fs_devices)
>>> 						    |_::
>>> 						      |_btrfs_free_stale_devices()
>>>
>>>    |_btrfs_open_devices(fs_devices ..) <-- stale fs_devices.
>>>
>>> Fix this with a mutually exclusive flag BTRFS_VOL_FLAG_EXCL_OPS.
>>
>> I'm not sure this is the right way to fix it, adding another bit to the
>> uuid_mutex and device_list_mutex combo.

  Hmm I wonder why?

>> To fix the race between mount and mkfs we can add a bit of logic to the
>> device scanning so that mount calling scan will track the purpose and
>> mkfs scan will not be allowed to free that device.

  Right. To implement such a logic requisites are..
   #1 The lock must be FSID local so that concurrent mount and or scan
      of two independent FSID+DEV is possible.
   #2 It should not return EBUSY when either of scan or mount is in
      progress but smart enough to complete the (re)scan and or mount
      in parallel

  #1 is must, but #2 is good to have and if EBUSY is returned its not
  wrong as well.


> Last version of the proposed fix is to extend the uuid_mutex over the
> whole mount callback and use it around calls to btrfs_scan_one_device.
> That way we'll be sure the mount will always get to the device it
> scanned and that will not be freed by the parallel scan.

  That will break the requisite #1 as above.

Thanks, Anand

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anand Jain June 26, 2018, 6:38 a.m. UTC | #5

On 06/20/2018 10:06 PM, David Sterba wrote:
> On Mon, Jun 04, 2018 at 11:00:30PM +0800, Anand Jain wrote:
>> In an instrumented testing it is possible that the mount and
>> a newer mkfs.btrfs thread on the same device can race and if the new
>> mkfs.btrfs wins it will free the older fs_devices, then the mount thread
>> will lead to oops.
>>
>> Thread1						Thread2
>> -------						-------
>> mkfs.btrfs -fq /dev/sdb
>> mount /dev/sdb /btrfs
>> |_btrfs_mount_root()
>>    |_btrfs_scan_one_device(... &fs_devices)
>>
>> 						mkfs.btrfs -fq /dev/sdb
>> 						|_btrfs_contol_ioctl()
>> 						  |_btrfs_scan_one_device(... &fs_devices)
>> 						    |_::
>> 						      |_btrfs_free_stale_devices()
>>
>>    |_btrfs_open_devices(fs_devices ..) <-- stale fs_devices.
>>
>> Fix this with a mutually exclusive flag BTRFS_VOL_FLAG_EXCL_OPS.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/super.c   |  6 ++++++
>>   fs/btrfs/volumes.c | 10 +++++++++-
>>   fs/btrfs/volumes.h |  1 +
>>   3 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index f0c13defc9eb..b60e7cbe39f5 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -1565,7 +1565,13 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
>>   		goto error_fs_info;
>>   	}
>>   
>> +	if (test_and_set_bit(BTRFS_VOLUME_STATE_EXCL_OPS, &fs_devices->volume_state)) {
>> +		error = -EBUSY;
> 
> We'd need to wait until the bit is not set instead of BUSY, as the
> parallel scan is not really a reason to fail the whole mount.

> I'll post the patch series to address this problem today, it utilizes
> the uuid_mutex in a similar way you try to do with the new bit, but it
> will not lead to EBUSY.

  Ok. Shall review.

Thanks, Anand

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba June 26, 2018, 12:19 p.m. UTC | #6

On Tue, Jun 26, 2018 at 02:25:11PM +0800, Anand Jain wrote:
> 
> 
> (sorry for the delay in reply due to my vacation and, other
>   urgent things took my priority too).
> 
>   Please see inline below.
> 
> On 06/19/2018 09:53 PM, David Sterba wrote:
> > On Thu, Jun 07, 2018 at 06:39:32PM +0200, David Sterba wrote:
> >> On Mon, Jun 04, 2018 at 11:00:30PM +0800, Anand Jain wrote:
> >>> In an instrumented testing it is possible that the mount and
> >>> a newer mkfs.btrfs thread on the same device can race and if the new
> >>> mkfs.btrfs wins it will free the older fs_devices, then the mount thread
> >>> will lead to oops.
> >>>
> >>> Thread1						Thread2
> >>> -------						-------
> >>> mkfs.btrfs -fq /dev/sdb
> >>> mount /dev/sdb /btrfs
> >>> |_btrfs_mount_root()
> >>>    |_btrfs_scan_one_device(... &fs_devices)
> >>>
> >>> 						mkfs.btrfs -fq /dev/sdb
> >>> 						|_btrfs_contol_ioctl()
> >>> 						  |_btrfs_scan_one_device(... &fs_devices)
> >>> 						    |_::
> >>> 						      |_btrfs_free_stale_devices()
> >>>
> >>>    |_btrfs_open_devices(fs_devices ..) <-- stale fs_devices.
> >>>
> >>> Fix this with a mutually exclusive flag BTRFS_VOL_FLAG_EXCL_OPS.
> >>
> >> I'm not sure this is the right way to fix it, adding another bit to the
> >> uuid_mutex and device_list_mutex combo.
> 
>   Hmm I wonder why?
> 
> >> To fix the race between mount and mkfs we can add a bit of logic to the
> >> device scanning so that mount calling scan will track the purpose and
> >> mkfs scan will not be allowed to free that device.
> 
>   Right. To implement such a logic requisites are..
>    #1 The lock must be FSID local so that concurrent mount and or scan
>       of two independent FSID+DEV is possible.
>    #2 It should not return EBUSY when either of scan or mount is in
>       progress but smart enough to complete the (re)scan and or mount
>       in parallel
> 
>   #1 is must, but #2 is good to have and if EBUSY is returned its not
>   wrong as well.
> 
> 
> > Last version of the proposed fix is to extend the uuid_mutex over the
> > whole mount callback and use it around calls to btrfs_scan_one_device.
> > That way we'll be sure the mount will always get to the device it
> > scanned and that will not be freed by the parallel scan.
> 
>   That will break the requisite #1 as above.

I don't see how it breaks 'mount and or scan of two independent fsid+dev
is possible'. It is possible, but the lock does not need to be
filesystem local.

Concurrent mount or scan will block on the uuid_mutex, so all callers
will proceed once the lock is released and there's no unexpected
behaviour.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anand Jain June 26, 2018, 2:42 p.m. UTC | #7

On 06/26/2018 08:19 PM, David Sterba wrote:
> On Tue, Jun 26, 2018 at 02:25:11PM +0800, Anand Jain wrote:
>>
>>
>> (sorry for the delay in reply due to my vacation and, other
>>    urgent things took my priority too).
>>
>>    Please see inline below.
>>
>> On 06/19/2018 09:53 PM, David Sterba wrote:
>>> On Thu, Jun 07, 2018 at 06:39:32PM +0200, David Sterba wrote:
>>>> On Mon, Jun 04, 2018 at 11:00:30PM +0800, Anand Jain wrote:
>>>>> In an instrumented testing it is possible that the mount and
>>>>> a newer mkfs.btrfs thread on the same device can race and if the new
>>>>> mkfs.btrfs wins it will free the older fs_devices, then the mount thread
>>>>> will lead to oops.
>>>>>
>>>>> Thread1						Thread2
>>>>> -------						-------
>>>>> mkfs.btrfs -fq /dev/sdb
>>>>> mount /dev/sdb /btrfs
>>>>> |_btrfs_mount_root()
>>>>>     |_btrfs_scan_one_device(... &fs_devices)
>>>>>
>>>>> 						mkfs.btrfs -fq /dev/sdb
>>>>> 						|_btrfs_contol_ioctl()
>>>>> 						  |_btrfs_scan_one_device(... &fs_devices)
>>>>> 						    |_::
>>>>> 						      |_btrfs_free_stale_devices()
>>>>>
>>>>>     |_btrfs_open_devices(fs_devices ..) <-- stale fs_devices.
>>>>>
>>>>> Fix this with a mutually exclusive flag BTRFS_VOL_FLAG_EXCL_OPS.
>>>>
>>>> I'm not sure this is the right way to fix it, adding another bit to the
>>>> uuid_mutex and device_list_mutex combo.
>>
>>    Hmm I wonder why?
>>
>>>> To fix the race between mount and mkfs we can add a bit of logic to the
>>>> device scanning so that mount calling scan will track the purpose and
>>>> mkfs scan will not be allowed to free that device.
>>
>>    Right. To implement such a logic requisites are..
>>     #1 The lock must be FSID local so that concurrent mount and or scan
>>        of two independent FSID+DEV is possible.
>>     #2 It should not return EBUSY when either of scan or mount is in
>>        progress but smart enough to complete the (re)scan and or mount
>>        in parallel
>>
>>    #1 is must, but #2 is good to have and if EBUSY is returned its not
>>    wrong as well.
>>
>>
>>> Last version of the proposed fix is to extend the uuid_mutex over the
>>> whole mount callback and use it around calls to btrfs_scan_one_device.
>>> That way we'll be sure the mount will always get to the device it
>>> scanned and that will not be freed by the parallel scan.
>>
>>    That will break the requisite #1 as above.
> 
> I don't see how it breaks 'mount and or scan of two independent fsid+dev
> is possible'. It is possible, but the lock does not need to be
> filesystem local.
> 
> Concurrent mount or scan will block on the uuid_mutex,

  As uuid_mutex is global, if fsid1 is being mounted, the fsid2 mount
  will wait upto certain extent with this code. My efforts here was to
  use uuid_mutex only to protect the fs_uuid update part, in this way
  there is concurrency in the mount process of fsid1 and fsid2 and
  provides shorter bootup time when the user uses the mount at boot
  option.

Thanks, Anand

> so all callers
> will proceed once the lock is released and there's no unexpected
> behaviour.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba June 29, 2018, 12:06 p.m. UTC | #8

On Tue, Jun 26, 2018 at 10:42:32PM +0800, Anand Jain wrote:
> >>> Last version of the proposed fix is to extend the uuid_mutex over the
> >>> whole mount callback and use it around calls to btrfs_scan_one_device.
> >>> That way we'll be sure the mount will always get to the device it
> >>> scanned and that will not be freed by the parallel scan.
> >>
> >>    That will break the requisite #1 as above.
> > 
> > I don't see how it breaks 'mount and or scan of two independent fsid+dev
> > is possible'. It is possible, but the lock does not need to be
> > filesystem local.
> > 
> > Concurrent mount or scan will block on the uuid_mutex,
> 
>   As uuid_mutex is global, if fsid1 is being mounted, the fsid2 mount
>   will wait upto certain extent with this code.

Yes it will wait a bit, but the critical section is short. There's some
IO done and it's reading of 4K in btrfs_read_disk_super. The rest is
cpu-bound and hardly measurable in practice, in context of concurrent
mount and scanning.

I took the approach to fix the bug first and then make it faster or
cleaner, also to fix it in a way that's still acceptable for current
development cycle.

>   My efforts here was to
>   use uuid_mutex only to protect the fs_uuid update part, in this way
>   there is concurrency in the mount process of fsid1 and fsid2 and
>   provides shorter bootup time when the user uses the mount at boot
>   option.

The locking can be still improved but the uuid_mutex is not a contended
lock, mount is an operation that does not happen thousand times a
second, same for the scanning.

So even if there are several mounts happening during boot, it's just a
few and the delay is IMO unnoticeable. If the boot time is longer by
say 100ms at worst, I'm still ok with my patches as a fix.

But not as a final fix, so the updates to locking you mentioned are
still possible. We now have a point of reference where syzbot does is
not able to reproduce the bugs.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Anand Jain June 30, 2018, 2:16 a.m. UTC | #9

On 06/29/2018 08:06 PM, David Sterba wrote:
> On Tue, Jun 26, 2018 at 10:42:32PM +0800, Anand Jain wrote:
>>>>> Last version of the proposed fix is to extend the uuid_mutex over the
>>>>> whole mount callback and use it around calls to btrfs_scan_one_device.
>>>>> That way we'll be sure the mount will always get to the device it
>>>>> scanned and that will not be freed by the parallel scan.
>>>>
>>>>     That will break the requisite #1 as above.
>>>
>>> I don't see how it breaks 'mount and or scan of two independent fsid+dev
>>> is possible'. It is possible, but the lock does not need to be
>>> filesystem local.
>>>
>>> Concurrent mount or scan will block on the uuid_mutex,
>>
>>    As uuid_mutex is global, if fsid1 is being mounted, the fsid2 mount
>>    will wait upto certain extent with this code.
> 
> Yes it will wait a bit, but the critical section is short. There's some
> IO done and it's reading of 4K in btrfs_read_disk_super. The rest is
> cpu-bound and hardly measurable in practice, in context of concurrent
> mount and scanning.
> 
> I took the approach to fix the bug first and then make it faster or
> cleaner, also to fix it in a way that's still acceptable for current
> development cycle.
>>    My efforts here was to
>>    use uuid_mutex only to protect the fs_uuid update part, in this way
>>    there is concurrency in the mount process of fsid1 and fsid2 and
>>    provides shorter bootup time when the user uses the mount at boot
>>    option.
> 
> The locking can be still improved but the uuid_mutex is not a contended
> lock, mount is an operation that does not happen thousand times a
> second, same for the scanning.

  My concern is about the boot up time when there are larger number of
  LUN configured with independent btrfs FSIDs to mount at boot. Since
  BTRFS is a kind of infrastructure for the servers, we can't rule out
  that these kind of configuration won't exists at all. Anyway as you
  said we can use uuid_mutex for the current development cycle.

  However a review on [1] which does fix your earlier concern of
  returning -EBUSY is appreciated. And pls let me know if going ahead
  with this approach would be accepted in the current development cycle.?

> So even if there are several mounts happening during boot, it's just a
> few and the delay is IMO unnoticeable. If the boot time is longer by
> say 100ms at worst, I'm still ok with my patches as a fix.
> 
> But not as a final fix, so the updates to locking you mentioned are
> still possible. We now have a point of reference where syzbot does is
> not able to reproduce the bugs.


[1]
----------------------------------
When %fs_devices::opened > 0 the device is mounted, %fs_devices is never
freed so its safe to use %fs_devices and it does not need any locks or
flags.

However, when %fs_devices::opened == 0 (idle) that means device is not
yet mounted, and it can be either transition to opened or freed. When
it transitions to freed, fs_devices gets freed and any pointer accessing
will endup with UAF error.

Here are places where we access fs_devcies and it needs locking and
using uuid_mutex is one of method

1.
READY ioctl

2.
Mount

3.
SCAN ioctl

4.
Stale cleanup during SCAN

5.
planned device FORGET ioctl

#4 and #5 may free the fs_devices while #1, #2, and #3 are still
accessing the fs_devices.

using uuid_mutex is one choice however it would slow down the mount
at boot when there are larger number of independent btrfs fsid being
mounted at boot.

Proposed Fix
-------------

This does not return the -EBUSY. If there is any better way
I am ok to use it.

struct btrfs_fs_devices
{
::
    int ref_count;
::
}

To acquire a fs_devices..

volume_devices_excl_state_enter(x)
{
  fs_devices = NULL
  lock uuid_mutex
    if (fs_devices = find fs_devices(x))
         fs_devices::ref_count++
  unlock uuid_mutex
  return fs_devices
}


To release a fs_devices..

volume_devices_excl_state_exit(fs_devices)
{
   lock uuid_mutex
     fs_devices::ref_count--
   unlock uuid_mutex
}


To delete a fs_devices..

again:
  lock uuid_mutex
    find fs_devices
    if (fs_devices::ref_count != 0) {
       unlock uuid_mutex
       goto agin;
    }
    if (fs_devices->opened > 0) {
      unlock uuid_mutex
      return -EBUSY
    }

    free_fs_devices()
  unlock uuid_mutex

------------------------------------------


Thanks, Anand


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[3/3] btrfs: fix race between mkfs and mount

Commit Message

Comments

Patch