diff mbox

[06/11] btrfs: document device locking

Message ID 5f3dc05b0b44803e9d20498970f259ead2bfe7de.1509471604.git.dsterba@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

David Sterba Oct. 31, 2017, 5:44 p.m. UTC
Overview of the main locks protecting various device-related structures.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/volumes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

Comments

Nikolay Borisov Nov. 2, 2017, 10:29 a.m. UTC | #1
On 31.10.2017 19:44, David Sterba wrote:
> Overview of the main locks protecting various device-related structures.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/volumes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 66 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 75aed8ec64bd..098affc58361 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -145,6 +145,72 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>  			     struct btrfs_bio **bbio_ret,
>  			     int mirror_num, int need_raid_map);
>  
> +/*
> + * Device locking
> + * ==============
> + *
> + * There are several mutexes that protect manipulation of devices and low-level
> + * structures like chunks but not block groups, extents or files

This sentence suggests you are describing the various mutexes but it
seems you are also describing data structure below i.e.
global::fs_devs/btrfs_device::name

> + *
> + * - uuid_mutex (global lock)
> + *
> + *   protects the fs_uuids list that tracks all per-fs fs_devices, resulting
> + *   from the SCAN_DEV ioctl registration or from mount either implicitly
> + *   (the first device) or requested by the device= mount option
> + *
> + *   the mutex can be very coarse and can cover long-running operations
> + *
> + *   protects: updates to fs_devices counters like missing devices, rw devices,
> + *   seeding, structure cloning, openning/closing devices at mount/umount time

Perhaps move the uuid_mutex description after btrfs_device::name. That
way mutexes are grouped together and data structures are grouped as well.

> + *
> + *   global::fs_devs - add, remove, updates to the global list

You seem to be describing a data structure here rather than a mutex

> + *
> + *   does not protect: manipulation of the fs_devices::devices list!

but this sentence is written as if you've just described a mutex. The
end result is that it's not clear what mutex you are referring to here.

> + *
> + *   btrfs_device::name - renames (write side), read is RCU

It's not clear how the write side is protected.

> + *
> + * - fs_devices::device_list_mutex (per-fs, with RCU)
> + *
> + *   protects updates to fs_devices::devices, ie. adding and deleting
> + *
> + *   simple list traversal with read-only actions can be done with RCU
> + *   protection
> + *
> + *   may be used to exclude some operations from running concurrently without
> + *   any modifications to the list (see write_all_supers)
> + *
> + * - volume_mutex
> + *
> + *   coarse lock owned by a mounted filesystem; used to exclude some operations
> + *   that cannot run in parallel and affect the higher-level properties of the
> + *   filesystem like: device add/deleting/resize/replace, or balance
> + *
> + * - balance_mutex
> + *
> + *   protects balance structures (status, state) and context accessed from
> + *   several places (internally, ioctl)
> + *
> + * - chunk_mutex
> + *
> + *   protects chunks, adding or removing during allocation, trim or when
> + *   a new device is added/removed
> + *
> + * - cleaner_mutex
> + *
> + *   a big lock that is held by the cleaner thread and prevents running
> + *   subvolume cleaning together with relocation or delayed iputs
> + *
> + *
> + * Lock nesting
> + * ------------
> + *
> + * uuid_mutex
> + *   volume_mutex
> + *     device_list_mutex
> + *       chunk_mutex
> + *     balance_mutex
> + *
> + */
>  DEFINE_MUTEX(uuid_mutex);
>  static LIST_HEAD(fs_uuids);
>  struct list_head *btrfs_get_fs_uuids(void)
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anand Jain Nov. 3, 2017, 11:13 a.m. UTC | #2
Thanks for writing this.

> + * - fs_devices::device_list_mutex (per-fs, with RCU)
> + *
> + *   protects updates to fs_devices::devices, ie. adding and deleting
> + *
> + *   simple list traversal with read-only actions can be done with RCU
> + *   protection
> + *
> + *   may be used to exclude some operations from running concurrently without
> + *   any modifications to the list (see write_all_supers)

> + * - volume_mutex
> + *
> + *   coarse lock owned by a mounted filesystem; used to exclude some operations
> + *   that cannot run in parallel and affect the higher-level properties of the
> + *   filesystem like: device add/deleting/resize/replace, or balance

> + * - chunk_mutex
> + *
> + *   protects chunks, adding or removing during allocation, trim or when
> + *   a new device is added/removed

::

> + * Lock nesting
> + * ------------
> + *
> + * uuid_mutex
> + *   volume_mutex
> + *     device_list_mutex
> + *       chunk_mutex
> + *     balance_mutex


If we have a list of operations that would consume these locks then we 
can map it accordingly for better clarity.

To me it looks like we have too many locks.
  - we don't have to differentiate the mounted and unmounted context
    for device locks.
  - Two lock would be sufficient, one for the device list
    (add/rm,replace,..) and another for device property changes
    (resize, trim,..).

Thanks, Anand

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anand Jain Nov. 6, 2017, 2:32 a.m. UTC | #3
On 11/03/2017 07:13 PM, Anand Jain wrote:
> 
> 
> Thanks for writing this.
> 
>> + * - fs_devices::device_list_mutex (per-fs, with RCU)
>> + *
>> + *   protects updates to fs_devices::devices, ie. adding and deleting
>> + *
>> + *   simple list traversal with read-only actions can be done with RCU
>> + *   protection
>> + *
>> + *   may be used to exclude some operations from running concurrently 
>> without
>> + *   any modifications to the list (see write_all_supers)
> 
>> + * - volume_mutex
>> + *
>> + *   coarse lock owned by a mounted filesystem; used to exclude some 
>> operations
>> + *   that cannot run in parallel and affect the higher-level 
>> properties of the
>> + *   filesystem like: device add/deleting/resize/replace, or balance
> 
>> + * - chunk_mutex
>> + *
>> + *   protects chunks, adding or removing during allocation, trim or when
>> + *   a new device is added/removed
> 
> ::
> 
>> + * Lock nesting
>> + * ------------
>> + *
>> + * uuid_mutex
>> + *   volume_mutex

I think it should be nested like this, but as of now its not. Ref [1]
  [1]
  btrfs: move volume_mutex into the btrfs_rm_device()

Thanks, Anand

>> + *     device_list_mutex
>> + *       chunk_mutex
>> + *     balance_mutex
> 
> 
> If we have a list of operations that would consume these locks then we 
> can map it accordingly for better clarity.
> 
> To me it looks like we have too many locks.
>   - we don't have to differentiate the mounted and unmounted context
>     for device locks.
>   - Two lock would be sufficient, one for the device list
>     (add/rm,replace,..) and another for device property changes
>     (resize, trim,..).
> 
> Thanks, Anand
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Nov. 6, 2017, 1:36 p.m. UTC | #4
On Fri, Nov 03, 2017 at 07:13:01PM +0800, Anand Jain wrote:
> 
> 
> Thanks for writing this.
> 
> > + * - fs_devices::device_list_mutex (per-fs, with RCU)
> > + *
> > + *   protects updates to fs_devices::devices, ie. adding and deleting
> > + *
> > + *   simple list traversal with read-only actions can be done with RCU
> > + *   protection
> > + *
> > + *   may be used to exclude some operations from running concurrently without
> > + *   any modifications to the list (see write_all_supers)
> 
> > + * - volume_mutex
> > + *
> > + *   coarse lock owned by a mounted filesystem; used to exclude some operations
> > + *   that cannot run in parallel and affect the higher-level properties of the
> > + *   filesystem like: device add/deleting/resize/replace, or balance
> 
> > + * - chunk_mutex
> > + *
> > + *   protects chunks, adding or removing during allocation, trim or when
> > + *   a new device is added/removed
> 
> ::
> 
> > + * Lock nesting
> > + * ------------
> > + *
> > + * uuid_mutex
> > + *   volume_mutex
> > + *     device_list_mutex
> > + *       chunk_mutex
> > + *     balance_mutex
> 
> 
> If we have a list of operations that would consume these locks then we 
> can map it accordingly for better clarity.
> 
> To me it looks like we have too many locks.

I agree. Trivially we can remove the volume_mutex, because it copies
setting of BTRFS_FS_EXCL_OP, but this still needs some preparatory work
in the mount path. I have patches for that but I wanted to send out the
documentation first, that is supposed to reflect the current state.

>   - we don't have to differentiate the mounted and unmounted context
>     for device locks.

I'm not sure about that, the device registering happens outside of
mounted filesystems yet we touch the same structures on mounted
filesystem.

>   - Two lock would be sufficient, one for the device list
>     (add/rm,replace,..) and another for device property changes
>     (resize, trim,..).

I think this will be sorted once the volume_mutex is gone. The nature of
add/remove and trim is different, as trim does not change the high-level
status of devices, only locks the bloc groups. Resize and add/remove are
mutually exclusive so they need to be protected by a common lock.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Nov. 6, 2017, 1:40 p.m. UTC | #5
On Mon, Nov 06, 2017 at 10:32:48AM +0800, Anand Jain wrote:
> >> + * Lock nesting
> >> + * ------------
> >> + *
> >> + * uuid_mutex
> >> + *   volume_mutex
> 
> I think it should be nested like this, but as of now its not. Ref [1]
>   [1]
>   btrfs: move volume_mutex into the btrfs_rm_device()

Right, this violates the documented rules and I think that's because of
the ambiguous semantic of the uuid_mutex. Here we're using the "on a
mounted filesystem" part so we lock only in the context where this
applies. Again, once volume_mutex is gone, this will be fixed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Nov. 6, 2017, 1:51 p.m. UTC | #6
On Thu, Nov 02, 2017 at 12:29:17PM +0200, Nikolay Borisov wrote:
> On 31.10.2017 19:44, David Sterba wrote:
> > Overview of the main locks protecting various device-related structures.
> > 
> > Signed-off-by: David Sterba <dsterba@suse.com>
> > ---
> >  fs/btrfs/volumes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 66 insertions(+)
> > 
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 75aed8ec64bd..098affc58361 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -145,6 +145,72 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
> >  			     struct btrfs_bio **bbio_ret,
> >  			     int mirror_num, int need_raid_map);
> >  
> > +/*
> > + * Device locking
> > + * ==============
> > + *
> > + * There are several mutexes that protect manipulation of devices and low-level
> > + * structures like chunks but not block groups, extents or files
> 
> This sentence suggests you are describing the various mutexes but it
> seems you are also describing data structure below i.e.
> global::fs_devs/btrfs_device::name

In my understanding, documenting a mutex means defining the context and
data that are protected in the context. So I can start with the mutex,
or the data itself, but the mutex makes IMO more sense to start.


> > + * - uuid_mutex (global lock)
> > + *
> > + *   protects the fs_uuids list that tracks all per-fs fs_devices, resulting
> > + *   from the SCAN_DEV ioctl registration or from mount either implicitly
> > + *   (the first device) or requested by the device= mount option
> > + *
> > + *   the mutex can be very coarse and can cover long-running operations
> > + *
> > + *   protects: updates to fs_devices counters like missing devices, rw devices,
> > + *   seeding, structure cloning, openning/closing devices at mount/umount time
> 
> Perhaps move the uuid_mutex description after btrfs_device::name. That
> way mutexes are grouped together and data structures are grouped as well.

The grouping is by mutex.

> > + *
> > + *   global::fs_devs - add, remove, updates to the global list
> 
> You seem to be describing a data structure here rather than a mutex

As I read it, this means "if this mutex is taken, it protects
global::fs_devs in this way".

> > + *   does not protect: manipulation of the fs_devices::devices list!
> 
> but this sentence is written as if you've just described a mutex. The
> end result is that it's not clear what mutex you are referring to here.

That's the mutex in the section above "- uuid_mutex (global lock)".

> > + *   btrfs_device::name - renames (write side), read is RCU
> 
> It's not clear how the write side is protected.

same answer, it's under the 'uuid_mutex', so the write side is protected
by that, with exception of reads protected by RCU.

I wonder what kind of documentation structure do you expect as it seems
to me you missed it completely. Which is a valuable feedback to me, but
I don't know how to fix it, other than make the sections more visible
with underlining or indenting or more whitespace.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov Nov. 6, 2017, 3:02 p.m. UTC | #7
On  6.11.2017 15:51, David Sterba wrote:
> On Thu, Nov 02, 2017 at 12:29:17PM +0200, Nikolay Borisov wrote:
>> On 31.10.2017 19:44, David Sterba wrote:
>>> Overview of the main locks protecting various device-related structures.
>>>
>>> Signed-off-by: David Sterba <dsterba@suse.com>
>>> ---
>>>  fs/btrfs/volumes.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 66 insertions(+)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 75aed8ec64bd..098affc58361 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -145,6 +145,72 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>>>  			     struct btrfs_bio **bbio_ret,
>>>  			     int mirror_num, int need_raid_map);
>>>  
>>> +/*
>>> + * Device locking
>>> + * ==============
>>> + *
>>> + * There are several mutexes that protect manipulation of devices and low-level
>>> + * structures like chunks but not block groups, extents or files
>>
>> This sentence suggests you are describing the various mutexes but it
>> seems you are also describing data structure below i.e.
>> global::fs_devs/btrfs_device::name
> 
> In my understanding, documenting a mutex means defining the context and
> data that are protected in the context. So I can start with the mutex,
> or the data itself, but the mutex makes IMO more sense to start.
> 
> 
>>> + * - uuid_mutex (global lock)
>>> + *
>>> + *   protects the fs_uuids list that tracks all per-fs fs_devices, resulting
>>> + *   from the SCAN_DEV ioctl registration or from mount either implicitly
>>> + *   (the first device) or requested by the device= mount option
>>> + *
>>> + *   the mutex can be very coarse and can cover long-running operations
>>> + *
>>> + *   protects: updates to fs_devices counters like missing devices, rw devices,
>>> + *   seeding, structure cloning, openning/closing devices at mount/umount time
>>
>> Perhaps move the uuid_mutex description after btrfs_device::name. That
>> way mutexes are grouped together and data structures are grouped as well.
> 
> The grouping is by mutex.
> 
>>> + *
>>> + *   global::fs_devs - add, remove, updates to the global list
>>
>> You seem to be describing a data structure here rather than a mutex
> 
> As I read it, this means "if this mutex is taken, it protects
> global::fs_devs in this way".
> 
>>> + *   does not protect: manipulation of the fs_devices::devices list!
>>
>> but this sentence is written as if you've just described a mutex. The
>> end result is that it's not clear what mutex you are referring to here.
> 
> That's the mutex in the section above "- uuid_mutex (global lock)".
> 
>>> + *   btrfs_device::name - renames (write side), read is RCU
>>
>> It's not clear how the write side is protected.
> 
> same answer, it's under the 'uuid_mutex', so the write side is protected
> by that, with exception of reads protected by RCU.
> 
> I wonder what kind of documentation structure do you expect as it seems
> to me you missed it completely. Which is a valuable feedback to me, but
> I don't know how to fix it, other than make the sections more visible
> with underlining or indenting or more whitespace.

Right, I did a fresh read and indeed the structure now makes sense. I
have completely missed the nesting though. Maybe you can nest more
agressively, now every sentence is aligned to the header, perhaps we
want to have it one level deeper? Or use more than one - at the mutex
level ? I don't really have a clear-cut answer for you.


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Nov. 6, 2017, 3:09 p.m. UTC | #8
On Mon, Nov 06, 2017 at 05:02:21PM +0200, Nikolay Borisov wrote:
> Right, I did a fresh read and indeed the structure now makes sense. I
> have completely missed the nesting though. Maybe you can nest more
> agressively, now every sentence is aligned to the header, perhaps we
> want to have it one level deeper? Or use more than one - at the mutex
> level ? I don't really have a clear-cut answer for you.

How about that:

/*
 * Device locking
 * ==============
 *
 * There are several mutexes that protect manipulation of devices and low-level
 * structures like chunks but not block groups, extents or files
 *
 * uuid_mutex (global lock)
 * ------------------------
 * protects the fs_uuids list that tracks all per-fs fs_devices, resulting from
 * the SCAN_DEV ioctl registration or from mount either implicitly (the first
 * device) or requested by the device= mount option
 *
 * the mutex can be very coarse and can cover long-running operations
 *
 * protects: updates to fs_devices counters like missing devices, rw devices,
 * seeding, structure cloning, openning/closing devices at mount/umount time
 *
 * global::fs_devs - add, remove, updates to the global list
 *
 * does not protect: manipulation of the fs_devices::devices list!
 *
 * btrfs_device::name - renames (write side), read is RCU
 *
 * fs_devices::device_list_mutex (per-fs, with RCU)
 * ------------------------------------------------
 * protects updates to fs_devices::devices, ie. adding and deleting
 *
 * simple list traversal with read-only actions can be done with RCU protection
 *
 * may be used to exclude some operations from running concurrently without any
 * modifications to the list (see write_all_supers)
 *
 * volume_mutex
 * ------------
 * coarse lock owned by a mounted filesystem; used to exclude some operations
 * that cannot run in parallel and affect the higher-level properties of the
 * filesystem like: device add/deleting/resize/replace, or balance
 *
 * balance_mutex
 * -------------
 * protects balance structures (status, state) and context accessed from
 * several places (internally, ioctl)
 *
 * chunk_mutex
 * -----------
 * protects chunks, adding or removing during allocation, trim or when a new
 * device is added/removed
 *
 * cleaner_mutex
 * -------------
 * a big lock that is held by the cleaner thread and prevents running subvolume
 * cleaning together with relocation or delayed iputs
 *
 *
 * Lock nesting
 * ============
 *
 * uuid_mutex
 *   volume_mutex
 *     device_list_mutex
 *       chunk_mutex
 *     balance_mutex
 */
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 75aed8ec64bd..098affc58361 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -145,6 +145,72 @@  static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 			     struct btrfs_bio **bbio_ret,
 			     int mirror_num, int need_raid_map);
 
+/*
+ * Device locking
+ * ==============
+ *
+ * There are several mutexes that protect manipulation of devices and low-level
+ * structures like chunks but not block groups, extents or files
+ *
+ * - uuid_mutex (global lock)
+ *
+ *   protects the fs_uuids list that tracks all per-fs fs_devices, resulting
+ *   from the SCAN_DEV ioctl registration or from mount either implicitly
+ *   (the first device) or requested by the device= mount option
+ *
+ *   the mutex can be very coarse and can cover long-running operations
+ *
+ *   protects: updates to fs_devices counters like missing devices, rw devices,
+ *   seeding, structure cloning, openning/closing devices at mount/umount time
+ *
+ *   global::fs_devs - add, remove, updates to the global list
+ *
+ *   does not protect: manipulation of the fs_devices::devices list!
+ *
+ *   btrfs_device::name - renames (write side), read is RCU
+ *
+ * - fs_devices::device_list_mutex (per-fs, with RCU)
+ *
+ *   protects updates to fs_devices::devices, ie. adding and deleting
+ *
+ *   simple list traversal with read-only actions can be done with RCU
+ *   protection
+ *
+ *   may be used to exclude some operations from running concurrently without
+ *   any modifications to the list (see write_all_supers)
+ *
+ * - volume_mutex
+ *
+ *   coarse lock owned by a mounted filesystem; used to exclude some operations
+ *   that cannot run in parallel and affect the higher-level properties of the
+ *   filesystem like: device add/deleting/resize/replace, or balance
+ *
+ * - balance_mutex
+ *
+ *   protects balance structures (status, state) and context accessed from
+ *   several places (internally, ioctl)
+ *
+ * - chunk_mutex
+ *
+ *   protects chunks, adding or removing during allocation, trim or when
+ *   a new device is added/removed
+ *
+ * - cleaner_mutex
+ *
+ *   a big lock that is held by the cleaner thread and prevents running
+ *   subvolume cleaning together with relocation or delayed iputs
+ *
+ *
+ * Lock nesting
+ * ------------
+ *
+ * uuid_mutex
+ *   volume_mutex
+ *     device_list_mutex
+ *       chunk_mutex
+ *     balance_mutex
+ *
+ */
 DEFINE_MUTEX(uuid_mutex);
 static LIST_HEAD(fs_uuids);
 struct list_head *btrfs_get_fs_uuids(void)