mbox series

[0/4] btrfs, sysfs cleanup and add dev_state

Message ID 20191205112706.8125-1-anand.jain@oracle.com (mailing list archive)
Headers show
Series btrfs, sysfs cleanup and add dev_state | expand

Message

Anand Jain Dec. 5, 2019, 11:27 a.m. UTC
Patch 1/4 is a cleanup patch.
Patch 2/4 adds the kobject devinfo which is a preparatory to patch 4/4.
Patch 3/4 was sent before, no functional changes, change log is updated.
Patch 4/4 creates the attribute dev_state.

Anand Jain (4):
  btrfs: sysfs, use btrfs_sysfs_remove_fsid in fail return in add_fsid
  btrfs: sysfs, add UUID/devinfo kobject
  btrfs: sysfs, rename device_link add,remove functions
  btrfs: sysfs, add devid/dev_state kobject and attribute

 fs/btrfs/dev-replace.c |   4 +-
 fs/btrfs/sysfs.c       | 130 +++++++++++++++++++++++++++++++----------
 fs/btrfs/sysfs.h       |   4 +-
 fs/btrfs/volumes.c     |   8 +--
 fs/btrfs/volumes.h     |   3 +
 5 files changed, 111 insertions(+), 38 deletions(-)

Comments

David Sterba Dec. 5, 2019, 3 p.m. UTC | #1
On Thu, Dec 05, 2019 at 07:27:02PM +0800, Anand Jain wrote:
> Anand Jain (4):
>   btrfs: sysfs, use btrfs_sysfs_remove_fsid in fail return in add_fsid
>   btrfs: sysfs, add UUID/devinfo kobject
>   btrfs: sysfs, rename device_link add,remove functions
>   btrfs: sysfs, add devid/dev_state kobject and attribute

So we can start adding things to devinfo, I did a quick test how the
sysfs directory looks like, the base structure seems ok.

Unlike other sources for sysfs file data (like superblock), the devices
can appear and disappear during the lifetime of the filesystem and as
pointed out in the patches, some synchronization is needed.

But it could be more tricky. Reading from the sysfs files should not
block normal operation (no device_list_mutex) but also must not lead to
use-after-free in case the device gets deleted.

I haven't found a simple locking scheme that would avoid accessing a
freed device structure, the sysfs callback can happen at any time and
the structure can be freed already.

So this means that btrfs_sysfs_dev_state_show cannot access it directly
(using offsetof(kobj, ...)). The safe (but not necessarily the best) way
I have so far is to track the device kobjects in the superblock and add
own lock for accessing this structure.

This avoids increasing contention of device_list_mutex, each sysfs
callback needs to take the lock first, lookup the device and print the
value if it's found. Otherwise we know the device is gone.

The lock is rwlock_t, sysfs callbacks take read-side, device deletion
takes write possibly outside of the device_list_mutex before the device
is actually going to be deleted. This relies on fairness of the lock so
the write will happen eventually (even if there are many readers).
Anand Jain Dec. 6, 2019, 1:46 p.m. UTC | #2
On 5/12/19 11:00 PM, David Sterba wrote:
> On Thu, Dec 05, 2019 at 07:27:02PM +0800, Anand Jain wrote:
>> Anand Jain (4):
>>    btrfs: sysfs, use btrfs_sysfs_remove_fsid in fail return in add_fsid
>>    btrfs: sysfs, add UUID/devinfo kobject
>>    btrfs: sysfs, rename device_link add,remove functions
>>    btrfs: sysfs, add devid/dev_state kobject and attribute
> 
> So we can start adding things to devinfo, I did a quick test how the
> sysfs directory looks like, the base structure seems ok.
> 
> Unlike other sources for sysfs file data (like superblock), the devices
> can appear and disappear during the lifetime of the filesystem and as
> pointed out in the patches, some synchronization is needed.
> 
> But it could be more tricky. Reading from the sysfs files should not
> block normal operation (no device_list_mutex) but also must not lead to
> use-after-free in case the device gets deleted.
> 
> I haven't found a simple locking scheme that would avoid accessing a
> freed device structure, the sysfs callback can happen at any time and
> the structure can be freed already.
> 
> So this means that btrfs_sysfs_dev_state_show cannot access it directly
> (using offsetof(kobj, ...)). The safe (but not necessarily the best) way
> I have so far is to track the device kobjects in the superblock and add
> own lock for accessing this structure.
> 
> This avoids increasing contention of device_list_mutex, each sysfs
> callback needs to take the lock first, lookup the device and print the
> value if it's found. Otherwise we know the device is gone.



> The lock is rwlock_t, sysfs callbacks take read-side, device deletion
> takes write possibly outside of the device_list_mutex before the device
> is actually going to be deleted. This relies on fairness of the lock so
> the write will happen eventually (even if there are many readers).
> 

  Yeah this makes sense to me. I completely forgot about the %device
  getting deleted while sysfs is reading. Let me fix in the patch 4/4.
Anand Jain Dec. 9, 2019, 2:09 p.m. UTC | #3
On 12/6/19 9:46 PM, Anand Jain wrote:
> 
> 
> On 5/12/19 11:00 PM, David Sterba wrote:
>> On Thu, Dec 05, 2019 at 07:27:02PM +0800, Anand Jain wrote:
>>> Anand Jain (4):
>>>    btrfs: sysfs, use btrfs_sysfs_remove_fsid in fail return in add_fsid
>>>    btrfs: sysfs, add UUID/devinfo kobject
>>>    btrfs: sysfs, rename device_link add,remove functions
>>>    btrfs: sysfs, add devid/dev_state kobject and attribute
>>
>> So we can start adding things to devinfo, I did a quick test how the
>> sysfs directory looks like, the base structure seems ok.
>>
>> Unlike other sources for sysfs file data (like superblock), the devices
>> can appear and disappear during the lifetime of the filesystem and as
>> pointed out in the patches, some synchronization is needed.
>>
>> But it could be more tricky. Reading from the sysfs files should not
>> block normal operation (no device_list_mutex) but also must not lead to
>> use-after-free in case the device gets deleted.
>>
>> I haven't found a simple locking scheme that would avoid accessing a
>> freed device structure, the sysfs callback can happen at any time and
>> the structure can be freed already.
>>
>> So this means that btrfs_sysfs_dev_state_show cannot access it directly
>> (using offsetof(kobj, ...)). The safe (but not necessarily the best) way
>> I have so far is to track the device kobjects in the superblock and add
>> own lock for accessing this structure.
>>
>> This avoids increasing contention of device_list_mutex, each sysfs
>> callback needs to take the lock first, lookup the device and print the
>> value if it's found. Otherwise we know the device is gone.
> 
> 
> 
>> The lock is rwlock_t, sysfs callbacks take read-side, device deletion
>> takes write possibly outside of the device_list_mutex before the device
>> is actually going to be deleted. This relies on fairness of the lock so
>> the write will happen eventually (even if there are many readers).
>>
> 
>   Yeah this makes sense to me. I completely forgot about the %device
>   getting deleted while sysfs is reading. Let me fix in the patch 4/4.
> 

  Ah. No we don't need the lock. Sysfs kobject delete/put called by the
  delete thread waits for the sysfs read open to close. And after this
  operation we are freeing the %device. So its synchronized there is no
  race.

Thanks, Anand
Anand Jain Dec. 9, 2019, 10:48 p.m. UTC | #4
On 12/5/19 7:27 PM, Anand Jain wrote:
> Patch 1/4 is a cleanup patch.
> Patch 2/4 adds the kobject devinfo which is a preparatory to patch 4/4.
> Patch 3/4 was sent before, no functional changes, change log is updated.
> Patch 4/4 creates the attribute dev_state.
> 
> Anand Jain (4):
>    btrfs: sysfs, use btrfs_sysfs_remove_fsid in fail return in add_fsid
>    btrfs: sysfs, add UUID/devinfo kobject
>    btrfs: sysfs, rename device_link add,remove functions
>    btrfs: sysfs, add devid/dev_state kobject and attribute

David,

  Is there a chance that partly/fully this could get into 5.5-rc3?
  1,3 are cleanups and 2,4 are feature add.

Thanks, Anand

>   fs/btrfs/dev-replace.c |   4 +-
>   fs/btrfs/sysfs.c       | 130 +++++++++++++++++++++++++++++++----------
>   fs/btrfs/sysfs.h       |   4 +-
>   fs/btrfs/volumes.c     |   8 +--
>   fs/btrfs/volumes.h     |   3 +
>   5 files changed, 111 insertions(+), 38 deletions(-)
>
David Sterba Dec. 10, 2019, 4:41 p.m. UTC | #5
On Tue, Dec 10, 2019 at 06:48:56AM +0800, Anand Jain wrote:
> On 12/5/19 7:27 PM, Anand Jain wrote:
> > Patch 1/4 is a cleanup patch.
> > Patch 2/4 adds the kobject devinfo which is a preparatory to patch 4/4.
> > Patch 3/4 was sent before, no functional changes, change log is updated.
> > Patch 4/4 creates the attribute dev_state.
> > 
> > Anand Jain (4):
> >    btrfs: sysfs, use btrfs_sysfs_remove_fsid in fail return in add_fsid
> >    btrfs: sysfs, add UUID/devinfo kobject
> >    btrfs: sysfs, rename device_link add,remove functions
> >    btrfs: sysfs, add devid/dev_state kobject and attribute
> 
>   Is there a chance that partly/fully this could get into 5.5-rc3?
>   1,3 are cleanups and 2,4 are feature add.

... which is exactly why this cannot be added to 5.5-rc3. The merge
target 5.5 is now only for regressions introduced in the new code or
reasonable fixes (eg. for stable trees).

https://www.kernel.org/doc/html/latest/process/2.Process.html#the-big-picture
David Sterba Jan. 14, 2020, 6:39 p.m. UTC | #6
On Thu, Dec 05, 2019 at 07:27:02PM +0800, Anand Jain wrote:
> Patch 1/4 is a cleanup patch.
> Patch 2/4 adds the kobject devinfo which is a preparatory to patch 4/4.
> Patch 3/4 was sent before, no functional changes, change log is updated.
> Patch 4/4 creates the attribute dev_state.
> 
> Anand Jain (4):
>   btrfs: sysfs, use btrfs_sysfs_remove_fsid in fail return in add_fsid
>   btrfs: sysfs, add UUID/devinfo kobject
>   btrfs: sysfs, rename device_link add,remove functions
>   btrfs: sysfs, add devid/dev_state kobject and attribute

I'm not sure which patch causes that but there are more problems, from test
btrfs/064. The log is full of it, the create part for device id 0 and
when the filesystem is unmounted, refcount for the object is wrong and
sysfs complains.

[ 1839.514008] kobject: '(null)' (00000000167bb824): is not initialized, yet kobject_put() is being called.                                                                                                     │··
[ 1839.517437] WARNING: CPU: 3 PID: 2487 at lib/kobject.c:736 kobject_put+0x40/0xb0                                                                                                                             │··
[ 1839.520253] Modules linked in: dm_flakey dm_mod dax xxhash_generic btrfs blake2b_generic libcrc32c crc32c_intel xor zstd_decompress zstd_compress xxhash lzo_compress lzo_decompress raid6_pq loop           │··
[ 1839.524566] CPU: 3 PID: 2487 Comm: umount Tainted: G        W         5.5.0-rc5-default+ #932                                                                                                                │··
[ 1839.527075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014                                                                               │··
[ 1839.529898] RIP: 0010:kobject_put+0x40/0xb0                                                                                                                                                                  │··
 0c 80                                                                                                                                                                                                          │··
[ 1839.535471] RSP: 0018:ffffac040741fdc0 EFLAGS: 00010282                                                                                                                                                      │··
[ 1839.537134] RAX: 0000000000000000 RBX: ffff93422bc0f000 RCX: 0000000000000001                                                                                                                                │··
[ 1839.539236] RDX: 0000000000000000 RSI: ffffffff9f100899 RDI: ffffffff9f100971                                                                                                                                │··
[ 1839.541299] RBP: ffff93422bc0f2f8 R08: 0000000000000000 R09: 0000000000000000                                                                                                                                │··
[ 1839.543132] R10: 0000000000000000 R11: 0000000000000000 R12: ffff934239959a00                                                                                                                                │··
[ 1839.545114] R13: ffff934239959b18 R14: 0000000000000000 R15: ffff934235b23fc8                                                                                                                                │··
[ 1839.547131] FS:  00007f9d2c233800(0000) GS:ffff93423dc00000(0000) knlGS:0000000000000000                                                                                                                     │··
[ 1839.549219] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                                                                │··
[ 1839.550380] CR2: 00007fff85c8cff8 CR3: 000000001ff3f006 CR4: 0000000000160ee0                                                                                                                                │··
[ 1839.551841] Call Trace:                                                                                                                                                                                      │··
[ 1839.553050]  btrfs_sysfs_rm_device_link+0xb8/0xe0 [btrfs]                                                                                                                                                    │··
[ 1839.554862]  close_ctree+0x23a/0x334 [btrfs]                                                                                                                                                                 │··
[ 1839.556429]  generic_shutdown_super+0x69/0x100                                                                                                                                                               │··
[ 1839.558030]  kill_anon_super+0x14/0x30                                                                                                                                                                       │··
[ 1839.559461]  btrfs_kill_super+0x12/0xa0 [btrfs]                                                                                                                                                              │··
[ 1839.560894]  deactivate_locked_super+0x2c/0x70                                                                                                                                                               │··
[ 1839.562375]  cleanup_mnt+0x100/0x160                                                                                                                                                                         │··
[ 1839.563682]  task_work_run+0x90/0xc0                                                                                                                                                                         │··
[ 1839.564991]  exit_to_usermode_loop+0x96/0xa0                                                                                                                                                                 │··
[ 1839.566438]  do_syscall_64+0x1df/0x210                                                                                                                                                                       │··
[ 1839.567792]  entry_SYSCALL_64_after_hwframe+0x49/0xbe                                                                                                                                                        │··
[ 1839.569416] RIP: 0033:0x7f9d2c4774e7                                                                                                                                                                         │··
 01 48                                                                                                                                                                                                          │··
[ 1839.576283] RSP: 002b:00007fff85c8e388 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6                                                                                                                           │··
[ 1839.578849] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f9d2c4774e7                                                                                                                                │··
[ 1839.581006] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055e08e9dab80                                                                                                                                │··
[ 1839.583132] RBP: 000055e08e9da970 R08: 0000000000000000 R09: 00007fff85c8d100                                                                                                                                │··
[ 1839.585271] R10: 000055e08e9daba0 R11: 0000000000000246 R12: 000055e08e9dab80                                                                                                                                │··
[ 1839.587420] R13: 0000000000000000 R14: 000055e08e9daa68 R15: 0000000000000000                                                                                                                                │··
[ 1839.589528] irq event stamp: 0                                                                                                                                                                               │··
[ 1839.590731] hardirqs last  enabled at (0): [<0000000000000000>] 0x0                                                                                                                                          │··
[ 1839.592667] hardirqs last disabled at (0): [<ffffffff9f07efb0>] copy_process+0x680/0x1b70                                                                                                                    │··
[ 1839.595447] softirqs last  enabled at (0): [<ffffffff9f07efb0>] copy_process+0x680/0x1b70                                                                                                                    │··
[ 1839.598216] softirqs last disabled at (0): [<0000000000000000>] 0x0                                                                                                                                          │··
[ 1839.600178] ---[ end trace d7fb083174aa2eaf ]---