mbox series

[RFC,0/3] Remove mrlock

Message ID 20210113111707.756662-1-nborisov@suse.com (mailing list archive)
Headers show
Series Remove mrlock | expand

Message

Nikolay Borisov Jan. 13, 2021, 11:17 a.m. UTC
This series removes mrlock_t and directly replaces i_lock and i_mmaplock with
rw_semaphore in xfs_inode. My end game with this is to eventually lift i_mmaplock
in VFS and use it from there. The necessity for the latter came up since BTRFS
is also about to get its own private version of i_mmaplock for the same use case.
This  will mean that all 3 major filesystems on linux (ext4/xfs/btrfs) wil share
the same lock. Christoph naturally suggested for the lock to be lifted to VFS.

Before proceeding with this work I'd like to get the opinion of XFS developers
whether doing that is acceptable for them. I've heard that Dave wants to eventually
convert the mmapsem to a range lock for XFS and implement a callback mechanism
for VFS to call into every filesystem...

I've only compile tested this and also the way the rwsem is checked for write
is admittedly a bit hackish but it can easily be changed to utilize lockdep.
I'm aware of https://lore.kernel.org/linux-xfs/20201102194135.174806-1-preichl@redhat.com/
but frankly that series went too far up to rev 10 which is a bit mind boggling...

Nikolay Borisov (3):
  xfs: Add is_rwsem_write_locked function
  xfs: Convert i_lock/i_mmaplock to  rw_semaphore
  xfs: Remove mrlock

 fs/xfs/mrlock.h          | 78 ----------------------------------------
 fs/xfs/xfs_inode.c       | 48 ++++++++++++++-----------
 fs/xfs/xfs_inode.h       |  6 ++--
 fs/xfs/xfs_linux.h       |  1 -
 fs/xfs/xfs_qm_syscalls.c |  2 +-
 fs/xfs/xfs_super.c       |  7 ++--
 6 files changed, 34 insertions(+), 108 deletions(-)
 delete mode 100644 fs/xfs/mrlock.h

--
2.25.1

Comments

Christoph Hellwig Jan. 13, 2021, 11:27 a.m. UTC | #1
Pavel has looked into this before and got stuck on the allocator
workqueue offloads:

[PATCH v13 0/4] xfs: Remove wrappers for some semaphores

On Wed, Jan 13, 2021 at 01:17:03PM +0200, Nikolay Borisov wrote:
> This series removes mrlock_t and directly replaces i_lock and i_mmaplock with
> rw_semaphore in xfs_inode. My end game with this is to eventually lift i_mmaplock
> in VFS and use it from there. The necessity for the latter came up since BTRFS
> is also about to get its own private version of i_mmaplock for the same use case.
> This  will mean that all 3 major filesystems on linux (ext4/xfs/btrfs) wil share
> the same lock. Christoph naturally suggested for the lock to be lifted to VFS.
> 
> Before proceeding with this work I'd like to get the opinion of XFS developers
> whether doing that is acceptable for them. I've heard that Dave wants to eventually
> convert the mmapsem to a range lock for XFS and implement a callback mechanism
> for VFS to call into every filesystem...
> 
> I've only compile tested this and also the way the rwsem is checked for write
> is admittedly a bit hackish but it can easily be changed to utilize lockdep.
> I'm aware of https://lore.kernel.org/linux-xfs/20201102194135.174806-1-preichl@redhat.com/
> but frankly that series went too far up to rev 10 which is a bit mind boggling...
> 
> Nikolay Borisov (3):
>   xfs: Add is_rwsem_write_locked function
>   xfs: Convert i_lock/i_mmaplock to  rw_semaphore
>   xfs: Remove mrlock
> 
>  fs/xfs/mrlock.h          | 78 ----------------------------------------
>  fs/xfs/xfs_inode.c       | 48 ++++++++++++++-----------
>  fs/xfs/xfs_inode.h       |  6 ++--
>  fs/xfs/xfs_linux.h       |  1 -
>  fs/xfs/xfs_qm_syscalls.c |  2 +-
>  fs/xfs/xfs_super.c       |  7 ++--
>  6 files changed, 34 insertions(+), 108 deletions(-)
>  delete mode 100644 fs/xfs/mrlock.h
> 
> --
> 2.25.1
> 
---end quoted text---
Nikolay Borisov Jan. 13, 2021, 11:41 a.m. UTC | #2
On 13.01.21 г. 13:27 ч., Christoph Hellwig wrote:
> Pavel has looked into this before and got stuck on the allocator
> workqueue offloads:
> 
> [PATCH v13 0/4] xfs: Remove wrappers for some semaphores

I haven't looked into his series but I fail to see how lifting
rwsemaphore out of the nested structure can change the behavior ? It
just removes a level of indirection. My patches are semantically
identical to the original code.

<snip>
Christoph Hellwig Jan. 13, 2021, 12:09 p.m. UTC | #3
On Wed, Jan 13, 2021 at 01:41:09PM +0200, Nikolay Borisov wrote:
> 
> 
> On 13.01.21 ??. 13:27 ??., Christoph Hellwig wrote:
> > Pavel has looked into this before and got stuck on the allocator
> > workqueue offloads:
> > 
> > [PATCH v13 0/4] xfs: Remove wrappers for some semaphores
> 
> I haven't looked into his series but I fail to see how lifting
> rwsemaphore out of the nested structure can change the behavior ? It
> just removes a level of indirection. My patches are semantically
> identical to the original code.

mrlocks have the mr_writer field that annotate that the is a writer
locking the lock.  The XFS asserts use it to assert that the lock that
the current thread holds it for exclusive protection, which isn't
actually what the field says, and this breaks when XFS uses synchronous
execution of work_struct as basically an extension of the kernel stack.
Nikolay Borisov Jan. 13, 2021, 12:17 p.m. UTC | #4
On 13.01.21 г. 14:09 ч., Christoph Hellwig wrote:
> On Wed, Jan 13, 2021 at 01:41:09PM +0200, Nikolay Borisov wrote:
>>
>>
>> On 13.01.21 ??. 13:27 ??., Christoph Hellwig wrote:
>>> Pavel has looked into this before and got stuck on the allocator
>>> workqueue offloads:
>>>
>>> [PATCH v13 0/4] xfs: Remove wrappers for some semaphores
>>
>> I haven't looked into his series but I fail to see how lifting
>> rwsemaphore out of the nested structure can change the behavior ? It
>> just removes a level of indirection. My patches are semantically
>> identical to the original code.
> 
> mrlocks have the mr_writer field that annotate that the is a writer
> locking the lock.  The XFS asserts use it to assert that the lock that
> the current thread holds it for exclusive protection, which isn't
> actually what the field says, and this breaks when XFS uses synchronous
> execution of work_struct as basically an extension of the kernel stack.

I'm still failing to see what's the problem of checking the last bit of
the rwsem ->count field. It is set when the sem is held for writing
(identical to what mr_write does). As I mention in the cover letter this
might be considered a bit hacky because it exposes an internal detail of
the rwsem i.e that the bit of interest is bit 0. But I believe the same
can be achieved using lockdep_is_held_type(xx, 0/1) and making XFS's
debug routines depend on lockdep being on.

>
Darrick J. Wong Jan. 20, 2021, 8:14 p.m. UTC | #5
On Wed, Jan 13, 2021 at 02:17:29PM +0200, Nikolay Borisov wrote:
> 
> 
> On 13.01.21 г. 14:09 ч., Christoph Hellwig wrote:
> > On Wed, Jan 13, 2021 at 01:41:09PM +0200, Nikolay Borisov wrote:
> >>
> >>
> >> On 13.01.21 ??. 13:27 ??., Christoph Hellwig wrote:
> >>> Pavel has looked into this before and got stuck on the allocator
> >>> workqueue offloads:
> >>>
> >>> [PATCH v13 0/4] xfs: Remove wrappers for some semaphores
> >>
> >> I haven't looked into his series but I fail to see how lifting
> >> rwsemaphore out of the nested structure can change the behavior ? It
> >> just removes a level of indirection. My patches are semantically
> >> identical to the original code.
> > 
> > mrlocks have the mr_writer field that annotate that the is a writer
> > locking the lock.  The XFS asserts use it to assert that the lock that
> > the current thread holds it for exclusive protection, which isn't
> > actually what the field says, and this breaks when XFS uses synchronous
> > execution of work_struct as basically an extension of the kernel stack.
> 
> I'm still failing to see what's the problem of checking the last bit of
> the rwsem ->count field. It is set when the sem is held for writing
> (identical to what mr_write does). As I mention in the cover letter this
> might be considered a bit hacky because it exposes an internal detail of
> the rwsem i.e that the bit of interest is bit 0.

I don't want to tear into the implementation details of rwsems if I can
avoid it.  Just because we all have one big happy address space doesn't
mean shortcuts won't hose everyone.

> But I believe the same
> can be achieved using lockdep_is_held_type(xx, 0/1) and making XFS's
> debug routines depend on lockdep being on.

Pavel Reichl tried that two months ago in:
https://lore.kernel.org/linux-xfs/20201102194135.174806-2-preichl@redhat.com/

which resulted in fstests regressions:
https://lore.kernel.org/linux-xfs/20201104005345.GC7115@magnolia/

TLDR: the ILOCK semaphore is data-centric, but lockdep's debugging
chains are task-centric, which causes incorrect lock validation reports.

The solutions as I see them are: (a) figure out if we really still need
to defer bmbt splits to workers to avoid overflowing the kernel stack;
or (b) making it possible to transfer rwsem ownership to shut up
lockdep; or (c) fix the is_held predicate to ignore ownership.

--D

> 
> >