[1/5] locking: Add rwsem_is_write_locked()

Message ID	20230907174705.2976191-2-willy@infradead.org (mailing list archive)
State	Deferred, archived
Headers	show Return-Path: <linux-xfs-owner@vger.kernel.org> From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chandan Babu R <chandan.babu@oracle.com>, "Darrick J . Wong" <djwong@kernel.org>, linux-xfs@vger.kernel.org Subject: [PATCH 1/5] locking: Add rwsem_is_write_locked() Date: Thu, 7 Sep 2023 18:47:01 +0100 Message-Id: <20230907174705.2976191-2-willy@infradead.org> In-Reply-To: <20230907174705.2976191-1-willy@infradead.org> References: <20230907174705.2976191-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Remove the XFS mrlock \| expand [0/5] Remove the XFS mrlock [1/5] locking: Add rwsem_is_write_locked() [2/5] mm: Use rwsem_is_write_locked in mmap_assert_write_locked [3/5] xfs: Use rwsem_is_write_locked() [4/5] xfs: Remove mrlock wrapper [5/5] xfs: Stop using lockdep to assert that locks are held

Matthew Wilcox Sept. 7, 2023, 5:47 p.m. UTC

Several places want to know whether the lock is held by a writer, instead
of just whether it's held.  We can implement this for both normal and
rt rwsems.  RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing
it outside that file might tempt other people to use it, so just use
a comment to note that's what the 1 means, and help anybody find it if
they're looking to change the implementation.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/rwbase_rt.h |  5 +++++
 include/linux/rwsem.h     | 10 ++++++++++
 2 files changed, 15 insertions(+)

Waiman Long Sept. 7, 2023, 6:05 p.m. UTC | #1

On 9/7/23 13:47, Matthew Wilcox (Oracle) wrote:
> Several places want to know whether the lock is held by a writer, instead
> of just whether it's held.  We can implement this for both normal and
> rt rwsems.  RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing
> it outside that file might tempt other people to use it, so just use
> a comment to note that's what the 1 means, and help anybody find it if
> they're looking to change the implementation.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/rwbase_rt.h |  5 +++++
>   include/linux/rwsem.h     | 10 ++++++++++
>   2 files changed, 15 insertions(+)
>
> diff --git a/include/linux/rwbase_rt.h b/include/linux/rwbase_rt.h
> index 1d264dd08625..3c25b14edc05 100644
> --- a/include/linux/rwbase_rt.h
> +++ b/include/linux/rwbase_rt.h
> @@ -31,6 +31,11 @@ static __always_inline bool rw_base_is_locked(struct rwbase_rt *rwb)
>   	return atomic_read(&rwb->readers) != READER_BIAS;
>   }
>   
> +static __always_inline bool rw_base_is_write_locked(struct rwbase_rt *rwb)
> +{
> +	return atomic_read(&rwb->readers) == WRITER_BIAS;
> +}
> +
>   static __always_inline bool rw_base_is_contended(struct rwbase_rt *rwb)
>   {
>   	return atomic_read(&rwb->readers) > 0;
> diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
> index 1dd530ce8b45..0f78b8d2e653 100644
> --- a/include/linux/rwsem.h
> +++ b/include/linux/rwsem.h
> @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
>   	return atomic_long_read(&sem->count) != 0;
>   }
>   
> +static inline int rwsem_is_write_locked(struct rw_semaphore *sem)
> +{
> +	return atomic_long_read(&sem->count) & 1 /* RWSEM_WRITER_LOCKED */;
> +}

I would prefer you move the various RWSEM_* count bit macros from 
kernel/locking/rwsem.c to under the !PREEMPT_RT block and directly use 
RWSEM_WRITER_LOCKED instead of hardcoding a value of 1.

Cheers,
Longman

Peter Zijlstra Sept. 7, 2023, 7:08 p.m. UTC | #2

On Thu, Sep 07, 2023 at 06:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
> Several places want to know whether the lock is held by a writer, instead
> of just whether it's held.  We can implement this for both normal and
> rt rwsems.  RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing
> it outside that file might tempt other people to use it, so just use
> a comment to note that's what the 1 means, and help anybody find it if
> they're looking to change the implementation.

I'm presuming this is deep in a callchain where they know they hold the
lock, but they lost in what capacity?

In general I strongly dislike the whole _is_locked family, because it
gives very poorly defined semantics if used by anybody but the owner.

If these new functions are indeed to be used only by lock holders to
determine what kind of lock they hold, could we please put:

	lockdep_assert_held()

in them?

Matthew Wilcox Sept. 7, 2023, 7:20 p.m. UTC | #3

On Thu, Sep 07, 2023 at 09:08:10PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 07, 2023 at 06:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
> > Several places want to know whether the lock is held by a writer, instead
> > of just whether it's held.  We can implement this for both normal and
> > rt rwsems.  RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing
> > it outside that file might tempt other people to use it, so just use
> > a comment to note that's what the 1 means, and help anybody find it if
> > they're looking to change the implementation.
> 
> I'm presuming this is deep in a callchain where they know they hold the
> lock, but they lost in what capacity?

No, it's just assertions.  You can see that in patch 3 where it's
used in functions called things like "xfs_islocked".

> In general I strongly dislike the whole _is_locked family, because it
> gives very poorly defined semantics if used by anybody but the owner.
> 
> If these new functions are indeed to be used only by lock holders to
> determine what kind of lock they hold, could we please put:
> 
> 	lockdep_assert_held()
> 
> in them?

Patch 2 shows it in use in the MM code.  We already have a
lockdep_assert_held_write(), but most people don't enable lockdep, so
we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm)
to give us a good assertion when lockdep is disabled.

XFS has a problem with using lockdep in general, which is that a worker
thread can be spawned and use the fact that the spawner is holding the
lock.  There's no mechanism for the worker thread to ask "Does struct
task_struct *p hold the lock?".

Matthew Wilcox Sept. 7, 2023, 7:33 p.m. UTC | #4

On Thu, Sep 07, 2023 at 02:05:54PM -0400, Waiman Long wrote:
> On 9/7/23 13:47, Matthew Wilcox (Oracle) wrote:
> > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem)
> > +{
> > +	return atomic_long_read(&sem->count) & 1 /* RWSEM_WRITER_LOCKED */;
> > +}
> 
> I would prefer you move the various RWSEM_* count bit macros from
> kernel/locking/rwsem.c to under the !PREEMPT_RT block and directly use
> RWSEM_WRITER_LOCKED instead of hardcoding a value of 1.

Just to be clear, you want the ~50 lines from:

/*
 * On 64-bit architectures, the bit definitions of the count are:
...
#define RWSEM_READ_FAILED_MASK  (RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\
                                 RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READFAIL)

moved from rwsem.c to rwsem.h?

Or just these four lines:

#define RWSEM_WRITER_LOCKED     (1UL << 0)
#define RWSEM_FLAG_WAITERS      (1UL << 1)
#define RWSEM_FLAG_HANDOFF      (1UL << 2)
#define RWSEM_FLAG_READFAIL     (1UL << (BITS_PER_LONG - 1))

Peter Zijlstra Sept. 7, 2023, 7:38 p.m. UTC | #5

On Thu, Sep 07, 2023 at 08:20:30PM +0100, Matthew Wilcox wrote:
> On Thu, Sep 07, 2023 at 09:08:10PM +0200, Peter Zijlstra wrote:
> > On Thu, Sep 07, 2023 at 06:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
> > > Several places want to know whether the lock is held by a writer, instead
> > > of just whether it's held.  We can implement this for both normal and
> > > rt rwsems.  RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing
> > > it outside that file might tempt other people to use it, so just use
> > > a comment to note that's what the 1 means, and help anybody find it if
> > > they're looking to change the implementation.
> > 
> > I'm presuming this is deep in a callchain where they know they hold the
> > lock, but they lost in what capacity?
> 
> No, it's just assertions.  You can see that in patch 3 where it's
> used in functions called things like "xfs_islocked".

Right, but if you're not the lock owner, your answer to the question is
a dice-roll, it might be locked, it might not be.

> > In general I strongly dislike the whole _is_locked family, because it
> > gives very poorly defined semantics if used by anybody but the owner.
> > 
> > If these new functions are indeed to be used only by lock holders to
> > determine what kind of lock they hold, could we please put:
> > 
> > 	lockdep_assert_held()
> > 
> > in them?
> 
> Patch 2 shows it in use in the MM code.  We already have a
> lockdep_assert_held_write(), but most people don't enable lockdep, so

Most devs should run with lockdep on when writing new code, and I know
the sanitizer robots run with lockdep on.

In general there seems to be a ton of lockdep on coverage.

> we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm)
> to give us a good assertion when lockdep is disabled.

Is that really worth it still? I mean, much of these assertions pre-date
lockdep.

> XFS has a problem with using lockdep in general, which is that a worker
> thread can be spawned and use the fact that the spawner is holding the
> lock.  There's no mechanism for the worker thread to ask "Does struct
> task_struct *p hold the lock?".

Will be somewhat tricky to make happen -- but might be doable. It is
however an interface that is *very* hard to use correctly. Basically I
think you want to also assert that your target task 'p' is blocked,
right?

That is: assert @p is blocked and holds @lock.

Waiman Long Sept. 7, 2023, 9:06 p.m. UTC | #6

On 9/7/23 15:33, Matthew Wilcox wrote:
> On Thu, Sep 07, 2023 at 02:05:54PM -0400, Waiman Long wrote:
>> On 9/7/23 13:47, Matthew Wilcox (Oracle) wrote:
>>> +static inline int rwsem_is_write_locked(struct rw_semaphore *sem)
>>> +{
>>> +	return atomic_long_read(&sem->count) & 1 /* RWSEM_WRITER_LOCKED */;
>>> +}
>> I would prefer you move the various RWSEM_* count bit macros from
>> kernel/locking/rwsem.c to under the !PREEMPT_RT block and directly use
>> RWSEM_WRITER_LOCKED instead of hardcoding a value of 1.
> Just to be clear, you want the ~50 lines from:
>
> /*
>   * On 64-bit architectures, the bit definitions of the count are:
> ...
> #define RWSEM_READ_FAILED_MASK  (RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\
>                                   RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READFAIL)
>
> moved from rwsem.c to rwsem.h?
>
> Or just these four lines:
>
> #define RWSEM_WRITER_LOCKED     (1UL << 0)
> #define RWSEM_FLAG_WAITERS      (1UL << 1)
> #define RWSEM_FLAG_HANDOFF      (1UL << 2)
> #define RWSEM_FLAG_READFAIL     (1UL << (BITS_PER_LONG - 1))

I think just the first 3 lines will be enough. Maybe a bit of comment 
about these bit flags in the count atomic_long value.

Cheers,
Longman

Dave Chinner Sept. 7, 2023, 11 p.m. UTC | #7

On Thu, Sep 07, 2023 at 09:38:38PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 07, 2023 at 08:20:30PM +0100, Matthew Wilcox wrote:
> > On Thu, Sep 07, 2023 at 09:08:10PM +0200, Peter Zijlstra wrote:
> > > On Thu, Sep 07, 2023 at 06:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
> > > > Several places want to know whether the lock is held by a writer, instead
> > > > of just whether it's held.  We can implement this for both normal and
> > > > rt rwsems.  RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing
> > > > it outside that file might tempt other people to use it, so just use
> > > > a comment to note that's what the 1 means, and help anybody find it if
> > > > they're looking to change the implementation.
> > > 
> > > I'm presuming this is deep in a callchain where they know they hold the
> > > lock, but they lost in what capacity?
> > 
> > No, it's just assertions.  You can see that in patch 3 where it's
> > used in functions called things like "xfs_islocked".
> 
> Right, but if you're not the lock owner, your answer to the question is
> a dice-roll, it might be locked, it might not be.

Except that the person writing the code knows the call chain that
leads up to that code, and so they have a pretty good idea whether
the object should be locked or not. If we are running that code, and
the object is locked, then it's pretty much guaranteed that the
owner of the lock is code that executed the check, because otherwise
we have a *major lock implementation bug*.

i.e. if we get to a place where rwsem_is_write_locked() fires
because some other task holds the lock, it almost always means we
have *two* tasks holding the lock exclusively.

Yes, it's these non-lockdep checks in XFS that have found rwsem
implementation bugs in the past. We've had them fire when the lock
was write locked when we know a few lines earlier it was taken as a
read lock, or marked write locked when they should have been
unlocked, etc because the rwsem code failed to enforce rw exclusion
semantics correctly.

So, really, these lock checks should be considered in the context of
the code that is running them and what such a "false detection"
would actually mean. In my experience, a false detection like you
talk about above means "rwsems are broken", not that there is a
problem with the code using the rwsems or the rwsem state check.

> > > In general I strongly dislike the whole _is_locked family, because it
> > > gives very poorly defined semantics if used by anybody but the owner.
> > > 
> > > If these new functions are indeed to be used only by lock holders to
> > > determine what kind of lock they hold, could we please put:
> > > 
> > > 	lockdep_assert_held()
> > > 
> > > in them?
> > 
> > Patch 2 shows it in use in the MM code.  We already have a
> > lockdep_assert_held_write(), but most people don't enable lockdep, so
> 
> Most devs should run with lockdep on when writing new code, and I know
> the sanitizer robots run with lockdep on.
> 
> In general there seems to be a ton of lockdep on coverage.

*cough*

Bit locks, semaphores, and all sorts of other constructs for IO
serialisation (like inode_dio_wait()) have no lockdep coverage at
all. IOWs, large chunks of many filesystems, the VFS and the VM have
little to no lockdep coverage at all.

> > we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm)
> > to give us a good assertion when lockdep is disabled.
> 
> Is that really worth it still? I mean, much of these assertions pre-date
> lockdep.

And we're trying to propagate them because lockdep isn't a viable
option for day to day testing of filesystems because of it's
overhead vs how infrequently it finds new problems.

> > XFS has a problem with using lockdep in general, which is that a worker
> > thread can be spawned and use the fact that the spawner is holding the
> > lock.  There's no mechanism for the worker thread to ask "Does struct
> > task_struct *p hold the lock?".
> 
> Will be somewhat tricky to make happen -- but might be doable. It is
> however an interface that is *very* hard to use correctly. Basically I
> think you want to also assert that your target task 'p' is blocked,
> right?
> 
> That is: assert @p is blocked and holds @lock.

That addresses the immediate symptom; it doesn't address the large
problem with lockdep and needing non-owner rwsem semantics.

i.e. synchronous task based locking models don't work for
asynchronous multi-stage pipeline processing engines like XFS. The
lock protects the data object and follows the data object through
the processing pipeline, whilst the original submitter moves on to
the next operation to processes without blocking.

This is the non-blocking, async processing model that io_uring
development is pushing filesystems towards, so assuming that we only
hand a lock to a single worker task and then wait for it complete
(i.e. synchronous operation) flies in the face of current
development directions...

-Dave.

Waiman Long Sept. 7, 2023, 11:47 p.m. UTC | #8

On 9/7/23 17:06, Waiman Long wrote:
>
> On 9/7/23 15:33, Matthew Wilcox wrote:
>> On Thu, Sep 07, 2023 at 02:05:54PM -0400, Waiman Long wrote:
>>> On 9/7/23 13:47, Matthew Wilcox (Oracle) wrote:
>>>> +static inline int rwsem_is_write_locked(struct rw_semaphore *sem)
>>>> +{
>>>> +    return atomic_long_read(&sem->count) & 1 /* 
>>>> RWSEM_WRITER_LOCKED */;
>>>> +}
>>> I would prefer you move the various RWSEM_* count bit macros from
>>> kernel/locking/rwsem.c to under the !PREEMPT_RT block and directly use
>>> RWSEM_WRITER_LOCKED instead of hardcoding a value of 1.
>> Just to be clear, you want the ~50 lines from:
>>
>> /*
>>   * On 64-bit architectures, the bit definitions of the count are:
>> ...
>> #define RWSEM_READ_FAILED_MASK (RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\
>> RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READFAIL)
>>
>> moved from rwsem.c to rwsem.h?
>>
>> Or just these four lines:
>>
>> #define RWSEM_WRITER_LOCKED     (1UL << 0)
>> #define RWSEM_FLAG_WAITERS      (1UL << 1)
>> #define RWSEM_FLAG_HANDOFF      (1UL << 2)
>> #define RWSEM_FLAG_READFAIL     (1UL << (BITS_PER_LONG - 1))
>
> I think just the first 3 lines will be enough. Maybe a bit of comment 
> about these bit flags in the count atomic_long value.

Actually, the old rwsem implementation won't allow you to reliably 
determine if a rwsem is write locked because the xadd instruction is 
used for write locking and the code had to back out the WRITER_BIAS if 
the attempt failed. Maybe that is why XFS has its own code to check if a 
rwsem is write locked which is needed with the old rwsem implementation.

The new implementation makes this check reliable. Still it is not easy 
to check if a rwsem is read locked as the check will be rather 
complicated and probably racy.

Cheers,
Longman

Matthew Wilcox Sept. 8, 2023, 12:01 a.m. UTC | #9

On Thu, Sep 07, 2023 at 09:38:38PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 07, 2023 at 08:20:30PM +0100, Matthew Wilcox wrote:
> > On Thu, Sep 07, 2023 at 09:08:10PM +0200, Peter Zijlstra wrote:
> > > On Thu, Sep 07, 2023 at 06:47:01PM +0100, Matthew Wilcox (Oracle) wrote:
> > > > Several places want to know whether the lock is held by a writer, instead
> > > > of just whether it's held.  We can implement this for both normal and
> > > > rt rwsems.  RWSEM_WRITER_LOCKED is declared in rwsem.c and exposing
> > > > it outside that file might tempt other people to use it, so just use
> > > > a comment to note that's what the 1 means, and help anybody find it if
> > > > they're looking to change the implementation.
> > > 
> > > I'm presuming this is deep in a callchain where they know they hold the
> > > lock, but they lost in what capacity?
> > 
> > No, it's just assertions.  You can see that in patch 3 where it's
> > used in functions called things like "xfs_islocked".
> 
> Right, but if you're not the lock owner, your answer to the question is
> a dice-roll, it might be locked, it might not be.

Sure, but if you are the lock owner, it's guaranteed to work.  And if
you hold it for read, and check that it's held for write, that will
definitely fail.  I agree "somebody else is holding it" gives you a
false negative, but most locks aren't contended, so it'll fail the
assertion often enough.

> > Patch 2 shows it in use in the MM code.  We already have a
> > lockdep_assert_held_write(), but most people don't enable lockdep, so
> 
> Most devs should run with lockdep on when writing new code, and I know
> the sanitizer robots run with lockdep on.
> 
> In general there seems to be a ton of lockdep on coverage.

Hm.  Enabling lockdep causes an xfstests run to go up from 6000 seconds
to 8000 seconds for me.  That's a significant reduction in the number
of test cycles I can run per day.

> > we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm)
> > to give us a good assertion when lockdep is disabled.
> 
> Is that really worth it still? I mean, much of these assertions pre-date
> lockdep.

That's true.  Is it possible to do a cheaper version of lockdep where it
only records that you have the lock and doesn't, eg, check for locking
dependencies?  I haven't analysed lockdep to see what the expensive part
is; for all I know it's recording the holders, and this idea is
worthless.

> > XFS has a problem with using lockdep in general, which is that a worker
> > thread can be spawned and use the fact that the spawner is holding the
> > lock.  There's no mechanism for the worker thread to ask "Does struct
> > task_struct *p hold the lock?".
> 
> Will be somewhat tricky to make happen -- but might be doable. It is
> however an interface that is *very* hard to use correctly. Basically I
> think you want to also assert that your target task 'p' is blocked,
> right?
> 
> That is: assert @p is blocked and holds @lock.

Probably, although I think there might be a race where the worker thread
starts running before the waiter starts waiting?

In conversation with Darrick & Dave, XFS does this in order to avoid
overflowing the kernel stack.  So if you've been thinking about "we
could support segmented stacks", it would avoid having to support this
lockdep handoff.  It'd probably work a lot better for the scheduler too.

Dave Chinner Sept. 8, 2023, 12:44 a.m. UTC | #10

On Thu, Sep 07, 2023 at 07:47:31PM -0400, Waiman Long wrote:
> 
> On 9/7/23 17:06, Waiman Long wrote:
> > 
> > On 9/7/23 15:33, Matthew Wilcox wrote:
> > > On Thu, Sep 07, 2023 at 02:05:54PM -0400, Waiman Long wrote:
> > > > On 9/7/23 13:47, Matthew Wilcox (Oracle) wrote:
> > > > > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem)
> > > > > +{
> > > > > +    return atomic_long_read(&sem->count) & 1 /*
> > > > > RWSEM_WRITER_LOCKED */;
> > > > > +}
> > > > I would prefer you move the various RWSEM_* count bit macros from
> > > > kernel/locking/rwsem.c to under the !PREEMPT_RT block and directly use
> > > > RWSEM_WRITER_LOCKED instead of hardcoding a value of 1.
> > > Just to be clear, you want the ~50 lines from:
> > > 
> > > /*
> > >   * On 64-bit architectures, the bit definitions of the count are:
> > > ...
> > > #define RWSEM_READ_FAILED_MASK (RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\
> > > RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READFAIL)
> > > 
> > > moved from rwsem.c to rwsem.h?
> > > 
> > > Or just these four lines:
> > > 
> > > #define RWSEM_WRITER_LOCKED     (1UL << 0)
> > > #define RWSEM_FLAG_WAITERS      (1UL << 1)
> > > #define RWSEM_FLAG_HANDOFF      (1UL << 2)
> > > #define RWSEM_FLAG_READFAIL     (1UL << (BITS_PER_LONG - 1))
> > 
> > I think just the first 3 lines will be enough. Maybe a bit of comment
> > about these bit flags in the count atomic_long value.
> 
> Actually, the old rwsem implementation won't allow you to reliably determine
> if a rwsem is write locked because the xadd instruction is used for write
> locking and the code had to back out the WRITER_BIAS if the attempt failed.
> Maybe that is why XFS has its own code to check if a rwsem is write locked
> which is needed with the old rwsem implementation.

mrlocks pre-date rwsems entirely on Linux.  mrlocks were introduced
to XFS as part of the port from Irix back in 2000. This originally
had a 'ismrlocked()' function for checking lock state.

In 2003, this was expanded to allow explicit lock type checks via
'mrislocked_access() and 'mrislocked_update()' wrappers that checked
internal counters to determine how it was locked.

In 2004, the mrlock was converted to use the generic kernel rwsem
implementation, and because that couldn't be used to track writers,
the mrlock included a mr_writer boolean field to indicate it was
write locked for the purpose of implementing the existing debug
checks. Hence the mrlock debug code has always had reliable
differentiation of read vs write state, whereas we couldn't do that
natively with rwsems for a real long time.

The mrlocks have essentially remained unchanged since 2004 - this
long predates lockdep, and it lives on because gives us something
lockdep doesn't: zero overhead locking validation.

> The new implementation makes this check reliable. Still it is not easy to
> check if a rwsem is read locked as the check will be rather complicated and
> probably racy.

You can't look at these locking checks in isolation. These checks
are done in code paths where we expect the caller to have already
locked the rwsem in the manner required. Hence there should be no races
with rwsem state changes at all.

If we see a locking assert to fire, it means we either screwed up
the XFS locking completely (no races necessary), or there's a bug in
the rwsem implementation. The latter case has occurred several
times; the rwsem locking checks in XFS have uncovered more than one
rwsem implementation bug in the past...

IOWs, the explicit lock state checks and asserts provide a
lock implemetnation validation mechanism that lockdep doesn't....

Cheers,

Dave.

Peter Zijlstra Sept. 8, 2023, 10:44 a.m. UTC | #11

On Fri, Sep 08, 2023 at 09:00:08AM +1000, Dave Chinner wrote:

> > Right, but if you're not the lock owner, your answer to the question is
> > a dice-roll, it might be locked, it might not be.
> 
> Except that the person writing the code knows the call chain that
> leads up to that code, and so they have a pretty good idea whether
> the object should be locked or not. If we are running that code, and
> the object is locked, then it's pretty much guaranteed that the
> owner of the lock is code that executed the check, because otherwise
> we have a *major lock implementation bug*.

Agreed, and this is fine. However there's been some very creative
'use' of the _is_locked() class of functions in the past that did not
follow 'common' sense.

If all usage was: I should be holding this, lets check. I probably
wouldn't have this bad feeling about things.

> > Most devs should run with lockdep on when writing new code, and I know
> > the sanitizer robots run with lockdep on.
> > 
> > In general there seems to be a ton of lockdep on coverage.
> 
> *cough*
> 
> Bit locks, semaphores, and all sorts of other constructs for IO
> serialisation (like inode_dio_wait()) have no lockdep coverage at
> all. IOWs, large chunks of many filesystems, the VFS and the VM have
> little to no lockdep coverage at all.

True, however I was commenting on the assertion that vm code has
duplicate asserts with the implication that was because not a lot of
people run with lockdep on.

> > > we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm)
> > > to give us a good assertion when lockdep is disabled.
> > 
> > Is that really worth it still? I mean, much of these assertions pre-date
> > lockdep.
> 
> And we're trying to propagate them because lockdep isn't a viable
> option for day to day testing of filesystems because of it's
> overhead vs how infrequently it finds new problems.

... in XFS. Lockdep avoids a giant pile of broken from entering the
kernel and the robots still report plenty.

> > > XFS has a problem with using lockdep in general, which is that a worker
> > > thread can be spawned and use the fact that the spawner is holding the
> > > lock.  There's no mechanism for the worker thread to ask "Does struct
> > > task_struct *p hold the lock?".
> > 
> > Will be somewhat tricky to make happen -- but might be doable. It is
> > however an interface that is *very* hard to use correctly. Basically I
> > think you want to also assert that your target task 'p' is blocked,
> > right?
> > 
> > That is: assert @p is blocked and holds @lock.
> 
> That addresses the immediate symptom; it doesn't address the large
> problem with lockdep and needing non-owner rwsem semantics.
> 
> i.e. synchronous task based locking models don't work for
> asynchronous multi-stage pipeline processing engines like XFS. The
> lock protects the data object and follows the data object through
> the processing pipeline, whilst the original submitter moves on to
> the next operation to processes without blocking.
> 
> This is the non-blocking, async processing model that io_uring
> development is pushing filesystems towards, so assuming that we only
> hand a lock to a single worker task and then wait for it complete
> (i.e. synchronous operation) flies in the face of current
> development directions...

I was looking at things from an interface abuse perspective. How easy is
it to do the wrong thing. As said, we've had a bunch of really dodgy
code with the _is_locked class of functions, hence my desire to find
something else.

As to the whole non-owner locking, yes, that's problematic. I'm not
convinced async operations require non-owner locking, at the same time I
do see that IO completions pose a challence.

Coming from the schedulability and real-time corner, non-owner locks are
a nightmare because of the inversions. So yeah, fun to be had I'm sure.

Dave Chinner Sept. 10, 2023, 10:56 p.m. UTC | #12

On Fri, Sep 08, 2023 at 12:44:34PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 08, 2023 at 09:00:08AM +1000, Dave Chinner wrote:
> 
> > > Right, but if you're not the lock owner, your answer to the question is
> > > a dice-roll, it might be locked, it might not be.
> > 
> > Except that the person writing the code knows the call chain that
> > leads up to that code, and so they have a pretty good idea whether
> > the object should be locked or not. If we are running that code, and
> > the object is locked, then it's pretty much guaranteed that the
> > owner of the lock is code that executed the check, because otherwise
> > we have a *major lock implementation bug*.
> 
> Agreed, and this is fine. However there's been some very creative
> 'use' of the _is_locked() class of functions in the past that did not
> follow 'common' sense.
> 
> If all usage was: I should be holding this, lets check. I probably
> wouldn't have this bad feeling about things.

So your argument against such an interface is essentially "we can't
have nice things because someone might abuse them"?

> > > Most devs should run with lockdep on when writing new code, and I know
> > > the sanitizer robots run with lockdep on.
> > > 
> > > In general there seems to be a ton of lockdep on coverage.
> > 
> > *cough*
> > 
> > Bit locks, semaphores, and all sorts of other constructs for IO
> > serialisation (like inode_dio_wait()) have no lockdep coverage at
> > all. IOWs, large chunks of many filesystems, the VFS and the VM have
> > little to no lockdep coverage at all.
> 
> True, however I was commenting on the assertion that vm code has
> duplicate asserts with the implication that was because not a lot of
> people run with lockdep on.

I think that implication is pretty much spot on the money for any
subsystem that has significant correctness testing overhead. A
single fstests regression test pass for a single configuration for
XFS takes well over 4 hours to run. If I add lockdep, it's about 7
hours. If I add lockdep and KASAN, it's closer to 12 hours. It just
isn't viable to run test kernels with these options day-in, day-out.
Maybe once a release I'll run a weekend sanity check with them
enabled, but otherwise the rare issue they find just isn't worth the
cost of enabling them....

> > > > we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm)
> > > > to give us a good assertion when lockdep is disabled.
> > > 
> > > Is that really worth it still? I mean, much of these assertions pre-date
> > > lockdep.
> > 
> > And we're trying to propagate them because lockdep isn't a viable
> > option for day to day testing of filesystems because of it's
> > overhead vs how infrequently it finds new problems.
> 
> ... in XFS. Lockdep avoids a giant pile of broken from entering the
> kernel and the robots still report plenty.

Nobody is suggesting that lockdep gets replaced by these functions.
They are *in addition* to lockdep, and are used to give use 99.9%
certainty that locks are being used correctly without adding any
runtime overhead at all.

That's the whole point - it is simple introspection code that will
find most gross locking mistakes that people make very quickly,
without any real costs. If you're worried about people abusing
introspection code, then you're forgetting that they can just dig
directly at the rwsem guts directly themselves. Adding an interface
that does introspection right and enables the internal
imp[lementation to change without breaking anything is a no brainer;
it stops people from just digging in the guts of the structure and
guaranteeing that code will break if the implementation changes...

> > > > XFS has a problem with using lockdep in general, which is that a worker
> > > > thread can be spawned and use the fact that the spawner is holding the
> > > > lock.  There's no mechanism for the worker thread to ask "Does struct
> > > > task_struct *p hold the lock?".
> > > 
> > > Will be somewhat tricky to make happen -- but might be doable. It is
> > > however an interface that is *very* hard to use correctly. Basically I
> > > think you want to also assert that your target task 'p' is blocked,
> > > right?
> > > 
> > > That is: assert @p is blocked and holds @lock.
> > 
> > That addresses the immediate symptom; it doesn't address the large
> > problem with lockdep and needing non-owner rwsem semantics.
> > 
> > i.e. synchronous task based locking models don't work for
> > asynchronous multi-stage pipeline processing engines like XFS. The
> > lock protects the data object and follows the data object through
> > the processing pipeline, whilst the original submitter moves on to
> > the next operation to processes without blocking.
> > 
> > This is the non-blocking, async processing model that io_uring
> > development is pushing filesystems towards, so assuming that we only
> > hand a lock to a single worker task and then wait for it complete
> > (i.e. synchronous operation) flies in the face of current
> > development directions...
> 
> I was looking at things from an interface abuse perspective. How easy is
> it to do the wrong thing. As said, we've had a bunch of really dodgy
> code with the _is_locked class of functions, hence my desire to find
> something else.
> 
> As to the whole non-owner locking, yes, that's problematic. I'm not
> convinced async operations require non-owner locking, at the same time I
> do see that IO completions pose a challence.
> 
> Coming from the schedulability and real-time corner, non-owner locks are
> a nightmare because of the inversions. So yeah, fun to be had I'm sure.

I'm not sure you understand the scope of the problem with modern
filesystems vs RT processing. The moment code enters a modern
filesystem, it gives up all hope of real-time response guarantees.
There is currently nothing a RT process can do but wait for the
filesystem to finish with the locks it holds, and the wait times are
effectively unbound because there may be a requirement for tens of
thousands of IOs to be done before the lock is dropped and the RT
task can make progress.

Priority inheritance for the lock owner won't make any difference
here, because the latency is not caused by something running on a
CPU.  IOWs, lock inversions and non-owner locks are the very least
of the problems for RT priority apps when it comes to filesystem
operations.

The solution for RT priority apps avoiding priority inversions in
filesystems is going be io_uring. i.e. the initial NOWAIT operation
is done with RT priority in the RT task itself, but if that is going
to be blocked it gets punted to a background worker for async
processing and the RT priority task goes on to processing the next
thing it needs to do.

All the  background async operations are performed with the same
(non-RT) priority and we just don't need to care about priority
inversions or the problems RT has with non-owner lock contexts. The
RT tasks themselves don't care, either, because they don't ever get
stuck waiting on a filesystem lock that a lower priority task might
hold, or get stuck on an operation that might require unbound
amounts of IO to complete (e.g. transaction reservations).

IOWs, if we want to make "RT with filesystems" a reality, we need to
stop worrying about constraining lock implementations and handling
priority inversions. Instead, we need to look towards how to make
filesystem infrastructure fully non-blocking for RT priority tasks
and writing RT applications to use that infrastructure....

Cheers,

Dave.

Matthew Wilcox Sept. 10, 2023, 11:17 p.m. UTC | #13

On Mon, Sep 11, 2023 at 08:56:45AM +1000, Dave Chinner wrote:
> On Fri, Sep 08, 2023 at 12:44:34PM +0200, Peter Zijlstra wrote:
> > Agreed, and this is fine. However there's been some very creative
> > 'use' of the _is_locked() class of functions in the past that did not
> > follow 'common' sense.
> > 
> > If all usage was: I should be holding this, lets check. I probably
> > wouldn't have this bad feeling about things.
> 
> So your argument against such an interface is essentially "we can't
> have nice things because someone might abuse them"?

Some people are very creative ...

I was thinking about how to handle this better.  We could have

static inline void rwsem_assert_locked(const struct rw_semaphore *sem)
{
	BUG_ON(atomic_long_read(&sem->count) == 0);
}

static inline void rwsem_assert_write_locked(const struct rw_semaphore *sem)
{
	BUG_ON((atomic_long_read(&sem->count) & 1) != 1);
}

but then we'd also need to change how XFS currently uses the ASSERT()
macro to be ASSERT_LOCKED(ip, flags), and I understand this code is also
used in userspace, so it'd involve changing that shim, and we're getting
way past the amount of code I'm comfortable changing, and way past the
amount of time I should be spending on this.

And then there'd be the inevitable bikeshedding about "don't use BUG_ON"
and it's probably just for the best if I walk away at this point,
becoming the third person to fail to remove the mrlock.

I'll keep reading this thread to see if some consensus emerges, but I'm
not optimistic.

Dave Chinner Sept. 11, 2023, 12:55 a.m. UTC | #14

On Mon, Sep 11, 2023 at 12:17:18AM +0100, Matthew Wilcox wrote:
> On Mon, Sep 11, 2023 at 08:56:45AM +1000, Dave Chinner wrote:
> > On Fri, Sep 08, 2023 at 12:44:34PM +0200, Peter Zijlstra wrote:
> > > Agreed, and this is fine. However there's been some very creative
> > > 'use' of the _is_locked() class of functions in the past that did not
> > > follow 'common' sense.
> > > 
> > > If all usage was: I should be holding this, lets check. I probably
> > > wouldn't have this bad feeling about things.
> > 
> > So your argument against such an interface is essentially "we can't
> > have nice things because someone might abuse them"?
> 
> Some people are very creative ...

Sure, but that's no reason to stop anyone else from making progress.

> I was thinking about how to handle this better.  We could have
> 
> static inline void rwsem_assert_locked(const struct rw_semaphore *sem)
> {
> 	BUG_ON(atomic_long_read(&sem->count) == 0);
> }
> 
> static inline void rwsem_assert_write_locked(const struct rw_semaphore *sem)
> {
> 	BUG_ON((atomic_long_read(&sem->count) & 1) != 1);
> }

We already have CONFIG_DEBUG_RWSEMS, so we can put these
introspection interfaces inside debug code, and make any attempt to
use them outside of debug builds break the build. e.g:

#if DEBUG_RWSEMS
/*
 * rwsem locked checks can only be used by conditionally compiled
 * subsystem debug code. It is not valid to use them in normal
 * production code.
 */
static inline bool rwsem_is_write_locked()
{
	....
}

static inline bool rwsem_is_locked()
{
	....
}
#else /* !DEBUG_RWSEMS */
#define rwsem_is_write_locked()		BUILD_BUG()
#define rwsem_is_locked()		BUILD_BUG()
#endif /* DEBUG_RWSEMS */

And now we simply add a single line to subsystem Kconfig debug
options to turn on rwsem introspection for their debug checks like
so:

 config XFS_DEBUG
 	bool "XFS Debugging support"
 	depends on XFS_FS
+	select RWSEM_DEBUG
 	help
 	  Say Y here to get an XFS build with many debugging features,
 	  including ASSERT checks, function wrappers around macros,

> but then we'd also need to change how XFS currently uses the ASSERT()
> macro to be ASSERT_LOCKED(ip, flags), and I understand this code is also
> used in userspace, so it'd involve changing that shim, and we're getting
> way past the amount of code I'm comfortable changing, and way past the
> amount of time I should be spending on this.
> 
> And then there'd be the inevitable bikeshedding about "don't use BUG_ON"
> and it's probably just for the best if I walk away at this point,
> becoming the third person to fail to remove the mrlock.

Yeah, which further points out how ridiculous the situation is. This
is useful debug code and it can *obviously* and *easily* be
constrained to debug environments.

Cheers,

Dave.

Waiman Long Sept. 11, 2023, 2:15 a.m. UTC | #15

On 9/10/23 20:55, Dave Chinner wrote:
> On Mon, Sep 11, 2023 at 12:17:18AM +0100, Matthew Wilcox wrote:
>> On Mon, Sep 11, 2023 at 08:56:45AM +1000, Dave Chinner wrote:
>>> On Fri, Sep 08, 2023 at 12:44:34PM +0200, Peter Zijlstra wrote:
>>>> Agreed, and this is fine. However there's been some very creative
>>>> 'use' of the _is_locked() class of functions in the past that did not
>>>> follow 'common' sense.
>>>>
>>>> If all usage was: I should be holding this, lets check. I probably
>>>> wouldn't have this bad feeling about things.
>>> So your argument against such an interface is essentially "we can't
>>> have nice things because someone might abuse them"?
>> Some people are very creative ...
> Sure, but that's no reason to stop anyone else from making progress.
>
>> I was thinking about how to handle this better.  We could have
>>
>> static inline void rwsem_assert_locked(const struct rw_semaphore *sem)
>> {
>> 	BUG_ON(atomic_long_read(&sem->count) == 0);
>> }
>>
>> static inline void rwsem_assert_write_locked(const struct rw_semaphore *sem)
>> {
>> 	BUG_ON((atomic_long_read(&sem->count) & 1) != 1);
>> }
> We already have CONFIG_DEBUG_RWSEMS, so we can put these
> introspection interfaces inside debug code, and make any attempt to
> use them outside of debug builds break the build. e.g:
>
> #if DEBUG_RWSEMS
> /*
>   * rwsem locked checks can only be used by conditionally compiled
>   * subsystem debug code. It is not valid to use them in normal
>   * production code.
>   */
> static inline bool rwsem_is_write_locked()
> {
> 	....
> }
>
> static inline bool rwsem_is_locked()
> {
> 	....
> }
> #else /* !DEBUG_RWSEMS */
> #define rwsem_is_write_locked()		BUILD_BUG()
> #define rwsem_is_locked()		BUILD_BUG()
> #endif /* DEBUG_RWSEMS */
>
> And now we simply add a single line to subsystem Kconfig debug
> options to turn on rwsem introspection for their debug checks like
> so:
>
>   config XFS_DEBUG
>   	bool "XFS Debugging support"
>   	depends on XFS_FS
> +	select RWSEM_DEBUG
>   	help
>   	  Say Y here to get an XFS build with many debugging features,
>   	  including ASSERT checks, function wrappers around macros,

That may be a possible compromise. Actually, I am not against having 
them defined even outside the DEBUG_RWSEMS. We already have 
mutex_is_locked() defined and used in a lot of places. So this is just 
providing the rwsem equivalents.

I also agreed that these APIs can be misused by other users. I think we 
should clearly document the caveats of using these. So unless there are 
other means of maintaining the stability of the lock state, the test 
result may no longer be true right after the test. It is simply just the 
lock state at a certain moment in time. Callers are using them at their 
own risk.

Cheers,
Longman

Dave Chinner Sept. 11, 2023, 10:29 p.m. UTC | #16

On Sun, Sep 10, 2023 at 10:15:59PM -0400, Waiman Long wrote:
> 
> On 9/10/23 20:55, Dave Chinner wrote:
> > On Mon, Sep 11, 2023 at 12:17:18AM +0100, Matthew Wilcox wrote:
> > > On Mon, Sep 11, 2023 at 08:56:45AM +1000, Dave Chinner wrote:
> > > > On Fri, Sep 08, 2023 at 12:44:34PM +0200, Peter Zijlstra wrote:
> > > > > Agreed, and this is fine. However there's been some very creative
> > > > > 'use' of the _is_locked() class of functions in the past that did not
> > > > > follow 'common' sense.
> > > > > 
> > > > > If all usage was: I should be holding this, lets check. I probably
> > > > > wouldn't have this bad feeling about things.
> > > > So your argument against such an interface is essentially "we can't
> > > > have nice things because someone might abuse them"?
> > > Some people are very creative ...
> > Sure, but that's no reason to stop anyone else from making progress.
> > 
> > > I was thinking about how to handle this better.  We could have
> > > 
> > > static inline void rwsem_assert_locked(const struct rw_semaphore *sem)
> > > {
> > > 	BUG_ON(atomic_long_read(&sem->count) == 0);
> > > }
> > > 
> > > static inline void rwsem_assert_write_locked(const struct rw_semaphore *sem)
> > > {
> > > 	BUG_ON((atomic_long_read(&sem->count) & 1) != 1);
> > > }
> > We already have CONFIG_DEBUG_RWSEMS, so we can put these
> > introspection interfaces inside debug code, and make any attempt to
> > use them outside of debug builds break the build. e.g:
> > 
> > #if DEBUG_RWSEMS
> > /*
> >   * rwsem locked checks can only be used by conditionally compiled
> >   * subsystem debug code. It is not valid to use them in normal
> >   * production code.
> >   */
> > static inline bool rwsem_is_write_locked()
> > {
> > 	....
> > }
> > 
> > static inline bool rwsem_is_locked()
> > {
> > 	....
> > }
> > #else /* !DEBUG_RWSEMS */
> > #define rwsem_is_write_locked()		BUILD_BUG()
> > #define rwsem_is_locked()		BUILD_BUG()
> > #endif /* DEBUG_RWSEMS */
> > 
> > And now we simply add a single line to subsystem Kconfig debug
> > options to turn on rwsem introspection for their debug checks like
> > so:
> > 
> >   config XFS_DEBUG
> >   	bool "XFS Debugging support"
> >   	depends on XFS_FS
> > +	select RWSEM_DEBUG
> >   	help
> >   	  Say Y here to get an XFS build with many debugging features,
> >   	  including ASSERT checks, function wrappers around macros,
> 
> That may be a possible compromise. Actually, I am not against having them
> defined even outside the DEBUG_RWSEMS. We already have mutex_is_locked()
> defined and used in a lot of places. So this is just providing the rwsem
> equivalents.

So, once again, we have mixed messages from the lock maintainers.
One says "no, it might get abused", another says "I'm fine with
that", and now we have a maintainer disagreement stalemate.

This is dysfunctional.

Peter, Waiman, please make a decision one way or the other about
allowing rwsems ito support native write lock checking. In the
absence of an actual yes/no decision, do we assume that the
maintainers don't actually care about it and we should just
submit it straight to Linus?

-Dave.

Peter Zijlstra Sept. 12, 2023, 9:03 a.m. UTC | #17

On Tue, Sep 12, 2023 at 08:29:55AM +1000, Dave Chinner wrote:

> So, once again, we have mixed messages from the lock maintainers.
> One says "no, it might get abused", another says "I'm fine with
> that", and now we have a maintainer disagreement stalemate.

I didn't say no, I was trying to see if there's alternatives because the
is_locked pattern has a history of abuse.

If not, then sure we can do this; it's not like I managed to get rid of
muteX_is_locked() -- and I actually tried at some point :/

And just now I grepped for it, and look what I find:

drivers/hid/hid-nintendo.c:     if (unlikely(mutex_is_locked(&ctlr->output_mutex))) {
drivers/nvdimm/btt.c:           if (mutex_is_locked(&arena->err_lock)

And there's more :-(

Also, please just calm down already..

Matthew Wilcox Sept. 12, 2023, 12:28 p.m. UTC | #18

On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote:
> If not, then sure we can do this; it's not like I managed to get rid of
> muteX_is_locked() -- and I actually tried at some point :/
> 
> And just now I grepped for it, and look what I find:
> 
> drivers/hid/hid-nintendo.c:     if (unlikely(mutex_is_locked(&ctlr->output_mutex))) {
> drivers/nvdimm/btt.c:           if (mutex_is_locked(&arena->err_lock)
> 
> And there's more :-(

Are these actually abuse?  I looked at these two, and they both seem to
be asking "Does somebody else currently have this mutex?" rather than
"Do I have this mutex?".

Peter Zijlstra Sept. 12, 2023, 1:52 p.m. UTC | #19

On Tue, Sep 12, 2023 at 01:28:13PM +0100, Matthew Wilcox wrote:
> On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote:
> > If not, then sure we can do this; it's not like I managed to get rid of
> > muteX_is_locked() -- and I actually tried at some point :/
> > 
> > And just now I grepped for it, and look what I find:
> > 
> > drivers/hid/hid-nintendo.c:     if (unlikely(mutex_is_locked(&ctlr->output_mutex))) {
> > drivers/nvdimm/btt.c:           if (mutex_is_locked(&arena->err_lock)
> > 
> > And there's more :-(
> 
> Are these actually abuse?  I looked at these two, and they both seem to
> be asking "Does somebody else currently have this mutex?" rather than
> "Do I have this mutex?".

It's effectively a random number generator in that capacity. Someone
might have it or might have had it when you looked and no longer have
it, or might have it now but not when you asked.

Matthew Wilcox Sept. 12, 2023, 1:58 p.m. UTC | #20

On Tue, Sep 12, 2023 at 03:52:13PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 12, 2023 at 01:28:13PM +0100, Matthew Wilcox wrote:
> > On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote:
> > > If not, then sure we can do this; it's not like I managed to get rid of
> > > muteX_is_locked() -- and I actually tried at some point :/
> > > 
> > > And just now I grepped for it, and look what I find:
> > > 
> > > drivers/hid/hid-nintendo.c:     if (unlikely(mutex_is_locked(&ctlr->output_mutex))) {
> > > drivers/nvdimm/btt.c:           if (mutex_is_locked(&arena->err_lock)
> > > 
> > > And there's more :-(
> > 
> > Are these actually abuse?  I looked at these two, and they both seem to
> > be asking "Does somebody else currently have this mutex?" rather than
> > "Do I have this mutex?".
> 
> It's effectively a random number generator in that capacity. Someone
> might have it or might have had it when you looked and no longer have
> it, or might have it now but not when you asked.

Well, no.

                if (mutex_is_locked(&arena->err_lock)
                                || arena->freelist[lane].has_err) {
                        nd_region_release_lane(btt->nd_region, lane);

                        ret = arena_clear_freelist_error(arena, lane);

So that's "Is somebody currently processing an error, or have they
already finished setting an error".  Sure, it's somewhat racy, but
it looks like a performance optimisation, not something that needs
100% accuracy.

The other one's in a similar boat; an optimisation if anyone else is
currently holding this mutex:

        /*
         * Immediately after receiving a report is the most reliable time to
         * send a subcommand to the controller. Wake any subcommand senders
         * waiting for a report.
         */
        if (unlikely(mutex_is_locked(&ctlr->output_mutex))) {
                spin_lock_irqsave(&ctlr->lock, flags);
                ctlr->received_input_report = true;
                spin_unlock_irqrestore(&ctlr->lock, flags);
                wake_up(&ctlr->wait);
        }

Sure, they might not still be holding it, or it may have been grabbed
one clock tick later; that just means they miss out on this optimisation.

Peter Zijlstra Sept. 12, 2023, 2:02 p.m. UTC | #21

On Tue, Sep 12, 2023 at 03:52:13PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 12, 2023 at 01:28:13PM +0100, Matthew Wilcox wrote:
> > On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote:
> > > If not, then sure we can do this; it's not like I managed to get rid of
> > > muteX_is_locked() -- and I actually tried at some point :/
> > > 
> > > And just now I grepped for it, and look what I find:
> > > 
> > > drivers/hid/hid-nintendo.c:     if (unlikely(mutex_is_locked(&ctlr->output_mutex))) {
> > > drivers/nvdimm/btt.c:           if (mutex_is_locked(&arena->err_lock)
> > > 
> > > And there's more :-(
> > 
> > Are these actually abuse?  I looked at these two, and they both seem to
> > be asking "Does somebody else currently have this mutex?" rather than
> > "Do I have this mutex?".
> 
> It's effectively a random number generator in that capacity. Someone
> might have it or might have had it when you looked and no longer have
> it, or might have it now but not when you asked.

Also, there's more fun; the 'is_locked' store from spin_lock() (or
mutex, or whatever) is not ordered vs any other write inside the
critical section.

So something like:

	bar = 0;

	CPU0			CPU1

	spin_lock(&foo)		
	bar = 1;		x = READ_ONCE(bar)
				y = spin_is_locked(&foo);
	spin_unlock(&foo);

can have x==1 && y==0, even though CPU0 is currently inside the critical
section.

Normally that doesn't matter, and for the program-order case where you
ask 'am I holding the lock' this obviously cannot go wrong. But the
moment you ask: 'is someone else holding the lock' it all goes sideways
real fast.

We've been there, done that, got a t-shirt etc..

Peter Zijlstra Sept. 12, 2023, 2:23 p.m. UTC | #22

On Tue, Sep 12, 2023 at 02:58:32PM +0100, Matthew Wilcox wrote:
> On Tue, Sep 12, 2023 at 03:52:13PM +0200, Peter Zijlstra wrote:
> > On Tue, Sep 12, 2023 at 01:28:13PM +0100, Matthew Wilcox wrote:
> > > On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote:
> > > > If not, then sure we can do this; it's not like I managed to get rid of
> > > > muteX_is_locked() -- and I actually tried at some point :/
> > > > 
> > > > And just now I grepped for it, and look what I find:
> > > > 
> > > > drivers/hid/hid-nintendo.c:     if (unlikely(mutex_is_locked(&ctlr->output_mutex))) {
> > > > drivers/nvdimm/btt.c:           if (mutex_is_locked(&arena->err_lock)
> > > > 
> > > > And there's more :-(
> > > 
> > > Are these actually abuse?  I looked at these two, and they both seem to
> > > be asking "Does somebody else currently have this mutex?" rather than
> > > "Do I have this mutex?".
> > 
> > It's effectively a random number generator in that capacity. Someone
> > might have it or might have had it when you looked and no longer have
> > it, or might have it now but not when you asked.
> 
> Well, no.
> 
>                 if (mutex_is_locked(&arena->err_lock)
>                                 || arena->freelist[lane].has_err) {
>                         nd_region_release_lane(btt->nd_region, lane);
> 
>                         ret = arena_clear_freelist_error(arena, lane);
> 
> So that's "Is somebody currently processing an error, or have they
> already finished setting an error".  Sure, it's somewhat racy, but
> it looks like a performance optimisation, not something that needs
> 100% accuracy.

We're arguing past one another I think. Yes mutex_is_locked() is a
random number generator when asked for something you don't own. But it
might not be a bug because the code is ok with races.

It is still fully dodgy IMO, such usage is pretty close to UB.

Darrick J. Wong Sept. 12, 2023, 3:27 p.m. UTC | #23

On Tue, Sep 12, 2023 at 04:23:00PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 12, 2023 at 02:58:32PM +0100, Matthew Wilcox wrote:
> > On Tue, Sep 12, 2023 at 03:52:13PM +0200, Peter Zijlstra wrote:
> > > On Tue, Sep 12, 2023 at 01:28:13PM +0100, Matthew Wilcox wrote:
> > > > On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote:
> > > > > If not, then sure we can do this; it's not like I managed to get rid of
> > > > > muteX_is_locked() -- and I actually tried at some point :/
> > > > > 
> > > > > And just now I grepped for it, and look what I find:
> > > > > 
> > > > > drivers/hid/hid-nintendo.c:     if (unlikely(mutex_is_locked(&ctlr->output_mutex))) {
> > > > > drivers/nvdimm/btt.c:           if (mutex_is_locked(&arena->err_lock)
> > > > > 
> > > > > And there's more :-(
> > > > 
> > > > Are these actually abuse?  I looked at these two, and they both seem to
> > > > be asking "Does somebody else currently have this mutex?" rather than
> > > > "Do I have this mutex?".
> > > 
> > > It's effectively a random number generator in that capacity. Someone
> > > might have it or might have had it when you looked and no longer have
> > > it, or might have it now but not when you asked.
> > 
> > Well, no.
> > 
> >                 if (mutex_is_locked(&arena->err_lock)
> >                                 || arena->freelist[lane].has_err) {
> >                         nd_region_release_lane(btt->nd_region, lane);
> > 
> >                         ret = arena_clear_freelist_error(arena, lane);
> > 
> > So that's "Is somebody currently processing an error, or have they
> > already finished setting an error".  Sure, it's somewhat racy, but
> > it looks like a performance optimisation, not something that needs
> > 100% accuracy.
> 
> We're arguing past one another I think. Yes mutex_is_locked() is a
> random number generator when asked for something you don't own. But it
> might not be a bug because the code is ok with races.
> 
> It is still fully dodgy IMO, such usage is pretty close to UB.

My 2 cents here:

I could live with Longman's suggestion of an rwsem_assert_is_locked that
only exists if DEBUG_RWSEMS is enabled.  Something like:

#ifdef CONFIG_DEBUG_RWSEMS
static inline bool __rwsem_assert_is_locked(struct rw_semaphore *rwsem,
		const char *file, int line)
{
	bool ret = rwsem_is_locked(rwsem);
	if (!ret)
		WARN(1, "!rwsem_is_locked(rwsem) at %s line %d", file, line);
	return ret;
}
#define rwsem_assert_is_locked(r) \
	__rwsem_assert_is_locked((r), __FILE__, __LINE__)
#endif

and then XFS could do:

	ASSERT(rwsem_assert_is_locked(&VFS_I(ip)->i_rwsem));

Wherein ASSERT is only #defined if CONFIG_XFS_DEBUG, and XFS_DEBUG
selects DEBUG_RWSEMS, per Longman's suggestion.  That's work for what we
want it for (simple cheap lock checking) without becoming a general
lockabuse predicate.

--D

Dave Chinner Sept. 12, 2023, 11:16 p.m. UTC | #24

On Tue, Sep 12, 2023 at 11:03:42AM +0200, Peter Zijlstra wrote:
> On Tue, Sep 12, 2023 at 08:29:55AM +1000, Dave Chinner wrote:
> 
> > So, once again, we have mixed messages from the lock maintainers.
> > One says "no, it might get abused", another says "I'm fine with
> > that", and now we have a maintainer disagreement stalemate.
> 
> I didn't say no, I was trying to see if there's alternatives because the
> is_locked pattern has a history of abuse.

Yet you haven't suggested or commented on the proposed methods to
avoid abuse - you are still arguing that it can be abused. Go back
and read what I proposed before continuing to argue about
mutex_is_locked()....

> If not, then sure we can do this; it's not like I managed to get rid of
> muteX_is_locked() -- and I actually tried at some point :/
> 
> And just now I grepped for it, and look what I find:
> 
> drivers/hid/hid-nintendo.c:     if (unlikely(mutex_is_locked(&ctlr->output_mutex))) {
> drivers/nvdimm/btt.c:           if (mutex_is_locked(&arena->err_lock)
> 
> And there's more :-(

.... like this.

Seriously: if we put it behind CONFIG_DEBUG_RWSEM, and then turn
that on when other subsystem debug code wants the rwsem
introspection, why does anyone outside that subsystem even how it
gets used? It won't even get used in production kernels, because
nobody will turn something on that requires rwsem debug in a
production kernel.

If you are still concerned that this will happen, then do the same
that we've done for trace_printk and other debug only functions:
dump a big warning at boot time that rwsem debug is enabled and this
is not a supported production kernel configuration.

> Also, please just calm down already..

I'm perfectly calm and relaxed.  Asking for a definitive decision
between co-maintainers who are giving decidedly mixed signals is a
very reasonable request to make.  Just because you may not like what
such a request implies about how the code is being maintained, it
doesn't mean I'm the slightest bit upset, hysterical or irrational.

-Dave.

Peter Zijlstra Sept. 13, 2023, 8:59 a.m. UTC | #25

On Tue, Sep 12, 2023 at 08:27:15AM -0700, Darrick J. Wong wrote:

> I could live with Longman's suggestion of an rwsem_assert_is_locked that
> only exists if DEBUG_RWSEMS is enabled.  Something like:
> 
> #ifdef CONFIG_DEBUG_RWSEMS
> static inline bool __rwsem_assert_is_locked(struct rw_semaphore *rwsem,
> 		const char *file, int line)
> {
> 	bool ret = rwsem_is_locked(rwsem);
> 	if (!ret)
> 		WARN(1, "!rwsem_is_locked(rwsem) at %s line %d", file, line);
> 	return ret;
> }
> #define rwsem_assert_is_locked(r) \
> 	__rwsem_assert_is_locked((r), __FILE__, __LINE__)
> #endif
> 
> and then XFS could do:
> 
> 	ASSERT(rwsem_assert_is_locked(&VFS_I(ip)->i_rwsem));
> 
> Wherein ASSERT is only #defined if CONFIG_XFS_DEBUG, and XFS_DEBUG
> selects DEBUG_RWSEMS, per Longman's suggestion.  That's work for what we
> want it for (simple cheap lock checking) without becoming a general
> lockabuse predicate.

Ack.

[1/5] locking: Add rwsem_is_write_locked()

Commit Message

Comments

Patch