diff mbox series

[RFC] vfs: add inode lockdep assertions

Message ID 20230831151414.2714750-1-mjguzik@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] vfs: add inode lockdep assertions | expand

Commit Message

Mateusz Guzik Aug. 31, 2023, 3:14 p.m. UTC
Thread "Use exclusive lock for file_remove_privs" [1] reports an issue
which should have been found by asserts -- inode not write locked by the
caller.

It did not happen because the attempt to do it in notify_change:
WARN_ON_ONCE(!inode_is_locked(inode));

passes if the inode is only read-locked:
static inline int rwsem_is_locked(struct rw_semaphore *sem)
{
        return atomic_long_read(&sem->count) != 0;
}

According to git blame this regressed from 2 commits:
1. 5955102c9984 ("wrappers for ->i_mutex access") which replaced a
   bunch of mutex_is_locked with inode_is_locked
2. 9902af79c01a ("parallel lookups: actual switch to rwsem") which
   implemented inode_is_locked as a mere check on the semaphore being
   held in *any* manner

In order to remedy this I'm proposing lockdep-ing the check with 2
helpers: inode_assert_locked and inode_assert_write_locked

Below I'm adding the helpers and converting *some* of the spots modified
by the first patch. I boot tested it and nothing blow up on ext4, but
btrfs should cause a complaint.

I can finish the other spots originally touched by 1 and touch up the 3
uses I grepped in fs/namei.c, but ultimately filesystem maintainers are
going to have to patch their code at their leasure. On top of that there
are probably quite a few places which should assert, but don't.

Comments?

Link: https://lore.kernel.org/linux-fsdevel/20230830181519.2964941-1-bschubert@ddn.com/

---
 fs/attr.c          |  2 +-
 fs/btrfs/xattr.c   |  2 +-
 fs/ext4/ext4.h     |  4 ++--
 fs/ext4/extents.c  |  4 ++--
 fs/ext4/inode.c    |  4 ++--
 include/linux/fs.h | 10 ++++++++++
 6 files changed, 18 insertions(+), 8 deletions(-)

Comments

Christian Brauner Sept. 1, 2023, 12:43 p.m. UTC | #1
On Thu, Aug 31, 2023 at 05:14:14PM +0200, Mateusz Guzik wrote:
> Thread "Use exclusive lock for file_remove_privs" [1] reports an issue
> which should have been found by asserts -- inode not write locked by the
> caller.
> 
> It did not happen because the attempt to do it in notify_change:
> WARN_ON_ONCE(!inode_is_locked(inode));
> 
> passes if the inode is only read-locked:
> static inline int rwsem_is_locked(struct rw_semaphore *sem)
> {
>         return atomic_long_read(&sem->count) != 0;
> }
> 
> According to git blame this regressed from 2 commits:
> 1. 5955102c9984 ("wrappers for ->i_mutex access") which replaced a
>    bunch of mutex_is_locked with inode_is_locked
> 2. 9902af79c01a ("parallel lookups: actual switch to rwsem") which
>    implemented inode_is_locked as a mere check on the semaphore being
>    held in *any* manner
> 
> In order to remedy this I'm proposing lockdep-ing the check with 2
> helpers: inode_assert_locked and inode_assert_write_locked
> 
> Below I'm adding the helpers and converting *some* of the spots modified
> by the first patch. I boot tested it and nothing blow up on ext4, but
> btrfs should cause a complaint.
> 
> I can finish the other spots originally touched by 1 and touch up the 3
> uses I grepped in fs/namei.c, but ultimately filesystem maintainers are
> going to have to patch their code at their leasure. On top of that there
> are probably quite a few places which should assert, but don't.
> 
> Comments?

I think this is useful and I would be supportive of this.
Matthew Wilcox (Oracle) Sept. 6, 2023, 3:20 p.m. UTC | #2
On Thu, Aug 31, 2023 at 05:14:14PM +0200, Mateusz Guzik wrote:
> +++ b/include/linux/fs.h
> @@ -842,6 +842,16 @@ static inline void inode_lock_shared_nested(struct inode *inode, unsigned subcla
>  	down_read_nested(&inode->i_rwsem, subclass);
>  }
>  
> +static inline void inode_assert_locked(struct inode *inode)
> +{
> +	lockdep_assert_held(&inode->i_rwsem);
> +}
> +
> +static inline void inode_assert_write_locked(struct inode *inode)
> +{
> +	lockdep_assert_held_write(&inode->i_rwsem);
> +}

This mirrors what we have in mm, but it's only going to trigger on
builds that have lockdep enabled.  Lockdep is very expensive; it
easily doubles the time it takes to run xfstests on my laptop, so
I don't generally enable it.  So what we also have in MM is:

static inline void mmap_assert_write_locked(struct mm_struct *mm)
{
        lockdep_assert_held_write(&mm->mmap_lock);
        VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
}

Now if you have lockdep enabled, you get the lockdep check which
gives you all the lovely lockdep information, but if you don't, you
at least get the cheap check that someone is holding the lock at all.

ie I would make this:

+static inline void inode_assert_write_locked(struct inode *inode)
+{
+	lockdep_assert_held_write(&inode->i_rwsem);
+	WARN_ON_ONCE(!inode_is_locked(inode));
+}

Maybe the locking people could give us a rwsem_is_write_locked()
predicate, but until then, this is the best solution we came up with.
Bernd Schubert Sept. 6, 2023, 3:23 p.m. UTC | #3
On 9/6/23 17:20, Matthew Wilcox wrote:
> On Thu, Aug 31, 2023 at 05:14:14PM +0200, Mateusz Guzik wrote:
>> +++ b/include/linux/fs.h
>> @@ -842,6 +842,16 @@ static inline void inode_lock_shared_nested(struct inode *inode, unsigned subcla
>>   	down_read_nested(&inode->i_rwsem, subclass);
>>   }
>>   
>> +static inline void inode_assert_locked(struct inode *inode)
>> +{
>> +	lockdep_assert_held(&inode->i_rwsem);
>> +}
>> +
>> +static inline void inode_assert_write_locked(struct inode *inode)
>> +{
>> +	lockdep_assert_held_write(&inode->i_rwsem);
>> +}
> 
> This mirrors what we have in mm, but it's only going to trigger on
> builds that have lockdep enabled.  Lockdep is very expensive; it
> easily doubles the time it takes to run xfstests on my laptop, so
> I don't generally enable it.  So what we also have in MM is:
> 
> static inline void mmap_assert_write_locked(struct mm_struct *mm)
> {
>          lockdep_assert_held_write(&mm->mmap_lock);
>          VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
> }
> 
> Now if you have lockdep enabled, you get the lockdep check which
> gives you all the lovely lockdep information, but if you don't, you
> at least get the cheap check that someone is holding the lock at all.
> 
> ie I would make this:
> 
> +static inline void inode_assert_write_locked(struct inode *inode)
> +{
> +	lockdep_assert_held_write(&inode->i_rwsem);
> +	WARN_ON_ONCE(!inode_is_locked(inode));
> +}
> 
> Maybe the locking people could give us a rwsem_is_write_locked()
> predicate, but until then, this is the best solution we came up with.


Which is exactly what I had suggested in the other thread :)
Matthew Wilcox (Oracle) Sept. 6, 2023, 3:27 p.m. UTC | #4
On Wed, Sep 06, 2023 at 05:23:42PM +0200, Bernd Schubert wrote:
> 
> 
> On 9/6/23 17:20, Matthew Wilcox wrote:
> > On Thu, Aug 31, 2023 at 05:14:14PM +0200, Mateusz Guzik wrote:
> > > +++ b/include/linux/fs.h
> > > @@ -842,6 +842,16 @@ static inline void inode_lock_shared_nested(struct inode *inode, unsigned subcla
> > >   	down_read_nested(&inode->i_rwsem, subclass);
> > >   }
> > > +static inline void inode_assert_locked(struct inode *inode)
> > > +{
> > > +	lockdep_assert_held(&inode->i_rwsem);
> > > +}
> > > +
> > > +static inline void inode_assert_write_locked(struct inode *inode)
> > > +{
> > > +	lockdep_assert_held_write(&inode->i_rwsem);
> > > +}
> > 
> > This mirrors what we have in mm, but it's only going to trigger on
> > builds that have lockdep enabled.  Lockdep is very expensive; it
> > easily doubles the time it takes to run xfstests on my laptop, so
> > I don't generally enable it.  So what we also have in MM is:
> > 
> > static inline void mmap_assert_write_locked(struct mm_struct *mm)
> > {
> >          lockdep_assert_held_write(&mm->mmap_lock);
> >          VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
> > }
> > 
> > Now if you have lockdep enabled, you get the lockdep check which
> > gives you all the lovely lockdep information, but if you don't, you
> > at least get the cheap check that someone is holding the lock at all.
> > 
> > ie I would make this:
> > 
> > +static inline void inode_assert_write_locked(struct inode *inode)
> > +{
> > +	lockdep_assert_held_write(&inode->i_rwsem);
> > +	WARN_ON_ONCE(!inode_is_locked(inode));
> > +}
> > 
> > Maybe the locking people could give us a rwsem_is_write_locked()
> > predicate, but until then, this is the best solution we came up with.
> 
> 
> Which is exactly what I had suggested in the other thread :)

Yes, but apparently comments in that thread don't count :eyeroll:
Darrick J. Wong Sept. 6, 2023, 3:29 p.m. UTC | #5
On Wed, Sep 06, 2023 at 05:23:42PM +0200, Bernd Schubert wrote:
> 
> 
> On 9/6/23 17:20, Matthew Wilcox wrote:
> > On Thu, Aug 31, 2023 at 05:14:14PM +0200, Mateusz Guzik wrote:
> > > +++ b/include/linux/fs.h
> > > @@ -842,6 +842,16 @@ static inline void inode_lock_shared_nested(struct inode *inode, unsigned subcla
> > >   	down_read_nested(&inode->i_rwsem, subclass);
> > >   }
> > > +static inline void inode_assert_locked(struct inode *inode)
> > > +{
> > > +	lockdep_assert_held(&inode->i_rwsem);
> > > +}
> > > +
> > > +static inline void inode_assert_write_locked(struct inode *inode)
> > > +{
> > > +	lockdep_assert_held_write(&inode->i_rwsem);
> > > +}
> > 
> > This mirrors what we have in mm, but it's only going to trigger on
> > builds that have lockdep enabled.  Lockdep is very expensive; it
> > easily doubles the time it takes to run xfstests on my laptop, so
> > I don't generally enable it.  So what we also have in MM is:
> > 
> > static inline void mmap_assert_write_locked(struct mm_struct *mm)
> > {
> >          lockdep_assert_held_write(&mm->mmap_lock);
> >          VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
> > }
> > 
> > Now if you have lockdep enabled, you get the lockdep check which
> > gives you all the lovely lockdep information, but if you don't, you
> > at least get the cheap check that someone is holding the lock at all.
> > 
> > ie I would make this:
> > 
> > +static inline void inode_assert_write_locked(struct inode *inode)
> > +{
> > +	lockdep_assert_held_write(&inode->i_rwsem);
> > +	WARN_ON_ONCE(!inode_is_locked(inode));
> > +}
> > 
> > Maybe the locking people could give us a rwsem_is_write_locked()
> > predicate, but until then, this is the best solution we came up with.
> 
> 
> Which is exactly what I had suggested in the other thread :)

Or hoist the XFS mrlock, because it actually /does/ know if the rwsem is
held in shared or exclusive mode.

--D
Matthew Wilcox (Oracle) Sept. 6, 2023, 4 p.m. UTC | #6
On Wed, Sep 06, 2023 at 08:29:48AM -0700, Darrick J. Wong wrote:
> Or hoist the XFS mrlock, because it actually /does/ know if the rwsem is
> held in shared or exclusive mode.

... or to put it another way, if we had rwsem_is_write_locked(),
we could get rid of mrlock?

diff --git a/fs/xfs/mrlock.h b/fs/xfs/mrlock.h
index 79155eec341b..5530f03aaed1 100644
--- a/fs/xfs/mrlock.h
+++ b/fs/xfs/mrlock.h
@@ -10,18 +10,10 @@
 
 typedef struct {
 	struct rw_semaphore	mr_lock;
-#if defined(DEBUG) || defined(XFS_WARN)
-	int			mr_writer;
-#endif
 } mrlock_t;
 
-#if defined(DEBUG) || defined(XFS_WARN)
-#define mrinit(mrp, name)	\
-	do { (mrp)->mr_writer = 0; init_rwsem(&(mrp)->mr_lock); } while (0)
-#else
 #define mrinit(mrp, name)	\
 	do { init_rwsem(&(mrp)->mr_lock); } while (0)
-#endif
 
 #define mrlock_init(mrp, t,n,s)	mrinit(mrp, n)
 #define mrfree(mrp)		do { } while (0)
@@ -34,9 +26,6 @@ static inline void mraccess_nested(mrlock_t *mrp, int subclass)
 static inline void mrupdate_nested(mrlock_t *mrp, int subclass)
 {
 	down_write_nested(&mrp->mr_lock, subclass);
-#if defined(DEBUG) || defined(XFS_WARN)
-	mrp->mr_writer = 1;
-#endif
 }
 
 static inline int mrtryaccess(mrlock_t *mrp)
@@ -48,17 +37,11 @@ static inline int mrtryupdate(mrlock_t *mrp)
 {
 	if (!down_write_trylock(&mrp->mr_lock))
 		return 0;
-#if defined(DEBUG) || defined(XFS_WARN)
-	mrp->mr_writer = 1;
-#endif
 	return 1;
 }
 
 static inline void mrunlock_excl(mrlock_t *mrp)
 {
-#if defined(DEBUG) || defined(XFS_WARN)
-	mrp->mr_writer = 0;
-#endif
 	up_write(&mrp->mr_lock);
 }
 
@@ -69,9 +52,6 @@ static inline void mrunlock_shared(mrlock_t *mrp)
 
 static inline void mrdemote(mrlock_t *mrp)
 {
-#if defined(DEBUG) || defined(XFS_WARN)
-	mrp->mr_writer = 0;
-#endif
 	downgrade_write(&mrp->mr_lock);
 }
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 9e62cc500140..b99c3bd78c5e 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -361,7 +361,7 @@ xfs_isilocked(
 {
 	if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) {
 		if (!(lock_flags & XFS_ILOCK_SHARED))
-			return !!ip->i_lock.mr_writer;
+			return rwsem_is_write_locked(&ip->i_lock.mr_lock);
 		return rwsem_is_locked(&ip->i_lock.mr_lock);
 	}
 
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index e05e167dbd16..277b8c96bbf9 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -69,7 +69,7 @@ static inline void mmap_assert_locked(struct mm_struct *mm)
 static inline void mmap_assert_write_locked(struct mm_struct *mm)
 {
 	lockdep_assert_held_write(&mm->mmap_lock);
-	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
+	VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm);
 }
 
 #ifdef CONFIG_PER_VMA_LOCK
diff --git a/include/linux/rwbase_rt.h b/include/linux/rwbase_rt.h
index 1d264dd08625..3c25b14edc05 100644
--- a/include/linux/rwbase_rt.h
+++ b/include/linux/rwbase_rt.h
@@ -31,6 +31,11 @@ static __always_inline bool rw_base_is_locked(struct rwbase_rt *rwb)
 	return atomic_read(&rwb->readers) != READER_BIAS;
 }
 
+static __always_inline bool rw_base_is_write_locked(struct rwbase_rt *rwb)
+{
+	return atomic_read(&rwb->readers) == WRITER_BIAS;
+}
+
 static __always_inline bool rw_base_is_contended(struct rwbase_rt *rwb)
 {
 	return atomic_read(&rwb->readers) > 0;
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 1dd530ce8b45..241a12c6019e 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
 	return atomic_long_read(&sem->count) != 0;
 }
 
+static inline int rwsem_is_write_locked(struct rw_semaphore *sem)
+{
+	return atomic_long_read(&sem->count) & 1;
+}
+
 #define RWSEM_UNLOCKED_VALUE		0L
 #define __RWSEM_COUNT_INIT(name)	.count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE)
 
@@ -157,6 +162,11 @@ static __always_inline int rwsem_is_locked(struct rw_semaphore *sem)
 	return rw_base_is_locked(&sem->rwbase);
 }
 
+static __always_inline int rwsem_is_write_locked(struct rw_semaphore *sem)
+{
+	return rw_base_is_write_locked(&sem->rwbase);
+}
+
 static __always_inline int rwsem_is_contended(struct rw_semaphore *sem)
 {
 	return rw_base_is_contended(&sem->rwbase);
Darrick J. Wong Sept. 6, 2023, 5:07 p.m. UTC | #7
On Wed, Sep 06, 2023 at 05:00:14PM +0100, Matthew Wilcox wrote:
> On Wed, Sep 06, 2023 at 08:29:48AM -0700, Darrick J. Wong wrote:
> > Or hoist the XFS mrlock, because it actually /does/ know if the rwsem is
> > held in shared or exclusive mode.
> 
> ... or to put it another way, if we had rwsem_is_write_locked(),
> we could get rid of mrlock?
> 
> diff --git a/fs/xfs/mrlock.h b/fs/xfs/mrlock.h
> index 79155eec341b..5530f03aaed1 100644
> --- a/fs/xfs/mrlock.h
> +++ b/fs/xfs/mrlock.h
> @@ -10,18 +10,10 @@
>  
>  typedef struct {
>  	struct rw_semaphore	mr_lock;
> -#if defined(DEBUG) || defined(XFS_WARN)
> -	int			mr_writer;
> -#endif
>  } mrlock_t;
>  
> -#if defined(DEBUG) || defined(XFS_WARN)
> -#define mrinit(mrp, name)	\
> -	do { (mrp)->mr_writer = 0; init_rwsem(&(mrp)->mr_lock); } while (0)
> -#else
>  #define mrinit(mrp, name)	\
>  	do { init_rwsem(&(mrp)->mr_lock); } while (0)
> -#endif
>  
>  #define mrlock_init(mrp, t,n,s)	mrinit(mrp, n)
>  #define mrfree(mrp)		do { } while (0)
> @@ -34,9 +26,6 @@ static inline void mraccess_nested(mrlock_t *mrp, int subclass)
>  static inline void mrupdate_nested(mrlock_t *mrp, int subclass)
>  {
>  	down_write_nested(&mrp->mr_lock, subclass);
> -#if defined(DEBUG) || defined(XFS_WARN)
> -	mrp->mr_writer = 1;
> -#endif
>  }
>  
>  static inline int mrtryaccess(mrlock_t *mrp)
> @@ -48,17 +37,11 @@ static inline int mrtryupdate(mrlock_t *mrp)
>  {
>  	if (!down_write_trylock(&mrp->mr_lock))
>  		return 0;
> -#if defined(DEBUG) || defined(XFS_WARN)
> -	mrp->mr_writer = 1;
> -#endif
>  	return 1;
>  }
>  
>  static inline void mrunlock_excl(mrlock_t *mrp)
>  {
> -#if defined(DEBUG) || defined(XFS_WARN)
> -	mrp->mr_writer = 0;
> -#endif
>  	up_write(&mrp->mr_lock);
>  }
>  
> @@ -69,9 +52,6 @@ static inline void mrunlock_shared(mrlock_t *mrp)
>  
>  static inline void mrdemote(mrlock_t *mrp)
>  {
> -#if defined(DEBUG) || defined(XFS_WARN)
> -	mrp->mr_writer = 0;
> -#endif
>  	downgrade_write(&mrp->mr_lock);
>  }
>  
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 9e62cc500140..b99c3bd78c5e 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -361,7 +361,7 @@ xfs_isilocked(
>  {
>  	if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) {
>  		if (!(lock_flags & XFS_ILOCK_SHARED))
> -			return !!ip->i_lock.mr_writer;
> +			return rwsem_is_write_locked(&ip->i_lock.mr_lock);

You'd be better off converting this to:

	return __xfs_rwsem_islocked(&ip->i_lock.mr_lock,
				(lock_flags & XFS_ILOCK_SHARED));

And then fixing __xfs_rwsem_islocked to do:

static inline bool
__xfs_rwsem_islocked(
	struct rw_semaphore	*rwsem,
	bool			shared)
{
	if (!debug_locks) {
		if (!shared)
			return rwsem_is_write_locked(rwsem);
		return rwsem_is_locked(rwsem);
	}

	...
}

(and then getting rid of mrlock_t.h entirely)

>  		return rwsem_is_locked(&ip->i_lock.mr_lock);
>  	}
>  
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index e05e167dbd16..277b8c96bbf9 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -69,7 +69,7 @@ static inline void mmap_assert_locked(struct mm_struct *mm)
>  static inline void mmap_assert_write_locked(struct mm_struct *mm)
>  {
>  	lockdep_assert_held_write(&mm->mmap_lock);
> -	VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
> +	VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm);
>  }
>  
>  #ifdef CONFIG_PER_VMA_LOCK
> diff --git a/include/linux/rwbase_rt.h b/include/linux/rwbase_rt.h
> index 1d264dd08625..3c25b14edc05 100644
> --- a/include/linux/rwbase_rt.h
> +++ b/include/linux/rwbase_rt.h
> @@ -31,6 +31,11 @@ static __always_inline bool rw_base_is_locked(struct rwbase_rt *rwb)
>  	return atomic_read(&rwb->readers) != READER_BIAS;
>  }
>  
> +static __always_inline bool rw_base_is_write_locked(struct rwbase_rt *rwb)
> +{
> +	return atomic_read(&rwb->readers) == WRITER_BIAS;
> +}
> +
>  static __always_inline bool rw_base_is_contended(struct rwbase_rt *rwb)
>  {
>  	return atomic_read(&rwb->readers) > 0;
> diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
> index 1dd530ce8b45..241a12c6019e 100644
> --- a/include/linux/rwsem.h
> +++ b/include/linux/rwsem.h
> @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
>  	return atomic_long_read(&sem->count) != 0;
>  }
>  
> +static inline int rwsem_is_write_locked(struct rw_semaphore *sem)
> +{
> +	return atomic_long_read(&sem->count) & 1;


atomic_long_read(&sem->count) & RWSEM_WRITER_LOCKED ?

In one of the past "replace mrlock_t" threads I complained about
hardcoding this value instead of using the symbol.  I saw it go by,
neglected to copy the url, and ten minutes later I can't find it. :(

--D

> +}
> +
>  #define RWSEM_UNLOCKED_VALUE		0L
>  #define __RWSEM_COUNT_INIT(name)	.count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE)
>  
> @@ -157,6 +162,11 @@ static __always_inline int rwsem_is_locked(struct rw_semaphore *sem)
>  	return rw_base_is_locked(&sem->rwbase);
>  }
>  
> +static __always_inline int rwsem_is_write_locked(struct rw_semaphore *sem)
> +{
> +	return rw_base_is_write_locked(&sem->rwbase);
> +}
> +
>  static __always_inline int rwsem_is_contended(struct rw_semaphore *sem)
>  {
>  	return rw_base_is_contended(&sem->rwbase);
Matthew Wilcox (Oracle) Sept. 6, 2023, 6:33 p.m. UTC | #8
On Wed, Sep 06, 2023 at 10:07:24AM -0700, Darrick J. Wong wrote:
> On Wed, Sep 06, 2023 at 05:00:14PM +0100, Matthew Wilcox wrote:
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -361,7 +361,7 @@ xfs_isilocked(
> >  {
> >  	if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) {
> >  		if (!(lock_flags & XFS_ILOCK_SHARED))
> > -			return !!ip->i_lock.mr_writer;
> > +			return rwsem_is_write_locked(&ip->i_lock.mr_lock);
> 
> You'd be better off converting this to:
> 
> 	return __xfs_rwsem_islocked(&ip->i_lock.mr_lock,
> 				(lock_flags & XFS_ILOCK_SHARED));
> 
> And then fixing __xfs_rwsem_islocked to do:
> 
> static inline bool
> __xfs_rwsem_islocked(
> 	struct rw_semaphore	*rwsem,
> 	bool			shared)
> {
> 	if (!debug_locks) {
> 		if (!shared)
> 			return rwsem_is_write_locked(rwsem);
> 		return rwsem_is_locked(rwsem);
> 	}
> 
> 	...
> }

Thanks.

> > +++ b/include/linux/rwsem.h
> > @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
> >  	return atomic_long_read(&sem->count) != 0;
> >  }
> >  
> > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem)
> > +{
> > +	return atomic_long_read(&sem->count) & 1;
> 
> 
> atomic_long_read(&sem->count) & RWSEM_WRITER_LOCKED ?

Then this would either have to be in rwsem.c or we'd have to move the
definition of RWSEM_WRITER_LOCKED to rwsem.h.  All three options are
kind of bad.  I think I hate the bare '1' least.
Darrick J. Wong Sept. 6, 2023, 6:43 p.m. UTC | #9
On Wed, Sep 06, 2023 at 07:33:48PM +0100, Matthew Wilcox wrote:
> On Wed, Sep 06, 2023 at 10:07:24AM -0700, Darrick J. Wong wrote:
> > On Wed, Sep 06, 2023 at 05:00:14PM +0100, Matthew Wilcox wrote:
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -361,7 +361,7 @@ xfs_isilocked(
> > >  {
> > >  	if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) {
> > >  		if (!(lock_flags & XFS_ILOCK_SHARED))
> > > -			return !!ip->i_lock.mr_writer;
> > > +			return rwsem_is_write_locked(&ip->i_lock.mr_lock);
> > 
> > You'd be better off converting this to:
> > 
> > 	return __xfs_rwsem_islocked(&ip->i_lock.mr_lock,
> > 				(lock_flags & XFS_ILOCK_SHARED));
> > 
> > And then fixing __xfs_rwsem_islocked to do:
> > 
> > static inline bool
> > __xfs_rwsem_islocked(
> > 	struct rw_semaphore	*rwsem,
> > 	bool			shared)
> > {
> > 	if (!debug_locks) {
> > 		if (!shared)
> > 			return rwsem_is_write_locked(rwsem);
> > 		return rwsem_is_locked(rwsem);
> > 	}
> > 
> > 	...
> > }
> 
> Thanks.
> 
> > > +++ b/include/linux/rwsem.h
> > > @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
> > >  	return atomic_long_read(&sem->count) != 0;
> > >  }
> > >  
> > > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem)
> > > +{
> > > +	return atomic_long_read(&sem->count) & 1;
> > 
> > 
> > atomic_long_read(&sem->count) & RWSEM_WRITER_LOCKED ?
> 
> Then this would either have to be in rwsem.c or we'd have to move the
> definition of RWSEM_WRITER_LOCKED to rwsem.h.  All three options are
> kind of bad.  I think I hate the bare '1' least.

I disagree, because using the bare 1 brings the most risk that someone
will subtly break the locking assertions some day when they get the
bright idea to move RWSEM_WRITER_LOCKED to the upper bit and fail to
notice this predicate and its magic number.  IMO moving it to the header
file (possibly with the usual __ prefix) would be preferable to leaving
a landmine.

--D
Matthew Wilcox (Oracle) Sept. 6, 2023, 7:01 p.m. UTC | #10
On Wed, Sep 06, 2023 at 11:43:36AM -0700, Darrick J. Wong wrote:
> On Wed, Sep 06, 2023 at 07:33:48PM +0100, Matthew Wilcox wrote:
> > On Wed, Sep 06, 2023 at 10:07:24AM -0700, Darrick J. Wong wrote:
> > > On Wed, Sep 06, 2023 at 05:00:14PM +0100, Matthew Wilcox wrote:
> > > > +++ b/fs/xfs/xfs_inode.c
> > > > @@ -361,7 +361,7 @@ xfs_isilocked(
> > > >  {
> > > >  	if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) {
> > > >  		if (!(lock_flags & XFS_ILOCK_SHARED))
> > > > -			return !!ip->i_lock.mr_writer;
> > > > +			return rwsem_is_write_locked(&ip->i_lock.mr_lock);
> > > 
> > > You'd be better off converting this to:
> > > 
> > > 	return __xfs_rwsem_islocked(&ip->i_lock.mr_lock,
> > > 				(lock_flags & XFS_ILOCK_SHARED));
> > > 
> > > And then fixing __xfs_rwsem_islocked to do:
> > > 
> > > static inline bool
> > > __xfs_rwsem_islocked(
> > > 	struct rw_semaphore	*rwsem,
> > > 	bool			shared)
> > > {
> > > 	if (!debug_locks) {
> > > 		if (!shared)
> > > 			return rwsem_is_write_locked(rwsem);
> > > 		return rwsem_is_locked(rwsem);
> > > 	}
> > > 
> > > 	...
> > > }
> > 
> > Thanks.
> > 
> > > > +++ b/include/linux/rwsem.h
> > > > @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
> > > >  	return atomic_long_read(&sem->count) != 0;
> > > >  }
> > > >  
> > > > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem)
> > > > +{
> > > > +	return atomic_long_read(&sem->count) & 1;
> > > 
> > > 
> > > atomic_long_read(&sem->count) & RWSEM_WRITER_LOCKED ?
> > 
> > Then this would either have to be in rwsem.c or we'd have to move the
> > definition of RWSEM_WRITER_LOCKED to rwsem.h.  All three options are
> > kind of bad.  I think I hate the bare '1' least.
> 
> I disagree, because using the bare 1 brings the most risk that someone
> will subtly break the locking assertions some day when they get the
> bright idea to move RWSEM_WRITER_LOCKED to the upper bit and fail to
> notice this predicate and its magic number.  IMO moving it to the header
> file (possibly with the usual __ prefix) would be preferable to leaving
> a landmine.

+       return atomic_long_read(&sem->count) & 1 /* RWSEM_WRITER_LOCKED */;

works for you?
Darrick J. Wong Sept. 6, 2023, 7:19 p.m. UTC | #11
On Wed, Sep 06, 2023 at 08:01:56PM +0100, Matthew Wilcox wrote:
> On Wed, Sep 06, 2023 at 11:43:36AM -0700, Darrick J. Wong wrote:
> > On Wed, Sep 06, 2023 at 07:33:48PM +0100, Matthew Wilcox wrote:
> > > On Wed, Sep 06, 2023 at 10:07:24AM -0700, Darrick J. Wong wrote:
> > > > On Wed, Sep 06, 2023 at 05:00:14PM +0100, Matthew Wilcox wrote:
> > > > > +++ b/fs/xfs/xfs_inode.c
> > > > > @@ -361,7 +361,7 @@ xfs_isilocked(
> > > > >  {
> > > > >  	if (lock_flags & (XFS_ILOCK_EXCL|XFS_ILOCK_SHARED)) {
> > > > >  		if (!(lock_flags & XFS_ILOCK_SHARED))
> > > > > -			return !!ip->i_lock.mr_writer;
> > > > > +			return rwsem_is_write_locked(&ip->i_lock.mr_lock);
> > > > 
> > > > You'd be better off converting this to:
> > > > 
> > > > 	return __xfs_rwsem_islocked(&ip->i_lock.mr_lock,
> > > > 				(lock_flags & XFS_ILOCK_SHARED));
> > > > 
> > > > And then fixing __xfs_rwsem_islocked to do:
> > > > 
> > > > static inline bool
> > > > __xfs_rwsem_islocked(
> > > > 	struct rw_semaphore	*rwsem,
> > > > 	bool			shared)
> > > > {
> > > > 	if (!debug_locks) {
> > > > 		if (!shared)
> > > > 			return rwsem_is_write_locked(rwsem);
> > > > 		return rwsem_is_locked(rwsem);
> > > > 	}
> > > > 
> > > > 	...
> > > > }
> > > 
> > > Thanks.
> > > 
> > > > > +++ b/include/linux/rwsem.h
> > > > > @@ -72,6 +72,11 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
> > > > >  	return atomic_long_read(&sem->count) != 0;
> > > > >  }
> > > > >  
> > > > > +static inline int rwsem_is_write_locked(struct rw_semaphore *sem)
> > > > > +{
> > > > > +	return atomic_long_read(&sem->count) & 1;
> > > > 
> > > > 
> > > > atomic_long_read(&sem->count) & RWSEM_WRITER_LOCKED ?
> > > 
> > > Then this would either have to be in rwsem.c or we'd have to move the
> > > definition of RWSEM_WRITER_LOCKED to rwsem.h.  All three options are
> > > kind of bad.  I think I hate the bare '1' least.
> > 
> > I disagree, because using the bare 1 brings the most risk that someone
> > will subtly break the locking assertions some day when they get the
> > bright idea to move RWSEM_WRITER_LOCKED to the upper bit and fail to
> > notice this predicate and its magic number.  IMO moving it to the header
> > file (possibly with the usual __ prefix) would be preferable to leaving
> > a landmine.
> 
> +       return atomic_long_read(&sem->count) & 1 /* RWSEM_WRITER_LOCKED */;
> 
> works for you?

Yeah I guess that works.

--D
Matthew Wilcox (Oracle) Sept. 6, 2023, 9:10 p.m. UTC | #12
On Wed, Sep 06, 2023 at 10:07:24AM -0700, Darrick J. Wong wrote:
> You'd be better off converting this to:
> 
> 	return __xfs_rwsem_islocked(&ip->i_lock.mr_lock,
> 				(lock_flags & XFS_ILOCK_SHARED));
> 
> And then fixing __xfs_rwsem_islocked to do:
> 
> static inline bool
> __xfs_rwsem_islocked(
> 	struct rw_semaphore	*rwsem,
> 	bool			shared)
> {
> 	if (!debug_locks) {
> 		if (!shared)
> 			return rwsem_is_write_locked(rwsem);
> 		return rwsem_is_locked(rwsem);
> 	}
> 
> 	...
> }

so ... I did that.  And then many isilocked() calls started failing.
generic/017 was the first one:

00030 XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_trans_inode.c, line: 93
00030 ------------[ cut here ]------------
00030 WARNING: CPU: 5 PID: 77 at fs/xfs/xfs_message.c:104 assfail+0x2c/0x40
00030 Modules linked in:
00030 CPU: 5 PID: 77 Comm: kworker/5:1 Kdump: loaded Not tainted 6.5.0-00006-g88a61c17df8f-dirty #14
00030 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
00030 Workqueue: xfsalloc xfs_btree_split_worker
00030 RIP: 0010:assfail+0x2c/0x40
00030 Code: 89 d0 41 89 c9 48 c7 c2 80 70 0f 82 48 89 f1 48 89 e5 48 89 fe 48 c7
 c7 08 4f 07 82 e8 fd fd ff ff 80 3d 26 cc ed 00 00 75 04 <0f> 0b 5d c3 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48
00030 RSP: 0018:ffff888003f0bc28 EFLAGS: 00010246
00030 RAX: 00000000ffffffea RBX: ffff88800d9a0000 RCX: 000000007fffffff
00030 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff82074f08
00030 RBP: ffff888003f0bc28 R08: 0000000000000000 R09: 000000000000000a
00030 R10: 000000000000000a R11: 0fffffffffffffff R12: ffff888009ff6000
00030 R13: ffff88800ac8a000 R14: 0000000000000001 R15: ffff88800af0a000
00030 FS:  0000000000000000(0000) GS:ffff88807d940000(0000) knlGS:0000000000000000
00030 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
00030 CR2: 00007ff177b05d58 CR3: 0000000007aa2002 CR4: 0000000000770ea0
00030 PKRU: 55555554
00030 Call Trace:
00030  <TASK>
00030  ? show_regs+0x5c/0x70
00030  ? __warn+0x84/0x180
00030  ? assfail+0x2c/0x40
00030  ? report_bug+0x18e/0x1c0
00030  ? handle_bug+0x3e/0x70
00030  ? exc_invalid_op+0x18/0x70
00030  ? asm_exc_invalid_op+0x1b/0x20
00030  ? assfail+0x2c/0x40
00030  ? assfail+0x23/0x40
00030  xfs_trans_log_inode+0xf9/0x120
00030  xfs_bmbt_alloc_block+0xf0/0x1c0
00030  __xfs_btree_split+0xf8/0x6c0
00030  ? __this_cpu_preempt_check+0x13/0x20
00030  ? lock_acquire+0xc8/0x280
00030  xfs_btree_split_worker+0x8a/0x110
00030  process_one_work+0x23f/0x510
00030  worker_thread+0x4d/0x3b0
00030  ? process_one_work+0x510/0x510
00030  kthread+0x106/0x140
00030  ? kthread_complete_and_exit+0x20/0x20
00030  ret_from_fork+0x31/0x50
00030  ? kthread_complete_and_exit+0x20/0x20
00030  ret_from_fork_asm+0x11/0x20
00030  </TASK>
00030 irq event stamp: 2901
00030 hardirqs last  enabled at (2909): [<ffffffff810e4d83>] __up_console_sem+0x53/0x60
00030 hardirqs last disabled at (2916): [<ffffffff810e4d68>] __up_console_sem+0x38/0x60
00030 softirqs last  enabled at (0): [<ffffffff81067bd0>] copy_process+0x830/0x1c10
00030 softirqs last disabled at (0): [<0000000000000000>] 0x0
00030 ---[ end trace 0000000000000000 ]---

but here's the thing, I have lockdep enabled.  So it's not testing my
new rwsem_is_write_locked() code, it's testing the current lockdep
stuff.

So I have a feeling that we're not telling lockdep that we've acquired
the i_lock.  Or something?  Seems unlikely that this is a real bug;
surely we'd've noticed before now.

Code here:
https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/mrlock

ie git clone git://git.infradead.org/users/willy/pagecache.git mrlock

You don't need the top commit.  754fb6a68dae is enough to provoke it,
as long as you have CONFIG_LOCKDEP enabled.
Darrick J. Wong Sept. 6, 2023, 9:26 p.m. UTC | #13
On Wed, Sep 06, 2023 at 10:10:25PM +0100, Matthew Wilcox wrote:
> On Wed, Sep 06, 2023 at 10:07:24AM -0700, Darrick J. Wong wrote:
> > You'd be better off converting this to:
> > 
> > 	return __xfs_rwsem_islocked(&ip->i_lock.mr_lock,
> > 				(lock_flags & XFS_ILOCK_SHARED));
> > 
> > And then fixing __xfs_rwsem_islocked to do:
> > 
> > static inline bool
> > __xfs_rwsem_islocked(
> > 	struct rw_semaphore	*rwsem,
> > 	bool			shared)
> > {
> > 	if (!debug_locks) {
> > 		if (!shared)
> > 			return rwsem_is_write_locked(rwsem);
> > 		return rwsem_is_locked(rwsem);
> > 	}
> > 
> > 	...
> > }
> 
> so ... I did that.  And then many isilocked() calls started failing.
> generic/017 was the first one:
> 
> 00030 XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_trans_inode.c, line: 93
> 00030 ------------[ cut here ]------------
> 00030 WARNING: CPU: 5 PID: 77 at fs/xfs/xfs_message.c:104 assfail+0x2c/0x40
> 00030 Modules linked in:
> 00030 CPU: 5 PID: 77 Comm: kworker/5:1 Kdump: loaded Not tainted 6.5.0-00006-g88a61c17df8f-dirty #14
> 00030 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> 00030 Workqueue: xfsalloc xfs_btree_split_worker
> 00030 RIP: 0010:assfail+0x2c/0x40
> 00030 Code: 89 d0 41 89 c9 48 c7 c2 80 70 0f 82 48 89 f1 48 89 e5 48 89 fe 48 c7
>  c7 08 4f 07 82 e8 fd fd ff ff 80 3d 26 cc ed 00 00 75 04 <0f> 0b 5d c3 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48
> 00030 RSP: 0018:ffff888003f0bc28 EFLAGS: 00010246
> 00030 RAX: 00000000ffffffea RBX: ffff88800d9a0000 RCX: 000000007fffffff
> 00030 RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffffffff82074f08
> 00030 RBP: ffff888003f0bc28 R08: 0000000000000000 R09: 000000000000000a
> 00030 R10: 000000000000000a R11: 0fffffffffffffff R12: ffff888009ff6000
> 00030 R13: ffff88800ac8a000 R14: 0000000000000001 R15: ffff88800af0a000
> 00030 FS:  0000000000000000(0000) GS:ffff88807d940000(0000) knlGS:0000000000000000
> 00030 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 00030 CR2: 00007ff177b05d58 CR3: 0000000007aa2002 CR4: 0000000000770ea0
> 00030 PKRU: 55555554
> 00030 Call Trace:
> 00030  <TASK>
> 00030  ? show_regs+0x5c/0x70
> 00030  ? __warn+0x84/0x180
> 00030  ? assfail+0x2c/0x40
> 00030  ? report_bug+0x18e/0x1c0
> 00030  ? handle_bug+0x3e/0x70
> 00030  ? exc_invalid_op+0x18/0x70
> 00030  ? asm_exc_invalid_op+0x1b/0x20
> 00030  ? assfail+0x2c/0x40
> 00030  ? assfail+0x23/0x40
> 00030  xfs_trans_log_inode+0xf9/0x120
> 00030  xfs_bmbt_alloc_block+0xf0/0x1c0
> 00030  __xfs_btree_split+0xf8/0x6c0
> 00030  ? __this_cpu_preempt_check+0x13/0x20
> 00030  ? lock_acquire+0xc8/0x280
> 00030  xfs_btree_split_worker+0x8a/0x110
> 00030  process_one_work+0x23f/0x510
> 00030  worker_thread+0x4d/0x3b0
> 00030  ? process_one_work+0x510/0x510
> 00030  kthread+0x106/0x140
> 00030  ? kthread_complete_and_exit+0x20/0x20
> 00030  ret_from_fork+0x31/0x50
> 00030  ? kthread_complete_and_exit+0x20/0x20
> 00030  ret_from_fork_asm+0x11/0x20
> 00030  </TASK>
> 00030 irq event stamp: 2901
> 00030 hardirqs last  enabled at (2909): [<ffffffff810e4d83>] __up_console_sem+0x53/0x60
> 00030 hardirqs last disabled at (2916): [<ffffffff810e4d68>] __up_console_sem+0x38/0x60
> 00030 softirqs last  enabled at (0): [<ffffffff81067bd0>] copy_process+0x830/0x1c10
> 00030 softirqs last disabled at (0): [<0000000000000000>] 0x0
> 00030 ---[ end trace 0000000000000000 ]---
> 
> but here's the thing, I have lockdep enabled.  So it's not testing my
> new rwsem_is_write_locked() code, it's testing the current lockdep
> stuff.
> 
> So I have a feeling that we're not telling lockdep that we've acquired
> the i_lock.  Or something?  Seems unlikely that this is a real bug;
> surely we'd've noticed before now.
> 
> Code here:
> https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/mrlock
> 
> ie git clone git://git.infradead.org/users/willy/pagecache.git mrlock
> 
> You don't need the top commit.  754fb6a68dae is enough to provoke it,
> as long as you have CONFIG_LOCKDEP enabled.

Yeah.  The lockdep assertions in __xfs_rwsem_islocked are broken because
the fsstress thread takes the ILOCK_EXCL, decides to defer a bmap btree
split to a workqueue to avoid overflowing the kernel stack, and doesn't
tell lockdep that the workqueue is (effectively) the rwsem owner until
it completes.

The last time we tried to get rid of mrlock_t, dchinner suggested using
lockdep_assert_held_non_owner to fix the assertions, but (a) that
doesn't seem to exist in TOT 6.5 and (b) the inode rwsems protect the
inode data structure, not threads.  Hence you could just get rid of the
lockdep assertions and use the rwsem_* versions unconditionally.

--D
Mateusz Guzik Sept. 14, 2023, 1:16 p.m. UTC | #14
On 9/6/23, Matthew Wilcox <willy@infradead.org> wrote:
> On Wed, Sep 06, 2023 at 05:23:42PM +0200, Bernd Schubert wrote:
>>
>>
>> On 9/6/23 17:20, Matthew Wilcox wrote:
>> > On Thu, Aug 31, 2023 at 05:14:14PM +0200, Mateusz Guzik wrote:
>> > > +++ b/include/linux/fs.h
>> > > @@ -842,6 +842,16 @@ static inline void
>> > > inode_lock_shared_nested(struct inode *inode, unsigned subcla
>> > >   	down_read_nested(&inode->i_rwsem, subclass);
>> > >   }
>> > > +static inline void inode_assert_locked(struct inode *inode)
>> > > +{
>> > > +	lockdep_assert_held(&inode->i_rwsem);
>> > > +}
>> > > +
>> > > +static inline void inode_assert_write_locked(struct inode *inode)
>> > > +{
>> > > +	lockdep_assert_held_write(&inode->i_rwsem);
>> > > +}
>> >
>> > This mirrors what we have in mm, but it's only going to trigger on
>> > builds that have lockdep enabled.  Lockdep is very expensive; it
>> > easily doubles the time it takes to run xfstests on my laptop, so
>> > I don't generally enable it.  So what we also have in MM is:
>> >
>> > static inline void mmap_assert_write_locked(struct mm_struct *mm)
>> > {
>> >          lockdep_assert_held_write(&mm->mmap_lock);
>> >          VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
>> > }
>> >
>> > Now if you have lockdep enabled, you get the lockdep check which
>> > gives you all the lovely lockdep information, but if you don't, you
>> > at least get the cheap check that someone is holding the lock at all.
>> >
>> > ie I would make this:
>> >
>> > +static inline void inode_assert_write_locked(struct inode *inode)
>> > +{
>> > +	lockdep_assert_held_write(&inode->i_rwsem);
>> > +	WARN_ON_ONCE(!inode_is_locked(inode));
>> > +}
>> >
>> > Maybe the locking people could give us a rwsem_is_write_locked()
>> > predicate, but until then, this is the best solution we came up with.
>>
>>
>> Which is exactly what I had suggested in the other thread :)
>
> Yes, but apparently comments in that thread don't count :eyeroll:
>

Pretty weird reaction mate, they very much *do* count which is why I'm
confused why you resent an e-mail with the bogus is_locked check
(which I explicitly pointed out).

Since you posted a separate patch to add write-locking check to rwsem
I'm going to wait for that bit to get sorted out (unless it stalls for
a long time).

fwiw I'm confused what's up with people making kernel changes without
running lockdep. If it adds too much overhead for use in normal
development someone(tm) should have fixed that aspect long time ago.
diff mbox series

Patch

diff --git a/fs/attr.c b/fs/attr.c
index a8ae5f6d9b16..90dec999a952 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -387,7 +387,7 @@  int notify_change(struct mnt_idmap *idmap, struct dentry *dentry,
 	struct timespec64 now;
 	unsigned int ia_valid = attr->ia_valid;
 
-	WARN_ON_ONCE(!inode_is_locked(inode));
+	inode_assert_write_locked(inode);
 
 	error = may_setattr(idmap, inode, ia_valid);
 	if (error)
diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
index 96828a13dd43..46b268a433dd 100644
--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -120,7 +120,7 @@  int btrfs_setxattr(struct btrfs_trans_handle *trans, struct inode *inode,
 	 * locks the inode's i_mutex before calling setxattr or removexattr.
 	 */
 	if (flags & XATTR_REPLACE) {
-		ASSERT(inode_is_locked(inode));
+		inode_assert_write_locked(inode);
 		di = btrfs_lookup_xattr(NULL, root, path,
 				btrfs_ino(BTRFS_I(inode)), name, name_len, 0);
 		if (!di)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 481491e892df..df428f22f624 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3364,8 +3364,8 @@  do {								\
 /* Update i_disksize. Requires i_rwsem to avoid races with truncate */
 static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize)
 {
-	WARN_ON_ONCE(S_ISREG(inode->i_mode) &&
-		     !inode_is_locked(inode));
+	if (S_ISREG(inode->i_mode))
+		inode_assert_write_locked(inode);
 	down_write(&EXT4_I(inode)->i_data_sem);
 	if (newsize > EXT4_I(inode)->i_disksize)
 		WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 202c76996b62..149783ecfe16 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -5588,8 +5588,8 @@  ext4_swap_extents(handle_t *handle, struct inode *inode1,
 
 	BUG_ON(!rwsem_is_locked(&EXT4_I(inode1)->i_data_sem));
 	BUG_ON(!rwsem_is_locked(&EXT4_I(inode2)->i_data_sem));
-	BUG_ON(!inode_is_locked(inode1));
-	BUG_ON(!inode_is_locked(inode2));
+	inode_assert_write_locked(inode1);
+	inode_assert_write_locked(inode2);
 
 	ext4_es_remove_extent(inode1, lblk1, count);
 	ext4_es_remove_extent(inode2, lblk2, count);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 89737d5a1614..2ecdef6ddc88 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3797,7 +3797,7 @@  int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
 
 	loff_t size = i_size_read(inode);
 
-	WARN_ON(!inode_is_locked(inode));
+	inode_assert_write_locked(inode);
 	if (offset > size || offset + len < size)
 		return 0;
 
@@ -4068,7 +4068,7 @@  int ext4_truncate(struct inode *inode)
 	 * have i_rwsem locked because it's not necessary.
 	 */
 	if (!(inode->i_state & (I_NEW|I_FREEING)))
-		WARN_ON(!inode_is_locked(inode));
+		inode_assert_write_locked(inode);
 	trace_ext4_truncate_enter(inode);
 
 	if (!ext4_can_truncate(inode))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c8ff4156a0a1..93d48b6b9f67 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -842,6 +842,16 @@  static inline void inode_lock_shared_nested(struct inode *inode, unsigned subcla
 	down_read_nested(&inode->i_rwsem, subclass);
 }
 
+static inline void inode_assert_locked(struct inode *inode)
+{
+	lockdep_assert_held(&inode->i_rwsem);
+}
+
+static inline void inode_assert_write_locked(struct inode *inode)
+{
+	lockdep_assert_held_write(&inode->i_rwsem);
+}
+
 static inline void filemap_invalidate_lock(struct address_space *mapping)
 {
 	down_write(&mapping->invalidate_lock);