mbox series

[0/2] Use exclusive lock for file_remove_privs

Message ID 20230830181519.2964941-1-bschubert@ddn.com (mailing list archive)
Headers show
Series Use exclusive lock for file_remove_privs | expand

Message

Bernd Schubert Aug. 30, 2023, 6:15 p.m. UTC
While adding shared direct IO write locks to fuse Miklos noticed
that file_remove_privs() needs an exclusive lock. I then
noticed that btrfs actually has the same issue as I had in my patch,
it was calling into that function with a shared lock.
This series adds a new exported function file_needs_remove_privs(),
which used by the follow up btrfs patch and will be used by the
DIO code path in fuse as well. If that function returns any mask
the shared lock needs to be dropped and replaced by the exclusive
variant.

Note: Compilation tested only.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Dharmendra Singh <dsingh@ddn.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org


Bernd Schubert (2):
  fs: Add and export file_needs_remove_privs
  btrfs: file_remove_privs needs an exclusive lock

 fs/btrfs/file.c    | 37 +++++++++++++++++++++++++++++--------
 fs/inode.c         |  8 ++++++++
 include/linux/fs.h |  1 +
 3 files changed, 38 insertions(+), 8 deletions(-)

Comments

Christoph Hellwig Aug. 31, 2023, 7:40 a.m. UTC | #1
On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote:
> While adding shared direct IO write locks to fuse Miklos noticed
> that file_remove_privs() needs an exclusive lock. I then
> noticed that btrfs actually has the same issue as I had in my patch,
> it was calling into that function with a shared lock.
> This series adds a new exported function file_needs_remove_privs(),
> which used by the follow up btrfs patch and will be used by the
> DIO code path in fuse as well. If that function returns any mask
> the shared lock needs to be dropped and replaced by the exclusive
> variant.

FYI, xfs and ext4 use a simple IS_NOSEC check for this.  That has
a lot more false positives, but also is a much faster check in the
fast path.   It might be worth benchmarking which way to go, but
we should be consistent across file systems for the behavior.
Bernd Schubert Aug. 31, 2023, 10:17 a.m. UTC | #2
On 8/31/23 09:40, Christoph Hellwig wrote:
> On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote:
>> While adding shared direct IO write locks to fuse Miklos noticed
>> that file_remove_privs() needs an exclusive lock. I then
>> noticed that btrfs actually has the same issue as I had in my patch,
>> it was calling into that function with a shared lock.
>> This series adds a new exported function file_needs_remove_privs(),
>> which used by the follow up btrfs patch and will be used by the
>> DIO code path in fuse as well. If that function returns any mask
>> the shared lock needs to be dropped and replaced by the exclusive
>> variant.
> 
> FYI, xfs and ext4 use a simple IS_NOSEC check for this.  That has
> a lot more false positives, but also is a much faster check in the
> fast path.   It might be worth benchmarking which way to go, but
> we should be consistent across file systems for the behavior.
> 

Thanks, interesting! It is basically the same as my 
file_needs_remove_privs, as long as IS_NOSEC is set. I can remove the 
new function and export and to change to that check.
Not sure if it is that much faster, but should keep the kernel a bit 
smaller.


Thanks,
Bernd
Mateusz Guzik Aug. 31, 2023, 10:18 a.m. UTC | #3
On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote:
> While adding shared direct IO write locks to fuse Miklos noticed
> that file_remove_privs() needs an exclusive lock. I then
> noticed that btrfs actually has the same issue as I had in my patch,
> it was calling into that function with a shared lock.
> This series adds a new exported function file_needs_remove_privs(),
> which used by the follow up btrfs patch and will be used by the
> DIO code path in fuse as well. If that function returns any mask
> the shared lock needs to be dropped and replaced by the exclusive
> variant.
> 

No comments on the patchset itself.

So I figured an assert should be there on the write lock held, then the
issue would have been automagically reported.

Turns out notify_change has the following:
        WARN_ON_ONCE(!inode_is_locked(inode));

Which expands to:
static inline int rwsem_is_locked(struct rw_semaphore *sem)
{
        return atomic_long_read(&sem->count) != 0;
}

So it does check the lock, except it passes *any* locked state,
including just readers.

According to git blame this regressed from commit 5955102c9984
("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked
were replaced with inode_is_locked, which unintentionally provides
weaker guarantees.

I don't see a rwsem helper for wlock check and I don't think it is all
that beneficial to add. Instead, how about a bunch of lockdep, like so:
diff --git a/fs/attr.c b/fs/attr.c
index a8ae5f6d9b16..f47e718766d1 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry,
        struct timespec64 now;
        unsigned int ia_valid = attr->ia_valid;

-       WARN_ON_ONCE(!inode_is_locked(inode));
+       lockdep_assert_held_write(&inode->i_rwsem);

        error = may_setattr(idmap, inode, ia_valid);
        if (error)

Alternatively hide it behind inode_assert_is_wlocked() or whatever other
name.

I can do the churn if this sounds like a plan.
Bernd Schubert Aug. 31, 2023, 2:41 p.m. UTC | #4
On 8/31/23 12:18, Mateusz Guzik wrote:
> On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote:
>> While adding shared direct IO write locks to fuse Miklos noticed
>> that file_remove_privs() needs an exclusive lock. I then
>> noticed that btrfs actually has the same issue as I had in my patch,
>> it was calling into that function with a shared lock.
>> This series adds a new exported function file_needs_remove_privs(),
>> which used by the follow up btrfs patch and will be used by the
>> DIO code path in fuse as well. If that function returns any mask
>> the shared lock needs to be dropped and replaced by the exclusive
>> variant.
>>
> 
> No comments on the patchset itself.
> 
> So I figured an assert should be there on the write lock held, then the
> issue would have been automagically reported.
> 
> Turns out notify_change has the following:
>          WARN_ON_ONCE(!inode_is_locked(inode));
> 
> Which expands to:
> static inline int rwsem_is_locked(struct rw_semaphore *sem)
> {
>          return atomic_long_read(&sem->count) != 0;
> }
> 
> So it does check the lock, except it passes *any* locked state,
> including just readers.
> 
> According to git blame this regressed from commit 5955102c9984
> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked
> were replaced with inode_is_locked, which unintentionally provides
> weaker guarantees.
> 
> I don't see a rwsem helper for wlock check and I don't think it is all
> that beneficial to add. Instead, how about a bunch of lockdep, like so:
> diff --git a/fs/attr.c b/fs/attr.c
> index a8ae5f6d9b16..f47e718766d1 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry,
>          struct timespec64 now;
>          unsigned int ia_valid = attr->ia_valid;
> 
> -       WARN_ON_ONCE(!inode_is_locked(inode));
> +       lockdep_assert_held_write(&inode->i_rwsem);
> 
>          error = may_setattr(idmap, inode, ia_valid);
>          if (error)
> 
> Alternatively hide it behind inode_assert_is_wlocked() or whatever other
> name.
> 
> I can do the churn if this sounds like a plan.
> 

I guess that might help to discover these issues. Maybe keep the 
existing WARN_ON_ONCE, as it would annotate a missing lock, when lockdep 
is turned off?

Another code path is file_modified() and I just wonder if there are more 
issues.

btrfs_punch_hole() takes inode->i_mmap_lock. Although I guess that 
should be found by the existing WARN_ON_ONCE? Actually same in 
btrfs_fallocate(). Or does btrfs always have IS_NOSEC?


Thanks,
Bernd
Christoph Hellwig Sept. 1, 2023, 6:49 a.m. UTC | #5
On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote:
> Turns out notify_change has the following:
>         WARN_ON_ONCE(!inode_is_locked(inode));
> 
> Which expands to:
> static inline int rwsem_is_locked(struct rw_semaphore *sem)
> {
>         return atomic_long_read(&sem->count) != 0;
> }
> 
> So it does check the lock, except it passes *any* locked state,
> including just readers.
> 
> According to git blame this regressed from commit 5955102c9984
> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked
> were replaced with inode_is_locked, which unintentionally provides
> weaker guarantees.
> 
> I don't see a rwsem helper for wlock check and I don't think it is all
> that beneficial to add. Instead, how about a bunch of lockdep, like so:

Yes, that's a good idea.
Matthew Wilcox (Oracle) Sept. 1, 2023, 11:38 a.m. UTC | #6
On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote:
> So I figured an assert should be there on the write lock held, then the
> issue would have been automagically reported.
> 
> Turns out notify_change has the following:
>         WARN_ON_ONCE(!inode_is_locked(inode));
> 
> Which expands to:
> static inline int rwsem_is_locked(struct rw_semaphore *sem)
> {
>         return atomic_long_read(&sem->count) != 0;
> }
> 
> So it does check the lock, except it passes *any* locked state,
> including just readers.
> 
> According to git blame this regressed from commit 5955102c9984
> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked
> were replaced with inode_is_locked, which unintentionally provides
> weaker guarantees.
> 
> I don't see a rwsem helper for wlock check and I don't think it is all
> that beneficial to add. Instead, how about a bunch of lockdep, like so:
> diff --git a/fs/attr.c b/fs/attr.c
> index a8ae5f6d9b16..f47e718766d1 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry,
>         struct timespec64 now;
>         unsigned int ia_valid = attr->ia_valid;
> 
> -       WARN_ON_ONCE(!inode_is_locked(inode));
> +       lockdep_assert_held_write(&inode->i_rwsem);
> 
>         error = may_setattr(idmap, inode, ia_valid);
>         if (error)
> 
> Alternatively hide it behind inode_assert_is_wlocked() or whatever other
> name.

Better to do it like mmap_lock:

static inline void mmap_assert_write_locked(struct mm_struct *mm)
{
        lockdep_assert_held_write(&mm->mmap_lock);
        VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
}
Mateusz Guzik Sept. 1, 2023, 11:47 a.m. UTC | #7
On 9/1/23, Matthew Wilcox <willy@infradead.org> wrote:
> On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote:
>> So I figured an assert should be there on the write lock held, then the
>> issue would have been automagically reported.
>>
>> Turns out notify_change has the following:
>>         WARN_ON_ONCE(!inode_is_locked(inode));
>>
>> Which expands to:
>> static inline int rwsem_is_locked(struct rw_semaphore *sem)
>> {
>>         return atomic_long_read(&sem->count) != 0;
>> }
>>
>> So it does check the lock, except it passes *any* locked state,
>> including just readers.
>>
>> According to git blame this regressed from commit 5955102c9984
>> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked
>> were replaced with inode_is_locked, which unintentionally provides
>> weaker guarantees.
>>
>> I don't see a rwsem helper for wlock check and I don't think it is all
>> that beneficial to add. Instead, how about a bunch of lockdep, like so:
>> diff --git a/fs/attr.c b/fs/attr.c
>> index a8ae5f6d9b16..f47e718766d1 100644
>> --- a/fs/attr.c
>> +++ b/fs/attr.c
>> @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct
>> dentry *dentry,
>>         struct timespec64 now;
>>         unsigned int ia_valid = attr->ia_valid;
>>
>> -       WARN_ON_ONCE(!inode_is_locked(inode));
>> +       lockdep_assert_held_write(&inode->i_rwsem);
>>
>>         error = may_setattr(idmap, inode, ia_valid);
>>         if (error)
>>
>> Alternatively hide it behind inode_assert_is_wlocked() or whatever other
>> name.
>
> Better to do it like mmap_lock:
>
> static inline void mmap_assert_write_locked(struct mm_struct *mm)
> {
>         lockdep_assert_held_write(&mm->mmap_lock);
>         VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
> }
>

May I suggest continuing this with responses to the patch I sent? ;)
[RFC PATCH] vfs: add inode lockdep assertions on -fsdevel

I made it line up with asserts for i_mmap_rwsem.

btw your non-lockdep check suffers the very problem I'm trying to fix
here -- checking for *any* locked state.
Matthew Wilcox (Oracle) Sept. 6, 2023, 3:13 p.m. UTC | #8
On Fri, Sep 01, 2023 at 01:47:10PM +0200, Mateusz Guzik wrote:
> On 9/1/23, Matthew Wilcox <willy@infradead.org> wrote:
> > On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote:
> >> So I figured an assert should be there on the write lock held, then the
> >> issue would have been automagically reported.
> >>
> >> Turns out notify_change has the following:
> >>         WARN_ON_ONCE(!inode_is_locked(inode));
> >>
> >> Which expands to:
> >> static inline int rwsem_is_locked(struct rw_semaphore *sem)
> >> {
> >>         return atomic_long_read(&sem->count) != 0;
> >> }
> >>
> >> So it does check the lock, except it passes *any* locked state,
> >> including just readers.
> >>
> >> According to git blame this regressed from commit 5955102c9984
> >> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked
> >> were replaced with inode_is_locked, which unintentionally provides
> >> weaker guarantees.
> >>
> >> I don't see a rwsem helper for wlock check and I don't think it is all
> >> that beneficial to add. Instead, how about a bunch of lockdep, like so:
> >> diff --git a/fs/attr.c b/fs/attr.c
> >> index a8ae5f6d9b16..f47e718766d1 100644
> >> --- a/fs/attr.c
> >> +++ b/fs/attr.c
> >> @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct
> >> dentry *dentry,
> >>         struct timespec64 now;
> >>         unsigned int ia_valid = attr->ia_valid;
> >>
> >> -       WARN_ON_ONCE(!inode_is_locked(inode));
> >> +       lockdep_assert_held_write(&inode->i_rwsem);
> >>
> >>         error = may_setattr(idmap, inode, ia_valid);
> >>         if (error)
> >>
> >> Alternatively hide it behind inode_assert_is_wlocked() or whatever other
> >> name.
> >
> > Better to do it like mmap_lock:
> >
> > static inline void mmap_assert_write_locked(struct mm_struct *mm)
> > {
> >         lockdep_assert_held_write(&mm->mmap_lock);
> >         VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
> > }
> >
> 
> May I suggest continuing this with responses to the patch I sent? ;)

That's annoying.  Don't split this kind of conversation up if you don't
have to.

> [RFC PATCH] vfs: add inode lockdep assertions on -fsdevel
> 
> I made it line up with asserts for i_mmap_rwsem.
> 
> btw your non-lockdep check suffers the very problem I'm trying to fix
> here -- checking for *any* locked state.

I'll respond to this over there then.