Message ID | 20230830181519.2964941-1-bschubert@ddn.com (mailing list archive) |
---|---|
Headers | show |
Series | Use exclusive lock for file_remove_privs | expand |
On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote: > While adding shared direct IO write locks to fuse Miklos noticed > that file_remove_privs() needs an exclusive lock. I then > noticed that btrfs actually has the same issue as I had in my patch, > it was calling into that function with a shared lock. > This series adds a new exported function file_needs_remove_privs(), > which used by the follow up btrfs patch and will be used by the > DIO code path in fuse as well. If that function returns any mask > the shared lock needs to be dropped and replaced by the exclusive > variant. FYI, xfs and ext4 use a simple IS_NOSEC check for this. That has a lot more false positives, but also is a much faster check in the fast path. It might be worth benchmarking which way to go, but we should be consistent across file systems for the behavior.
On 8/31/23 09:40, Christoph Hellwig wrote: > On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote: >> While adding shared direct IO write locks to fuse Miklos noticed >> that file_remove_privs() needs an exclusive lock. I then >> noticed that btrfs actually has the same issue as I had in my patch, >> it was calling into that function with a shared lock. >> This series adds a new exported function file_needs_remove_privs(), >> which used by the follow up btrfs patch and will be used by the >> DIO code path in fuse as well. If that function returns any mask >> the shared lock needs to be dropped and replaced by the exclusive >> variant. > > FYI, xfs and ext4 use a simple IS_NOSEC check for this. That has > a lot more false positives, but also is a much faster check in the > fast path. It might be worth benchmarking which way to go, but > we should be consistent across file systems for the behavior. > Thanks, interesting! It is basically the same as my file_needs_remove_privs, as long as IS_NOSEC is set. I can remove the new function and export and to change to that check. Not sure if it is that much faster, but should keep the kernel a bit smaller. Thanks, Bernd
On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote: > While adding shared direct IO write locks to fuse Miklos noticed > that file_remove_privs() needs an exclusive lock. I then > noticed that btrfs actually has the same issue as I had in my patch, > it was calling into that function with a shared lock. > This series adds a new exported function file_needs_remove_privs(), > which used by the follow up btrfs patch and will be used by the > DIO code path in fuse as well. If that function returns any mask > the shared lock needs to be dropped and replaced by the exclusive > variant. > No comments on the patchset itself. So I figured an assert should be there on the write lock held, then the issue would have been automagically reported. Turns out notify_change has the following: WARN_ON_ONCE(!inode_is_locked(inode)); Which expands to: static inline int rwsem_is_locked(struct rw_semaphore *sem) { return atomic_long_read(&sem->count) != 0; } So it does check the lock, except it passes *any* locked state, including just readers. According to git blame this regressed from commit 5955102c9984 ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked were replaced with inode_is_locked, which unintentionally provides weaker guarantees. I don't see a rwsem helper for wlock check and I don't think it is all that beneficial to add. Instead, how about a bunch of lockdep, like so: diff --git a/fs/attr.c b/fs/attr.c index a8ae5f6d9b16..f47e718766d1 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry, struct timespec64 now; unsigned int ia_valid = attr->ia_valid; - WARN_ON_ONCE(!inode_is_locked(inode)); + lockdep_assert_held_write(&inode->i_rwsem); error = may_setattr(idmap, inode, ia_valid); if (error) Alternatively hide it behind inode_assert_is_wlocked() or whatever other name. I can do the churn if this sounds like a plan.
On 8/31/23 12:18, Mateusz Guzik wrote: > On Wed, Aug 30, 2023 at 08:15:17PM +0200, Bernd Schubert wrote: >> While adding shared direct IO write locks to fuse Miklos noticed >> that file_remove_privs() needs an exclusive lock. I then >> noticed that btrfs actually has the same issue as I had in my patch, >> it was calling into that function with a shared lock. >> This series adds a new exported function file_needs_remove_privs(), >> which used by the follow up btrfs patch and will be used by the >> DIO code path in fuse as well. If that function returns any mask >> the shared lock needs to be dropped and replaced by the exclusive >> variant. >> > > No comments on the patchset itself. > > So I figured an assert should be there on the write lock held, then the > issue would have been automagically reported. > > Turns out notify_change has the following: > WARN_ON_ONCE(!inode_is_locked(inode)); > > Which expands to: > static inline int rwsem_is_locked(struct rw_semaphore *sem) > { > return atomic_long_read(&sem->count) != 0; > } > > So it does check the lock, except it passes *any* locked state, > including just readers. > > According to git blame this regressed from commit 5955102c9984 > ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked > were replaced with inode_is_locked, which unintentionally provides > weaker guarantees. > > I don't see a rwsem helper for wlock check and I don't think it is all > that beneficial to add. Instead, how about a bunch of lockdep, like so: > diff --git a/fs/attr.c b/fs/attr.c > index a8ae5f6d9b16..f47e718766d1 100644 > --- a/fs/attr.c > +++ b/fs/attr.c > @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry, > struct timespec64 now; > unsigned int ia_valid = attr->ia_valid; > > - WARN_ON_ONCE(!inode_is_locked(inode)); > + lockdep_assert_held_write(&inode->i_rwsem); > > error = may_setattr(idmap, inode, ia_valid); > if (error) > > Alternatively hide it behind inode_assert_is_wlocked() or whatever other > name. > > I can do the churn if this sounds like a plan. > I guess that might help to discover these issues. Maybe keep the existing WARN_ON_ONCE, as it would annotate a missing lock, when lockdep is turned off? Another code path is file_modified() and I just wonder if there are more issues. btrfs_punch_hole() takes inode->i_mmap_lock. Although I guess that should be found by the existing WARN_ON_ONCE? Actually same in btrfs_fallocate(). Or does btrfs always have IS_NOSEC? Thanks, Bernd
On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote: > Turns out notify_change has the following: > WARN_ON_ONCE(!inode_is_locked(inode)); > > Which expands to: > static inline int rwsem_is_locked(struct rw_semaphore *sem) > { > return atomic_long_read(&sem->count) != 0; > } > > So it does check the lock, except it passes *any* locked state, > including just readers. > > According to git blame this regressed from commit 5955102c9984 > ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked > were replaced with inode_is_locked, which unintentionally provides > weaker guarantees. > > I don't see a rwsem helper for wlock check and I don't think it is all > that beneficial to add. Instead, how about a bunch of lockdep, like so: Yes, that's a good idea.
On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote: > So I figured an assert should be there on the write lock held, then the > issue would have been automagically reported. > > Turns out notify_change has the following: > WARN_ON_ONCE(!inode_is_locked(inode)); > > Which expands to: > static inline int rwsem_is_locked(struct rw_semaphore *sem) > { > return atomic_long_read(&sem->count) != 0; > } > > So it does check the lock, except it passes *any* locked state, > including just readers. > > According to git blame this regressed from commit 5955102c9984 > ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked > were replaced with inode_is_locked, which unintentionally provides > weaker guarantees. > > I don't see a rwsem helper for wlock check and I don't think it is all > that beneficial to add. Instead, how about a bunch of lockdep, like so: > diff --git a/fs/attr.c b/fs/attr.c > index a8ae5f6d9b16..f47e718766d1 100644 > --- a/fs/attr.c > +++ b/fs/attr.c > @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry, > struct timespec64 now; > unsigned int ia_valid = attr->ia_valid; > > - WARN_ON_ONCE(!inode_is_locked(inode)); > + lockdep_assert_held_write(&inode->i_rwsem); > > error = may_setattr(idmap, inode, ia_valid); > if (error) > > Alternatively hide it behind inode_assert_is_wlocked() or whatever other > name. Better to do it like mmap_lock: static inline void mmap_assert_write_locked(struct mm_struct *mm) { lockdep_assert_held_write(&mm->mmap_lock); VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); }
On 9/1/23, Matthew Wilcox <willy@infradead.org> wrote: > On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote: >> So I figured an assert should be there on the write lock held, then the >> issue would have been automagically reported. >> >> Turns out notify_change has the following: >> WARN_ON_ONCE(!inode_is_locked(inode)); >> >> Which expands to: >> static inline int rwsem_is_locked(struct rw_semaphore *sem) >> { >> return atomic_long_read(&sem->count) != 0; >> } >> >> So it does check the lock, except it passes *any* locked state, >> including just readers. >> >> According to git blame this regressed from commit 5955102c9984 >> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked >> were replaced with inode_is_locked, which unintentionally provides >> weaker guarantees. >> >> I don't see a rwsem helper for wlock check and I don't think it is all >> that beneficial to add. Instead, how about a bunch of lockdep, like so: >> diff --git a/fs/attr.c b/fs/attr.c >> index a8ae5f6d9b16..f47e718766d1 100644 >> --- a/fs/attr.c >> +++ b/fs/attr.c >> @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct >> dentry *dentry, >> struct timespec64 now; >> unsigned int ia_valid = attr->ia_valid; >> >> - WARN_ON_ONCE(!inode_is_locked(inode)); >> + lockdep_assert_held_write(&inode->i_rwsem); >> >> error = may_setattr(idmap, inode, ia_valid); >> if (error) >> >> Alternatively hide it behind inode_assert_is_wlocked() or whatever other >> name. > > Better to do it like mmap_lock: > > static inline void mmap_assert_write_locked(struct mm_struct *mm) > { > lockdep_assert_held_write(&mm->mmap_lock); > VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); > } > May I suggest continuing this with responses to the patch I sent? ;) [RFC PATCH] vfs: add inode lockdep assertions on -fsdevel I made it line up with asserts for i_mmap_rwsem. btw your non-lockdep check suffers the very problem I'm trying to fix here -- checking for *any* locked state.
On Fri, Sep 01, 2023 at 01:47:10PM +0200, Mateusz Guzik wrote: > On 9/1/23, Matthew Wilcox <willy@infradead.org> wrote: > > On Thu, Aug 31, 2023 at 12:18:24PM +0200, Mateusz Guzik wrote: > >> So I figured an assert should be there on the write lock held, then the > >> issue would have been automagically reported. > >> > >> Turns out notify_change has the following: > >> WARN_ON_ONCE(!inode_is_locked(inode)); > >> > >> Which expands to: > >> static inline int rwsem_is_locked(struct rw_semaphore *sem) > >> { > >> return atomic_long_read(&sem->count) != 0; > >> } > >> > >> So it does check the lock, except it passes *any* locked state, > >> including just readers. > >> > >> According to git blame this regressed from commit 5955102c9984 > >> ("wrappers for ->i_mutex access") by Al -- a bunch of mutex_is_locked > >> were replaced with inode_is_locked, which unintentionally provides > >> weaker guarantees. > >> > >> I don't see a rwsem helper for wlock check and I don't think it is all > >> that beneficial to add. Instead, how about a bunch of lockdep, like so: > >> diff --git a/fs/attr.c b/fs/attr.c > >> index a8ae5f6d9b16..f47e718766d1 100644 > >> --- a/fs/attr.c > >> +++ b/fs/attr.c > >> @@ -387,7 +387,7 @@ int notify_change(struct mnt_idmap *idmap, struct > >> dentry *dentry, > >> struct timespec64 now; > >> unsigned int ia_valid = attr->ia_valid; > >> > >> - WARN_ON_ONCE(!inode_is_locked(inode)); > >> + lockdep_assert_held_write(&inode->i_rwsem); > >> > >> error = may_setattr(idmap, inode, ia_valid); > >> if (error) > >> > >> Alternatively hide it behind inode_assert_is_wlocked() or whatever other > >> name. > > > > Better to do it like mmap_lock: > > > > static inline void mmap_assert_write_locked(struct mm_struct *mm) > > { > > lockdep_assert_held_write(&mm->mmap_lock); > > VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); > > } > > > > May I suggest continuing this with responses to the patch I sent? ;) That's annoying. Don't split this kind of conversation up if you don't have to. > [RFC PATCH] vfs: add inode lockdep assertions on -fsdevel > > I made it line up with asserts for i_mmap_rwsem. > > btw your non-lockdep check suffers the very problem I'm trying to fix > here -- checking for *any* locked state. I'll respond to this over there then.