diff mbox series

fs: don't needlessly acquire f_lock

Message ID 20250207-daten-mahlzeit-99d2079864fb@brauner (mailing list archive)
State New
Headers show
Series fs: don't needlessly acquire f_lock | expand

Commit Message

Christian Brauner Feb. 7, 2025, 2:10 p.m. UTC
Before 2011 there was no meaningful synchronization between
read/readdir/write/seek. Only in commit
ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
synchronization was added for SEEK_CUR by taking f_lock around
vfs_setpos().

Then in 2014 full synchronization between read/readdir/write/seek was
added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
for directories. At that point taking f_lock became unnecessary for such
files.

So only acquire f_lock for SEEK_CUR if this isn't a file that would have
acquired f_pos_lock if necessary.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/read_write.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

Comments

Matthew Wilcox Feb. 7, 2025, 2:18 p.m. UTC | #1
On Fri, Feb 07, 2025 at 03:10:33PM +0100, Christian Brauner wrote:
> Before 2011 there was no meaningful synchronization between
> read/readdir/write/seek. Only in commit
> ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> synchronization was added for SEEK_CUR by taking f_lock around
> vfs_setpos().
> 
> Then in 2014 full synchronization between read/readdir/write/seek was
> added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> for directories. At that point taking f_lock became unnecessary for such
> files.
> 
> So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> acquired f_pos_lock if necessary.

What workloads benefit from this optimisation?
Jan Kara Feb. 7, 2025, 3:50 p.m. UTC | #2
On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> Before 2011 there was no meaningful synchronization between
> read/readdir/write/seek. Only in commit
> ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> synchronization was added for SEEK_CUR by taking f_lock around
> vfs_setpos().
> 
> Then in 2014 full synchronization between read/readdir/write/seek was
> added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> for directories. At that point taking f_lock became unnecessary for such
> files.
> 
> So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> acquired f_pos_lock if necessary.
> 
> Signed-off-by: Christian Brauner <brauner@kernel.org>

...

>  	if (whence == SEEK_CUR) {
> +		bool locked;
> +
>  		/*
> -		 * f_lock protects against read/modify/write race with
> -		 * other SEEK_CURs. Note that parallel writes and reads
> -		 * behave like SEEK_SET.
> +		 * If the file requires locking via f_pos_lock we know
> +		 * that mutual exclusion for SEEK_CUR on the same file
> +		 * is guaranteed. If the file isn't locked, we take
> +		 * f_lock to protect against f_pos races with other
> +		 * SEEK_CURs.
>  		 */
> -		guard(spinlock)(&file->f_lock);
> -		return vfs_setpos(file, file->f_pos + offset, maxsize);
> +		locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> +			 file->f_op->iterate_shared;

As far as I understand the rationale this should match to
file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
to me that's the case. After thinking about possibilities, I could convince
myself that what you suggest is indeed safe but the condition being in two
completely independent places and leading to subtle bugs if it gets out of
sync seems a bit fragile to me.

								Honza

> +		if (!locked)
> +			spin_lock(&file->f_lock);
> +		offset = vfs_setpos(file, file->f_pos + offset, maxsize);
> +		if (!locked)
> +			spin_unlock(&file->f_lock);
> +		return offset;
>  	}
>  
>  	return vfs_setpos(file, offset, maxsize);
> -- 
> 2.47.2
>
Mateusz Guzik Feb. 7, 2025, 4:42 p.m. UTC | #3
On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> > Before 2011 there was no meaningful synchronization between
> > read/readdir/write/seek. Only in commit
> > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> > synchronization was added for SEEK_CUR by taking f_lock around
> > vfs_setpos().
> >
> > Then in 2014 full synchronization between read/readdir/write/seek was
> > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> > for directories. At that point taking f_lock became unnecessary for such
> > files.
> >
> > So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> > acquired f_pos_lock if necessary.
> >
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
>
> ...
>
> >       if (whence == SEEK_CUR) {
> > +             bool locked;
> > +
> >               /*
> > -              * f_lock protects against read/modify/write race with
> > -              * other SEEK_CURs. Note that parallel writes and reads
> > -              * behave like SEEK_SET.
> > +              * If the file requires locking via f_pos_lock we know
> > +              * that mutual exclusion for SEEK_CUR on the same file
> > +              * is guaranteed. If the file isn't locked, we take
> > +              * f_lock to protect against f_pos races with other
> > +              * SEEK_CURs.
> >                */
> > -             guard(spinlock)(&file->f_lock);
> > -             return vfs_setpos(file, file->f_pos + offset, maxsize);
> > +             locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> > +                      file->f_op->iterate_shared;
>
> As far as I understand the rationale this should match to
> file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
> to me that's the case. After thinking about possibilities, I could convince
> myself that what you suggest is indeed safe but the condition being in two
> completely independent places and leading to subtle bugs if it gets out of
> sync seems a bit fragile to me.
>

A debug-only assert that the lock is held when expected should sort it out?

> > +             if (!locked)
> > +                     spin_lock(&file->f_lock);
> > +             offset = vfs_setpos(file, file->f_pos + offset, maxsize);
> > +             if (!locked)
> > +                     spin_unlock(&file->f_lock);
> > +             return offset;
> >       }
> >
> >       return vfs_setpos(file, offset, maxsize);

btw I ran this sucker over a kernel build:

bpftrace -e 'kprobe:generic_file_llseek_size { @[((struct file
*)arg0)->f_mode & (1 << 15), ((struct file
*)arg0)->f_op->iterate_shared, arg2] = count(); }
'
Attaching 1 probe...
^C

@[32768, 0x0, 2]: 9
@[32768, 0x0, 1]: 171797
@[32768, 0x0, 0]: 660866

SEEK_CUR is 1, so this *does* show up.
Christian Brauner Feb. 10, 2025, 12:01 p.m. UTC | #4
On Fri, Feb 07, 2025 at 05:42:17PM +0100, Mateusz Guzik wrote:
> On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> > > Before 2011 there was no meaningful synchronization between
> > > read/readdir/write/seek. Only in commit
> > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> > > synchronization was added for SEEK_CUR by taking f_lock around
> > > vfs_setpos().
> > >
> > > Then in 2014 full synchronization between read/readdir/write/seek was
> > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> > > for directories. At that point taking f_lock became unnecessary for such
> > > files.
> > >
> > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> > > acquired f_pos_lock if necessary.
> > >
> > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> >
> > ...
> >
> > >       if (whence == SEEK_CUR) {
> > > +             bool locked;
> > > +
> > >               /*
> > > -              * f_lock protects against read/modify/write race with
> > > -              * other SEEK_CURs. Note that parallel writes and reads
> > > -              * behave like SEEK_SET.
> > > +              * If the file requires locking via f_pos_lock we know
> > > +              * that mutual exclusion for SEEK_CUR on the same file
> > > +              * is guaranteed. If the file isn't locked, we take
> > > +              * f_lock to protect against f_pos races with other
> > > +              * SEEK_CURs.
> > >                */
> > > -             guard(spinlock)(&file->f_lock);
> > > -             return vfs_setpos(file, file->f_pos + offset, maxsize);
> > > +             locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> > > +                      file->f_op->iterate_shared;
> >
> > As far as I understand the rationale this should match to
> > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
> > to me that's the case. After thinking about possibilities, I could convince
> > myself that what you suggest is indeed safe but the condition being in two
> > completely independent places and leading to subtle bugs if it gets out of
> > sync seems a bit fragile to me.
> >
> 
> A debug-only assert that the lock is held when expected should sort it out?

Good idea. Let me reuse the newly added infra:
VFS_WARN_ON_ONCE(locked && !mutex_is_locked(&file->f_pos_lock));

> 
> > > +             if (!locked)
> > > +                     spin_lock(&file->f_lock);
> > > +             offset = vfs_setpos(file, file->f_pos + offset, maxsize);
> > > +             if (!locked)
> > > +                     spin_unlock(&file->f_lock);
> > > +             return offset;
> > >       }
> > >
> > >       return vfs_setpos(file, offset, maxsize);
> 
> btw I ran this sucker over a kernel build:
> 
> bpftrace -e 'kprobe:generic_file_llseek_size { @[((struct file
> *)arg0)->f_mode & (1 << 15), ((struct file
> *)arg0)->f_op->iterate_shared, arg2] = count(); }
> '
> Attaching 1 probe...
> ^C
> 
> @[32768, 0x0, 2]: 9
> @[32768, 0x0, 1]: 171797
> @[32768, 0x0, 0]: 660866
> 
> SEEK_CUR is 1, so this *does* show up.

Thanks for the perf.
Jan Kara Feb. 10, 2025, 2:58 p.m. UTC | #5
On Mon 10-02-25 13:01:38, Christian Brauner wrote:
> On Fri, Feb 07, 2025 at 05:42:17PM +0100, Mateusz Guzik wrote:
> > On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> > > > Before 2011 there was no meaningful synchronization between
> > > > read/readdir/write/seek. Only in commit
> > > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> > > > synchronization was added for SEEK_CUR by taking f_lock around
> > > > vfs_setpos().
> > > >
> > > > Then in 2014 full synchronization between read/readdir/write/seek was
> > > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> > > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> > > > for directories. At that point taking f_lock became unnecessary for such
> > > > files.
> > > >
> > > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> > > > acquired f_pos_lock if necessary.
> > > >
> > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > >
> > > ...
> > >
> > > >       if (whence == SEEK_CUR) {
> > > > +             bool locked;
> > > > +
> > > >               /*
> > > > -              * f_lock protects against read/modify/write race with
> > > > -              * other SEEK_CURs. Note that parallel writes and reads
> > > > -              * behave like SEEK_SET.
> > > > +              * If the file requires locking via f_pos_lock we know
> > > > +              * that mutual exclusion for SEEK_CUR on the same file
> > > > +              * is guaranteed. If the file isn't locked, we take
> > > > +              * f_lock to protect against f_pos races with other
> > > > +              * SEEK_CURs.
> > > >                */
> > > > -             guard(spinlock)(&file->f_lock);
> > > > -             return vfs_setpos(file, file->f_pos + offset, maxsize);
> > > > +             locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> > > > +                      file->f_op->iterate_shared;
> > >
> > > As far as I understand the rationale this should match to
> > > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
> > > to me that's the case. After thinking about possibilities, I could convince
> > > myself that what you suggest is indeed safe but the condition being in two
> > > completely independent places and leading to subtle bugs if it gets out of
> > > sync seems a bit fragile to me.
> > >
> > 
> > A debug-only assert that the lock is held when expected should sort it out?
> 
> Good idea. Let me reuse the newly added infra:
> VFS_WARN_ON_ONCE(locked && !mutex_is_locked(&file->f_pos_lock));

Fine, but won't this actually trigger? file_needs_f_pos_lock() is:

        return (file->f_mode & FMODE_ATOMIC_POS) &&
                (file_count(file) > 1 || file->f_op->iterate_shared);

and here you have:

        locked = (file->f_mode & FMODE_ATOMIC_POS) ||
                  file->f_op->iterate_shared;

So if (file->f_mode & FMODE_ATOMIC_POS) and (file_count(file) == 1),
file_needs_f_pos_lock() returns false but locked is true?

								Honza
Christian Brauner Feb. 13, 2025, 3:10 p.m. UTC | #6
On Mon, Feb 10, 2025 at 03:58:17PM +0100, Jan Kara wrote:
> On Mon 10-02-25 13:01:38, Christian Brauner wrote:
> > On Fri, Feb 07, 2025 at 05:42:17PM +0100, Mateusz Guzik wrote:
> > > On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> > > > > Before 2011 there was no meaningful synchronization between
> > > > > read/readdir/write/seek. Only in commit
> > > > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> > > > > synchronization was added for SEEK_CUR by taking f_lock around
> > > > > vfs_setpos().
> > > > >
> > > > > Then in 2014 full synchronization between read/readdir/write/seek was
> > > > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> > > > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> > > > > for directories. At that point taking f_lock became unnecessary for such
> > > > > files.
> > > > >
> > > > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> > > > > acquired f_pos_lock if necessary.
> > > > >
> > > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > >
> > > > ...
> > > >
> > > > >       if (whence == SEEK_CUR) {
> > > > > +             bool locked;
> > > > > +
> > > > >               /*
> > > > > -              * f_lock protects against read/modify/write race with
> > > > > -              * other SEEK_CURs. Note that parallel writes and reads
> > > > > -              * behave like SEEK_SET.
> > > > > +              * If the file requires locking via f_pos_lock we know
> > > > > +              * that mutual exclusion for SEEK_CUR on the same file
> > > > > +              * is guaranteed. If the file isn't locked, we take
> > > > > +              * f_lock to protect against f_pos races with other
> > > > > +              * SEEK_CURs.
> > > > >                */
> > > > > -             guard(spinlock)(&file->f_lock);
> > > > > -             return vfs_setpos(file, file->f_pos + offset, maxsize);
> > > > > +             locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> > > > > +                      file->f_op->iterate_shared;
> > > >
> > > > As far as I understand the rationale this should match to
> > > > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
> > > > to me that's the case. After thinking about possibilities, I could convince
> > > > myself that what you suggest is indeed safe but the condition being in two
> > > > completely independent places and leading to subtle bugs if it gets out of
> > > > sync seems a bit fragile to me.
> > > >
> > > 
> > > A debug-only assert that the lock is held when expected should sort it out?
> > 
> > Good idea. Let me reuse the newly added infra:
> > VFS_WARN_ON_ONCE(locked && !mutex_is_locked(&file->f_pos_lock));
> 
> Fine, but won't this actually trigger? file_needs_f_pos_lock() is:
> 
>         return (file->f_mode & FMODE_ATOMIC_POS) &&
>                 (file_count(file) > 1 || file->f_op->iterate_shared);
> 
> and here you have:
> 
>         locked = (file->f_mode & FMODE_ATOMIC_POS) ||
>                   file->f_op->iterate_shared;
> 
> So if (file->f_mode & FMODE_ATOMIC_POS) and (file_count(file) == 1),
> file_needs_f_pos_lock() returns false but locked is true?

Sorry, I got lost in other mails.
Yes, you're right. I had changed the patch in my tree to fix that.
I'll append it here. Sorry about that. Tell me if that looks sane ok to
you now.
Jan Kara Feb. 13, 2025, 4:17 p.m. UTC | #7
On Thu 13-02-25 16:10:55, Christian Brauner wrote:
> On Mon, Feb 10, 2025 at 03:58:17PM +0100, Jan Kara wrote:
> > On Mon 10-02-25 13:01:38, Christian Brauner wrote:
> > > On Fri, Feb 07, 2025 at 05:42:17PM +0100, Mateusz Guzik wrote:
> > > > On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote:
> > > > >
> > > > > On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> > > > > > Before 2011 there was no meaningful synchronization between
> > > > > > read/readdir/write/seek. Only in commit
> > > > > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> > > > > > synchronization was added for SEEK_CUR by taking f_lock around
> > > > > > vfs_setpos().
> > > > > >
> > > > > > Then in 2014 full synchronization between read/readdir/write/seek was
> > > > > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> > > > > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> > > > > > for directories. At that point taking f_lock became unnecessary for such
> > > > > > files.
> > > > > >
> > > > > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> > > > > > acquired f_pos_lock if necessary.
> > > > > >
> > > > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > > >
> > > > > ...
> > > > >
> > > > > >       if (whence == SEEK_CUR) {
> > > > > > +             bool locked;
> > > > > > +
> > > > > >               /*
> > > > > > -              * f_lock protects against read/modify/write race with
> > > > > > -              * other SEEK_CURs. Note that parallel writes and reads
> > > > > > -              * behave like SEEK_SET.
> > > > > > +              * If the file requires locking via f_pos_lock we know
> > > > > > +              * that mutual exclusion for SEEK_CUR on the same file
> > > > > > +              * is guaranteed. If the file isn't locked, we take
> > > > > > +              * f_lock to protect against f_pos races with other
> > > > > > +              * SEEK_CURs.
> > > > > >                */
> > > > > > -             guard(spinlock)(&file->f_lock);
> > > > > > -             return vfs_setpos(file, file->f_pos + offset, maxsize);
> > > > > > +             locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> > > > > > +                      file->f_op->iterate_shared;
> > > > >
> > > > > As far as I understand the rationale this should match to
> > > > > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
> > > > > to me that's the case. After thinking about possibilities, I could convince
> > > > > myself that what you suggest is indeed safe but the condition being in two
> > > > > completely independent places and leading to subtle bugs if it gets out of
> > > > > sync seems a bit fragile to me.
> > > > >
> > > > 
> > > > A debug-only assert that the lock is held when expected should sort it out?
> > > 
> > > Good idea. Let me reuse the newly added infra:
> > > VFS_WARN_ON_ONCE(locked && !mutex_is_locked(&file->f_pos_lock));
> > 
> > Fine, but won't this actually trigger? file_needs_f_pos_lock() is:
> > 
> >         return (file->f_mode & FMODE_ATOMIC_POS) &&
> >                 (file_count(file) > 1 || file->f_op->iterate_shared);
> > 
> > and here you have:
> > 
> >         locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> >                   file->f_op->iterate_shared;
> > 
> > So if (file->f_mode & FMODE_ATOMIC_POS) and (file_count(file) == 1),
> > file_needs_f_pos_lock() returns false but locked is true?
> 
> Sorry, I got lost in other mails.
> Yes, you're right. I had changed the patch in my tree to fix that.
> I'll append it here. Sorry about that. Tell me if that looks sane ok to
> you now.

Looks good to me! Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
diff mbox series

Patch

diff --git a/fs/read_write.c b/fs/read_write.c
index a6133241dfb8..816189f9c56d 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -168,13 +168,23 @@  generic_file_llseek_size(struct file *file, loff_t offset, int whence,
 		return offset;
 
 	if (whence == SEEK_CUR) {
+		bool locked;
+
 		/*
-		 * f_lock protects against read/modify/write race with
-		 * other SEEK_CURs. Note that parallel writes and reads
-		 * behave like SEEK_SET.
+		 * If the file requires locking via f_pos_lock we know
+		 * that mutual exclusion for SEEK_CUR on the same file
+		 * is guaranteed. If the file isn't locked, we take
+		 * f_lock to protect against f_pos races with other
+		 * SEEK_CURs.
 		 */
-		guard(spinlock)(&file->f_lock);
-		return vfs_setpos(file, file->f_pos + offset, maxsize);
+		locked = (file->f_mode & FMODE_ATOMIC_POS) ||
+			 file->f_op->iterate_shared;
+		if (!locked)
+			spin_lock(&file->f_lock);
+		offset = vfs_setpos(file, file->f_pos + offset, maxsize);
+		if (!locked)
+			spin_unlock(&file->f_lock);
+		return offset;
 	}
 
 	return vfs_setpos(file, offset, maxsize);