fs: don't needlessly acquire f_lock

Message ID	20250207-daten-mahlzeit-99d2079864fb@brauner (mailing list archive)
State	New
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE48922068B for <linux-fsdevel@vger.kernel.org>; Fri, 7 Feb 2025 14:10:51 +0000 (UTC) From: Christian Brauner <brauner@kernel.org> To: linux-fsdevel@vger.kernel.org Cc: Christian Brauner <brauner@kernel.org>, Jeff Layton <jlayton@kernel.org>, Jan Kara <jack@suse.cz>, Alexander Viro <viro@zeniv.linux.org.uk>, Amir Goldstein <amir73il@gmail.com>, Mateusz Guzik <mjguzik@gmail.com> Subject: [PATCH] fs: don't needlessly acquire f_lock Date: Fri, 7 Feb 2025 15:10:33 +0100 Message-ID: <20250207-daten-mahlzeit-99d2079864fb@brauner> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	fs: don't needlessly acquire f_lock \| expand fs: don't needlessly acquire f_lock

Christian Brauner Feb. 7, 2025, 2:10 p.m. UTC

Before 2011 there was no meaningful synchronization between
read/readdir/write/seek. Only in commit
ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
synchronization was added for SEEK_CUR by taking f_lock around
vfs_setpos().

Then in 2014 full synchronization between read/readdir/write/seek was
added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
for directories. At that point taking f_lock became unnecessary for such
files.

So only acquire f_lock for SEEK_CUR if this isn't a file that would have
acquired f_pos_lock if necessary.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/read_write.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

Matthew Wilcox Feb. 7, 2025, 2:18 p.m. UTC | #1

On Fri, Feb 07, 2025 at 03:10:33PM +0100, Christian Brauner wrote:
> Before 2011 there was no meaningful synchronization between
> read/readdir/write/seek. Only in commit
> ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> synchronization was added for SEEK_CUR by taking f_lock around
> vfs_setpos().
> 
> Then in 2014 full synchronization between read/readdir/write/seek was
> added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> for directories. At that point taking f_lock became unnecessary for such
> files.
> 
> So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> acquired f_pos_lock if necessary.

What workloads benefit from this optimisation?

Jan Kara Feb. 7, 2025, 3:50 p.m. UTC | #2

On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> Before 2011 there was no meaningful synchronization between
> read/readdir/write/seek. Only in commit
> ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> synchronization was added for SEEK_CUR by taking f_lock around
> vfs_setpos().
> 
> Then in 2014 full synchronization between read/readdir/write/seek was
> added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> for directories. At that point taking f_lock became unnecessary for such
> files.
> 
> So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> acquired f_pos_lock if necessary.
> 
> Signed-off-by: Christian Brauner <brauner@kernel.org>

...

>  	if (whence == SEEK_CUR) {
> +		bool locked;
> +
>  		/*
> -		 * f_lock protects against read/modify/write race with
> -		 * other SEEK_CURs. Note that parallel writes and reads
> -		 * behave like SEEK_SET.
> +		 * If the file requires locking via f_pos_lock we know
> +		 * that mutual exclusion for SEEK_CUR on the same file
> +		 * is guaranteed. If the file isn't locked, we take
> +		 * f_lock to protect against f_pos races with other
> +		 * SEEK_CURs.
>  		 */
> -		guard(spinlock)(&file->f_lock);
> -		return vfs_setpos(file, file->f_pos + offset, maxsize);
> +		locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> +			 file->f_op->iterate_shared;

As far as I understand the rationale this should match to
file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
to me that's the case. After thinking about possibilities, I could convince
myself that what you suggest is indeed safe but the condition being in two
completely independent places and leading to subtle bugs if it gets out of
sync seems a bit fragile to me.

								Honza

> +		if (!locked)
> +			spin_lock(&file->f_lock);
> +		offset = vfs_setpos(file, file->f_pos + offset, maxsize);
> +		if (!locked)
> +			spin_unlock(&file->f_lock);
> +		return offset;
>  	}
>  
>  	return vfs_setpos(file, offset, maxsize);
> -- 
> 2.47.2
>

Mateusz Guzik Feb. 7, 2025, 4:42 p.m. UTC | #3

On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> > Before 2011 there was no meaningful synchronization between
> > read/readdir/write/seek. Only in commit
> > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> > synchronization was added for SEEK_CUR by taking f_lock around
> > vfs_setpos().
> >
> > Then in 2014 full synchronization between read/readdir/write/seek was
> > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> > for directories. At that point taking f_lock became unnecessary for such
> > files.
> >
> > So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> > acquired f_pos_lock if necessary.
> >
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
>
> ...
>
> >       if (whence == SEEK_CUR) {
> > +             bool locked;
> > +
> >               /*
> > -              * f_lock protects against read/modify/write race with
> > -              * other SEEK_CURs. Note that parallel writes and reads
> > -              * behave like SEEK_SET.
> > +              * If the file requires locking via f_pos_lock we know
> > +              * that mutual exclusion for SEEK_CUR on the same file
> > +              * is guaranteed. If the file isn't locked, we take
> > +              * f_lock to protect against f_pos races with other
> > +              * SEEK_CURs.
> >                */
> > -             guard(spinlock)(&file->f_lock);
> > -             return vfs_setpos(file, file->f_pos + offset, maxsize);
> > +             locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> > +                      file->f_op->iterate_shared;
>
> As far as I understand the rationale this should match to
> file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
> to me that's the case. After thinking about possibilities, I could convince
> myself that what you suggest is indeed safe but the condition being in two
> completely independent places and leading to subtle bugs if it gets out of
> sync seems a bit fragile to me.
>

A debug-only assert that the lock is held when expected should sort it out?

> > +             if (!locked)
> > +                     spin_lock(&file->f_lock);
> > +             offset = vfs_setpos(file, file->f_pos + offset, maxsize);
> > +             if (!locked)
> > +                     spin_unlock(&file->f_lock);
> > +             return offset;
> >       }
> >
> >       return vfs_setpos(file, offset, maxsize);

btw I ran this sucker over a kernel build:

bpftrace -e 'kprobe:generic_file_llseek_size { @[((struct file
*)arg0)->f_mode & (1 << 15), ((struct file
*)arg0)->f_op->iterate_shared, arg2] = count(); }
'
Attaching 1 probe...
^C

@[32768, 0x0, 2]: 9
@[32768, 0x0, 1]: 171797
@[32768, 0x0, 0]: 660866

SEEK_CUR is 1, so this *does* show up.

Christian Brauner Feb. 10, 2025, 12:01 p.m. UTC | #4

On Fri, Feb 07, 2025 at 05:42:17PM +0100, Mateusz Guzik wrote:
> On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> > > Before 2011 there was no meaningful synchronization between
> > > read/readdir/write/seek. Only in commit
> > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> > > synchronization was added for SEEK_CUR by taking f_lock around
> > > vfs_setpos().
> > >
> > > Then in 2014 full synchronization between read/readdir/write/seek was
> > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> > > for directories. At that point taking f_lock became unnecessary for such
> > > files.
> > >
> > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> > > acquired f_pos_lock if necessary.
> > >
> > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> >
> > ...
> >
> > >       if (whence == SEEK_CUR) {
> > > +             bool locked;
> > > +
> > >               /*
> > > -              * f_lock protects against read/modify/write race with
> > > -              * other SEEK_CURs. Note that parallel writes and reads
> > > -              * behave like SEEK_SET.
> > > +              * If the file requires locking via f_pos_lock we know
> > > +              * that mutual exclusion for SEEK_CUR on the same file
> > > +              * is guaranteed. If the file isn't locked, we take
> > > +              * f_lock to protect against f_pos races with other
> > > +              * SEEK_CURs.
> > >                */
> > > -             guard(spinlock)(&file->f_lock);
> > > -             return vfs_setpos(file, file->f_pos + offset, maxsize);
> > > +             locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> > > +                      file->f_op->iterate_shared;
> >
> > As far as I understand the rationale this should match to
> > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
> > to me that's the case. After thinking about possibilities, I could convince
> > myself that what you suggest is indeed safe but the condition being in two
> > completely independent places and leading to subtle bugs if it gets out of
> > sync seems a bit fragile to me.
> >
> 
> A debug-only assert that the lock is held when expected should sort it out?

Good idea. Let me reuse the newly added infra:
VFS_WARN_ON_ONCE(locked && !mutex_is_locked(&file->f_pos_lock));

> 
> > > +             if (!locked)
> > > +                     spin_lock(&file->f_lock);
> > > +             offset = vfs_setpos(file, file->f_pos + offset, maxsize);
> > > +             if (!locked)
> > > +                     spin_unlock(&file->f_lock);
> > > +             return offset;
> > >       }
> > >
> > >       return vfs_setpos(file, offset, maxsize);
> 
> btw I ran this sucker over a kernel build:
> 
> bpftrace -e 'kprobe:generic_file_llseek_size { @[((struct file
> *)arg0)->f_mode & (1 << 15), ((struct file
> *)arg0)->f_op->iterate_shared, arg2] = count(); }
> '
> Attaching 1 probe...
> ^C
> 
> @[32768, 0x0, 2]: 9
> @[32768, 0x0, 1]: 171797
> @[32768, 0x0, 0]: 660866
> 
> SEEK_CUR is 1, so this *does* show up.

Thanks for the perf.

Jan Kara Feb. 10, 2025, 2:58 p.m. UTC | #5

On Mon 10-02-25 13:01:38, Christian Brauner wrote:
> On Fri, Feb 07, 2025 at 05:42:17PM +0100, Mateusz Guzik wrote:
> > On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> > > > Before 2011 there was no meaningful synchronization between
> > > > read/readdir/write/seek. Only in commit
> > > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> > > > synchronization was added for SEEK_CUR by taking f_lock around
> > > > vfs_setpos().
> > > >
> > > > Then in 2014 full synchronization between read/readdir/write/seek was
> > > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> > > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> > > > for directories. At that point taking f_lock became unnecessary for such
> > > > files.
> > > >
> > > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> > > > acquired f_pos_lock if necessary.
> > > >
> > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > >
> > > ...
> > >
> > > >       if (whence == SEEK_CUR) {
> > > > +             bool locked;
> > > > +
> > > >               /*
> > > > -              * f_lock protects against read/modify/write race with
> > > > -              * other SEEK_CURs. Note that parallel writes and reads
> > > > -              * behave like SEEK_SET.
> > > > +              * If the file requires locking via f_pos_lock we know
> > > > +              * that mutual exclusion for SEEK_CUR on the same file
> > > > +              * is guaranteed. If the file isn't locked, we take
> > > > +              * f_lock to protect against f_pos races with other
> > > > +              * SEEK_CURs.
> > > >                */
> > > > -             guard(spinlock)(&file->f_lock);
> > > > -             return vfs_setpos(file, file->f_pos + offset, maxsize);
> > > > +             locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> > > > +                      file->f_op->iterate_shared;
> > >
> > > As far as I understand the rationale this should match to
> > > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
> > > to me that's the case. After thinking about possibilities, I could convince
> > > myself that what you suggest is indeed safe but the condition being in two
> > > completely independent places and leading to subtle bugs if it gets out of
> > > sync seems a bit fragile to me.
> > >
> > 
> > A debug-only assert that the lock is held when expected should sort it out?
> 
> Good idea. Let me reuse the newly added infra:
> VFS_WARN_ON_ONCE(locked && !mutex_is_locked(&file->f_pos_lock));

Fine, but won't this actually trigger? file_needs_f_pos_lock() is:

        return (file->f_mode & FMODE_ATOMIC_POS) &&
                (file_count(file) > 1 || file->f_op->iterate_shared);

and here you have:

        locked = (file->f_mode & FMODE_ATOMIC_POS) ||
                  file->f_op->iterate_shared;

So if (file->f_mode & FMODE_ATOMIC_POS) and (file_count(file) == 1),
file_needs_f_pos_lock() returns false but locked is true?

								Honza

Christian Brauner Feb. 13, 2025, 3:10 p.m. UTC | #6

On Mon, Feb 10, 2025 at 03:58:17PM +0100, Jan Kara wrote:
> On Mon 10-02-25 13:01:38, Christian Brauner wrote:
> > On Fri, Feb 07, 2025 at 05:42:17PM +0100, Mateusz Guzik wrote:
> > > On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> > > > > Before 2011 there was no meaningful synchronization between
> > > > > read/readdir/write/seek. Only in commit
> > > > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> > > > > synchronization was added for SEEK_CUR by taking f_lock around
> > > > > vfs_setpos().
> > > > >
> > > > > Then in 2014 full synchronization between read/readdir/write/seek was
> > > > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> > > > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> > > > > for directories. At that point taking f_lock became unnecessary for such
> > > > > files.
> > > > >
> > > > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> > > > > acquired f_pos_lock if necessary.
> > > > >
> > > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > >
> > > > ...
> > > >
> > > > >       if (whence == SEEK_CUR) {
> > > > > +             bool locked;
> > > > > +
> > > > >               /*
> > > > > -              * f_lock protects against read/modify/write race with
> > > > > -              * other SEEK_CURs. Note that parallel writes and reads
> > > > > -              * behave like SEEK_SET.
> > > > > +              * If the file requires locking via f_pos_lock we know
> > > > > +              * that mutual exclusion for SEEK_CUR on the same file
> > > > > +              * is guaranteed. If the file isn't locked, we take
> > > > > +              * f_lock to protect against f_pos races with other
> > > > > +              * SEEK_CURs.
> > > > >                */
> > > > > -             guard(spinlock)(&file->f_lock);
> > > > > -             return vfs_setpos(file, file->f_pos + offset, maxsize);
> > > > > +             locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> > > > > +                      file->f_op->iterate_shared;
> > > >
> > > > As far as I understand the rationale this should match to
> > > > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
> > > > to me that's the case. After thinking about possibilities, I could convince
> > > > myself that what you suggest is indeed safe but the condition being in two
> > > > completely independent places and leading to subtle bugs if it gets out of
> > > > sync seems a bit fragile to me.
> > > >
> > > 
> > > A debug-only assert that the lock is held when expected should sort it out?
> > 
> > Good idea. Let me reuse the newly added infra:
> > VFS_WARN_ON_ONCE(locked && !mutex_is_locked(&file->f_pos_lock));
> 
> Fine, but won't this actually trigger? file_needs_f_pos_lock() is:
> 
>         return (file->f_mode & FMODE_ATOMIC_POS) &&
>                 (file_count(file) > 1 || file->f_op->iterate_shared);
> 
> and here you have:
> 
>         locked = (file->f_mode & FMODE_ATOMIC_POS) ||
>                   file->f_op->iterate_shared;
> 
> So if (file->f_mode & FMODE_ATOMIC_POS) and (file_count(file) == 1),
> file_needs_f_pos_lock() returns false but locked is true?

Sorry, I got lost in other mails.
Yes, you're right. I had changed the patch in my tree to fix that.
I'll append it here. Sorry about that. Tell me if that looks sane ok to
you now.

Jan Kara Feb. 13, 2025, 4:17 p.m. UTC | #7

On Thu 13-02-25 16:10:55, Christian Brauner wrote:
> On Mon, Feb 10, 2025 at 03:58:17PM +0100, Jan Kara wrote:
> > On Mon 10-02-25 13:01:38, Christian Brauner wrote:
> > > On Fri, Feb 07, 2025 at 05:42:17PM +0100, Mateusz Guzik wrote:
> > > > On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote:
> > > > >
> > > > > On Fri 07-02-25 15:10:33, Christian Brauner wrote:
> > > > > > Before 2011 there was no meaningful synchronization between
> > > > > > read/readdir/write/seek. Only in commit
> > > > > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek")
> > > > > > synchronization was added for SEEK_CUR by taking f_lock around
> > > > > > vfs_setpos().
> > > > > >
> > > > > > Then in 2014 full synchronization between read/readdir/write/seek was
> > > > > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX")
> > > > > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and
> > > > > > for directories. At that point taking f_lock became unnecessary for such
> > > > > > files.
> > > > > >
> > > > > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have
> > > > > > acquired f_pos_lock if necessary.
> > > > > >
> > > > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > > >
> > > > > ...
> > > > >
> > > > > >       if (whence == SEEK_CUR) {
> > > > > > +             bool locked;
> > > > > > +
> > > > > >               /*
> > > > > > -              * f_lock protects against read/modify/write race with
> > > > > > -              * other SEEK_CURs. Note that parallel writes and reads
> > > > > > -              * behave like SEEK_SET.
> > > > > > +              * If the file requires locking via f_pos_lock we know
> > > > > > +              * that mutual exclusion for SEEK_CUR on the same file
> > > > > > +              * is guaranteed. If the file isn't locked, we take
> > > > > > +              * f_lock to protect against f_pos races with other
> > > > > > +              * SEEK_CURs.
> > > > > >                */
> > > > > > -             guard(spinlock)(&file->f_lock);
> > > > > > -             return vfs_setpos(file, file->f_pos + offset, maxsize);
> > > > > > +             locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> > > > > > +                      file->f_op->iterate_shared;
> > > > >
> > > > > As far as I understand the rationale this should match to
> > > > > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious
> > > > > to me that's the case. After thinking about possibilities, I could convince
> > > > > myself that what you suggest is indeed safe but the condition being in two
> > > > > completely independent places and leading to subtle bugs if it gets out of
> > > > > sync seems a bit fragile to me.
> > > > >
> > > > 
> > > > A debug-only assert that the lock is held when expected should sort it out?
> > > 
> > > Good idea. Let me reuse the newly added infra:
> > > VFS_WARN_ON_ONCE(locked && !mutex_is_locked(&file->f_pos_lock));
> > 
> > Fine, but won't this actually trigger? file_needs_f_pos_lock() is:
> > 
> >         return (file->f_mode & FMODE_ATOMIC_POS) &&
> >                 (file_count(file) > 1 || file->f_op->iterate_shared);
> > 
> > and here you have:
> > 
> >         locked = (file->f_mode & FMODE_ATOMIC_POS) ||
> >                   file->f_op->iterate_shared;
> > 
> > So if (file->f_mode & FMODE_ATOMIC_POS) and (file_count(file) == 1),
> > file_needs_f_pos_lock() returns false but locked is true?
> 
> Sorry, I got lost in other mails.
> Yes, you're right. I had changed the patch in my tree to fix that.
> I'll append it here. Sorry about that. Tell me if that looks sane ok to
> you now.

Looks good to me! Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

fs: don't needlessly acquire f_lock

Commit Message

Comments

Patch