Message ID | 20250207-daten-mahlzeit-99d2079864fb@brauner (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fs: don't needlessly acquire f_lock | expand |
On Fri, Feb 07, 2025 at 03:10:33PM +0100, Christian Brauner wrote: > Before 2011 there was no meaningful synchronization between > read/readdir/write/seek. Only in commit > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek") > synchronization was added for SEEK_CUR by taking f_lock around > vfs_setpos(). > > Then in 2014 full synchronization between read/readdir/write/seek was > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and > for directories. At that point taking f_lock became unnecessary for such > files. > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have > acquired f_pos_lock if necessary. What workloads benefit from this optimisation?
On Fri 07-02-25 15:10:33, Christian Brauner wrote: > Before 2011 there was no meaningful synchronization between > read/readdir/write/seek. Only in commit > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek") > synchronization was added for SEEK_CUR by taking f_lock around > vfs_setpos(). > > Then in 2014 full synchronization between read/readdir/write/seek was > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and > for directories. At that point taking f_lock became unnecessary for such > files. > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have > acquired f_pos_lock if necessary. > > Signed-off-by: Christian Brauner <brauner@kernel.org> ... > if (whence == SEEK_CUR) { > + bool locked; > + > /* > - * f_lock protects against read/modify/write race with > - * other SEEK_CURs. Note that parallel writes and reads > - * behave like SEEK_SET. > + * If the file requires locking via f_pos_lock we know > + * that mutual exclusion for SEEK_CUR on the same file > + * is guaranteed. If the file isn't locked, we take > + * f_lock to protect against f_pos races with other > + * SEEK_CURs. > */ > - guard(spinlock)(&file->f_lock); > - return vfs_setpos(file, file->f_pos + offset, maxsize); > + locked = (file->f_mode & FMODE_ATOMIC_POS) || > + file->f_op->iterate_shared; As far as I understand the rationale this should match to file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious to me that's the case. After thinking about possibilities, I could convince myself that what you suggest is indeed safe but the condition being in two completely independent places and leading to subtle bugs if it gets out of sync seems a bit fragile to me. Honza > + if (!locked) > + spin_lock(&file->f_lock); > + offset = vfs_setpos(file, file->f_pos + offset, maxsize); > + if (!locked) > + spin_unlock(&file->f_lock); > + return offset; > } > > return vfs_setpos(file, offset, maxsize); > -- > 2.47.2 >
On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote: > > On Fri 07-02-25 15:10:33, Christian Brauner wrote: > > Before 2011 there was no meaningful synchronization between > > read/readdir/write/seek. Only in commit > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek") > > synchronization was added for SEEK_CUR by taking f_lock around > > vfs_setpos(). > > > > Then in 2014 full synchronization between read/readdir/write/seek was > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and > > for directories. At that point taking f_lock became unnecessary for such > > files. > > > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have > > acquired f_pos_lock if necessary. > > > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > ... > > > if (whence == SEEK_CUR) { > > + bool locked; > > + > > /* > > - * f_lock protects against read/modify/write race with > > - * other SEEK_CURs. Note that parallel writes and reads > > - * behave like SEEK_SET. > > + * If the file requires locking via f_pos_lock we know > > + * that mutual exclusion for SEEK_CUR on the same file > > + * is guaranteed. If the file isn't locked, we take > > + * f_lock to protect against f_pos races with other > > + * SEEK_CURs. > > */ > > - guard(spinlock)(&file->f_lock); > > - return vfs_setpos(file, file->f_pos + offset, maxsize); > > + locked = (file->f_mode & FMODE_ATOMIC_POS) || > > + file->f_op->iterate_shared; > > As far as I understand the rationale this should match to > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious > to me that's the case. After thinking about possibilities, I could convince > myself that what you suggest is indeed safe but the condition being in two > completely independent places and leading to subtle bugs if it gets out of > sync seems a bit fragile to me. > A debug-only assert that the lock is held when expected should sort it out? > > + if (!locked) > > + spin_lock(&file->f_lock); > > + offset = vfs_setpos(file, file->f_pos + offset, maxsize); > > + if (!locked) > > + spin_unlock(&file->f_lock); > > + return offset; > > } > > > > return vfs_setpos(file, offset, maxsize); btw I ran this sucker over a kernel build: bpftrace -e 'kprobe:generic_file_llseek_size { @[((struct file *)arg0)->f_mode & (1 << 15), ((struct file *)arg0)->f_op->iterate_shared, arg2] = count(); } ' Attaching 1 probe... ^C @[32768, 0x0, 2]: 9 @[32768, 0x0, 1]: 171797 @[32768, 0x0, 0]: 660866 SEEK_CUR is 1, so this *does* show up.
On Fri, Feb 07, 2025 at 05:42:17PM +0100, Mateusz Guzik wrote: > On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote: > > > > On Fri 07-02-25 15:10:33, Christian Brauner wrote: > > > Before 2011 there was no meaningful synchronization between > > > read/readdir/write/seek. Only in commit > > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek") > > > synchronization was added for SEEK_CUR by taking f_lock around > > > vfs_setpos(). > > > > > > Then in 2014 full synchronization between read/readdir/write/seek was > > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") > > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and > > > for directories. At that point taking f_lock became unnecessary for such > > > files. > > > > > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have > > > acquired f_pos_lock if necessary. > > > > > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > > > ... > > > > > if (whence == SEEK_CUR) { > > > + bool locked; > > > + > > > /* > > > - * f_lock protects against read/modify/write race with > > > - * other SEEK_CURs. Note that parallel writes and reads > > > - * behave like SEEK_SET. > > > + * If the file requires locking via f_pos_lock we know > > > + * that mutual exclusion for SEEK_CUR on the same file > > > + * is guaranteed. If the file isn't locked, we take > > > + * f_lock to protect against f_pos races with other > > > + * SEEK_CURs. > > > */ > > > - guard(spinlock)(&file->f_lock); > > > - return vfs_setpos(file, file->f_pos + offset, maxsize); > > > + locked = (file->f_mode & FMODE_ATOMIC_POS) || > > > + file->f_op->iterate_shared; > > > > As far as I understand the rationale this should match to > > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious > > to me that's the case. After thinking about possibilities, I could convince > > myself that what you suggest is indeed safe but the condition being in two > > completely independent places and leading to subtle bugs if it gets out of > > sync seems a bit fragile to me. > > > > A debug-only assert that the lock is held when expected should sort it out? Good idea. Let me reuse the newly added infra: VFS_WARN_ON_ONCE(locked && !mutex_is_locked(&file->f_pos_lock)); > > > > + if (!locked) > > > + spin_lock(&file->f_lock); > > > + offset = vfs_setpos(file, file->f_pos + offset, maxsize); > > > + if (!locked) > > > + spin_unlock(&file->f_lock); > > > + return offset; > > > } > > > > > > return vfs_setpos(file, offset, maxsize); > > btw I ran this sucker over a kernel build: > > bpftrace -e 'kprobe:generic_file_llseek_size { @[((struct file > *)arg0)->f_mode & (1 << 15), ((struct file > *)arg0)->f_op->iterate_shared, arg2] = count(); } > ' > Attaching 1 probe... > ^C > > @[32768, 0x0, 2]: 9 > @[32768, 0x0, 1]: 171797 > @[32768, 0x0, 0]: 660866 > > SEEK_CUR is 1, so this *does* show up. Thanks for the perf.
On Mon 10-02-25 13:01:38, Christian Brauner wrote: > On Fri, Feb 07, 2025 at 05:42:17PM +0100, Mateusz Guzik wrote: > > On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote: > > > > > > On Fri 07-02-25 15:10:33, Christian Brauner wrote: > > > > Before 2011 there was no meaningful synchronization between > > > > read/readdir/write/seek. Only in commit > > > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek") > > > > synchronization was added for SEEK_CUR by taking f_lock around > > > > vfs_setpos(). > > > > > > > > Then in 2014 full synchronization between read/readdir/write/seek was > > > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") > > > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and > > > > for directories. At that point taking f_lock became unnecessary for such > > > > files. > > > > > > > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have > > > > acquired f_pos_lock if necessary. > > > > > > > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > > > > > ... > > > > > > > if (whence == SEEK_CUR) { > > > > + bool locked; > > > > + > > > > /* > > > > - * f_lock protects against read/modify/write race with > > > > - * other SEEK_CURs. Note that parallel writes and reads > > > > - * behave like SEEK_SET. > > > > + * If the file requires locking via f_pos_lock we know > > > > + * that mutual exclusion for SEEK_CUR on the same file > > > > + * is guaranteed. If the file isn't locked, we take > > > > + * f_lock to protect against f_pos races with other > > > > + * SEEK_CURs. > > > > */ > > > > - guard(spinlock)(&file->f_lock); > > > > - return vfs_setpos(file, file->f_pos + offset, maxsize); > > > > + locked = (file->f_mode & FMODE_ATOMIC_POS) || > > > > + file->f_op->iterate_shared; > > > > > > As far as I understand the rationale this should match to > > > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious > > > to me that's the case. After thinking about possibilities, I could convince > > > myself that what you suggest is indeed safe but the condition being in two > > > completely independent places and leading to subtle bugs if it gets out of > > > sync seems a bit fragile to me. > > > > > > > A debug-only assert that the lock is held when expected should sort it out? > > Good idea. Let me reuse the newly added infra: > VFS_WARN_ON_ONCE(locked && !mutex_is_locked(&file->f_pos_lock)); Fine, but won't this actually trigger? file_needs_f_pos_lock() is: return (file->f_mode & FMODE_ATOMIC_POS) && (file_count(file) > 1 || file->f_op->iterate_shared); and here you have: locked = (file->f_mode & FMODE_ATOMIC_POS) || file->f_op->iterate_shared; So if (file->f_mode & FMODE_ATOMIC_POS) and (file_count(file) == 1), file_needs_f_pos_lock() returns false but locked is true? Honza
On Mon, Feb 10, 2025 at 03:58:17PM +0100, Jan Kara wrote: > On Mon 10-02-25 13:01:38, Christian Brauner wrote: > > On Fri, Feb 07, 2025 at 05:42:17PM +0100, Mateusz Guzik wrote: > > > On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote: > > > > > > > > On Fri 07-02-25 15:10:33, Christian Brauner wrote: > > > > > Before 2011 there was no meaningful synchronization between > > > > > read/readdir/write/seek. Only in commit > > > > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek") > > > > > synchronization was added for SEEK_CUR by taking f_lock around > > > > > vfs_setpos(). > > > > > > > > > > Then in 2014 full synchronization between read/readdir/write/seek was > > > > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") > > > > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and > > > > > for directories. At that point taking f_lock became unnecessary for such > > > > > files. > > > > > > > > > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have > > > > > acquired f_pos_lock if necessary. > > > > > > > > > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > > > > > > > ... > > > > > > > > > if (whence == SEEK_CUR) { > > > > > + bool locked; > > > > > + > > > > > /* > > > > > - * f_lock protects against read/modify/write race with > > > > > - * other SEEK_CURs. Note that parallel writes and reads > > > > > - * behave like SEEK_SET. > > > > > + * If the file requires locking via f_pos_lock we know > > > > > + * that mutual exclusion for SEEK_CUR on the same file > > > > > + * is guaranteed. If the file isn't locked, we take > > > > > + * f_lock to protect against f_pos races with other > > > > > + * SEEK_CURs. > > > > > */ > > > > > - guard(spinlock)(&file->f_lock); > > > > > - return vfs_setpos(file, file->f_pos + offset, maxsize); > > > > > + locked = (file->f_mode & FMODE_ATOMIC_POS) || > > > > > + file->f_op->iterate_shared; > > > > > > > > As far as I understand the rationale this should match to > > > > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious > > > > to me that's the case. After thinking about possibilities, I could convince > > > > myself that what you suggest is indeed safe but the condition being in two > > > > completely independent places and leading to subtle bugs if it gets out of > > > > sync seems a bit fragile to me. > > > > > > > > > > A debug-only assert that the lock is held when expected should sort it out? > > > > Good idea. Let me reuse the newly added infra: > > VFS_WARN_ON_ONCE(locked && !mutex_is_locked(&file->f_pos_lock)); > > Fine, but won't this actually trigger? file_needs_f_pos_lock() is: > > return (file->f_mode & FMODE_ATOMIC_POS) && > (file_count(file) > 1 || file->f_op->iterate_shared); > > and here you have: > > locked = (file->f_mode & FMODE_ATOMIC_POS) || > file->f_op->iterate_shared; > > So if (file->f_mode & FMODE_ATOMIC_POS) and (file_count(file) == 1), > file_needs_f_pos_lock() returns false but locked is true? Sorry, I got lost in other mails. Yes, you're right. I had changed the patch in my tree to fix that. I'll append it here. Sorry about that. Tell me if that looks sane ok to you now.
On Thu 13-02-25 16:10:55, Christian Brauner wrote: > On Mon, Feb 10, 2025 at 03:58:17PM +0100, Jan Kara wrote: > > On Mon 10-02-25 13:01:38, Christian Brauner wrote: > > > On Fri, Feb 07, 2025 at 05:42:17PM +0100, Mateusz Guzik wrote: > > > > On Fri, Feb 7, 2025 at 4:50 PM Jan Kara <jack@suse.cz> wrote: > > > > > > > > > > On Fri 07-02-25 15:10:33, Christian Brauner wrote: > > > > > > Before 2011 there was no meaningful synchronization between > > > > > > read/readdir/write/seek. Only in commit > > > > > > ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek") > > > > > > synchronization was added for SEEK_CUR by taking f_lock around > > > > > > vfs_setpos(). > > > > > > > > > > > > Then in 2014 full synchronization between read/readdir/write/seek was > > > > > > added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") > > > > > > by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and > > > > > > for directories. At that point taking f_lock became unnecessary for such > > > > > > files. > > > > > > > > > > > > So only acquire f_lock for SEEK_CUR if this isn't a file that would have > > > > > > acquired f_pos_lock if necessary. > > > > > > > > > > > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > > > > > > > > > ... > > > > > > > > > > > if (whence == SEEK_CUR) { > > > > > > + bool locked; > > > > > > + > > > > > > /* > > > > > > - * f_lock protects against read/modify/write race with > > > > > > - * other SEEK_CURs. Note that parallel writes and reads > > > > > > - * behave like SEEK_SET. > > > > > > + * If the file requires locking via f_pos_lock we know > > > > > > + * that mutual exclusion for SEEK_CUR on the same file > > > > > > + * is guaranteed. If the file isn't locked, we take > > > > > > + * f_lock to protect against f_pos races with other > > > > > > + * SEEK_CURs. > > > > > > */ > > > > > > - guard(spinlock)(&file->f_lock); > > > > > > - return vfs_setpos(file, file->f_pos + offset, maxsize); > > > > > > + locked = (file->f_mode & FMODE_ATOMIC_POS) || > > > > > > + file->f_op->iterate_shared; > > > > > > > > > > As far as I understand the rationale this should match to > > > > > file_needs_f_pos_lock() (or it can possibly be weaker) but it isn't obvious > > > > > to me that's the case. After thinking about possibilities, I could convince > > > > > myself that what you suggest is indeed safe but the condition being in two > > > > > completely independent places and leading to subtle bugs if it gets out of > > > > > sync seems a bit fragile to me. > > > > > > > > > > > > > A debug-only assert that the lock is held when expected should sort it out? > > > > > > Good idea. Let me reuse the newly added infra: > > > VFS_WARN_ON_ONCE(locked && !mutex_is_locked(&file->f_pos_lock)); > > > > Fine, but won't this actually trigger? file_needs_f_pos_lock() is: > > > > return (file->f_mode & FMODE_ATOMIC_POS) && > > (file_count(file) > 1 || file->f_op->iterate_shared); > > > > and here you have: > > > > locked = (file->f_mode & FMODE_ATOMIC_POS) || > > file->f_op->iterate_shared; > > > > So if (file->f_mode & FMODE_ATOMIC_POS) and (file_count(file) == 1), > > file_needs_f_pos_lock() returns false but locked is true? > > Sorry, I got lost in other mails. > Yes, you're right. I had changed the patch in my tree to fix that. > I'll append it here. Sorry about that. Tell me if that looks sane ok to > you now. Looks good to me! Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza
diff --git a/fs/read_write.c b/fs/read_write.c index a6133241dfb8..816189f9c56d 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -168,13 +168,23 @@ generic_file_llseek_size(struct file *file, loff_t offset, int whence, return offset; if (whence == SEEK_CUR) { + bool locked; + /* - * f_lock protects against read/modify/write race with - * other SEEK_CURs. Note that parallel writes and reads - * behave like SEEK_SET. + * If the file requires locking via f_pos_lock we know + * that mutual exclusion for SEEK_CUR on the same file + * is guaranteed. If the file isn't locked, we take + * f_lock to protect against f_pos races with other + * SEEK_CURs. */ - guard(spinlock)(&file->f_lock); - return vfs_setpos(file, file->f_pos + offset, maxsize); + locked = (file->f_mode & FMODE_ATOMIC_POS) || + file->f_op->iterate_shared; + if (!locked) + spin_lock(&file->f_lock); + offset = vfs_setpos(file, file->f_pos + offset, maxsize); + if (!locked) + spin_unlock(&file->f_lock); + return offset; } return vfs_setpos(file, offset, maxsize);
Before 2011 there was no meaningful synchronization between read/readdir/write/seek. Only in commit ef3d0fd27e90 ("vfs: do (nearly) lockless generic_file_llseek") synchronization was added for SEEK_CUR by taking f_lock around vfs_setpos(). Then in 2014 full synchronization between read/readdir/write/seek was added in commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per POSIX") by introducing f_pos_lock for regular files with FMODE_ATOMIC_POS and for directories. At that point taking f_lock became unnecessary for such files. So only acquire f_lock for SEEK_CUR if this isn't a file that would have acquired f_pos_lock if necessary. Signed-off-by: Christian Brauner <brauner@kernel.org> --- fs/read_write.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)