Message ID | 20220814152322.569296-1-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | locks: fix TOCTOU race when granting write lease | expand |
On Sun, Aug 14, 2022 at 06:23:22PM +0300, Amir Goldstein wrote: > Thread A trying to acquire a write lease checks the value of i_readcount > and i_writecount in check_conflicting_open() to verify that its own fd > is the only fd referencing the file. > > Thread B trying to open the file for read will call break_lease() in > do_dentry_open() before incrementing i_readcount, which leaves a small > window where thread A can acquire the write lease and then thread B > completes the open of the file for read without breaking the write lease > that was acquired by thread A. > > Fix this race by incrementing i_readcount before checking for existing > leases, same as the case with i_writecount. > > Use a helper put_file_access() to decrement i_readcount or i_writecount > in do_dentry_open() and __fput(). > > Fixes: 387e3746d01c ("locks: eliminate false positive conflicts for write lease") > Signed-off-by: Amir Goldstein <amir73il@gmail.com> Looks sane; I'd probably collapsed cleanup_file and cleanup_all while we are at it, but then I can do that in a followup as well. > +static inline void put_file_access(struct file *file) > +{ > + if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { > + i_readcount_dec(file->f_inode); > + } else if (file->f_mode & FMODE_WRITER) { > + put_write_access(file->f_inode); > + __mnt_drop_write(file->f_path.mnt); > + } > +} What's the point of having it in linux/fs.h instead of internal.h?
On Sun, Aug 14, 2022 at 8:57 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Sun, Aug 14, 2022 at 06:23:22PM +0300, Amir Goldstein wrote: > > Thread A trying to acquire a write lease checks the value of i_readcount > > and i_writecount in check_conflicting_open() to verify that its own fd > > is the only fd referencing the file. > > > > Thread B trying to open the file for read will call break_lease() in > > do_dentry_open() before incrementing i_readcount, which leaves a small > > window where thread A can acquire the write lease and then thread B > > completes the open of the file for read without breaking the write lease > > that was acquired by thread A. > > > > Fix this race by incrementing i_readcount before checking for existing > > leases, same as the case with i_writecount. > > > > Use a helper put_file_access() to decrement i_readcount or i_writecount > > in do_dentry_open() and __fput(). > > > > Fixes: 387e3746d01c ("locks: eliminate false positive conflicts for write lease") > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > Looks sane; I'd probably collapsed cleanup_file and cleanup_all while we are > at it, but then I can do that in a followup as well. > Not sure how you envision that cleanup, so I'll let you do it. > > +static inline void put_file_access(struct file *file) > > +{ > > + if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { > > + i_readcount_dec(file->f_inode); > > + } else if (file->f_mode & FMODE_WRITER) { > > + put_write_access(file->f_inode); > > + __mnt_drop_write(file->f_path.mnt); > > + } > > +} > > What's the point of having it in linux/fs.h instead of internal.h? No reason. Overlooked. Do you need me to re-send or will you move it to internal.h yourself? Thanks, Amir.
On Sun, 2022-08-14 at 18:23 +0300, Amir Goldstein wrote: > Thread A trying to acquire a write lease checks the value of i_readcount > and i_writecount in check_conflicting_open() to verify that its own fd > is the only fd referencing the file. > > Thread B trying to open the file for read will call break_lease() in > do_dentry_open() before incrementing i_readcount, which leaves a small > window where thread A can acquire the write lease and then thread B > completes the open of the file for read without breaking the write lease > that was acquired by thread A. > > Fix this race by incrementing i_readcount before checking for existing > leases, same as the case with i_writecount. > Nice catch. > Use a helper put_file_access() to decrement i_readcount or i_writecount > in do_dentry_open() and __fput(). > > Fixes: 387e3746d01c ("locks: eliminate false positive conflicts for write lease") > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > > Hi Jeff, > > This fixes a race I found during code audit - I do not have a reproducer > for it. > > I ran the fstests I found for locks and leases: > generic/131 generic/478 generic/504 generic/571 > and the LTP fcntl tests. > > Encountered this warning with generic/131, but I also see it on > current master: > > ============================= > WARNING: suspicious RCU usage > 5.19.0-xfstests-14277-gbd6ab3ef4e93 #966 Not tainted > ----------------------------- > include/net/sock.h:592 suspicious rcu_dereference_check() usage! > > other info that might help us debug this: > > > rcu_scheduler_active = 2, debug_locks = 1 > 5 locks held by locktest/3996: > #0: ffff88800be1d7a0 (&sb->s_type->i_mutex_key#8){+.+.}-{3:3}, at: __sock_release+0x25/0x97 > #1: ffff88800909ce00 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x14/0x60 > #2: ffff888006847cc8 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x3a/0xcf > #3: ffffffff82a8ac18 (reuseport_lock){+...}-{2:2}, at: reuseport_detach_sock+0x17/0xb8 > #4: ffff88800909d0b0 (clock-AF_INET){++..}-{2:2}, at: bpf_sk_reuseport_detach+0x1b/0x85 > > stack backtrace: > CPU: 1 PID: 3996 Comm: locktest Not tainted 5.19.0-xfstests-14277-gbd6ab3ef4e93 #966 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 > Call Trace: > <TASK> > dump_stack_lvl+0x45/0x5d > bpf_sk_reuseport_detach+0x5c/0x85 > reuseport_detach_sock+0x65/0xb8 > inet_unhash+0x55/0xcf > tcp_set_state+0xb3/0x10d > ? mark_lock.part.0+0x30/0x101 > __tcp_close+0x26/0x32d > tcp_close+0x20/0x60 > inet_release+0x50/0x64 > __sock_release+0x32/0x97 > sock_close+0x14/0x1b > __fput+0x118/0x1eb > > > Let me know what you think. > > Thanks, > Amir. > > fs/file_table.c | 7 +------ > fs/open.c | 11 ++++------- > include/linux/fs.h | 10 ++++++++++ > 3 files changed, 15 insertions(+), 13 deletions(-) > > diff --git a/fs/file_table.c b/fs/file_table.c > index 99c6796c9f28..dd88701e54a9 100644 > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -324,12 +324,7 @@ static void __fput(struct file *file) > } > fops_put(file->f_op); > put_pid(file->f_owner.pid); > - if ((mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) > - i_readcount_dec(inode); > - if (mode & FMODE_WRITER) { > - put_write_access(inode); > - __mnt_drop_write(mnt); > - } > + put_file_access(file); > dput(dentry); > if (unlikely(mode & FMODE_NEED_UNMOUNT)) > dissolve_on_fput(mnt); > diff --git a/fs/open.c b/fs/open.c > index 8a813fa5ca56..a98572585815 100644 > --- a/fs/open.c > +++ b/fs/open.c > @@ -840,7 +840,9 @@ static int do_dentry_open(struct file *f, > return 0; > } > > - if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) { > + if ((f->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { > + i_readcount_inc(inode); > + } else if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) { > error = get_write_access(inode); > if (unlikely(error)) > goto cleanup_file; > @@ -880,8 +882,6 @@ static int do_dentry_open(struct file *f, > goto cleanup_all; > } > f->f_mode |= FMODE_OPENED; > - if ((f->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) > - i_readcount_inc(inode); > if ((f->f_mode & FMODE_READ) && > likely(f->f_op->read || f->f_op->read_iter)) > f->f_mode |= FMODE_CAN_READ; > @@ -935,10 +935,7 @@ static int do_dentry_open(struct file *f, > if (WARN_ON_ONCE(error > 0)) > error = -EINVAL; > fops_put(f->f_op); > - if (f->f_mode & FMODE_WRITER) { > - put_write_access(inode); > - __mnt_drop_write(f->f_path.mnt); > - } > + put_file_access(f); > cleanup_file: > path_put(&f->f_path); > f->f_path.mnt = NULL; > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 9eced4cc286e..8bc04852c3da 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -3000,6 +3000,16 @@ static inline void i_readcount_inc(struct inode *inode) > return; > } > #endif > +static inline void put_file_access(struct file *file) > +{ > + if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { > + i_readcount_dec(file->f_inode); > + } else if (file->f_mode & FMODE_WRITER) { > + put_write_access(file->f_inode); > + __mnt_drop_write(file->f_path.mnt); > + } > +} > + > extern int do_pipe_flags(int *, int); > > extern ssize_t kernel_read(struct file *, void *, size_t, loff_t *); Looks good to me. I like the new helper. In addition to Al's comment about which header this should go in, it might also be good to put a kerneldoc comment over it. Al, did you want to take this via your tree or do you want me to take it via the filelocks tree? Thanks,
On Mon, Aug 15, 2022 at 10:18:21AM +0300, Amir Goldstein wrote: > > > +static inline void put_file_access(struct file *file) > > > +{ > > > + if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { > > > + i_readcount_dec(file->f_inode); > > > + } else if (file->f_mode & FMODE_WRITER) { > > > + put_write_access(file->f_inode); > > > + __mnt_drop_write(file->f_path.mnt); > > > + } > > > +} > > > > What's the point of having it in linux/fs.h instead of internal.h? > > No reason. Overlooked. > Do you need me to re-send or will you move it to internal.h yourself? Resend, please...
On Mon, Aug 15, 2022 at 2:21 PM Jeff Layton <jlayton@kernel.org> wrote: > > On Sun, 2022-08-14 at 18:23 +0300, Amir Goldstein wrote: > > Thread A trying to acquire a write lease checks the value of i_readcount > > and i_writecount in check_conflicting_open() to verify that its own fd > > is the only fd referencing the file. > > > > Thread B trying to open the file for read will call break_lease() in > > do_dentry_open() before incrementing i_readcount, which leaves a small > > window where thread A can acquire the write lease and then thread B > > completes the open of the file for read without breaking the write lease > > that was acquired by thread A. > > > > Fix this race by incrementing i_readcount before checking for existing > > leases, same as the case with i_writecount. > > > > Nice catch. > > > Use a helper put_file_access() to decrement i_readcount or i_writecount > > in do_dentry_open() and __fput(). > > > > Fixes: 387e3746d01c ("locks: eliminate false positive conflicts for write lease") > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > --- > > > > Hi Jeff, > > > > This fixes a race I found during code audit - I do not have a reproducer > > for it. > > > > I ran the fstests I found for locks and leases: > > generic/131 generic/478 generic/504 generic/571 > > and the LTP fcntl tests. > > > > Encountered this warning with generic/131, but I also see it on > > current master: > > > > ============================= > > WARNING: suspicious RCU usage > > 5.19.0-xfstests-14277-gbd6ab3ef4e93 #966 Not tainted > > ----------------------------- > > include/net/sock.h:592 suspicious rcu_dereference_check() usage! > > > > other info that might help us debug this: > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > 5 locks held by locktest/3996: > > #0: ffff88800be1d7a0 (&sb->s_type->i_mutex_key#8){+.+.}-{3:3}, at: __sock_release+0x25/0x97 > > #1: ffff88800909ce00 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x14/0x60 > > #2: ffff888006847cc8 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x3a/0xcf > > #3: ffffffff82a8ac18 (reuseport_lock){+...}-{2:2}, at: reuseport_detach_sock+0x17/0xb8 > > #4: ffff88800909d0b0 (clock-AF_INET){++..}-{2:2}, at: bpf_sk_reuseport_detach+0x1b/0x85 > > > > stack backtrace: > > CPU: 1 PID: 3996 Comm: locktest Not tainted 5.19.0-xfstests-14277-gbd6ab3ef4e93 #966 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 > > Call Trace: > > <TASK> > > dump_stack_lvl+0x45/0x5d > > bpf_sk_reuseport_detach+0x5c/0x85 > > reuseport_detach_sock+0x65/0xb8 > > inet_unhash+0x55/0xcf > > tcp_set_state+0xb3/0x10d > > ? mark_lock.part.0+0x30/0x101 > > __tcp_close+0x26/0x32d > > tcp_close+0x20/0x60 > > inet_release+0x50/0x64 > > __sock_release+0x32/0x97 > > sock_close+0x14/0x1b > > __fput+0x118/0x1eb > > > > > > Let me know what you think. > > > > Thanks, > > Amir. > > > > fs/file_table.c | 7 +------ > > fs/open.c | 11 ++++------- > > include/linux/fs.h | 10 ++++++++++ > > 3 files changed, 15 insertions(+), 13 deletions(-) > > > > diff --git a/fs/file_table.c b/fs/file_table.c > > index 99c6796c9f28..dd88701e54a9 100644 > > --- a/fs/file_table.c > > +++ b/fs/file_table.c > > @@ -324,12 +324,7 @@ static void __fput(struct file *file) > > } > > fops_put(file->f_op); > > put_pid(file->f_owner.pid); > > - if ((mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) > > - i_readcount_dec(inode); > > - if (mode & FMODE_WRITER) { > > - put_write_access(inode); > > - __mnt_drop_write(mnt); > > - } > > + put_file_access(file); > > dput(dentry); > > if (unlikely(mode & FMODE_NEED_UNMOUNT)) > > dissolve_on_fput(mnt); > > diff --git a/fs/open.c b/fs/open.c > > index 8a813fa5ca56..a98572585815 100644 > > --- a/fs/open.c > > +++ b/fs/open.c > > @@ -840,7 +840,9 @@ static int do_dentry_open(struct file *f, > > return 0; > > } > > > > - if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) { > > + if ((f->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { > > + i_readcount_inc(inode); > > + } else if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) { > > error = get_write_access(inode); > > if (unlikely(error)) > > goto cleanup_file; > > @@ -880,8 +882,6 @@ static int do_dentry_open(struct file *f, > > goto cleanup_all; > > } > > f->f_mode |= FMODE_OPENED; > > - if ((f->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) > > - i_readcount_inc(inode); > > if ((f->f_mode & FMODE_READ) && > > likely(f->f_op->read || f->f_op->read_iter)) > > f->f_mode |= FMODE_CAN_READ; > > @@ -935,10 +935,7 @@ static int do_dentry_open(struct file *f, > > if (WARN_ON_ONCE(error > 0)) > > error = -EINVAL; > > fops_put(f->f_op); > > - if (f->f_mode & FMODE_WRITER) { > > - put_write_access(inode); > > - __mnt_drop_write(f->f_path.mnt); > > - } > > + put_file_access(f); > > cleanup_file: > > path_put(&f->f_path); > > f->f_path.mnt = NULL; > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 9eced4cc286e..8bc04852c3da 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -3000,6 +3000,16 @@ static inline void i_readcount_inc(struct inode *inode) > > return; > > } > > #endif > > +static inline void put_file_access(struct file *file) > > +{ > > + if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { > > + i_readcount_dec(file->f_inode); > > + } else if (file->f_mode & FMODE_WRITER) { > > + put_write_access(file->f_inode); > > + __mnt_drop_write(file->f_path.mnt); > > + } > > +} > > + > > extern int do_pipe_flags(int *, int); > > > > extern ssize_t kernel_read(struct file *, void *, size_t, loff_t *); > > Looks good to me. I like the new helper. > > In addition to Al's comment about which header this should go in, it > might also be good to put a kerneldoc comment over it. > I'm sorry, I couldn't come up with a good description of this arbitrary helper and I don't think this is so important for an internal helper like this one. For now, I will post without the kerneldoc. If you disagree, please provide a description. Thanks, Amir.
diff --git a/fs/file_table.c b/fs/file_table.c index 99c6796c9f28..dd88701e54a9 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -324,12 +324,7 @@ static void __fput(struct file *file) } fops_put(file->f_op); put_pid(file->f_owner.pid); - if ((mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) - i_readcount_dec(inode); - if (mode & FMODE_WRITER) { - put_write_access(inode); - __mnt_drop_write(mnt); - } + put_file_access(file); dput(dentry); if (unlikely(mode & FMODE_NEED_UNMOUNT)) dissolve_on_fput(mnt); diff --git a/fs/open.c b/fs/open.c index 8a813fa5ca56..a98572585815 100644 --- a/fs/open.c +++ b/fs/open.c @@ -840,7 +840,9 @@ static int do_dentry_open(struct file *f, return 0; } - if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) { + if ((f->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { + i_readcount_inc(inode); + } else if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) { error = get_write_access(inode); if (unlikely(error)) goto cleanup_file; @@ -880,8 +882,6 @@ static int do_dentry_open(struct file *f, goto cleanup_all; } f->f_mode |= FMODE_OPENED; - if ((f->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) - i_readcount_inc(inode); if ((f->f_mode & FMODE_READ) && likely(f->f_op->read || f->f_op->read_iter)) f->f_mode |= FMODE_CAN_READ; @@ -935,10 +935,7 @@ static int do_dentry_open(struct file *f, if (WARN_ON_ONCE(error > 0)) error = -EINVAL; fops_put(f->f_op); - if (f->f_mode & FMODE_WRITER) { - put_write_access(inode); - __mnt_drop_write(f->f_path.mnt); - } + put_file_access(f); cleanup_file: path_put(&f->f_path); f->f_path.mnt = NULL; diff --git a/include/linux/fs.h b/include/linux/fs.h index 9eced4cc286e..8bc04852c3da 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3000,6 +3000,16 @@ static inline void i_readcount_inc(struct inode *inode) return; } #endif +static inline void put_file_access(struct file *file) +{ + if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) { + i_readcount_dec(file->f_inode); + } else if (file->f_mode & FMODE_WRITER) { + put_write_access(file->f_inode); + __mnt_drop_write(file->f_path.mnt); + } +} + extern int do_pipe_flags(int *, int); extern ssize_t kernel_read(struct file *, void *, size_t, loff_t *);
Thread A trying to acquire a write lease checks the value of i_readcount and i_writecount in check_conflicting_open() to verify that its own fd is the only fd referencing the file. Thread B trying to open the file for read will call break_lease() in do_dentry_open() before incrementing i_readcount, which leaves a small window where thread A can acquire the write lease and then thread B completes the open of the file for read without breaking the write lease that was acquired by thread A. Fix this race by incrementing i_readcount before checking for existing leases, same as the case with i_writecount. Use a helper put_file_access() to decrement i_readcount or i_writecount in do_dentry_open() and __fput(). Fixes: 387e3746d01c ("locks: eliminate false positive conflicts for write lease") Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- Hi Jeff, This fixes a race I found during code audit - I do not have a reproducer for it. I ran the fstests I found for locks and leases: generic/131 generic/478 generic/504 generic/571 and the LTP fcntl tests. Encountered this warning with generic/131, but I also see it on current master: ============================= WARNING: suspicious RCU usage 5.19.0-xfstests-14277-gbd6ab3ef4e93 #966 Not tainted ----------------------------- include/net/sock.h:592 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 5 locks held by locktest/3996: #0: ffff88800be1d7a0 (&sb->s_type->i_mutex_key#8){+.+.}-{3:3}, at: __sock_release+0x25/0x97 #1: ffff88800909ce00 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x14/0x60 #2: ffff888006847cc8 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x3a/0xcf #3: ffffffff82a8ac18 (reuseport_lock){+...}-{2:2}, at: reuseport_detach_sock+0x17/0xb8 #4: ffff88800909d0b0 (clock-AF_INET){++..}-{2:2}, at: bpf_sk_reuseport_detach+0x1b/0x85 stack backtrace: CPU: 1 PID: 3996 Comm: locktest Not tainted 5.19.0-xfstests-14277-gbd6ab3ef4e93 #966 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x5d bpf_sk_reuseport_detach+0x5c/0x85 reuseport_detach_sock+0x65/0xb8 inet_unhash+0x55/0xcf tcp_set_state+0xb3/0x10d ? mark_lock.part.0+0x30/0x101 __tcp_close+0x26/0x32d tcp_close+0x20/0x60 inet_release+0x50/0x64 __sock_release+0x32/0x97 sock_close+0x14/0x1b __fput+0x118/0x1eb Let me know what you think. Thanks, Amir. fs/file_table.c | 7 +------ fs/open.c | 11 ++++------- include/linux/fs.h | 10 ++++++++++ 3 files changed, 15 insertions(+), 13 deletions(-)