Message ID | 20220706052130.16368-4-kuniyu@amazon.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | sysctl: Fix data-races around ipv4_table. | expand |
On Wed, Jul 6, 2022 at 7:22 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > A sysctl variable is accessed concurrently, and there is always a chance of > data-race. So, all readers and writers need some basic protection to avoid > load/store-tearing. > > This patch changes proc_dointvec() to use READ_ONCE()/WRITE_ONCE() > internally to fix a data-race on the sysctl side. For now, proc_dointvec() > itself is tolerant to a data-race, but we still need to add annotations on > the other subsystem's side. > > In case we miss such fixes, this patch converts proc_dointvec() to a > wrapper of proc_dointvec_lockless(). When we fix a data-race in the other > subsystem, we can explicitly set it as a handler. > > Also, this patch removes proc_dointvec()'s document and adds > proc_dointvec_lockless()'s one so that no one will use proc_dointvec() > anymore. > > While we are on it, we remove some trailing spaces. I do not see why you add more functions. Really all sysctls can change locklessly by nature, as I pointed out. So I would simply add WRITE_ONCE() whenever they are written, and READ_ONCE() when they are read. If stable teams care enough, they will have to backport these changes, so I would rather not have to change proc_dointvec() to proc_dointvec_lockless() in many files, with many conflicts, that ultimately will either add bugs, or ask extra work for maintainers.
From: Eric Dumazet <edumazet@google.com> Date: Wed, 6 Jul 2022 09:00:11 +0200 > On Wed, Jul 6, 2022 at 7:22 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > A sysctl variable is accessed concurrently, and there is always a chance of > > data-race. So, all readers and writers need some basic protection to avoid > > load/store-tearing. > > > > This patch changes proc_dointvec() to use READ_ONCE()/WRITE_ONCE() > > internally to fix a data-race on the sysctl side. For now, proc_dointvec() > > itself is tolerant to a data-race, but we still need to add annotations on > > the other subsystem's side. > > > > In case we miss such fixes, this patch converts proc_dointvec() to a > > wrapper of proc_dointvec_lockless(). When we fix a data-race in the other > > subsystem, we can explicitly set it as a handler. > > > > Also, this patch removes proc_dointvec()'s document and adds > > proc_dointvec_lockless()'s one so that no one will use proc_dointvec() > > anymore. > > > > While we are on it, we remove some trailing spaces. > > > I do not see why you add more functions. It was not to miss where we still need fixes and to be taken care of by newly added sysctl knob. > Really all sysctls can change locklessly by nature, as I pointed out. > > So I would simply add WRITE_ONCE() whenever they are written, and > READ_ONCE() when they are read. > > If stable teams care enough, they will have to backport these changes, > so I would rather not have to change > proc_dointvec() to proc_dointvec_lockless() in many files, with many > conflicts, that ultimately will either > add bugs, or ask extra work for maintainers. Indeed, I will drop such changes and just add annotations in *_conv(). Thank you!
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index fcafc16abbad..cb87919b5508 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -84,6 +84,7 @@ PROC_HANDLER(proc_do_large_bitmap); PROC_HANDLER(proc_do_static_key); PROC_HANDLER(proc_dobool_lockless); +PROC_HANDLER(proc_dointvec_lockless); /* * Register a set of sysctl names by calling register_sysctl_table diff --git a/kernel/sysctl.c b/kernel/sysctl.c index bc6fcc64eeaf..50d9b78aa0b3 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -445,14 +445,17 @@ static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp, if (*negp) { if (*lvalp > (unsigned long) INT_MAX + 1) return -EINVAL; - *valp = -*lvalp; + + WRITE_ONCE(*valp, -*lvalp); } else { if (*lvalp > (unsigned long) INT_MAX) return -EINVAL; - *valp = *lvalp; + + WRITE_ONCE(*valp, *lvalp); } } else { - int val = *valp; + int val = READ_ONCE(*valp); + if (val < 0) { *negp = true; *lvalp = -(unsigned long)val; @@ -491,12 +494,12 @@ static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table, int *i, vleft, first = 1, err = 0; size_t left; char *p; - + if (!tbl_data || !table->maxlen || !*lenp || (*ppos && !write)) { *lenp = 0; return 0; } - + i = (int *) tbl_data; vleft = table->maxlen / sizeof(*i); left = *lenp; @@ -726,7 +729,7 @@ int proc_dobool(struct ctl_table *table, int write, void *buffer, } /** - * proc_dointvec - read a vector of integers + * proc_dointvec_lockless - read/write a vector of integers locklessly * @table: the sysctl table * @write: %TRUE if this is a write to the sysctl file * @buffer: the user buffer @@ -734,14 +737,20 @@ int proc_dobool(struct ctl_table *table, int write, void *buffer, * @ppos: file position * * Reads/writes up to table->maxlen/sizeof(unsigned int) integer - * values from/to the user buffer, treated as an ASCII string. + * values from/to the user buffer, treated as an ASCII string. * * Returns 0 on success. */ +int proc_dointvec_lockless(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos) +{ + return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL); +} + int proc_dointvec(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { - return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL); + return proc_dointvec_lockless(table, write, buffer, lenp, ppos); } #ifdef CONFIG_COMPACTION @@ -1503,6 +1512,7 @@ PROC_HANDLER_ENOSYS(proc_do_cad_pid); PROC_HANDLER_ENOSYS(proc_do_large_bitmap); PROC_HANDLER_ENOSYS(proc_dobool_lockless); +PROC_HANDLER_ENOSYS(proc_dointvec_lockless); #endif /* CONFIG_PROC_SYSCTL */ @@ -2414,3 +2424,4 @@ EXPORT_SYMBOL(proc_dointvec_ms_jiffies); EXPORT_SYMBOL(proc_do_large_bitmap); EXPORT_SYMBOL(proc_dobool_lockless); +EXPORT_SYMBOL(proc_dointvec_lockless);
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_dointvec() to use READ_ONCE()/WRITE_ONCE() internally to fix a data-race on the sysctl side. For now, proc_dointvec() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. In case we miss such fixes, this patch converts proc_dointvec() to a wrapper of proc_dointvec_lockless(). When we fix a data-race in the other subsystem, we can explicitly set it as a handler. Also, this patch removes proc_dointvec()'s document and adds proc_dointvec_lockless()'s one so that no one will use proc_dointvec() anymore. While we are on it, we remove some trailing spaces. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> --- include/linux/sysctl.h | 1 + kernel/sysctl.c | 27 +++++++++++++++++++-------- 2 files changed, 20 insertions(+), 8 deletions(-)