diff mbox series

[v1,net,03/16] sysctl: Add proc_dointvec_lockless().

Message ID 20220706052130.16368-4-kuniyu@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series sysctl: Fix data-races around ipv4_table. | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net, async
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count fail Series longer than 15 patches (and no cover letter)
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 17369 this patch: 17369
netdev/cc_maintainers warning 1 maintainers not CCed: linux-fsdevel@vger.kernel.org
netdev/build_clang success Errors and warnings before: 3291 this patch: 3291
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 16543 this patch: 16543
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 82 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Kuniyuki Iwashima July 6, 2022, 5:21 a.m. UTC
A sysctl variable is accessed concurrently, and there is always a chance of
data-race.  So, all readers and writers need some basic protection to avoid
load/store-tearing.

This patch changes proc_dointvec() to use READ_ONCE()/WRITE_ONCE()
internally to fix a data-race on the sysctl side.  For now, proc_dointvec()
itself is tolerant to a data-race, but we still need to add annotations on
the other subsystem's side.

In case we miss such fixes, this patch converts proc_dointvec() to a
wrapper of proc_dointvec_lockless().  When we fix a data-race in the other
subsystem, we can explicitly set it as a handler.

Also, this patch removes proc_dointvec()'s document and adds
proc_dointvec_lockless()'s one so that no one will use proc_dointvec()
anymore.

While we are on it, we remove some trailing spaces.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/linux/sysctl.h |  1 +
 kernel/sysctl.c        | 27 +++++++++++++++++++--------
 2 files changed, 20 insertions(+), 8 deletions(-)

Comments

Eric Dumazet July 6, 2022, 7 a.m. UTC | #1
On Wed, Jul 6, 2022 at 7:22 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> A sysctl variable is accessed concurrently, and there is always a chance of
> data-race.  So, all readers and writers need some basic protection to avoid
> load/store-tearing.
>
> This patch changes proc_dointvec() to use READ_ONCE()/WRITE_ONCE()
> internally to fix a data-race on the sysctl side.  For now, proc_dointvec()
> itself is tolerant to a data-race, but we still need to add annotations on
> the other subsystem's side.
>
> In case we miss such fixes, this patch converts proc_dointvec() to a
> wrapper of proc_dointvec_lockless().  When we fix a data-race in the other
> subsystem, we can explicitly set it as a handler.
>
> Also, this patch removes proc_dointvec()'s document and adds
> proc_dointvec_lockless()'s one so that no one will use proc_dointvec()
> anymore.
>
> While we are on it, we remove some trailing spaces.


I do not see why you add more functions.

Really all sysctls can change locklessly by nature, as I pointed out.

So I would simply add WRITE_ONCE() whenever they are written, and
READ_ONCE() when they are read.

If stable teams care enough, they will have to backport these changes,
so I would rather not have to change
proc_dointvec() to proc_dointvec_lockless() in many files, with many
conflicts, that ultimately will either
add bugs, or ask extra work for maintainers.
Kuniyuki Iwashima July 6, 2022, 4:15 p.m. UTC | #2
From:   Eric Dumazet <edumazet@google.com>
Date:   Wed, 6 Jul 2022 09:00:11 +0200
> On Wed, Jul 6, 2022 at 7:22 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > A sysctl variable is accessed concurrently, and there is always a chance of
> > data-race.  So, all readers and writers need some basic protection to avoid
> > load/store-tearing.
> >
> > This patch changes proc_dointvec() to use READ_ONCE()/WRITE_ONCE()
> > internally to fix a data-race on the sysctl side.  For now, proc_dointvec()
> > itself is tolerant to a data-race, but we still need to add annotations on
> > the other subsystem's side.
> >
> > In case we miss such fixes, this patch converts proc_dointvec() to a
> > wrapper of proc_dointvec_lockless().  When we fix a data-race in the other
> > subsystem, we can explicitly set it as a handler.
> >
> > Also, this patch removes proc_dointvec()'s document and adds
> > proc_dointvec_lockless()'s one so that no one will use proc_dointvec()
> > anymore.
> >
> > While we are on it, we remove some trailing spaces.
> 
> 
> I do not see why you add more functions.

It was not to miss where we still need fixes and to be taken care of
by newly added sysctl knob.


> Really all sysctls can change locklessly by nature, as I pointed out.
> 
> So I would simply add WRITE_ONCE() whenever they are written, and
> READ_ONCE() when they are read.
> 
> If stable teams care enough, they will have to backport these changes,
> so I would rather not have to change
> proc_dointvec() to proc_dointvec_lockless() in many files, with many
> conflicts, that ultimately will either
> add bugs, or ask extra work for maintainers.

Indeed, I will drop such changes and just add annotations in *_conv().
Thank you!
diff mbox series

Patch

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index fcafc16abbad..cb87919b5508 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -84,6 +84,7 @@  PROC_HANDLER(proc_do_large_bitmap);
 PROC_HANDLER(proc_do_static_key);
 
 PROC_HANDLER(proc_dobool_lockless);
+PROC_HANDLER(proc_dointvec_lockless);
 
 /*
  * Register a set of sysctl names by calling register_sysctl_table
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index bc6fcc64eeaf..50d9b78aa0b3 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -445,14 +445,17 @@  static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp,
 		if (*negp) {
 			if (*lvalp > (unsigned long) INT_MAX + 1)
 				return -EINVAL;
-			*valp = -*lvalp;
+
+			WRITE_ONCE(*valp, -*lvalp);
 		} else {
 			if (*lvalp > (unsigned long) INT_MAX)
 				return -EINVAL;
-			*valp = *lvalp;
+
+			WRITE_ONCE(*valp, *lvalp);
 		}
 	} else {
-		int val = *valp;
+		int val = READ_ONCE(*valp);
+
 		if (val < 0) {
 			*negp = true;
 			*lvalp = -(unsigned long)val;
@@ -491,12 +494,12 @@  static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table,
 	int *i, vleft, first = 1, err = 0;
 	size_t left;
 	char *p;
-	
+
 	if (!tbl_data || !table->maxlen || !*lenp || (*ppos && !write)) {
 		*lenp = 0;
 		return 0;
 	}
-	
+
 	i = (int *) tbl_data;
 	vleft = table->maxlen / sizeof(*i);
 	left = *lenp;
@@ -726,7 +729,7 @@  int proc_dobool(struct ctl_table *table, int write, void *buffer,
 }
 
 /**
- * proc_dointvec - read a vector of integers
+ * proc_dointvec_lockless - read/write a vector of integers locklessly
  * @table: the sysctl table
  * @write: %TRUE if this is a write to the sysctl file
  * @buffer: the user buffer
@@ -734,14 +737,20 @@  int proc_dobool(struct ctl_table *table, int write, void *buffer,
  * @ppos: file position
  *
  * Reads/writes up to table->maxlen/sizeof(unsigned int) integer
- * values from/to the user buffer, treated as an ASCII string. 
+ * values from/to the user buffer, treated as an ASCII string.
  *
  * Returns 0 on success.
  */
+int proc_dointvec_lockless(struct ctl_table *table, int write, void *buffer,
+			   size_t *lenp, loff_t *ppos)
+{
+	return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL);
+}
+
 int proc_dointvec(struct ctl_table *table, int write, void *buffer,
 		  size_t *lenp, loff_t *ppos)
 {
-	return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL);
+	return proc_dointvec_lockless(table, write, buffer, lenp, ppos);
 }
 
 #ifdef CONFIG_COMPACTION
@@ -1503,6 +1512,7 @@  PROC_HANDLER_ENOSYS(proc_do_cad_pid);
 PROC_HANDLER_ENOSYS(proc_do_large_bitmap);
 
 PROC_HANDLER_ENOSYS(proc_dobool_lockless);
+PROC_HANDLER_ENOSYS(proc_dointvec_lockless);
 
 #endif /* CONFIG_PROC_SYSCTL */
 
@@ -2414,3 +2424,4 @@  EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
 EXPORT_SYMBOL(proc_do_large_bitmap);
 
 EXPORT_SYMBOL(proc_dobool_lockless);
+EXPORT_SYMBOL(proc_dointvec_lockless);