diff mbox series

[v2,net-next,2/4] rtnetlink: Add per-netns RTNL.

Message ID 20241002151240.49813-3-kuniyu@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series rtnetlink: Per-netns RTNL. | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 46 this patch: 46
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 95 this patch: 95
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4300 this patch: 4300
netdev/checkpatch warning WARNING: Argument 'net' is not used in function-like macro WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 10 this patch: 10
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-10-04--18-00 (tests: 773)

Commit Message

Kuniyuki Iwashima Oct. 2, 2024, 3:12 p.m. UTC
The goal is to break RTNL down into per-netns mutex.

This patch adds per-netns mutex and its helper functions, rtnl_net_lock()
and rtnl_net_unlock().

rtnl_net_lock() acquires the global RTNL and per-netns RTNL mutex, and
rtnl_net_unlock() releases them.

We will replace 800+ rtnl_lock() with rtnl_net_lock() and finally removes
rtnl_lock() in rtnl_net_lock().

When we need to nest per-netns RTNL mutex, we will use __rtnl_net_lock(),
and its locking order is defined by rtnl_net_lock_cmp_fn() as follows:

  1. init_net is first
  2. netns address ascending order

Note that the conversion will be done under CONFIG_DEBUG_NET_SMALL_RTNL
with LOCKDEP so that we can carefully add the extra mutex without slowing
down RTNL operations during conversion.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/linux/rtnetlink.h   | 13 +++++++++
 include/net/net_namespace.h |  4 +++
 net/Kconfig.debug           | 15 ++++++++++
 net/core/net_namespace.c    |  6 ++++
 net/core/rtnetlink.c        | 58 +++++++++++++++++++++++++++++++++++++
 5 files changed, 96 insertions(+)

Comments

Eric Dumazet Oct. 4, 2024, 11:08 a.m. UTC | #1
On Wed, Oct 2, 2024 at 5:13 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> The goal is to break RTNL down into per-netns mutex.
>
> This patch adds per-netns mutex and its helper functions, rtnl_net_lock()
> and rtnl_net_unlock().
>
> rtnl_net_lock() acquires the global RTNL and per-netns RTNL mutex, and
> rtnl_net_unlock() releases them.
>
> We will replace 800+ rtnl_lock() with rtnl_net_lock() and finally removes
> rtnl_lock() in rtnl_net_lock().
>
> When we need to nest per-netns RTNL mutex, we will use __rtnl_net_lock(),
> and its locking order is defined by rtnl_net_lock_cmp_fn() as follows:
>
>   1. init_net is first
>   2. netns address ascending order
>
> Note that the conversion will be done under CONFIG_DEBUG_NET_SMALL_RTNL
> with LOCKDEP so that we can carefully add the extra mutex without slowing
> down RTNL operations during conversion.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>
Jakub Kicinski Oct. 4, 2024, 8:21 p.m. UTC | #2
On Wed, 2 Oct 2024 08:12:38 -0700 Kuniyuki Iwashima wrote:
> +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
> +void __rtnl_net_lock(struct net *net);
> +void __rtnl_net_unlock(struct net *net);
> +void rtnl_net_lock(struct net *net);
> +void rtnl_net_unlock(struct net *net);
> +int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b);
> +#else
> +#define __rtnl_net_lock(net)
> +#define __rtnl_net_unlock(net)
> +#define rtnl_net_lock(net) rtnl_lock()
> +#define rtnl_net_unlock(net) rtnl_unlock()

Let's make sure net is always evaluated?
At the very least make sure the preprocessor doesn't eat it completely
otherwise we may end up with config-dependent "unused variable"
warnings down the line.

> +#endif
Kuniyuki Iwashima Oct. 4, 2024, 8:45 p.m. UTC | #3
From: Jakub Kicinski <kuba@kernel.org>
Date: Fri, 4 Oct 2024 13:21:45 -0700
> On Wed, 2 Oct 2024 08:12:38 -0700 Kuniyuki Iwashima wrote:
> > +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
> > +void __rtnl_net_lock(struct net *net);
> > +void __rtnl_net_unlock(struct net *net);
> > +void rtnl_net_lock(struct net *net);
> > +void rtnl_net_unlock(struct net *net);
> > +int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b);
> > +#else
> > +#define __rtnl_net_lock(net)
> > +#define __rtnl_net_unlock(net)
> > +#define rtnl_net_lock(net) rtnl_lock()
> > +#define rtnl_net_unlock(net) rtnl_unlock()
> 
> Let's make sure net is always evaluated?
> At the very least make sure the preprocessor doesn't eat it completely
> otherwise we may end up with config-dependent "unused variable"
> warnings down the line.

Sure, what comes to mind is void casting, which I guess is old-school
style ?  Do you have any other idea or is this acceptable ?

#define __rtnl_net_lock(net) (void)(net)
#define __rtnl_net_unlock(net) (void)(net)
#define rtnl_net_lock(net)	\
	do {			\
		(void)(net);	\
		rtnl_lock();	\
	} while (0)
#define rtnl_net_unlock(net)	\
	do {			\
		(void)(net);	\
		rtnl_unlock();	\
	} while (0)
Kuniyuki Iwashima Oct. 4, 2024, 8:51 p.m. UTC | #4
From: Kuniyuki Iwashima <kuniyu@amazon.com>
Date: Fri, 4 Oct 2024 13:45:26 -0700
> From: Jakub Kicinski <kuba@kernel.org>
> Date: Fri, 4 Oct 2024 13:21:45 -0700
> > On Wed, 2 Oct 2024 08:12:38 -0700 Kuniyuki Iwashima wrote:
> > > +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
> > > +void __rtnl_net_lock(struct net *net);
> > > +void __rtnl_net_unlock(struct net *net);
> > > +void rtnl_net_lock(struct net *net);
> > > +void rtnl_net_unlock(struct net *net);
> > > +int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b);
> > > +#else
> > > +#define __rtnl_net_lock(net)
> > > +#define __rtnl_net_unlock(net)
> > > +#define rtnl_net_lock(net) rtnl_lock()
> > > +#define rtnl_net_unlock(net) rtnl_unlock()
> > 
> > Let's make sure net is always evaluated?
> > At the very least make sure the preprocessor doesn't eat it completely
> > otherwise we may end up with config-dependent "unused variable"
> > warnings down the line.
> 
> Sure, what comes to mind is void casting, which I guess is old-school
> style ?  Do you have any other idea or is this acceptable ?
> 
> #define __rtnl_net_lock(net) (void)(net)
> #define __rtnl_net_unlock(net) (void)(net)
> #define rtnl_net_lock(net)	\
> 	do {			\
> 		(void)(net);	\
> 		rtnl_lock();	\
> 	} while (0)
> #define rtnl_net_unlock(net)	\
> 	do {			\
> 		(void)(net);	\
> 		rtnl_unlock();	\
> 	} while (0)

or simply define these as static inline functions and
probably this is more preferable ?

static inline void __rtnl_net_lock(struct net *net) {}
static inline void __rtnl_net_unlock(struct net *net) {}
static inline void rtnl_net_lock(struct net *net)
{
	rtnl_lock();
}
static inline void rtnl_net_unlock(struct net *net)
{
	rtnl_unlock();
}
Eric Dumazet Oct. 4, 2024, 8:59 p.m. UTC | #5
On Fri, Oct 4, 2024 at 10:51 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Kuniyuki Iwashima <kuniyu@amazon.com>
> Date: Fri, 4 Oct 2024 13:45:26 -0700
> > From: Jakub Kicinski <kuba@kernel.org>
> > Date: Fri, 4 Oct 2024 13:21:45 -0700
> > > On Wed, 2 Oct 2024 08:12:38 -0700 Kuniyuki Iwashima wrote:
> > > > +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
> > > > +void __rtnl_net_lock(struct net *net);
> > > > +void __rtnl_net_unlock(struct net *net);
> > > > +void rtnl_net_lock(struct net *net);
> > > > +void rtnl_net_unlock(struct net *net);
> > > > +int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b);
> > > > +#else
> > > > +#define __rtnl_net_lock(net)
> > > > +#define __rtnl_net_unlock(net)
> > > > +#define rtnl_net_lock(net) rtnl_lock()
> > > > +#define rtnl_net_unlock(net) rtnl_unlock()
> > >
> > > Let's make sure net is always evaluated?
> > > At the very least make sure the preprocessor doesn't eat it completely
> > > otherwise we may end up with config-dependent "unused variable"
> > > warnings down the line.
> >
> > Sure, what comes to mind is void casting, which I guess is old-school
> > style ?  Do you have any other idea or is this acceptable ?
> >
> > #define __rtnl_net_lock(net) (void)(net)
> > #define __rtnl_net_unlock(net) (void)(net)
> > #define rtnl_net_lock(net)    \
> >       do {                    \
> >               (void)(net);    \
> >               rtnl_lock();    \
> >       } while (0)
> > #define rtnl_net_unlock(net)  \
> >       do {                    \
> >               (void)(net);    \
> >               rtnl_unlock();  \
> >       } while (0)
>
> or simply define these as static inline functions and
> probably this is more preferable ?
>
> static inline void __rtnl_net_lock(struct net *net) {}
> static inline void __rtnl_net_unlock(struct net *net) {}
> static inline void rtnl_net_lock(struct net *net)
> {
>         rtnl_lock();
> }
> static inline void rtnl_net_unlock(struct net *net)
> {
>         rtnl_unlock();
> }

static inline functions seem better to me.
Kuniyuki Iwashima Oct. 4, 2024, 9:22 p.m. UTC | #6
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 4 Oct 2024 22:59:32 +0200
> On Fri, Oct 4, 2024 at 10:51 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From: Kuniyuki Iwashima <kuniyu@amazon.com>
> > Date: Fri, 4 Oct 2024 13:45:26 -0700
> > > From: Jakub Kicinski <kuba@kernel.org>
> > > Date: Fri, 4 Oct 2024 13:21:45 -0700
> > > > On Wed, 2 Oct 2024 08:12:38 -0700 Kuniyuki Iwashima wrote:
> > > > > +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
> > > > > +void __rtnl_net_lock(struct net *net);
> > > > > +void __rtnl_net_unlock(struct net *net);
> > > > > +void rtnl_net_lock(struct net *net);
> > > > > +void rtnl_net_unlock(struct net *net);
> > > > > +int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b);
> > > > > +#else
> > > > > +#define __rtnl_net_lock(net)
> > > > > +#define __rtnl_net_unlock(net)
> > > > > +#define rtnl_net_lock(net) rtnl_lock()
> > > > > +#define rtnl_net_unlock(net) rtnl_unlock()
> > > >
> > > > Let's make sure net is always evaluated?
> > > > At the very least make sure the preprocessor doesn't eat it completely
> > > > otherwise we may end up with config-dependent "unused variable"
> > > > warnings down the line.
> > >
> > > Sure, what comes to mind is void casting, which I guess is old-school
> > > style ?  Do you have any other idea or is this acceptable ?
> > >
> > > #define __rtnl_net_lock(net) (void)(net)
> > > #define __rtnl_net_unlock(net) (void)(net)
> > > #define rtnl_net_lock(net)    \
> > >       do {                    \
> > >               (void)(net);    \
> > >               rtnl_lock();    \
> > >       } while (0)
> > > #define rtnl_net_unlock(net)  \
> > >       do {                    \
> > >               (void)(net);    \
> > >               rtnl_unlock();  \
> > >       } while (0)
> >
> > or simply define these as static inline functions and
> > probably this is more preferable ?
> >
> > static inline void __rtnl_net_lock(struct net *net) {}
> > static inline void __rtnl_net_unlock(struct net *net) {}
> > static inline void rtnl_net_lock(struct net *net)
> > {
> >         rtnl_lock();
> > }
> > static inline void rtnl_net_unlock(struct net *net)
> > {
> >         rtnl_unlock();
> > }
> 
> static inline functions seem better to me.

Will use them.

Thanks !
diff mbox series

Patch

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index cdfc897f1e3c..f743c4f678bf 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -46,6 +46,19 @@  extern int rtnl_is_locked(void);
 extern int rtnl_lock_killable(void);
 extern bool refcount_dec_and_rtnl_lock(refcount_t *r);
 
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+void __rtnl_net_lock(struct net *net);
+void __rtnl_net_unlock(struct net *net);
+void rtnl_net_lock(struct net *net);
+void rtnl_net_unlock(struct net *net);
+int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b);
+#else
+#define __rtnl_net_lock(net)
+#define __rtnl_net_unlock(net)
+#define rtnl_net_lock(net) rtnl_lock()
+#define rtnl_net_unlock(net) rtnl_unlock()
+#endif
+
 extern wait_queue_head_t netdev_unregistering_wq;
 extern atomic_t dev_unreg_count;
 extern struct rw_semaphore pernet_ops_rwsem;
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index e67b483cc8bb..873c0f9fdac6 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -188,6 +188,10 @@  struct net {
 #if IS_ENABLED(CONFIG_SMC)
 	struct netns_smc	smc;
 #endif
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+	/* Move to a better place when the config guard is removed. */
+	struct mutex		rtnl_mutex;
+#endif
 } __randomize_layout;
 
 #include <linux/seq_file_net.h>
diff --git a/net/Kconfig.debug b/net/Kconfig.debug
index 5e3fffe707dd..277fab8c4d77 100644
--- a/net/Kconfig.debug
+++ b/net/Kconfig.debug
@@ -24,3 +24,18 @@  config DEBUG_NET
 	help
 	  Enable extra sanity checks in networking.
 	  This is mostly used by fuzzers, but is safe to select.
+
+config DEBUG_NET_SMALL_RTNL
+	bool "Add extra per-netns mutex inside RTNL"
+	depends on DEBUG_KERNEL && NET && LOCK_DEBUGGING_SUPPORT
+	select PROVE_LOCKING
+	default n
+	help
+	  rtnl_lock() is being replaced with rtnl_net_lock() that
+	  acquires the global RTNL and a small per-netns RTNL mutex.
+
+	  During the conversion, rtnl_net_lock() just adds an extra
+	  mutex in every RTNL scope and slows down the operations.
+
+	  Once the conversion completes, rtnl_lock() will be removed
+	  and rtnetlink will gain per-netns scalability.
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index e39479f1c9a4..105e3cd26763 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -334,6 +334,12 @@  static __net_init void preinit_net(struct net *net, struct user_namespace *user_
 	idr_init(&net->netns_ids);
 	spin_lock_init(&net->nsid_lock);
 	mutex_init(&net->ipv4.ra_mutex);
+
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+	mutex_init(&net->rtnl_mutex);
+	lock_set_cmp_fn(&net->rtnl_mutex, rtnl_net_lock_cmp_fn, NULL);
+#endif
+
 	preinit_net_sysctl(net);
 }
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f0a520987085..edf530441b65 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -179,6 +179,64 @@  bool lockdep_rtnl_is_held(void)
 EXPORT_SYMBOL(lockdep_rtnl_is_held);
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
 
+#ifdef CONFIG_DEBUG_NET_SMALL_RTNL
+void __rtnl_net_lock(struct net *net)
+{
+	ASSERT_RTNL();
+
+	mutex_lock(&net->rtnl_mutex);
+}
+EXPORT_SYMBOL(__rtnl_net_lock);
+
+void __rtnl_net_unlock(struct net *net)
+{
+	ASSERT_RTNL();
+
+	mutex_unlock(&net->rtnl_mutex);
+}
+EXPORT_SYMBOL(__rtnl_net_unlock);
+
+void rtnl_net_lock(struct net *net)
+{
+	rtnl_lock();
+	__rtnl_net_lock(net);
+}
+EXPORT_SYMBOL(rtnl_net_lock);
+
+void rtnl_net_unlock(struct net *net)
+{
+	__rtnl_net_unlock(net);
+	rtnl_unlock();
+}
+EXPORT_SYMBOL(rtnl_net_unlock);
+
+static int rtnl_net_cmp_locks(const struct net *net_a, const struct net *net_b)
+{
+	if (net_eq(net_a, net_b))
+		return 0;
+
+	/* always init_net first */
+	if (net_eq(net_a, &init_net))
+		return -1;
+
+	if (net_eq(net_b, &init_net))
+		return 1;
+
+	/* otherwise lock in ascending order */
+	return net_a < net_b ? -1 : 1;
+}
+
+int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b)
+{
+	const struct net *net_a, *net_b;
+
+	net_a = container_of(a, struct net, rtnl_mutex.dep_map);
+	net_b = container_of(b, struct net, rtnl_mutex.dep_map);
+
+	return rtnl_net_cmp_locks(net_a, net_b);
+}
+#endif
+
 static struct rtnl_link __rcu *__rcu *rtnl_msg_handlers[RTNL_FAMILY_MAX + 1];
 
 static inline int rtm_msgindex(int msgtype)