Message ID | 20241002151240.49813-3-kuniyu@amazon.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | rtnetlink: Per-netns RTNL. | expand |
On Wed, Oct 2, 2024 at 5:13 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > The goal is to break RTNL down into per-netns mutex. > > This patch adds per-netns mutex and its helper functions, rtnl_net_lock() > and rtnl_net_unlock(). > > rtnl_net_lock() acquires the global RTNL and per-netns RTNL mutex, and > rtnl_net_unlock() releases them. > > We will replace 800+ rtnl_lock() with rtnl_net_lock() and finally removes > rtnl_lock() in rtnl_net_lock(). > > When we need to nest per-netns RTNL mutex, we will use __rtnl_net_lock(), > and its locking order is defined by rtnl_net_lock_cmp_fn() as follows: > > 1. init_net is first > 2. netns address ascending order > > Note that the conversion will be done under CONFIG_DEBUG_NET_SMALL_RTNL > with LOCKDEP so that we can carefully add the extra mutex without slowing > down RTNL operations during conversion. > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com>
On Wed, 2 Oct 2024 08:12:38 -0700 Kuniyuki Iwashima wrote: > +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL > +void __rtnl_net_lock(struct net *net); > +void __rtnl_net_unlock(struct net *net); > +void rtnl_net_lock(struct net *net); > +void rtnl_net_unlock(struct net *net); > +int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b); > +#else > +#define __rtnl_net_lock(net) > +#define __rtnl_net_unlock(net) > +#define rtnl_net_lock(net) rtnl_lock() > +#define rtnl_net_unlock(net) rtnl_unlock() Let's make sure net is always evaluated? At the very least make sure the preprocessor doesn't eat it completely otherwise we may end up with config-dependent "unused variable" warnings down the line. > +#endif
From: Jakub Kicinski <kuba@kernel.org> Date: Fri, 4 Oct 2024 13:21:45 -0700 > On Wed, 2 Oct 2024 08:12:38 -0700 Kuniyuki Iwashima wrote: > > +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL > > +void __rtnl_net_lock(struct net *net); > > +void __rtnl_net_unlock(struct net *net); > > +void rtnl_net_lock(struct net *net); > > +void rtnl_net_unlock(struct net *net); > > +int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b); > > +#else > > +#define __rtnl_net_lock(net) > > +#define __rtnl_net_unlock(net) > > +#define rtnl_net_lock(net) rtnl_lock() > > +#define rtnl_net_unlock(net) rtnl_unlock() > > Let's make sure net is always evaluated? > At the very least make sure the preprocessor doesn't eat it completely > otherwise we may end up with config-dependent "unused variable" > warnings down the line. Sure, what comes to mind is void casting, which I guess is old-school style ? Do you have any other idea or is this acceptable ? #define __rtnl_net_lock(net) (void)(net) #define __rtnl_net_unlock(net) (void)(net) #define rtnl_net_lock(net) \ do { \ (void)(net); \ rtnl_lock(); \ } while (0) #define rtnl_net_unlock(net) \ do { \ (void)(net); \ rtnl_unlock(); \ } while (0)
From: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Fri, 4 Oct 2024 13:45:26 -0700 > From: Jakub Kicinski <kuba@kernel.org> > Date: Fri, 4 Oct 2024 13:21:45 -0700 > > On Wed, 2 Oct 2024 08:12:38 -0700 Kuniyuki Iwashima wrote: > > > +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL > > > +void __rtnl_net_lock(struct net *net); > > > +void __rtnl_net_unlock(struct net *net); > > > +void rtnl_net_lock(struct net *net); > > > +void rtnl_net_unlock(struct net *net); > > > +int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b); > > > +#else > > > +#define __rtnl_net_lock(net) > > > +#define __rtnl_net_unlock(net) > > > +#define rtnl_net_lock(net) rtnl_lock() > > > +#define rtnl_net_unlock(net) rtnl_unlock() > > > > Let's make sure net is always evaluated? > > At the very least make sure the preprocessor doesn't eat it completely > > otherwise we may end up with config-dependent "unused variable" > > warnings down the line. > > Sure, what comes to mind is void casting, which I guess is old-school > style ? Do you have any other idea or is this acceptable ? > > #define __rtnl_net_lock(net) (void)(net) > #define __rtnl_net_unlock(net) (void)(net) > #define rtnl_net_lock(net) \ > do { \ > (void)(net); \ > rtnl_lock(); \ > } while (0) > #define rtnl_net_unlock(net) \ > do { \ > (void)(net); \ > rtnl_unlock(); \ > } while (0) or simply define these as static inline functions and probably this is more preferable ? static inline void __rtnl_net_lock(struct net *net) {} static inline void __rtnl_net_unlock(struct net *net) {} static inline void rtnl_net_lock(struct net *net) { rtnl_lock(); } static inline void rtnl_net_unlock(struct net *net) { rtnl_unlock(); }
On Fri, Oct 4, 2024 at 10:51 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > From: Kuniyuki Iwashima <kuniyu@amazon.com> > Date: Fri, 4 Oct 2024 13:45:26 -0700 > > From: Jakub Kicinski <kuba@kernel.org> > > Date: Fri, 4 Oct 2024 13:21:45 -0700 > > > On Wed, 2 Oct 2024 08:12:38 -0700 Kuniyuki Iwashima wrote: > > > > +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL > > > > +void __rtnl_net_lock(struct net *net); > > > > +void __rtnl_net_unlock(struct net *net); > > > > +void rtnl_net_lock(struct net *net); > > > > +void rtnl_net_unlock(struct net *net); > > > > +int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b); > > > > +#else > > > > +#define __rtnl_net_lock(net) > > > > +#define __rtnl_net_unlock(net) > > > > +#define rtnl_net_lock(net) rtnl_lock() > > > > +#define rtnl_net_unlock(net) rtnl_unlock() > > > > > > Let's make sure net is always evaluated? > > > At the very least make sure the preprocessor doesn't eat it completely > > > otherwise we may end up with config-dependent "unused variable" > > > warnings down the line. > > > > Sure, what comes to mind is void casting, which I guess is old-school > > style ? Do you have any other idea or is this acceptable ? > > > > #define __rtnl_net_lock(net) (void)(net) > > #define __rtnl_net_unlock(net) (void)(net) > > #define rtnl_net_lock(net) \ > > do { \ > > (void)(net); \ > > rtnl_lock(); \ > > } while (0) > > #define rtnl_net_unlock(net) \ > > do { \ > > (void)(net); \ > > rtnl_unlock(); \ > > } while (0) > > or simply define these as static inline functions and > probably this is more preferable ? > > static inline void __rtnl_net_lock(struct net *net) {} > static inline void __rtnl_net_unlock(struct net *net) {} > static inline void rtnl_net_lock(struct net *net) > { > rtnl_lock(); > } > static inline void rtnl_net_unlock(struct net *net) > { > rtnl_unlock(); > } static inline functions seem better to me.
From: Eric Dumazet <edumazet@google.com> Date: Fri, 4 Oct 2024 22:59:32 +0200 > On Fri, Oct 4, 2024 at 10:51 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > From: Kuniyuki Iwashima <kuniyu@amazon.com> > > Date: Fri, 4 Oct 2024 13:45:26 -0700 > > > From: Jakub Kicinski <kuba@kernel.org> > > > Date: Fri, 4 Oct 2024 13:21:45 -0700 > > > > On Wed, 2 Oct 2024 08:12:38 -0700 Kuniyuki Iwashima wrote: > > > > > +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL > > > > > +void __rtnl_net_lock(struct net *net); > > > > > +void __rtnl_net_unlock(struct net *net); > > > > > +void rtnl_net_lock(struct net *net); > > > > > +void rtnl_net_unlock(struct net *net); > > > > > +int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b); > > > > > +#else > > > > > +#define __rtnl_net_lock(net) > > > > > +#define __rtnl_net_unlock(net) > > > > > +#define rtnl_net_lock(net) rtnl_lock() > > > > > +#define rtnl_net_unlock(net) rtnl_unlock() > > > > > > > > Let's make sure net is always evaluated? > > > > At the very least make sure the preprocessor doesn't eat it completely > > > > otherwise we may end up with config-dependent "unused variable" > > > > warnings down the line. > > > > > > Sure, what comes to mind is void casting, which I guess is old-school > > > style ? Do you have any other idea or is this acceptable ? > > > > > > #define __rtnl_net_lock(net) (void)(net) > > > #define __rtnl_net_unlock(net) (void)(net) > > > #define rtnl_net_lock(net) \ > > > do { \ > > > (void)(net); \ > > > rtnl_lock(); \ > > > } while (0) > > > #define rtnl_net_unlock(net) \ > > > do { \ > > > (void)(net); \ > > > rtnl_unlock(); \ > > > } while (0) > > > > or simply define these as static inline functions and > > probably this is more preferable ? > > > > static inline void __rtnl_net_lock(struct net *net) {} > > static inline void __rtnl_net_unlock(struct net *net) {} > > static inline void rtnl_net_lock(struct net *net) > > { > > rtnl_lock(); > > } > > static inline void rtnl_net_unlock(struct net *net) > > { > > rtnl_unlock(); > > } > > static inline functions seem better to me. Will use them. Thanks !
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index cdfc897f1e3c..f743c4f678bf 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -46,6 +46,19 @@ extern int rtnl_is_locked(void); extern int rtnl_lock_killable(void); extern bool refcount_dec_and_rtnl_lock(refcount_t *r); +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL +void __rtnl_net_lock(struct net *net); +void __rtnl_net_unlock(struct net *net); +void rtnl_net_lock(struct net *net); +void rtnl_net_unlock(struct net *net); +int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b); +#else +#define __rtnl_net_lock(net) +#define __rtnl_net_unlock(net) +#define rtnl_net_lock(net) rtnl_lock() +#define rtnl_net_unlock(net) rtnl_unlock() +#endif + extern wait_queue_head_t netdev_unregistering_wq; extern atomic_t dev_unreg_count; extern struct rw_semaphore pernet_ops_rwsem; diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index e67b483cc8bb..873c0f9fdac6 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -188,6 +188,10 @@ struct net { #if IS_ENABLED(CONFIG_SMC) struct netns_smc smc; #endif +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL + /* Move to a better place when the config guard is removed. */ + struct mutex rtnl_mutex; +#endif } __randomize_layout; #include <linux/seq_file_net.h> diff --git a/net/Kconfig.debug b/net/Kconfig.debug index 5e3fffe707dd..277fab8c4d77 100644 --- a/net/Kconfig.debug +++ b/net/Kconfig.debug @@ -24,3 +24,18 @@ config DEBUG_NET help Enable extra sanity checks in networking. This is mostly used by fuzzers, but is safe to select. + +config DEBUG_NET_SMALL_RTNL + bool "Add extra per-netns mutex inside RTNL" + depends on DEBUG_KERNEL && NET && LOCK_DEBUGGING_SUPPORT + select PROVE_LOCKING + default n + help + rtnl_lock() is being replaced with rtnl_net_lock() that + acquires the global RTNL and a small per-netns RTNL mutex. + + During the conversion, rtnl_net_lock() just adds an extra + mutex in every RTNL scope and slows down the operations. + + Once the conversion completes, rtnl_lock() will be removed + and rtnetlink will gain per-netns scalability. diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index e39479f1c9a4..105e3cd26763 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -334,6 +334,12 @@ static __net_init void preinit_net(struct net *net, struct user_namespace *user_ idr_init(&net->netns_ids); spin_lock_init(&net->nsid_lock); mutex_init(&net->ipv4.ra_mutex); + +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL + mutex_init(&net->rtnl_mutex); + lock_set_cmp_fn(&net->rtnl_mutex, rtnl_net_lock_cmp_fn, NULL); +#endif + preinit_net_sysctl(net); } diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index f0a520987085..edf530441b65 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -179,6 +179,64 @@ bool lockdep_rtnl_is_held(void) EXPORT_SYMBOL(lockdep_rtnl_is_held); #endif /* #ifdef CONFIG_PROVE_LOCKING */ +#ifdef CONFIG_DEBUG_NET_SMALL_RTNL +void __rtnl_net_lock(struct net *net) +{ + ASSERT_RTNL(); + + mutex_lock(&net->rtnl_mutex); +} +EXPORT_SYMBOL(__rtnl_net_lock); + +void __rtnl_net_unlock(struct net *net) +{ + ASSERT_RTNL(); + + mutex_unlock(&net->rtnl_mutex); +} +EXPORT_SYMBOL(__rtnl_net_unlock); + +void rtnl_net_lock(struct net *net) +{ + rtnl_lock(); + __rtnl_net_lock(net); +} +EXPORT_SYMBOL(rtnl_net_lock); + +void rtnl_net_unlock(struct net *net) +{ + __rtnl_net_unlock(net); + rtnl_unlock(); +} +EXPORT_SYMBOL(rtnl_net_unlock); + +static int rtnl_net_cmp_locks(const struct net *net_a, const struct net *net_b) +{ + if (net_eq(net_a, net_b)) + return 0; + + /* always init_net first */ + if (net_eq(net_a, &init_net)) + return -1; + + if (net_eq(net_b, &init_net)) + return 1; + + /* otherwise lock in ascending order */ + return net_a < net_b ? -1 : 1; +} + +int rtnl_net_lock_cmp_fn(const struct lockdep_map *a, const struct lockdep_map *b) +{ + const struct net *net_a, *net_b; + + net_a = container_of(a, struct net, rtnl_mutex.dep_map); + net_b = container_of(b, struct net, rtnl_mutex.dep_map); + + return rtnl_net_cmp_locks(net_a, net_b); +} +#endif + static struct rtnl_link __rcu *__rcu *rtnl_msg_handlers[RTNL_FAMILY_MAX + 1]; static inline int rtm_msgindex(int msgtype)
The goal is to break RTNL down into per-netns mutex. This patch adds per-netns mutex and its helper functions, rtnl_net_lock() and rtnl_net_unlock(). rtnl_net_lock() acquires the global RTNL and per-netns RTNL mutex, and rtnl_net_unlock() releases them. We will replace 800+ rtnl_lock() with rtnl_net_lock() and finally removes rtnl_lock() in rtnl_net_lock(). When we need to nest per-netns RTNL mutex, we will use __rtnl_net_lock(), and its locking order is defined by rtnl_net_lock_cmp_fn() as follows: 1. init_net is first 2. netns address ascending order Note that the conversion will be done under CONFIG_DEBUG_NET_SMALL_RTNL with LOCKDEP so that we can carefully add the extra mutex without slowing down RTNL operations during conversion. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> --- include/linux/rtnetlink.h | 13 +++++++++ include/net/net_namespace.h | 4 +++ net/Kconfig.debug | 15 ++++++++++ net/core/net_namespace.c | 6 ++++ net/core/rtnetlink.c | 58 +++++++++++++++++++++++++++++++++++++ 5 files changed, 96 insertions(+)