Message ID | 20250210133002.883422-7-shaw.leon@gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Johannes Berg |
Headers | show |
Series | net: Improve netns handling in rtnetlink | expand |
From: Xiao Liang <shaw.leon@gmail.com> Date: Mon, 10 Feb 2025 21:29:57 +0800 > When link_net is set, use it as link netns instead of dev_net(). This > prepares for rtnetlink core to create device in target netns directly, > in which case the two namespaces may be different. > > Set correct netns in priv before registering device, and avoid > overwriting it in ndo_init() path. > > Signed-off-by: Xiao Liang <shaw.leon@gmail.com> > --- > net/ipv6/ip6_gre.c | 20 ++++++++++---------- > net/ipv6/ip6_tunnel.c | 13 ++++++++----- > net/ipv6/ip6_vti.c | 10 ++++++---- > net/ipv6/sit.c | 11 +++++++---- > 4 files changed, 31 insertions(+), 23 deletions(-) > > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c > index 863852abe8ea..108600dc716f 100644 > --- a/net/ipv6/ip6_gre.c > +++ b/net/ipv6/ip6_gre.c > @@ -1498,7 +1498,8 @@ static int ip6gre_tunnel_init_common(struct net_device *dev) > tunnel = netdev_priv(dev); > > tunnel->dev = dev; > - tunnel->net = dev_net(dev); > + if (!tunnel->net) > + tunnel->net = dev_net(dev); Same question as patch 5 for here and other parts. Do we need this check and assignment ? ip6gre_newlink_common -> nt->net = dev_net(dev) -> register_netdevice -> ndo_init / ip6gre_tunnel_init() -> ip6gre_tunnel_init_common -> tunnel->net = dev_net(dev)
On Thu, Feb 13, 2025 at 3:05 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > [...] > > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c > > index 863852abe8ea..108600dc716f 100644 > > --- a/net/ipv6/ip6_gre.c > > +++ b/net/ipv6/ip6_gre.c > > @@ -1498,7 +1498,8 @@ static int ip6gre_tunnel_init_common(struct net_device *dev) > > tunnel = netdev_priv(dev); > > > > tunnel->dev = dev; > > - tunnel->net = dev_net(dev); > > + if (!tunnel->net) > > + tunnel->net = dev_net(dev); > > Same question as patch 5 for here and other parts. > Do we need this check and assignment ? > > ip6gre_newlink_common > -> nt->net = dev_net(dev) > -> register_netdevice > -> ndo_init / ip6gre_tunnel_init() > -> ip6gre_tunnel_init_common > -> tunnel->net = dev_net(dev) Will remove this line.
On Thu, Feb 13, 2025 at 4:37 PM Xiao Liang <shaw.leon@gmail.com> wrote: > > On Thu, Feb 13, 2025 at 3:05 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > [...] > > > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c > > > index 863852abe8ea..108600dc716f 100644 > > > --- a/net/ipv6/ip6_gre.c > > > +++ b/net/ipv6/ip6_gre.c > > > @@ -1498,7 +1498,8 @@ static int ip6gre_tunnel_init_common(struct net_device *dev) > > > tunnel = netdev_priv(dev); > > > > > > tunnel->dev = dev; > > > - tunnel->net = dev_net(dev); > > > + if (!tunnel->net) > > > + tunnel->net = dev_net(dev); > > > > Same question as patch 5 for here and other parts. > > Do we need this check and assignment ? > > > > ip6gre_newlink_common > > -> nt->net = dev_net(dev) > > -> register_netdevice > > -> ndo_init / ip6gre_tunnel_init() > > -> ip6gre_tunnel_init_common > > -> tunnel->net = dev_net(dev) > > Will remove this line. However, fb tunnel of ip6_tunnel, ip6_vti and sit can have tunnel->net == NULL here. Take ip6_tunnel for example: ip6_tnl_init_net() -> ip6_fb_tnl_dev_init() -> register_netdev() -> register_netdevice() -> ip6_tnl_dev_init() This code path (including ip6_fb_tnl_dev_init()) doesn't set tunnel->net. But for ip6_gre, ip6gre_fb_tunnel_init() does.
From: Xiao Liang <shaw.leon@gmail.com> Date: Thu, 13 Feb 2025 17:55:32 +0800 > On Thu, Feb 13, 2025 at 4:37 PM Xiao Liang <shaw.leon@gmail.com> wrote: > > > > On Thu, Feb 13, 2025 at 3:05 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > [...] > > > > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c > > > > index 863852abe8ea..108600dc716f 100644 > > > > --- a/net/ipv6/ip6_gre.c > > > > +++ b/net/ipv6/ip6_gre.c > > > > @@ -1498,7 +1498,8 @@ static int ip6gre_tunnel_init_common(struct net_device *dev) > > > > tunnel = netdev_priv(dev); > > > > > > > > tunnel->dev = dev; > > > > - tunnel->net = dev_net(dev); > > > > + if (!tunnel->net) > > > > + tunnel->net = dev_net(dev); > > > > > > Same question as patch 5 for here and other parts. > > > Do we need this check and assignment ? > > > > > > ip6gre_newlink_common > > > -> nt->net = dev_net(dev) > > > -> register_netdevice > > > -> ndo_init / ip6gre_tunnel_init() > > > -> ip6gre_tunnel_init_common > > > -> tunnel->net = dev_net(dev) > > > > Will remove this line. > > However, fb tunnel of ip6_tunnel, ip6_vti and sit can have > tunnel->net == NULL here. Take ip6_tunnel for example: > > ip6_tnl_init_net() > -> ip6_fb_tnl_dev_init() > -> register_netdev() > -> register_netdevice() > -> ip6_tnl_dev_init() > > This code path (including ip6_fb_tnl_dev_init()) doesn't set > tunnel->net. But for ip6_gre, ip6gre_fb_tunnel_init() does. Ah, okay. Then, let's set net in a single place, which would be better than spreading net assignment and adding null check in ->ndo_init(), and maybe apply the same to IPv4 tunnels ?
On Thu, Feb 13, 2025 at 7:00 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > From: Xiao Liang <shaw.leon@gmail.com> > Date: Thu, 13 Feb 2025 17:55:32 +0800 > > On Thu, Feb 13, 2025 at 4:37 PM Xiao Liang <shaw.leon@gmail.com> wrote: > > > > > > On Thu, Feb 13, 2025 at 3:05 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > > [...] > > > > > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c > > > > > index 863852abe8ea..108600dc716f 100644 > > > > > --- a/net/ipv6/ip6_gre.c > > > > > +++ b/net/ipv6/ip6_gre.c > > > > > @@ -1498,7 +1498,8 @@ static int ip6gre_tunnel_init_common(struct net_device *dev) > > > > > tunnel = netdev_priv(dev); > > > > > > > > > > tunnel->dev = dev; > > > > > - tunnel->net = dev_net(dev); > > > > > + if (!tunnel->net) > > > > > + tunnel->net = dev_net(dev); > > > > > > > > Same question as patch 5 for here and other parts. > > > > Do we need this check and assignment ? > > > > > > > > ip6gre_newlink_common > > > > -> nt->net = dev_net(dev) > > > > -> register_netdevice > > > > -> ndo_init / ip6gre_tunnel_init() > > > > -> ip6gre_tunnel_init_common > > > > -> tunnel->net = dev_net(dev) > > > > > > Will remove this line. > > > > However, fb tunnel of ip6_tunnel, ip6_vti and sit can have > > tunnel->net == NULL here. Take ip6_tunnel for example: > > > > ip6_tnl_init_net() > > -> ip6_fb_tnl_dev_init() > > -> register_netdev() > > -> register_netdevice() > > -> ip6_tnl_dev_init() > > > > This code path (including ip6_fb_tnl_dev_init()) doesn't set > > tunnel->net. But for ip6_gre, ip6gre_fb_tunnel_init() does. > > Ah, okay. Then, let's set net in a single place, which would > be better than spreading net assignment and adding null check > in ->ndo_init(), and maybe apply the same to IPv4 tunnels ? Tunnels are created in three ways: a) rtnetlink newlink, b) ioctl SIOCADDTUNNEL and c) during per netns init (fb). The code paths don't have much in common, and refactoring to set net in a single place is somewhat beyond the scope of this series. But for now I think we could put a general rule: net should be set prior to register_netdevice(). For IPv4 tunnels, tunnel->net of a) is set in ip_tunnel_newlink(). b) and c) are set in __ip_tunnel_create(): ip_tunnel_init_net() -> __ip_tunnel_create() ip_tunnel_ctl() -> ip_tunnel_create() -> __ip_tunnel_create() So net has already been initialized when register_netdevice() is called. But it varies for IPv6 tunnels. Some set net for a) or c) while some don't. This patch has "fixed" for a). As for c) we can adopt the way of ip6_gre - setting net in *_fb_tunnel_init(), then remove the check in ndo_init(). Is it reasonable? Thanks.
From: Xiao Liang <shaw.leon@gmail.com> Date: Fri, 14 Feb 2025 17:22:28 +0800 > On Thu, Feb 13, 2025 at 7:00 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > From: Xiao Liang <shaw.leon@gmail.com> > > Date: Thu, 13 Feb 2025 17:55:32 +0800 > > > On Thu, Feb 13, 2025 at 4:37 PM Xiao Liang <shaw.leon@gmail.com> wrote: > > > > > > > > On Thu, Feb 13, 2025 at 3:05 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > > > > > > [...] > > > > > > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c > > > > > > index 863852abe8ea..108600dc716f 100644 > > > > > > --- a/net/ipv6/ip6_gre.c > > > > > > +++ b/net/ipv6/ip6_gre.c > > > > > > @@ -1498,7 +1498,8 @@ static int ip6gre_tunnel_init_common(struct net_device *dev) > > > > > > tunnel = netdev_priv(dev); > > > > > > > > > > > > tunnel->dev = dev; > > > > > > - tunnel->net = dev_net(dev); > > > > > > + if (!tunnel->net) > > > > > > + tunnel->net = dev_net(dev); > > > > > > > > > > Same question as patch 5 for here and other parts. > > > > > Do we need this check and assignment ? > > > > > > > > > > ip6gre_newlink_common > > > > > -> nt->net = dev_net(dev) > > > > > -> register_netdevice > > > > > -> ndo_init / ip6gre_tunnel_init() > > > > > -> ip6gre_tunnel_init_common > > > > > -> tunnel->net = dev_net(dev) > > > > > > > > Will remove this line. > > > > > > However, fb tunnel of ip6_tunnel, ip6_vti and sit can have > > > tunnel->net == NULL here. Take ip6_tunnel for example: > > > > > > ip6_tnl_init_net() > > > -> ip6_fb_tnl_dev_init() > > > -> register_netdev() > > > -> register_netdevice() > > > -> ip6_tnl_dev_init() > > > > > > This code path (including ip6_fb_tnl_dev_init()) doesn't set > > > tunnel->net. But for ip6_gre, ip6gre_fb_tunnel_init() does. > > > > Ah, okay. Then, let's set net in a single place, which would > > be better than spreading net assignment and adding null check > > in ->ndo_init(), and maybe apply the same to IPv4 tunnels ? > > Tunnels are created in three ways: a) rtnetlink newlink, > b) ioctl SIOCADDTUNNEL and c) during per netns init (fb). > The code paths don't have much in common, and refactoring > to set net in a single place is somewhat beyond the scope > of this series. But for now I think we could put a general rule: > net should be set prior to register_netdevice(). > > For IPv4 tunnels, tunnel->net of a) is set in ip_tunnel_newlink(). > b) and c) are set in __ip_tunnel_create(): > ip_tunnel_init_net() -> __ip_tunnel_create() > ip_tunnel_ctl() -> ip_tunnel_create() -> __ip_tunnel_create() > So net has already been initialized when register_netdevice() > is called. > > But it varies for IPv6 tunnels. Some set net for a) or c) while > some don't. This patch has "fixed" for a). As for c) we can > adopt the way of ip6_gre - setting net in *_fb_tunnel_init(), > then remove the check in ndo_init(). > > Is it reasonable? Yes, fair enough.
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 863852abe8ea..108600dc716f 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -1498,7 +1498,8 @@ static int ip6gre_tunnel_init_common(struct net_device *dev) tunnel = netdev_priv(dev); tunnel->dev = dev; - tunnel->net = dev_net(dev); + if (!tunnel->net) + tunnel->net = dev_net(dev); strcpy(tunnel->parms.name, dev->name); ret = dst_cache_init(&tunnel->dst_cache, GFP_KERNEL); @@ -1882,7 +1883,8 @@ static int ip6erspan_tap_init(struct net_device *dev) tunnel = netdev_priv(dev); tunnel->dev = dev; - tunnel->net = dev_net(dev); + if (!tunnel->net) + tunnel->net = dev_net(dev); strcpy(tunnel->parms.name, dev->name); ret = dst_cache_init(&tunnel->dst_cache, GFP_KERNEL); @@ -1971,7 +1973,7 @@ static bool ip6gre_netlink_encap_parms(struct nlattr *data[], return ret; } -static int ip6gre_newlink_common(struct net *src_net, struct net_device *dev, +static int ip6gre_newlink_common(struct net *link_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { @@ -1992,7 +1994,7 @@ static int ip6gre_newlink_common(struct net *src_net, struct net_device *dev, eth_hw_addr_random(dev); nt->dev = dev; - nt->net = dev_net(dev); + nt->net = link_net; err = register_netdevice(dev); if (err) @@ -2009,11 +2011,10 @@ static int ip6gre_newlink(struct net_device *dev, struct rtnl_newlink_params *params, struct netlink_ext_ack *extack) { + struct net *net = params->link_net ? : dev_net(dev); struct ip6_tnl *nt = netdev_priv(dev); struct nlattr **data = params->data; - struct net *src_net = params->net; struct nlattr **tb = params->tb; - struct net *net = dev_net(dev); struct ip6gre_net *ign; int err; @@ -2028,7 +2029,7 @@ static int ip6gre_newlink(struct net_device *dev, return -EEXIST; } - err = ip6gre_newlink_common(src_net, dev, tb, data, extack); + err = ip6gre_newlink_common(net, dev, tb, data, extack); if (!err) { ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]); ip6gre_tunnel_link_md(ign, nt); @@ -2248,11 +2249,10 @@ static int ip6erspan_newlink(struct net_device *dev, struct rtnl_newlink_params *params, struct netlink_ext_ack *extack) { + struct net *net = params->link_net ? : dev_net(dev); struct ip6_tnl *nt = netdev_priv(dev); struct nlattr **data = params->data; - struct net *src_net = params->net; struct nlattr **tb = params->tb; - struct net *net = dev_net(dev); struct ip6gre_net *ign; int err; @@ -2268,7 +2268,7 @@ static int ip6erspan_newlink(struct net_device *dev, return -EEXIST; } - err = ip6gre_newlink_common(src_net, dev, tb, data, extack); + err = ip6gre_newlink_common(net, dev, tb, data, extack); if (!err) { ip6erspan_tnl_link_config(nt, !tb[IFLA_MTU]); ip6erspan_tunnel_link_md(ign, nt); diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 54b843d20870..2438dc627e02 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -253,8 +253,7 @@ static void ip6_dev_free(struct net_device *dev) static int ip6_tnl_create2(struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); - struct net *net = dev_net(dev); - struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id); + struct ip6_tnl_net *ip6n = net_generic(t->net, ip6_tnl_net_id); int err; dev->rtnl_link_ops = &ip6_link_ops; @@ -1878,7 +1877,8 @@ ip6_tnl_dev_init_gen(struct net_device *dev) int t_hlen; t->dev = dev; - t->net = dev_net(dev); + if (!t->net) + t->net = dev_net(dev); ret = dst_cache_init(&t->dst_cache, GFP_KERNEL); if (ret) @@ -2008,13 +2008,16 @@ static int ip6_tnl_newlink(struct net_device *dev, { struct nlattr **data = params->data; struct nlattr **tb = params->tb; - struct net *net = dev_net(dev); - struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id); struct ip_tunnel_encap ipencap; + struct ip6_tnl_net *ip6n; struct ip6_tnl *nt, *t; + struct net *net; int err; + net = params->link_net ? : dev_net(dev); + ip6n = net_generic(net, ip6_tnl_net_id); nt = netdev_priv(dev); + nt->net = net; if (ip_tunnel_netlink_encap_parms(data, &ipencap)) { err = ip6_tnl_encap_setup(nt, &ipencap); diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c index 993f85aeb882..4aa1e7821951 100644 --- a/net/ipv6/ip6_vti.c +++ b/net/ipv6/ip6_vti.c @@ -177,8 +177,7 @@ vti6_tnl_unlink(struct vti6_net *ip6n, struct ip6_tnl *t) static int vti6_tnl_create2(struct net_device *dev) { struct ip6_tnl *t = netdev_priv(dev); - struct net *net = dev_net(dev); - struct vti6_net *ip6n = net_generic(net, vti6_net_id); + struct vti6_net *ip6n = net_generic(t->net, vti6_net_id); int err; dev->rtnl_link_ops = &vti6_link_ops; @@ -925,7 +924,8 @@ static inline int vti6_dev_init_gen(struct net_device *dev) struct ip6_tnl *t = netdev_priv(dev); t->dev = dev; - t->net = dev_net(dev); + if (!t->net) + t->net = dev_net(dev); netdev_hold(dev, &t->dev_tracker, GFP_KERNEL); netdev_lockdep_set_classes(dev); return 0; @@ -1002,13 +1002,15 @@ static int vti6_newlink(struct net_device *dev, struct netlink_ext_ack *extack) { struct nlattr **data = params->data; - struct net *net = dev_net(dev); struct ip6_tnl *nt; + struct net *net; + net = params->link_net ? : dev_net(dev); nt = netdev_priv(dev); vti6_netlink_parms(data, &nt->parms); nt->parms.proto = IPPROTO_IPV6; + nt->net = net; if (vti6_locate(net, &nt->parms, 0)) return -EEXIST; diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index e2bd52cabdee..e870271ed04a 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -201,8 +201,7 @@ static void ipip6_tunnel_clone_6rd(struct net_device *dev, struct sit_net *sitn) static int ipip6_tunnel_create(struct net_device *dev) { struct ip_tunnel *t = netdev_priv(dev); - struct net *net = dev_net(dev); - struct sit_net *sitn = net_generic(net, sit_net_id); + struct sit_net *sitn = net_generic(t->net, sit_net_id); int err; __dev_addr_set(dev, &t->parms.iph.saddr, 4); @@ -270,6 +269,7 @@ static struct ip_tunnel *ipip6_tunnel_locate(struct net *net, nt = netdev_priv(dev); nt->parms = *parms; + nt->net = net; if (ipip6_tunnel_create(dev) < 0) goto failed_free; @@ -1449,7 +1449,8 @@ static int ipip6_tunnel_init(struct net_device *dev) int err; tunnel->dev = dev; - tunnel->net = dev_net(dev); + if (!tunnel->net) + tunnel->net = dev_net(dev); strcpy(tunnel->parms.name, dev->name); ipip6_tunnel_bind_dev(dev); @@ -1556,15 +1557,17 @@ static int ipip6_newlink(struct net_device *dev, { struct nlattr **data = params->data; struct nlattr **tb = params->tb; - struct net *net = dev_net(dev); struct ip_tunnel *nt; struct ip_tunnel_encap ipencap; #ifdef CONFIG_IPV6_SIT_6RD struct ip_tunnel_6rd ip6rd; #endif + struct net *net; int err; + net = params->link_net ? : dev_net(dev); nt = netdev_priv(dev); + nt->net = net; if (ip_tunnel_netlink_encap_parms(data, &ipencap)) { err = ip_tunnel_encap_setup(nt, &ipencap);
When link_net is set, use it as link netns instead of dev_net(). This prepares for rtnetlink core to create device in target netns directly, in which case the two namespaces may be different. Set correct netns in priv before registering device, and avoid overwriting it in ndo_init() path. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> --- net/ipv6/ip6_gre.c | 20 ++++++++++---------- net/ipv6/ip6_tunnel.c | 13 ++++++++----- net/ipv6/ip6_vti.c | 10 ++++++---- net/ipv6/sit.c | 11 +++++++---- 4 files changed, 31 insertions(+), 23 deletions(-)