diff mbox series

[v2,net-next,10/14] ipv6: Factorise ip6_route_multipath_add().

Message ID 20250409011243.26195-11-kuniyu@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series ipv6: No RTNL for IPv6 routing table. | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 234 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-04-11--03-00 (tests: 900)

Commit Message

Kuniyuki Iwashima April 9, 2025, 1:12 a.m. UTC
We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT and rely
on RCU to guarantee dev and nexthop lifetime.

Then, the RCU section will start before ip6_route_info_create_nh()
in ip6_route_multipath_add(), but ip6_route_info_create() is called
in the same loop and will sleep.

Let's split the loop into ip6_route_mpath_info_create() and
ip6_route_mpath_info_create_nh().

Note that ip6_route_info_append() is now integrated into
ip6_route_mpath_info_create_nh() because we need to call different
free functions for nexthops that passed ip6_route_info_create_nh().

In case of failure, the remaining nexthops that ip6_route_info_create_nh()
has not been called for will be freed by ip6_route_mpath_info_cleanup().

OTOH, if a nexthop passes ip6_route_info_create_nh(), it will be linked
to a local temporary list, which will be spliced back to rt6_nh_list.
In case of failure, these nexthops will be released by fib6_info_release()
in ip6_route_multipath_add().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/ipv6/route.c | 205 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 130 insertions(+), 75 deletions(-)

Comments

Simon Horman April 11, 2025, 10:34 a.m. UTC | #1
On Tue, Apr 08, 2025 at 06:12:18PM -0700, Kuniyuki Iwashima wrote:
> We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT and rely
> on RCU to guarantee dev and nexthop lifetime.
> 
> Then, the RCU section will start before ip6_route_info_create_nh()
> in ip6_route_multipath_add(), but ip6_route_info_create() is called
> in the same loop and will sleep.
> 
> Let's split the loop into ip6_route_mpath_info_create() and
> ip6_route_mpath_info_create_nh().
> 
> Note that ip6_route_info_append() is now integrated into
> ip6_route_mpath_info_create_nh() because we need to call different
> free functions for nexthops that passed ip6_route_info_create_nh().
> 
> In case of failure, the remaining nexthops that ip6_route_info_create_nh()
> has not been called for will be freed by ip6_route_mpath_info_cleanup().
> 
> OTOH, if a nexthop passes ip6_route_info_create_nh(), it will be linked
> to a local temporary list, which will be spliced back to rt6_nh_list.
> In case of failure, these nexthops will be released by fib6_info_release()
> in ip6_route_multipath_add().
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>  net/ipv6/route.c | 205 ++++++++++++++++++++++++++++++-----------------
>  1 file changed, 130 insertions(+), 75 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c

...

> +static int ip6_route_mpath_info_create_nh(struct list_head *rt6_nh_list,
> +					  struct netlink_ext_ack *extack)
> +{
> +	struct rt6_nh *nh, *nh_next, *nh_tmp;
> +	LIST_HEAD(tmp);
> +	int err;
> +
> +	list_for_each_entry_safe(nh, nh_next, rt6_nh_list, next) {
> +		struct fib6_info *rt = nh->fib6_info;
> +
> +		err = ip6_route_info_create_nh(rt, &nh->r_cfg, extack);
> +		if (err) {
> +			nh->fib6_info = NULL;
> +			goto err;
> +		}
> +
> +		rt->fib6_nh->fib_nh_weight = nh->weight;
> +
> +		list_move_tail(&nh->next, &tmp);
> +
> +		list_for_each_entry(nh_tmp, rt6_nh_list, next) {
> +			/* check if fib6_info already exists */
> +			if (rt6_duplicate_nexthop(nh_tmp->fib6_info, rt)) {
> +				err = -EEXIST;
> +				goto err;
> +			}
> +		}
> +	}
> +out:
> +	list_splice(&tmp, rt6_nh_list);
> +	return err;

Hi Kuniyuki-san,

Perhaps it can't happen in practice, but if the loop above iterates zero
times then err will be used uninitialised. As it's expected that err is 0
here, perhaps it would be simplest to just:

	return 0;

> +err:
> +	ip6_route_mpath_info_cleanup(rt6_nh_list);
> +	goto out;
>  }

...
Simon Horman April 14, 2025, 2:52 p.m. UTC | #2
On Fri, Apr 11, 2025 at 12:33:46PM -0700, Kuniyuki Iwashima wrote:
> From: Simon Horman <horms@kernel.org>
> Date: Fri, 11 Apr 2025 11:34:04 +0100
> > > +static int ip6_route_mpath_info_create_nh(struct list_head *rt6_nh_list,
> > > +					  struct netlink_ext_ack *extack)
> > > +{
> > > +	struct rt6_nh *nh, *nh_next, *nh_tmp;
> > > +	LIST_HEAD(tmp);
> > > +	int err;
> > > +
> > > +	list_for_each_entry_safe(nh, nh_next, rt6_nh_list, next) {
> > > +		struct fib6_info *rt = nh->fib6_info;
> > > +
> > > +		err = ip6_route_info_create_nh(rt, &nh->r_cfg, extack);
> > > +		if (err) {
> > > +			nh->fib6_info = NULL;
> > > +			goto err;
> > > +		}
> > > +
> > > +		rt->fib6_nh->fib_nh_weight = nh->weight;
> > > +
> > > +		list_move_tail(&nh->next, &tmp);
> > > +
> > > +		list_for_each_entry(nh_tmp, rt6_nh_list, next) {
> > > +			/* check if fib6_info already exists */
> > > +			if (rt6_duplicate_nexthop(nh_tmp->fib6_info, rt)) {
> > > +				err = -EEXIST;
> > > +				goto err;
> > > +			}
> > > +		}
> > > +	}
> > > +out:
> > > +	list_splice(&tmp, rt6_nh_list);
> > > +	return err;
> > 
> > Hi Kuniyuki-san,
> > 
> > Perhaps it can't happen in practice,
> 
> Yes, it never happens by patch 1 as rtm_to_fib6_multipath_config()
> returns an error in such a case.
> 
> 
> > but if the loop above iterates zero
> > times then err will be used uninitialised. As it's expected that err is 0
> > here, perhaps it would be simplest to just:
> > 
> > 	return 0;
> 
> If we want to return 0 above, we need to duplicate list_splice() at
> err: and return err; there.  Or initialise err = 0, but this looks
> worse to me.

Thanks. I should have dug a bit deeper to determine that this
is a false-positive.

> Btw, was this caught by Smatch, Coverity, or something ?  I don't
> see such a report at CI.
> https://patchwork.kernel.org/project/netdevbpf/patch/20250409011243.26195-11-kuniyu@amazon.com/

Sorry for not mentioning that it was flagged by Smatch,
I certainly should have done so.


> 
> If so, I'm just curious if we have an official guideline for
> false-positives flagged by such tools, like we should care about it
> while writing a code and should try to be safer to make it happy.
> 
> We are also running Coverity for the mainline kernel and have tons
> of false-positive reports due to lack of contexts.

I think that the current non-guideline is that we don't change
code just to keep the tools happy. Perhaps we should add something
about that to the process document?
Kuniyuki Iwashima April 14, 2025, 6:06 p.m. UTC | #3
From: Simon Horman <horms@kernel.org>
Date: Mon, 14 Apr 2025 15:52:26 +0100
> On Fri, Apr 11, 2025 at 12:33:46PM -0700, Kuniyuki Iwashima wrote:
> > From: Simon Horman <horms@kernel.org>
> > Date: Fri, 11 Apr 2025 11:34:04 +0100
> > > > +static int ip6_route_mpath_info_create_nh(struct list_head *rt6_nh_list,
> > > > +					  struct netlink_ext_ack *extack)
> > > > +{
> > > > +	struct rt6_nh *nh, *nh_next, *nh_tmp;
> > > > +	LIST_HEAD(tmp);
> > > > +	int err;
> > > > +
> > > > +	list_for_each_entry_safe(nh, nh_next, rt6_nh_list, next) {
> > > > +		struct fib6_info *rt = nh->fib6_info;
> > > > +
> > > > +		err = ip6_route_info_create_nh(rt, &nh->r_cfg, extack);
> > > > +		if (err) {
> > > > +			nh->fib6_info = NULL;
> > > > +			goto err;
> > > > +		}
> > > > +
> > > > +		rt->fib6_nh->fib_nh_weight = nh->weight;
> > > > +
> > > > +		list_move_tail(&nh->next, &tmp);
> > > > +
> > > > +		list_for_each_entry(nh_tmp, rt6_nh_list, next) {
> > > > +			/* check if fib6_info already exists */
> > > > +			if (rt6_duplicate_nexthop(nh_tmp->fib6_info, rt)) {
> > > > +				err = -EEXIST;
> > > > +				goto err;
> > > > +			}
> > > > +		}
> > > > +	}
> > > > +out:
> > > > +	list_splice(&tmp, rt6_nh_list);
> > > > +	return err;
> > > 
> > > Hi Kuniyuki-san,
> > > 
> > > Perhaps it can't happen in practice,
> > 
> > Yes, it never happens by patch 1 as rtm_to_fib6_multipath_config()
> > returns an error in such a case.
> > 
> > 
> > > but if the loop above iterates zero
> > > times then err will be used uninitialised. As it's expected that err is 0
> > > here, perhaps it would be simplest to just:
> > > 
> > > 	return 0;
> > 
> > If we want to return 0 above, we need to duplicate list_splice() at
> > err: and return err; there.  Or initialise err = 0, but this looks
> > worse to me.
> 
> Thanks. I should have dug a bit deeper to determine that this
> is a false-positive.
> 
> > Btw, was this caught by Smatch, Coverity, or something ?  I don't
> > see such a report at CI.
> > https://patchwork.kernel.org/project/netdevbpf/patch/20250409011243.26195-11-kuniyu@amazon.com/
> 
> Sorry for not mentioning that it was flagged by Smatch,
> I certainly should have done so.

Thanks for confirming!

> 
> 
> > 
> > If so, I'm just curious if we have an official guideline for
> > false-positives flagged by such tools, like we should care about it
> > while writing a code and should try to be safer to make it happy.
> > 
> > We are also running Coverity for the mainline kernel and have tons
> > of false-positive reports due to lack of contexts.
> 
> I think that the current non-guideline is that we don't change
> code just to keep the tools happy. Perhaps we should add something
> about that to the process document?

Makes sense.

But looks like the series was marked Changes Requested, not sure
if it's accidental or intentional, so I'll resend v2 to see others'
opinion.

Thanks!
Simon Horman April 15, 2025, 6:38 p.m. UTC | #4
On Mon, Apr 14, 2025 at 11:06:58AM -0700, Kuniyuki Iwashima wrote:
> From: Simon Horman <horms@kernel.org>
> Date: Mon, 14 Apr 2025 15:52:26 +0100
> > On Fri, Apr 11, 2025 at 12:33:46PM -0700, Kuniyuki Iwashima wrote:
> > > From: Simon Horman <horms@kernel.org>
> > > Date: Fri, 11 Apr 2025 11:34:04 +0100

...

> > > > Hi Kuniyuki-san,
> > > > 
> > > > Perhaps it can't happen in practice,
> > > 
> > > Yes, it never happens by patch 1 as rtm_to_fib6_multipath_config()
> > > returns an error in such a case.
> > > 
> > > 
> > > > but if the loop above iterates zero
> > > > times then err will be used uninitialised. As it's expected that err is 0
> > > > here, perhaps it would be simplest to just:
> > > > 
> > > > 	return 0;
> > > 
> > > If we want to return 0 above, we need to duplicate list_splice() at
> > > err: and return err; there.  Or initialise err = 0, but this looks
> > > worse to me.
> > 
> > Thanks. I should have dug a bit deeper to determine that this
> > is a false-positive.
> > 
> > > Btw, was this caught by Smatch, Coverity, or something ?  I don't
> > > see such a report at CI.
> > > https://patchwork.kernel.org/project/netdevbpf/patch/20250409011243.26195-11-kuniyu@amazon.com/
> > 
> > Sorry for not mentioning that it was flagged by Smatch,
> > I certainly should have done so.
> 
> Thanks for confirming!
> 
> > 
> > 
> > > 
> > > If so, I'm just curious if we have an official guideline for
> > > false-positives flagged by such tools, like we should care about it
> > > while writing a code and should try to be safer to make it happy.
> > > 
> > > We are also running Coverity for the mainline kernel and have tons
> > > of false-positive reports due to lack of contexts.
> > 
> > I think that the current non-guideline is that we don't change
> > code just to keep the tools happy. Perhaps we should add something
> > about that to the process document?
> 
> Makes sense.
> 
> But looks like the series was marked Changes Requested, not sure
> if it's accidental or intentional, so I'll resend v2 to see others'
> opinion.

I'm not sure either.
But I agree that a v2 is a good way forward.
diff mbox series

Patch

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index adefabce985f..26a0be592632 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -5315,29 +5315,131 @@  struct rt6_nh {
 	struct fib6_info *fib6_info;
 	struct fib6_config r_cfg;
 	struct list_head next;
+	int weight;
 };
 
-static int ip6_route_info_append(struct list_head *rt6_nh_list,
-				 struct fib6_info *rt,
-				 struct fib6_config *r_cfg)
+static void ip6_route_mpath_info_cleanup(struct list_head *rt6_nh_list)
 {
-	struct rt6_nh *nh;
-	int err = -EEXIST;
+	struct rt6_nh *nh, *nh_next;
 
-	list_for_each_entry(nh, rt6_nh_list, next) {
-		/* check if fib6_info already exists */
-		if (rt6_duplicate_nexthop(nh->fib6_info, rt))
-			return err;
+	list_for_each_entry_safe(nh, nh_next, rt6_nh_list, next) {
+		struct fib6_info *rt = nh->fib6_info;
+
+		if (rt) {
+			free_percpu(rt->fib6_nh->nh_common.nhc_pcpu_rth_output);
+			free_percpu(rt->fib6_nh->rt6i_pcpu);
+			ip_fib_metrics_put(rt->fib6_metrics);
+			kfree(rt);
+		}
+
+		list_del(&nh->next);
+		kfree(nh);
 	}
+}
 
-	nh = kzalloc(sizeof(*nh), GFP_KERNEL);
-	if (!nh)
-		return -ENOMEM;
-	nh->fib6_info = rt;
-	memcpy(&nh->r_cfg, r_cfg, sizeof(*r_cfg));
-	list_add_tail(&nh->next, rt6_nh_list);
+static int ip6_route_mpath_info_create(struct list_head *rt6_nh_list,
+				       struct fib6_config *cfg,
+				       struct netlink_ext_ack *extack)
+{
+	struct rtnexthop *rtnh;
+	int remaining;
+	int err;
+
+	remaining = cfg->fc_mp_len;
+	rtnh = (struct rtnexthop *)cfg->fc_mp;
+
+	/* Parse a Multipath Entry and build a list (rt6_nh_list) of
+	 * fib6_info structs per nexthop
+	 */
+	while (rtnh_ok(rtnh, remaining)) {
+		struct fib6_config r_cfg;
+		struct fib6_info *rt;
+		struct rt6_nh *nh;
+		int attrlen;
+
+		nh = kzalloc(sizeof(*nh), GFP_KERNEL);
+		if (!nh) {
+			err = -ENOMEM;
+			goto err;
+		}
+
+		list_add_tail(&nh->next, rt6_nh_list);
+
+		memcpy(&r_cfg, cfg, sizeof(*cfg));
+		if (rtnh->rtnh_ifindex)
+			r_cfg.fc_ifindex = rtnh->rtnh_ifindex;
+
+		attrlen = rtnh_attrlen(rtnh);
+		if (attrlen > 0) {
+			struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
+
+			nla = nla_find(attrs, attrlen, RTA_GATEWAY);
+			if (nla) {
+				r_cfg.fc_gateway = nla_get_in6_addr(nla);
+				r_cfg.fc_flags |= RTF_GATEWAY;
+			}
+
+			r_cfg.fc_encap = nla_find(attrs, attrlen, RTA_ENCAP);
+			nla = nla_find(attrs, attrlen, RTA_ENCAP_TYPE);
+			if (nla)
+				r_cfg.fc_encap_type = nla_get_u16(nla);
+		}
+
+		r_cfg.fc_flags |= (rtnh->rtnh_flags & RTNH_F_ONLINK);
+
+		rt = ip6_route_info_create(&r_cfg, GFP_KERNEL, extack);
+		if (IS_ERR(rt)) {
+			err = PTR_ERR(rt);
+			goto err;
+		}
+
+		nh->fib6_info = rt;
+		nh->weight = rtnh->rtnh_hops + 1;
+		memcpy(&nh->r_cfg, &r_cfg, sizeof(r_cfg));
+
+		rtnh = rtnh_next(rtnh, &remaining);
+	}
 
 	return 0;
+err:
+	ip6_route_mpath_info_cleanup(rt6_nh_list);
+	return err;
+}
+
+static int ip6_route_mpath_info_create_nh(struct list_head *rt6_nh_list,
+					  struct netlink_ext_ack *extack)
+{
+	struct rt6_nh *nh, *nh_next, *nh_tmp;
+	LIST_HEAD(tmp);
+	int err;
+
+	list_for_each_entry_safe(nh, nh_next, rt6_nh_list, next) {
+		struct fib6_info *rt = nh->fib6_info;
+
+		err = ip6_route_info_create_nh(rt, &nh->r_cfg, extack);
+		if (err) {
+			nh->fib6_info = NULL;
+			goto err;
+		}
+
+		rt->fib6_nh->fib_nh_weight = nh->weight;
+
+		list_move_tail(&nh->next, &tmp);
+
+		list_for_each_entry(nh_tmp, rt6_nh_list, next) {
+			/* check if fib6_info already exists */
+			if (rt6_duplicate_nexthop(nh_tmp->fib6_info, rt)) {
+				err = -EEXIST;
+				goto err;
+			}
+		}
+	}
+out:
+	list_splice(&tmp, rt6_nh_list);
+	return err;
+err:
+	ip6_route_mpath_info_cleanup(rt6_nh_list);
+	goto out;
 }
 
 static void ip6_route_mpath_notify(struct fib6_info *rt,
@@ -5396,75 +5498,28 @@  static int ip6_route_multipath_add(struct fib6_config *cfg,
 {
 	struct fib6_info *rt_notif = NULL, *rt_last = NULL;
 	struct nl_info *info = &cfg->fc_nlinfo;
-	struct fib6_config r_cfg;
-	struct rtnexthop *rtnh;
-	struct fib6_info *rt;
-	struct rt6_nh *err_nh;
 	struct rt6_nh *nh, *nh_safe;
+	LIST_HEAD(rt6_nh_list);
+	struct rt6_nh *err_nh;
 	__u16 nlflags;
-	int remaining;
-	int attrlen;
-	int err = 1;
 	int nhn = 0;
-	int replace = (cfg->fc_nlinfo.nlh &&
-		       (cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_REPLACE));
-	LIST_HEAD(rt6_nh_list);
+	int replace;
+	int err;
+
+	replace = (cfg->fc_nlinfo.nlh &&
+		   (cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_REPLACE));
 
 	nlflags = replace ? NLM_F_REPLACE : NLM_F_CREATE;
 	if (info->nlh && info->nlh->nlmsg_flags & NLM_F_APPEND)
 		nlflags |= NLM_F_APPEND;
 
-	remaining = cfg->fc_mp_len;
-	rtnh = (struct rtnexthop *)cfg->fc_mp;
-
-	/* Parse a Multipath Entry and build a list (rt6_nh_list) of
-	 * fib6_info structs per nexthop
-	 */
-	while (rtnh_ok(rtnh, remaining)) {
-		memcpy(&r_cfg, cfg, sizeof(*cfg));
-		if (rtnh->rtnh_ifindex)
-			r_cfg.fc_ifindex = rtnh->rtnh_ifindex;
-
-		attrlen = rtnh_attrlen(rtnh);
-		if (attrlen > 0) {
-			struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
-
-			nla = nla_find(attrs, attrlen, RTA_GATEWAY);
-			if (nla) {
-				r_cfg.fc_gateway = nla_get_in6_addr(nla);
-				r_cfg.fc_flags |= RTF_GATEWAY;
-			}
-
-			r_cfg.fc_encap = nla_find(attrs, attrlen, RTA_ENCAP);
-			nla = nla_find(attrs, attrlen, RTA_ENCAP_TYPE);
-			if (nla)
-				r_cfg.fc_encap_type = nla_get_u16(nla);
-		}
-
-		r_cfg.fc_flags |= (rtnh->rtnh_flags & RTNH_F_ONLINK);
-		rt = ip6_route_info_create(&r_cfg, GFP_KERNEL, extack);
-		if (IS_ERR(rt)) {
-			err = PTR_ERR(rt);
-			rt = NULL;
-			goto cleanup;
-		}
-
-		err = ip6_route_info_create_nh(rt, &r_cfg, extack);
-		if (err) {
-			rt = NULL;
-			goto cleanup;
-		}
-
-		rt->fib6_nh->fib_nh_weight = rtnh->rtnh_hops + 1;
-
-		err = ip6_route_info_append(&rt6_nh_list, rt, &r_cfg);
-		if (err) {
-			fib6_info_release(rt);
-			goto cleanup;
-		}
+	err = ip6_route_mpath_info_create(&rt6_nh_list, cfg, extack);
+	if (err)
+		return err;
 
-		rtnh = rtnh_next(rtnh, &remaining);
-	}
+	err = ip6_route_mpath_info_create_nh(&rt6_nh_list, extack);
+	if (err)
+		goto cleanup;
 
 	/* for add and replace send one notification with all nexthops.
 	 * Skip the notification in fib6_add_rt2node and send one with