Message ID | 20250409011243.26195-11-kuniyu@amazon.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | ipv6: No RTNL for IPv6 routing table. | expand |
On Tue, Apr 08, 2025 at 06:12:18PM -0700, Kuniyuki Iwashima wrote: > We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT and rely > on RCU to guarantee dev and nexthop lifetime. > > Then, the RCU section will start before ip6_route_info_create_nh() > in ip6_route_multipath_add(), but ip6_route_info_create() is called > in the same loop and will sleep. > > Let's split the loop into ip6_route_mpath_info_create() and > ip6_route_mpath_info_create_nh(). > > Note that ip6_route_info_append() is now integrated into > ip6_route_mpath_info_create_nh() because we need to call different > free functions for nexthops that passed ip6_route_info_create_nh(). > > In case of failure, the remaining nexthops that ip6_route_info_create_nh() > has not been called for will be freed by ip6_route_mpath_info_cleanup(). > > OTOH, if a nexthop passes ip6_route_info_create_nh(), it will be linked > to a local temporary list, which will be spliced back to rt6_nh_list. > In case of failure, these nexthops will be released by fib6_info_release() > in ip6_route_multipath_add(). > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> > --- > net/ipv6/route.c | 205 ++++++++++++++++++++++++++++++----------------- > 1 file changed, 130 insertions(+), 75 deletions(-) > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c ... > +static int ip6_route_mpath_info_create_nh(struct list_head *rt6_nh_list, > + struct netlink_ext_ack *extack) > +{ > + struct rt6_nh *nh, *nh_next, *nh_tmp; > + LIST_HEAD(tmp); > + int err; > + > + list_for_each_entry_safe(nh, nh_next, rt6_nh_list, next) { > + struct fib6_info *rt = nh->fib6_info; > + > + err = ip6_route_info_create_nh(rt, &nh->r_cfg, extack); > + if (err) { > + nh->fib6_info = NULL; > + goto err; > + } > + > + rt->fib6_nh->fib_nh_weight = nh->weight; > + > + list_move_tail(&nh->next, &tmp); > + > + list_for_each_entry(nh_tmp, rt6_nh_list, next) { > + /* check if fib6_info already exists */ > + if (rt6_duplicate_nexthop(nh_tmp->fib6_info, rt)) { > + err = -EEXIST; > + goto err; > + } > + } > + } > +out: > + list_splice(&tmp, rt6_nh_list); > + return err; Hi Kuniyuki-san, Perhaps it can't happen in practice, but if the loop above iterates zero times then err will be used uninitialised. As it's expected that err is 0 here, perhaps it would be simplest to just: return 0; > +err: > + ip6_route_mpath_info_cleanup(rt6_nh_list); > + goto out; > } ...
On Fri, Apr 11, 2025 at 12:33:46PM -0700, Kuniyuki Iwashima wrote: > From: Simon Horman <horms@kernel.org> > Date: Fri, 11 Apr 2025 11:34:04 +0100 > > > +static int ip6_route_mpath_info_create_nh(struct list_head *rt6_nh_list, > > > + struct netlink_ext_ack *extack) > > > +{ > > > + struct rt6_nh *nh, *nh_next, *nh_tmp; > > > + LIST_HEAD(tmp); > > > + int err; > > > + > > > + list_for_each_entry_safe(nh, nh_next, rt6_nh_list, next) { > > > + struct fib6_info *rt = nh->fib6_info; > > > + > > > + err = ip6_route_info_create_nh(rt, &nh->r_cfg, extack); > > > + if (err) { > > > + nh->fib6_info = NULL; > > > + goto err; > > > + } > > > + > > > + rt->fib6_nh->fib_nh_weight = nh->weight; > > > + > > > + list_move_tail(&nh->next, &tmp); > > > + > > > + list_for_each_entry(nh_tmp, rt6_nh_list, next) { > > > + /* check if fib6_info already exists */ > > > + if (rt6_duplicate_nexthop(nh_tmp->fib6_info, rt)) { > > > + err = -EEXIST; > > > + goto err; > > > + } > > > + } > > > + } > > > +out: > > > + list_splice(&tmp, rt6_nh_list); > > > + return err; > > > > Hi Kuniyuki-san, > > > > Perhaps it can't happen in practice, > > Yes, it never happens by patch 1 as rtm_to_fib6_multipath_config() > returns an error in such a case. > > > > but if the loop above iterates zero > > times then err will be used uninitialised. As it's expected that err is 0 > > here, perhaps it would be simplest to just: > > > > return 0; > > If we want to return 0 above, we need to duplicate list_splice() at > err: and return err; there. Or initialise err = 0, but this looks > worse to me. Thanks. I should have dug a bit deeper to determine that this is a false-positive. > Btw, was this caught by Smatch, Coverity, or something ? I don't > see such a report at CI. > https://patchwork.kernel.org/project/netdevbpf/patch/20250409011243.26195-11-kuniyu@amazon.com/ Sorry for not mentioning that it was flagged by Smatch, I certainly should have done so. > > If so, I'm just curious if we have an official guideline for > false-positives flagged by such tools, like we should care about it > while writing a code and should try to be safer to make it happy. > > We are also running Coverity for the mainline kernel and have tons > of false-positive reports due to lack of contexts. I think that the current non-guideline is that we don't change code just to keep the tools happy. Perhaps we should add something about that to the process document?
From: Simon Horman <horms@kernel.org> Date: Mon, 14 Apr 2025 15:52:26 +0100 > On Fri, Apr 11, 2025 at 12:33:46PM -0700, Kuniyuki Iwashima wrote: > > From: Simon Horman <horms@kernel.org> > > Date: Fri, 11 Apr 2025 11:34:04 +0100 > > > > +static int ip6_route_mpath_info_create_nh(struct list_head *rt6_nh_list, > > > > + struct netlink_ext_ack *extack) > > > > +{ > > > > + struct rt6_nh *nh, *nh_next, *nh_tmp; > > > > + LIST_HEAD(tmp); > > > > + int err; > > > > + > > > > + list_for_each_entry_safe(nh, nh_next, rt6_nh_list, next) { > > > > + struct fib6_info *rt = nh->fib6_info; > > > > + > > > > + err = ip6_route_info_create_nh(rt, &nh->r_cfg, extack); > > > > + if (err) { > > > > + nh->fib6_info = NULL; > > > > + goto err; > > > > + } > > > > + > > > > + rt->fib6_nh->fib_nh_weight = nh->weight; > > > > + > > > > + list_move_tail(&nh->next, &tmp); > > > > + > > > > + list_for_each_entry(nh_tmp, rt6_nh_list, next) { > > > > + /* check if fib6_info already exists */ > > > > + if (rt6_duplicate_nexthop(nh_tmp->fib6_info, rt)) { > > > > + err = -EEXIST; > > > > + goto err; > > > > + } > > > > + } > > > > + } > > > > +out: > > > > + list_splice(&tmp, rt6_nh_list); > > > > + return err; > > > > > > Hi Kuniyuki-san, > > > > > > Perhaps it can't happen in practice, > > > > Yes, it never happens by patch 1 as rtm_to_fib6_multipath_config() > > returns an error in such a case. > > > > > > > but if the loop above iterates zero > > > times then err will be used uninitialised. As it's expected that err is 0 > > > here, perhaps it would be simplest to just: > > > > > > return 0; > > > > If we want to return 0 above, we need to duplicate list_splice() at > > err: and return err; there. Or initialise err = 0, but this looks > > worse to me. > > Thanks. I should have dug a bit deeper to determine that this > is a false-positive. > > > Btw, was this caught by Smatch, Coverity, or something ? I don't > > see such a report at CI. > > https://patchwork.kernel.org/project/netdevbpf/patch/20250409011243.26195-11-kuniyu@amazon.com/ > > Sorry for not mentioning that it was flagged by Smatch, > I certainly should have done so. Thanks for confirming! > > > > > > If so, I'm just curious if we have an official guideline for > > false-positives flagged by such tools, like we should care about it > > while writing a code and should try to be safer to make it happy. > > > > We are also running Coverity for the mainline kernel and have tons > > of false-positive reports due to lack of contexts. > > I think that the current non-guideline is that we don't change > code just to keep the tools happy. Perhaps we should add something > about that to the process document? Makes sense. But looks like the series was marked Changes Requested, not sure if it's accidental or intentional, so I'll resend v2 to see others' opinion. Thanks!
On Mon, Apr 14, 2025 at 11:06:58AM -0700, Kuniyuki Iwashima wrote: > From: Simon Horman <horms@kernel.org> > Date: Mon, 14 Apr 2025 15:52:26 +0100 > > On Fri, Apr 11, 2025 at 12:33:46PM -0700, Kuniyuki Iwashima wrote: > > > From: Simon Horman <horms@kernel.org> > > > Date: Fri, 11 Apr 2025 11:34:04 +0100 ... > > > > Hi Kuniyuki-san, > > > > > > > > Perhaps it can't happen in practice, > > > > > > Yes, it never happens by patch 1 as rtm_to_fib6_multipath_config() > > > returns an error in such a case. > > > > > > > > > > but if the loop above iterates zero > > > > times then err will be used uninitialised. As it's expected that err is 0 > > > > here, perhaps it would be simplest to just: > > > > > > > > return 0; > > > > > > If we want to return 0 above, we need to duplicate list_splice() at > > > err: and return err; there. Or initialise err = 0, but this looks > > > worse to me. > > > > Thanks. I should have dug a bit deeper to determine that this > > is a false-positive. > > > > > Btw, was this caught by Smatch, Coverity, or something ? I don't > > > see such a report at CI. > > > https://patchwork.kernel.org/project/netdevbpf/patch/20250409011243.26195-11-kuniyu@amazon.com/ > > > > Sorry for not mentioning that it was flagged by Smatch, > > I certainly should have done so. > > Thanks for confirming! > > > > > > > > > > > If so, I'm just curious if we have an official guideline for > > > false-positives flagged by such tools, like we should care about it > > > while writing a code and should try to be safer to make it happy. > > > > > > We are also running Coverity for the mainline kernel and have tons > > > of false-positive reports due to lack of contexts. > > > > I think that the current non-guideline is that we don't change > > code just to keep the tools happy. Perhaps we should add something > > about that to the process document? > > Makes sense. > > But looks like the series was marked Changes Requested, not sure > if it's accidental or intentional, so I'll resend v2 to see others' > opinion. I'm not sure either. But I agree that a v2 is a good way forward.
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index adefabce985f..26a0be592632 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -5315,29 +5315,131 @@ struct rt6_nh { struct fib6_info *fib6_info; struct fib6_config r_cfg; struct list_head next; + int weight; }; -static int ip6_route_info_append(struct list_head *rt6_nh_list, - struct fib6_info *rt, - struct fib6_config *r_cfg) +static void ip6_route_mpath_info_cleanup(struct list_head *rt6_nh_list) { - struct rt6_nh *nh; - int err = -EEXIST; + struct rt6_nh *nh, *nh_next; - list_for_each_entry(nh, rt6_nh_list, next) { - /* check if fib6_info already exists */ - if (rt6_duplicate_nexthop(nh->fib6_info, rt)) - return err; + list_for_each_entry_safe(nh, nh_next, rt6_nh_list, next) { + struct fib6_info *rt = nh->fib6_info; + + if (rt) { + free_percpu(rt->fib6_nh->nh_common.nhc_pcpu_rth_output); + free_percpu(rt->fib6_nh->rt6i_pcpu); + ip_fib_metrics_put(rt->fib6_metrics); + kfree(rt); + } + + list_del(&nh->next); + kfree(nh); } +} - nh = kzalloc(sizeof(*nh), GFP_KERNEL); - if (!nh) - return -ENOMEM; - nh->fib6_info = rt; - memcpy(&nh->r_cfg, r_cfg, sizeof(*r_cfg)); - list_add_tail(&nh->next, rt6_nh_list); +static int ip6_route_mpath_info_create(struct list_head *rt6_nh_list, + struct fib6_config *cfg, + struct netlink_ext_ack *extack) +{ + struct rtnexthop *rtnh; + int remaining; + int err; + + remaining = cfg->fc_mp_len; + rtnh = (struct rtnexthop *)cfg->fc_mp; + + /* Parse a Multipath Entry and build a list (rt6_nh_list) of + * fib6_info structs per nexthop + */ + while (rtnh_ok(rtnh, remaining)) { + struct fib6_config r_cfg; + struct fib6_info *rt; + struct rt6_nh *nh; + int attrlen; + + nh = kzalloc(sizeof(*nh), GFP_KERNEL); + if (!nh) { + err = -ENOMEM; + goto err; + } + + list_add_tail(&nh->next, rt6_nh_list); + + memcpy(&r_cfg, cfg, sizeof(*cfg)); + if (rtnh->rtnh_ifindex) + r_cfg.fc_ifindex = rtnh->rtnh_ifindex; + + attrlen = rtnh_attrlen(rtnh); + if (attrlen > 0) { + struct nlattr *nla, *attrs = rtnh_attrs(rtnh); + + nla = nla_find(attrs, attrlen, RTA_GATEWAY); + if (nla) { + r_cfg.fc_gateway = nla_get_in6_addr(nla); + r_cfg.fc_flags |= RTF_GATEWAY; + } + + r_cfg.fc_encap = nla_find(attrs, attrlen, RTA_ENCAP); + nla = nla_find(attrs, attrlen, RTA_ENCAP_TYPE); + if (nla) + r_cfg.fc_encap_type = nla_get_u16(nla); + } + + r_cfg.fc_flags |= (rtnh->rtnh_flags & RTNH_F_ONLINK); + + rt = ip6_route_info_create(&r_cfg, GFP_KERNEL, extack); + if (IS_ERR(rt)) { + err = PTR_ERR(rt); + goto err; + } + + nh->fib6_info = rt; + nh->weight = rtnh->rtnh_hops + 1; + memcpy(&nh->r_cfg, &r_cfg, sizeof(r_cfg)); + + rtnh = rtnh_next(rtnh, &remaining); + } return 0; +err: + ip6_route_mpath_info_cleanup(rt6_nh_list); + return err; +} + +static int ip6_route_mpath_info_create_nh(struct list_head *rt6_nh_list, + struct netlink_ext_ack *extack) +{ + struct rt6_nh *nh, *nh_next, *nh_tmp; + LIST_HEAD(tmp); + int err; + + list_for_each_entry_safe(nh, nh_next, rt6_nh_list, next) { + struct fib6_info *rt = nh->fib6_info; + + err = ip6_route_info_create_nh(rt, &nh->r_cfg, extack); + if (err) { + nh->fib6_info = NULL; + goto err; + } + + rt->fib6_nh->fib_nh_weight = nh->weight; + + list_move_tail(&nh->next, &tmp); + + list_for_each_entry(nh_tmp, rt6_nh_list, next) { + /* check if fib6_info already exists */ + if (rt6_duplicate_nexthop(nh_tmp->fib6_info, rt)) { + err = -EEXIST; + goto err; + } + } + } +out: + list_splice(&tmp, rt6_nh_list); + return err; +err: + ip6_route_mpath_info_cleanup(rt6_nh_list); + goto out; } static void ip6_route_mpath_notify(struct fib6_info *rt, @@ -5396,75 +5498,28 @@ static int ip6_route_multipath_add(struct fib6_config *cfg, { struct fib6_info *rt_notif = NULL, *rt_last = NULL; struct nl_info *info = &cfg->fc_nlinfo; - struct fib6_config r_cfg; - struct rtnexthop *rtnh; - struct fib6_info *rt; - struct rt6_nh *err_nh; struct rt6_nh *nh, *nh_safe; + LIST_HEAD(rt6_nh_list); + struct rt6_nh *err_nh; __u16 nlflags; - int remaining; - int attrlen; - int err = 1; int nhn = 0; - int replace = (cfg->fc_nlinfo.nlh && - (cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_REPLACE)); - LIST_HEAD(rt6_nh_list); + int replace; + int err; + + replace = (cfg->fc_nlinfo.nlh && + (cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_REPLACE)); nlflags = replace ? NLM_F_REPLACE : NLM_F_CREATE; if (info->nlh && info->nlh->nlmsg_flags & NLM_F_APPEND) nlflags |= NLM_F_APPEND; - remaining = cfg->fc_mp_len; - rtnh = (struct rtnexthop *)cfg->fc_mp; - - /* Parse a Multipath Entry and build a list (rt6_nh_list) of - * fib6_info structs per nexthop - */ - while (rtnh_ok(rtnh, remaining)) { - memcpy(&r_cfg, cfg, sizeof(*cfg)); - if (rtnh->rtnh_ifindex) - r_cfg.fc_ifindex = rtnh->rtnh_ifindex; - - attrlen = rtnh_attrlen(rtnh); - if (attrlen > 0) { - struct nlattr *nla, *attrs = rtnh_attrs(rtnh); - - nla = nla_find(attrs, attrlen, RTA_GATEWAY); - if (nla) { - r_cfg.fc_gateway = nla_get_in6_addr(nla); - r_cfg.fc_flags |= RTF_GATEWAY; - } - - r_cfg.fc_encap = nla_find(attrs, attrlen, RTA_ENCAP); - nla = nla_find(attrs, attrlen, RTA_ENCAP_TYPE); - if (nla) - r_cfg.fc_encap_type = nla_get_u16(nla); - } - - r_cfg.fc_flags |= (rtnh->rtnh_flags & RTNH_F_ONLINK); - rt = ip6_route_info_create(&r_cfg, GFP_KERNEL, extack); - if (IS_ERR(rt)) { - err = PTR_ERR(rt); - rt = NULL; - goto cleanup; - } - - err = ip6_route_info_create_nh(rt, &r_cfg, extack); - if (err) { - rt = NULL; - goto cleanup; - } - - rt->fib6_nh->fib_nh_weight = rtnh->rtnh_hops + 1; - - err = ip6_route_info_append(&rt6_nh_list, rt, &r_cfg); - if (err) { - fib6_info_release(rt); - goto cleanup; - } + err = ip6_route_mpath_info_create(&rt6_nh_list, cfg, extack); + if (err) + return err; - rtnh = rtnh_next(rtnh, &remaining); - } + err = ip6_route_mpath_info_create_nh(&rt6_nh_list, extack); + if (err) + goto cleanup; /* for add and replace send one notification with all nexthops. * Skip the notification in fib6_add_rt2node and send one with
We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT and rely on RCU to guarantee dev and nexthop lifetime. Then, the RCU section will start before ip6_route_info_create_nh() in ip6_route_multipath_add(), but ip6_route_info_create() is called in the same loop and will sleep. Let's split the loop into ip6_route_mpath_info_create() and ip6_route_mpath_info_create_nh(). Note that ip6_route_info_append() is now integrated into ip6_route_mpath_info_create_nh() because we need to call different free functions for nexthops that passed ip6_route_info_create_nh(). In case of failure, the remaining nexthops that ip6_route_info_create_nh() has not been called for will be freed by ip6_route_mpath_info_cleanup(). OTOH, if a nexthop passes ip6_route_info_create_nh(), it will be linked to a local temporary list, which will be spliced back to rt6_nh_list. In case of failure, these nexthops will be released by fib6_info_release() in ip6_route_multipath_add(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> --- net/ipv6/route.c | 205 ++++++++++++++++++++++++++++++----------------- 1 file changed, 130 insertions(+), 75 deletions(-)