From patchwork Fri Apr 18 00:03:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056427 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD6C14C80 for ; Fri, 18 Apr 2025 00:06:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.219 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934791; cv=none; b=aoWZdEAqgxgVYpabwj2SrXpNGOoUg7VXHeewuU7LVEM5QKlXmBGId8Rd8do8mn+hMWTgonjQET9p32SScmpxJhVDW09QOWMHm/gvXSJ9Tqs/dL+ljYlC8HE6H/q9Hjasj81YO+8AUH2FEHOeTFTpbNTr317FKRkS/oRiP9pgCmk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934791; c=relaxed/simple; bh=N8uxOTRUw9pLZHNjU69D1FMyv8yjlDTJPBfVh0b4tD4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Ii494RTXsKAVld1JmR05JjikxmYW9xcxv6leb3hZYesO3bgNoOq9GLcFBfDTNAYf1KA/o4cPl1nY8zium9QLlnz//UG4R0CA0ujyY6kReLt/2XlQ79WM9I1FkqNfxscdc2F5INPt7QmNg7oBEmBGndclHlB6rhX+li2G2rLZeAg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=myBjw2Cj; arc=none smtp.client-ip=99.78.197.219 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="myBjw2Cj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744934789; x=1776470789; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=syCUoJTRDODfrogevKezlpcwY1DeV1c2TMvS+hQzchE=; b=myBjw2CjBGS5gfW25YNehtrJSMvOcOMukDjCffGUguGdi+fZMBKpCGFQ Yb6ilcZz2NoNUf3JxeLRJk/xvHGWa9Vatx/yctAvFz6TApWkI/EL+DXXu sRdxkC9Y85/qImV1vOqNtIstf6+HH/PXEVHolydw00lveqOed3hY4nC7o U=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="188416455" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:05:20 +0000 Received: from EX19MTAUWA001.ant.amazon.com [10.0.21.151:17811] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.47.20:2525] with esmtp (Farcaster) id 482fb004-6bfd-43d2-9615-11b4220022e0; Fri, 18 Apr 2025 00:05:20 +0000 (UTC) X-Farcaster-Flow-ID: 482fb004-6bfd-43d2-9615-11b4220022e0 Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWA001.ant.amazon.com (10.250.64.217) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:05:19 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:05:17 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 01/15] ipv6: Validate RTA_GATEWAY of RTA_MULTIPATH in rtm_to_fib6_config(). Date: Thu, 17 Apr 2025 17:03:42 -0700 Message-ID: <20250418000443.43734-2-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D039UWB001.ant.amazon.com (10.13.138.119) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org We will perform RTM_NEWROUTE and RTM_DELROUTE under RCU, and then we want to perform some validation out of the RCU scope. When creating / removing an IPv6 route with RTA_MULTIPATH, inet6_rtm_newroute() / inet6_rtm_delroute() validates RTA_GATEWAY in each multipath entry. Let's do that in rtm_to_fib6_config(). Note that now RTM_DELROUTE returns an error for RTA_MULTIPATH with 0 entries, which was accepted but should result in -EINVAL as RTM_NEWROUTE. Signed-off-by: Kuniyuki Iwashima --- net/ipv6/route.c | 82 +++++++++++++++++++++++++----------------------- 1 file changed, 43 insertions(+), 39 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index e2c6c0b0684b..51f693581b7c 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -5050,6 +5050,44 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = { [RTA_FLOWLABEL] = { .type = NLA_BE32 }, }; +static int rtm_to_fib6_multipath_config(struct fib6_config *cfg, + struct netlink_ext_ack *extack) +{ + struct rtnexthop *rtnh; + int remaining; + + remaining = cfg->fc_mp_len; + rtnh = (struct rtnexthop *)cfg->fc_mp; + + if (!rtnh_ok(rtnh, remaining)) { + NL_SET_ERR_MSG(extack, "Invalid nexthop configuration - no valid nexthops"); + return -EINVAL; + } + + do { + int attrlen = rtnh_attrlen(rtnh); + + if (attrlen > 0) { + struct nlattr *nla, *attrs; + + attrs = rtnh_attrs(rtnh); + nla = nla_find(attrs, attrlen, RTA_GATEWAY); + if (nla) { + if (nla_len(nla) < sizeof(cfg->fc_gateway)) { + NL_SET_ERR_MSG(extack, + "Invalid IPv6 address in RTA_GATEWAY"); + return -EINVAL; + } + } + } + + rtnh = rtnh_next(rtnh, &remaining); + } while (rtnh_ok(rtnh, remaining)); + + return lwtunnel_valid_encap_type_attr(cfg->fc_mp, cfg->fc_mp_len, + extack, true); +} + static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh, struct fib6_config *cfg, struct netlink_ext_ack *extack) @@ -5164,9 +5202,7 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh, cfg->fc_mp = nla_data(tb[RTA_MULTIPATH]); cfg->fc_mp_len = nla_len(tb[RTA_MULTIPATH]); - err = lwtunnel_valid_encap_type_attr(cfg->fc_mp, - cfg->fc_mp_len, - extack, true); + err = rtm_to_fib6_multipath_config(cfg, extack); if (err < 0) goto errout; } @@ -5286,19 +5322,6 @@ static bool ip6_route_mpath_should_notify(const struct fib6_info *rt) return should_notify; } -static int fib6_gw_from_attr(struct in6_addr *gw, struct nlattr *nla, - struct netlink_ext_ack *extack) -{ - if (nla_len(nla) < sizeof(*gw)) { - NL_SET_ERR_MSG(extack, "Invalid IPv6 address in RTA_GATEWAY"); - return -EINVAL; - } - - *gw = nla_get_in6_addr(nla); - - return 0; -} - static int ip6_route_multipath_add(struct fib6_config *cfg, struct netlink_ext_ack *extack) { @@ -5339,18 +5362,11 @@ static int ip6_route_multipath_add(struct fib6_config *cfg, nla = nla_find(attrs, attrlen, RTA_GATEWAY); if (nla) { - err = fib6_gw_from_attr(&r_cfg.fc_gateway, nla, - extack); - if (err) - goto cleanup; - + r_cfg.fc_gateway = nla_get_in6_addr(nla); r_cfg.fc_flags |= RTF_GATEWAY; } - r_cfg.fc_encap = nla_find(attrs, attrlen, RTA_ENCAP); - /* RTA_ENCAP_TYPE length checked in - * lwtunnel_valid_encap_type_attr - */ + r_cfg.fc_encap = nla_find(attrs, attrlen, RTA_ENCAP); nla = nla_find(attrs, attrlen, RTA_ENCAP_TYPE); if (nla) r_cfg.fc_encap_type = nla_get_u16(nla); @@ -5383,12 +5399,6 @@ static int ip6_route_multipath_add(struct fib6_config *cfg, rtnh = rtnh_next(rtnh, &remaining); } - if (list_empty(&rt6_nh_list)) { - NL_SET_ERR_MSG(extack, - "Invalid nexthop configuration - no valid nexthops"); - return -EINVAL; - } - /* for add and replace send one notification with all nexthops. * Skip the notification in fib6_add_rt2node and send one with * the full route when done @@ -5510,21 +5520,15 @@ static int ip6_route_multipath_del(struct fib6_config *cfg, nla = nla_find(attrs, attrlen, RTA_GATEWAY); if (nla) { - err = fib6_gw_from_attr(&r_cfg.fc_gateway, nla, - extack); - if (err) { - last_err = err; - goto next_rtnh; - } - + r_cfg.fc_gateway = nla_get_in6_addr(nla); r_cfg.fc_flags |= RTF_GATEWAY; } } + err = ip6_route_del(&r_cfg, extack); if (err) last_err = err; -next_rtnh: rtnh = rtnh_next(rtnh, &remaining); } From patchwork Fri Apr 18 00:03:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056424 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D1CC173 for ; Fri, 18 Apr 2025 00:05:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.95.49.90 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934750; cv=none; b=RlQXvhxoUoQJwT6GRgGgZDm97PUlsqNos3InUWFxpo3I3MYeibh6XELXZrQoiEmDxwhO8FYwoKfjVWliDtbl6I7REvoVfUX1S4J2AOjua2Be9GgKSWnoj2WGHwsNE1K8quRINXKKe4FLmBXUvR/xlY8uUHCbpNqCZEXCjYWfqys= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934750; c=relaxed/simple; bh=I/QqMJxyxxM0q5+4qJ9cFi0SsQBPz/qcjS8OZwfP1T4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cwW53dR0drujg/DRpAtceu3/SmsUgJdU0PzchpmarNAvTxPCBQctCSvAY/iqYu5BmEXSfA5kYtm8xgQ1YSdHq46MtXMbKDdcONJAv/qkImxj+P/cTX1tpq1PHGmWLrBy8djauAinh6hWq92r28/nUbnV8wI6PCPyW1C39hXaquk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=iAsOziWP; arc=none smtp.client-ip=52.95.49.90 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="iAsOziWP" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744934749; x=1776470749; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=i9LPb3OYIRQ36IQYyd/1zAtQz/QlRdUHsy9gFQabZkw=; b=iAsOziWPPgqvQ91lcA58NT9iarFwcp9+JXKblZ16v4WgLfS50NuNiEBH NN0IWpHUTGHy4br+DCGImRJ5vIjmpxiIFYG9pWEA4JHzuzpa4lkh8UCRW c4e/J7bNa3tPj+geqPM0b5zGRjjkM2pdpu7V255dTD3PmnnYe5i9PLx8Q A=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="490407402" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-6002.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:05:45 +0000 Received: from EX19MTAUWB002.ant.amazon.com [10.0.21.151:28306] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.57.220:2525] with esmtp (Farcaster) id 02f02bbd-f3e1-475c-97b6-27deef9c8bca; Fri, 18 Apr 2025 00:05:44 +0000 (UTC) X-Farcaster-Flow-ID: 02f02bbd-f3e1-475c-97b6-27deef9c8bca Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWB002.ant.amazon.com (10.250.64.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:05:43 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:05:41 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 02/15] ipv6: Get rid of RTNL for SIOCDELRT and RTM_DELROUTE. Date: Thu, 17 Apr 2025 17:03:43 -0700 Message-ID: <20250418000443.43734-3-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D037UWB003.ant.amazon.com (10.13.138.115) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org Basically, removing an IPv6 route does not require RTNL because the IPv6 routing tables are protected by per table lock. inet6_rtm_delroute() calls nexthop_find_by_id() to check if the nexthop specified by RTA_NH_ID exists. nexthop uses rbtree and the top-down walk can be safely performed under RCU. ip6_route_del() already relies on RCU and the table lock, but we need to extend the RCU critical section a bit more to cover __ip6_del_rt(). For example, nexthop_for_each_fib6_nh() and inet6_rt_notify() needs RCU. Let's call nexthop_find_by_id() and __ip6_del_rt() under RCU and get rid of RTNL from inet6_rtm_delroute() and SIOCDELRT. Even if the nexthop is removed after rcu_read_unlock() in inet6_rtm_delroute(), __remove_nexthop_fib() cleans up the routes tied to the nexthop, and ip6_route_del() returns -ESRCH. So the request was at least valid as of nexthop_find_by_id(), and it's just a matter of timing. Note that we need to pass false to lwtunnel_valid_encap_type_attr(). The following patches also use the newroute bool. Note also that fib6_get_table() does not require RCU because once allocated fib6_table is not freed until netns dismantle. I will post a follow-up series to convert such callers to RCU-lockless version. [0] Link: https://lore.kernel.org/netdev/20250417174557.65721-1-kuniyu@amazon.com/ #[0] Signed-off-by: Kuniyuki Iwashima --- v3: Add a note that fib6_get_table() does not require RCU v2: Call __ip6_del_rt() under RCU --- net/ipv6/route.c | 48 ++++++++++++++++++++++++++++-------------------- 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 51f693581b7c..4de7abe5ee02 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -4124,9 +4124,9 @@ static int ip6_route_del(struct fib6_config *cfg, if (rt->nh) { if (!fib6_info_hold_safe(rt)) continue; - rcu_read_unlock(); - return __ip6_del_rt(rt, &cfg->fc_nlinfo); + err = __ip6_del_rt(rt, &cfg->fc_nlinfo); + break; } if (cfg->fc_nh_id) continue; @@ -4141,13 +4141,13 @@ static int ip6_route_del(struct fib6_config *cfg, continue; if (!fib6_info_hold_safe(rt)) continue; - rcu_read_unlock(); /* if gateway was specified only delete the one hop */ if (cfg->fc_flags & RTF_GATEWAY) - return __ip6_del_rt(rt, &cfg->fc_nlinfo); - - return __ip6_del_rt_siblings(rt, cfg); + err = __ip6_del_rt(rt, &cfg->fc_nlinfo); + else + err = __ip6_del_rt_siblings(rt, cfg); + break; } } rcu_read_unlock(); @@ -4516,19 +4516,20 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, struct in6_rtmsg *rtmsg) rtmsg_to_fib6_config(net, rtmsg, &cfg); - rtnl_lock(); switch (cmd) { case SIOCADDRT: + rtnl_lock(); /* Only do the default setting of fc_metric in route adding */ if (cfg.fc_metric == 0) cfg.fc_metric = IP6_RT_PRIO_USER; err = ip6_route_add(&cfg, GFP_KERNEL, NULL); + rtnl_unlock(); break; case SIOCDELRT: err = ip6_route_del(&cfg, NULL); break; } - rtnl_unlock(); + return err; } @@ -5051,7 +5052,8 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = { }; static int rtm_to_fib6_multipath_config(struct fib6_config *cfg, - struct netlink_ext_ack *extack) + struct netlink_ext_ack *extack, + bool newroute) { struct rtnexthop *rtnh; int remaining; @@ -5085,15 +5087,16 @@ static int rtm_to_fib6_multipath_config(struct fib6_config *cfg, } while (rtnh_ok(rtnh, remaining)); return lwtunnel_valid_encap_type_attr(cfg->fc_mp, cfg->fc_mp_len, - extack, true); + extack, newroute); } static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh, struct fib6_config *cfg, struct netlink_ext_ack *extack) { - struct rtmsg *rtm; + bool newroute = nlh->nlmsg_type == RTM_NEWROUTE; struct nlattr *tb[RTA_MAX+1]; + struct rtmsg *rtm; unsigned int pref; int err; @@ -5202,7 +5205,7 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh, cfg->fc_mp = nla_data(tb[RTA_MULTIPATH]); cfg->fc_mp_len = nla_len(tb[RTA_MULTIPATH]); - err = rtm_to_fib6_multipath_config(cfg, extack); + err = rtm_to_fib6_multipath_config(cfg, extack, newroute); if (err < 0) goto errout; } @@ -5222,7 +5225,7 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh, cfg->fc_encap_type = nla_get_u16(tb[RTA_ENCAP_TYPE]); err = lwtunnel_valid_encap_type(cfg->fc_encap_type, - extack, true); + extack, newroute); if (err < 0) goto errout; } @@ -5545,15 +5548,20 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh, if (err < 0) return err; - if (cfg.fc_nh_id && - !nexthop_find_by_id(sock_net(skb->sk), cfg.fc_nh_id)) { - NL_SET_ERR_MSG(extack, "Nexthop id does not exist"); - return -EINVAL; + if (cfg.fc_nh_id) { + rcu_read_lock(); + err = !nexthop_find_by_id(sock_net(skb->sk), cfg.fc_nh_id); + rcu_read_unlock(); + + if (err) { + NL_SET_ERR_MSG(extack, "Nexthop id does not exist"); + return -EINVAL; + } } - if (cfg.fc_mp) + if (cfg.fc_mp) { return ip6_route_multipath_del(&cfg, extack); - else { + } else { cfg.fc_delete_all_nh = 1; return ip6_route_del(&cfg, extack); } @@ -6765,7 +6773,7 @@ static const struct rtnl_msg_handler ip6_route_rtnl_msg_handlers[] __initconst_o {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_NEWROUTE, .doit = inet6_rtm_newroute}, {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_DELROUTE, - .doit = inet6_rtm_delroute}, + .doit = inet6_rtm_delroute, .flags = RTNL_FLAG_DOIT_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_GETROUTE, .doit = inet6_rtm_getroute, .flags = RTNL_FLAG_DOIT_UNLOCKED}, }; From patchwork Fri Apr 18 00:03:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056426 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F02A710E3 for ; Fri, 18 Apr 2025 00:06:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.184.29 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934775; cv=none; b=B6bny2g+3yAASSVVzMo5Zql095v3VjW4CZNDBloMb7v+WtMI+mtBWTl3V6N+KkgFqkx7pY3yemclVPH/mua04j3Y+YBu61gBJv7et7x16/kFZrryZe7ZBMVa/skgIpEwHkufJoTuie//Sd6kYDOq84DaAq4HTOzkuVQr7OEEb1k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934775; c=relaxed/simple; bh=17yaz042v70uixxtwHiT4iDao47Ci0n5dWsP3eCHUnM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ddBal4o407VEHhKOoN8WiprrrqSZRXn1x5UFGg3RsSMaurUWdX9SfOOnFnv6F0WPlYKhVGrgL+BBHYvbgJW2fH2Oz+3YlG+cXyMl7AE1yy1+ni0KAIxT4Tvcgeein+Qk/i8ludcNNNEXs7AdKkGSvEeEstgKywCUXSuS2HOpSg8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=YxreyFj/; arc=none smtp.client-ip=207.171.184.29 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="YxreyFj/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744934771; x=1776470771; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eC7aoslAjvMHTbit52MU8ZRxsQLhgKp0OyPIG4cKIYg=; b=YxreyFj/cHGcarneqgHyG0nBzCsVMiyY090n/sUqNuDdOonkGmeowZJH uAXEew/RRUWGrDEATAkyxbfWUfl4xf8gFmKuBAWesgwqlE+9JwRcUHid3 wmNRlcKK1c567f3Z9nxtaCSskQPFiNExHi+KWCtoJszT1Sr/QFW/IOlg2 Q=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="512500936" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:06:10 +0000 Received: from EX19MTAUWA002.ant.amazon.com [10.0.38.20:26613] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.47.20:2525] with esmtp (Farcaster) id 84e25271-5722-4bf1-a20f-019399fedecc; Fri, 18 Apr 2025 00:06:10 +0000 (UTC) X-Farcaster-Flow-ID: 84e25271-5722-4bf1-a20f-019399fedecc Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWA002.ant.amazon.com (10.250.64.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:06:07 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:06:05 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 03/15] ipv6: Move some validation from ip6_route_info_create() to rtm_to_fib6_config(). Date: Thu, 17 Apr 2025 17:03:44 -0700 Message-ID: <20250418000443.43734-4-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D035UWA004.ant.amazon.com (10.13.139.109) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org ip6_route_info_create() is called from 3 functions: * ip6_route_add() * ip6_route_multipath_add() * addrconf_f6i_alloc() addrconf_f6i_alloc() does not need validation for struct fib6_config in ip6_route_info_create(). ip6_route_multipath_add() calls ip6_route_info_create() for multiple routes with slightly different fib6_config instances, which is copied from the base config passed from userspace. So, we need not validate the same config repeatedly. Let's move such validation into rtm_to_fib6_config(). Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni --- net/ipv6/route.c | 79 +++++++++++++++++++++++++----------------------- 1 file changed, 42 insertions(+), 37 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 4de7abe5ee02..23102f37f220 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3739,38 +3739,6 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, int err = -EINVAL; int addr_type; - /* RTF_PCPU is an internal flag; can not be set by userspace */ - if (cfg->fc_flags & RTF_PCPU) { - NL_SET_ERR_MSG(extack, "Userspace can not set RTF_PCPU"); - goto out; - } - - /* RTF_CACHE is an internal flag; can not be set by userspace */ - if (cfg->fc_flags & RTF_CACHE) { - NL_SET_ERR_MSG(extack, "Userspace can not set RTF_CACHE"); - goto out; - } - - if (cfg->fc_type > RTN_MAX) { - NL_SET_ERR_MSG(extack, "Invalid route type"); - goto out; - } - - if (cfg->fc_dst_len > 128) { - NL_SET_ERR_MSG(extack, "Invalid prefix length"); - goto out; - } - if (cfg->fc_src_len > 128) { - NL_SET_ERR_MSG(extack, "Invalid source address length"); - goto out; - } -#ifndef CONFIG_IPV6_SUBTREES - if (cfg->fc_src_len) { - NL_SET_ERR_MSG(extack, - "Specifying source address requires IPV6_SUBTREES to be enabled"); - goto out; - } -#endif if (cfg->fc_nh_id) { nh = nexthop_find_by_id(net, cfg->fc_nh_id); if (!nh) { @@ -3835,11 +3803,6 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, rt->fib6_src.plen = cfg->fc_src_len; #endif if (nh) { - if (rt->fib6_src.plen) { - NL_SET_ERR_MSG(extack, "Nexthops can not be used with source routing"); - err = -EINVAL; - goto out_free; - } if (!nexthop_get(nh)) { NL_SET_ERR_MSG(extack, "Nexthop has been deleted"); err = -ENOENT; @@ -5239,6 +5202,48 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh, } } + if (newroute) { + /* RTF_PCPU is an internal flag; can not be set by userspace */ + if (cfg->fc_flags & RTF_PCPU) { + NL_SET_ERR_MSG(extack, "Userspace can not set RTF_PCPU"); + goto errout; + } + + /* RTF_CACHE is an internal flag; can not be set by userspace */ + if (cfg->fc_flags & RTF_CACHE) { + NL_SET_ERR_MSG(extack, "Userspace can not set RTF_CACHE"); + goto errout; + } + + if (cfg->fc_type > RTN_MAX) { + NL_SET_ERR_MSG(extack, "Invalid route type"); + goto errout; + } + + if (cfg->fc_dst_len > 128) { + NL_SET_ERR_MSG(extack, "Invalid prefix length"); + goto errout; + } + +#ifdef CONFIG_IPV6_SUBTREES + if (cfg->fc_src_len > 128) { + NL_SET_ERR_MSG(extack, "Invalid source address length"); + goto errout; + } + + if (cfg->fc_nh_id && cfg->fc_src_len) { + NL_SET_ERR_MSG(extack, "Nexthops can not be used with source routing"); + goto errout; + } +#else + if (cfg->fc_src_len) { + NL_SET_ERR_MSG(extack, + "Specifying source address requires IPV6_SUBTREES to be enabled"); + goto errout; + } +#endif + } + err = 0; errout: return err; From patchwork Fri Apr 18 00:03:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056428 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DD5F1C32 for ; Fri, 18 Apr 2025 00:06:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.95.49.90 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934795; cv=none; b=Tfy9w1MQebV9WJ6gDbqJNCmGacSns61vr14lP6qYz8aCLohfbp/dfzJWuetGJFPiP8OkOFZYKyf51/Jw/1J0chNErf/jkhUL25QPyBSZazPgtyFaCMsRdhg3ypznl4KO+BQ4e3b+H2b5coZQvFMACQf86jGmNRV/+wgDiAoXKHo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934795; c=relaxed/simple; bh=dwI7xtaXTwbiE+y7Ca8Un1ic/RVZGEmvZzd46q1BDAY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=b+lxlvzvmaagK+x7eYY7cZu0E179jRarb2zW3Ln0gBwKXtG+ZL9o5ORZaiTieEVnm/S4qMVw9Eq9hxdBjGd+QpFlPNMD7LM1Ck2VEjCB6gP9lRUUaqEQ7vwzqd113LIYDKDGjkPNJImb4mK2SvOlHqc7jlaqR8H+bLkLxZubPN4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=daDwkX3d; arc=none smtp.client-ip=52.95.49.90 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="daDwkX3d" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744934795; x=1776470795; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vLwtUWVIpvkkYedTFKAROR42+ngFJH0Kh9G+u85bfrQ=; b=daDwkX3daAfQl1eZ1FblJeUiEZG0SHkj3/FNArRX67O5LGUCVRCDOQqn RAQ0P5x37DNVcLVVTKmcJSCaLpzMe0jEcT3AiJYz+SKnasoKU7jFZAJCl U8TsebJhsnaxYu5OqAFZOe8+LVLiG33FYugUUA7WEqcD362ScpZHyqIHS U=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="490407561" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-6002.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:06:33 +0000 Received: from EX19MTAUWC002.ant.amazon.com [10.0.38.20:36211] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.13.240:2525] with esmtp (Farcaster) id 4ee7b281-462b-4014-9959-cd4e7f4b755d; Fri, 18 Apr 2025 00:06:32 +0000 (UTC) X-Farcaster-Flow-ID: 4ee7b281-462b-4014-9959-cd4e7f4b755d Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWC002.ant.amazon.com (10.250.64.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:06:31 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:06:29 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 04/15] ipv6: Check GATEWAY in rtm_to_fib6_multipath_config(). Date: Thu, 17 Apr 2025 17:03:45 -0700 Message-ID: <20250418000443.43734-5-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D035UWA003.ant.amazon.com (10.13.139.86) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org In ip6_route_multipath_add(), we call rt6_qualify_for_ecmp() for each entry. If it returns false, the request fails. rt6_qualify_for_ecmp() returns false if either of the conditions below is true: 1. f6i->fib6_flags has RTF_ADDRCONF 2. f6i->nh is not NULL 3. f6i->fib6_nh->fib_nh_gw_family is AF_UNSPEC 1 is unnecessary because rtm_to_fib6_config() never sets RTF_ADDRCONF to cfg->fc_flags. 2. is equivalent with cfg->fc_nh_id. 3. can be replaced by checking RTF_GATEWAY in the base and each multipath entry because AF_INET6 is set to f6i->fib6_nh->fib_nh_gw_family only when cfg.fc_is_fdb is true or RTF_GATEWAY is set, but the former is always false. These checks do not require RCU and can be done earlier. Let's perform the equivalent checks in rtm_to_fib6_multipath_config(). Signed-off-by: Kuniyuki Iwashima --- v3: Explain the checks do not require RCU. --- net/ipv6/route.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 23102f37f220..5f370c269e64 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -5030,6 +5030,7 @@ static int rtm_to_fib6_multipath_config(struct fib6_config *cfg, } do { + bool has_gateway = cfg->fc_flags & RTF_GATEWAY; int attrlen = rtnh_attrlen(rtnh); if (attrlen > 0) { @@ -5043,9 +5044,17 @@ static int rtm_to_fib6_multipath_config(struct fib6_config *cfg, "Invalid IPv6 address in RTA_GATEWAY"); return -EINVAL; } + + has_gateway = true; } } + if (newroute && (cfg->fc_nh_id || !has_gateway)) { + NL_SET_ERR_MSG(extack, + "Device only routes can not be added for IPv6 using the multipath API."); + return -EINVAL; + } + rtnh = rtnh_next(rtnh, &remaining); } while (rtnh_ok(rtnh, remaining)); @@ -5387,13 +5396,6 @@ static int ip6_route_multipath_add(struct fib6_config *cfg, rt = NULL; goto cleanup; } - if (!rt6_qualify_for_ecmp(rt)) { - err = -EINVAL; - NL_SET_ERR_MSG(extack, - "Device only routes can not be added for IPv6 using the multipath API."); - fib6_info_release(rt); - goto cleanup; - } rt->fib6_nh->fib_nh_weight = rtnh->rtnh_hops + 1; From patchwork Fri Apr 18 00:03:46 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056432 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-80009.amazon.com (smtp-fw-80009.amazon.com [99.78.197.220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2FDF723DE for ; Fri, 18 Apr 2025 00:08:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.220 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934888; cv=none; b=FPymTfloKHBFZackhb13/XBF6aNY9mu6twTzEbP6iieuRLbmLZFIB2pJvXjsDc2S2PA4F7NL8lNI9dRzO9Wq/uDRbglawBDQUzltNBccF1n2F5zbv1S8kBCc9JUZLx3IlmOoXPp/RcxIDOCe9rRTjo1zBS+8qwr8kt10mX+EttI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934888; c=relaxed/simple; bh=LXCC3DiTO+MpUZtkieMVJbmIUAMoaxu15xl9NpIgdzU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WqXw2wNlLs8OmOaWYMsLg17DYKSjnyxARe8HB5Zt5z37UWaRodN5tRo/iOgPc01CTyRk0U8mLLbiA73Xm6budIqoloO2NpJ6gSVVp/RFas1zmVfO+fyCWYDI0dqkFBj6pd8z1qAen8Id7YIjpycCRnlghorDqLx/OxxmS3NyRd4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=EtpqqN6q; arc=none smtp.client-ip=99.78.197.220 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="EtpqqN6q" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744934887; x=1776470887; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OsCX6bAx47k1BIW0b5BPQZME3GBSntyrmTDvcdAH6jQ=; b=EtpqqN6q5QB26USIR+C+SG6Hs+jyKpvKwGrM2UzeBxlQGslptRzVWHyz qwwZgLeWYmc1QcaYBGBl4h9yZGW7nHHO9PtD7/N68cYojNlOUVw2leWSr 5X7ENVPoOjLxRM/mLh5X4kbEaZlGpLQuImheKm+KyIW0TIoXIcinil2+q o=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="192135979" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80009.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:06:56 +0000 Received: from EX19MTAUWC001.ant.amazon.com [10.0.7.35:5819] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.50.54:2525] with esmtp (Farcaster) id 62a842f2-fadf-4c55-9f48-cccaabb2ef78; Fri, 18 Apr 2025 00:06:56 +0000 (UTC) X-Farcaster-Flow-ID: 62a842f2-fadf-4c55-9f48-cccaabb2ef78 Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWC001.ant.amazon.com (10.250.64.174) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:06:56 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:06:53 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 05/15] ipv6: Move nexthop_find_by_id() after fib6_info_alloc(). Date: Thu, 17 Apr 2025 17:03:46 -0700 Message-ID: <20250418000443.43734-6-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D042UWB002.ant.amazon.com (10.13.139.175) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT. Then, we must perform two lookups for nexthop and dev under RCU to guarantee their lifetime. ip6_route_info_create() calls nexthop_find_by_id() first if RTA_NH_ID is specified, and then allocates struct fib6_info. nexthop_find_by_id() must be called under RCU, but we do not want to use GFP_ATOMIC for memory allocation here, which will be likely to fail in ip6_route_multipath_add(). Let's move nexthop_find_by_id() after the memory allocation so that we can later split ip6_route_info_create() into two parts: the sleepable part and the RCU part. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni --- net/ipv6/route.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 5f370c269e64..06c5414fc14e 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3733,24 +3733,11 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, { struct net *net = cfg->fc_nlinfo.nl_net; struct fib6_info *rt = NULL; - struct nexthop *nh = NULL; struct fib6_table *table; struct fib6_nh *fib6_nh; - int err = -EINVAL; + int err = -ENOBUFS; int addr_type; - if (cfg->fc_nh_id) { - nh = nexthop_find_by_id(net, cfg->fc_nh_id); - if (!nh) { - NL_SET_ERR_MSG(extack, "Nexthop id does not exist"); - goto out; - } - err = fib6_check_nexthop(nh, cfg, extack); - if (err) - goto out; - } - - err = -ENOBUFS; if (cfg->fc_nlinfo.nlh && !(cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_CREATE)) { table = fib6_get_table(net, cfg->fc_table); @@ -3766,7 +3753,7 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, goto out; err = -ENOMEM; - rt = fib6_info_alloc(gfp_flags, !nh); + rt = fib6_info_alloc(gfp_flags, !cfg->fc_nh_id); if (!rt) goto out; @@ -3802,12 +3789,27 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, ipv6_addr_prefix(&rt->fib6_src.addr, &cfg->fc_src, cfg->fc_src_len); rt->fib6_src.plen = cfg->fc_src_len; #endif - if (nh) { + + if (cfg->fc_nh_id) { + struct nexthop *nh; + + nh = nexthop_find_by_id(net, cfg->fc_nh_id); + if (!nh) { + err = -EINVAL; + NL_SET_ERR_MSG(extack, "Nexthop id does not exist"); + goto out_free; + } + + err = fib6_check_nexthop(nh, cfg, extack); + if (err) + goto out_free; + if (!nexthop_get(nh)) { NL_SET_ERR_MSG(extack, "Nexthop has been deleted"); err = -ENOENT; goto out_free; } + rt->nh = nh; fib6_nh = nexthop_fib6_nh(rt->nh); } else { From patchwork Fri Apr 18 00:03:47 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056430 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-80009.amazon.com (smtp-fw-80009.amazon.com [99.78.197.220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08680372 for ; Fri, 18 Apr 2025 00:07:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.220 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934846; cv=none; b=Kv1EKg1ziXbmkSduCIU5fsg3jg7ytHaxWEZpL5hGHTXCdqLNizhVUUtbsAtoZbtBP/gwuAVCFsbmzJzRhxBKT2AcZtMSDzy+fRvMun1JHcayT+7jnnHUa/GqTmrKyFGVKmf4iaLBlMZAbvwptqMEq6Ww8kr+w+pcdAMs6+pmNsc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934846; c=relaxed/simple; bh=naN3x+y/H5dS/7zD9R/he30OEFrfAPg7DSgsAFR9EuA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=gn6JIhwgdFlkQLYLc+oB3Vuv1JGvk3sg0N1WzyNFbzgTVln6X1KCp5hp+2eHbGJ0SCawRIhELP6rVVibtEGK7cKPdGtg4BLC2Iu6611dfIBMDYfZ7WNLcD23o+IeJAKIzRSAGcUuRqsz07E/rzAxK9+CXlNiWWfMAhk9SX5RpQI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=Bpzj4K1+; arc=none smtp.client-ip=99.78.197.220 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="Bpzj4K1+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744934841; x=1776470841; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cZpXa3rh32OWvtEfAr8sx+3l0N4GmAu1NnfMF1Q2nq8=; b=Bpzj4K1+qtbihncT1MYi3dmUSd1qYEHuLzysDMZhev02gaPAaM5T9PP7 fjHIpK/Iw+JYkiub2yvJpUkz1kcpYiREVs4q/X3xylPOh2Bkv8RyeezzE 0y/09BGoUtNPgk2pzU4gIVljVHhwNmqJcoGi/HZxG35gebAloY4XcYGnC A=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="192136032" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80009.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:07:20 +0000 Received: from EX19MTAUWA002.ant.amazon.com [10.0.38.20:23530] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.42.20:2525] with esmtp (Farcaster) id 0f7071ea-65f6-44e0-88f9-1571aaabb56e; Fri, 18 Apr 2025 00:07:20 +0000 (UTC) X-Farcaster-Flow-ID: 0f7071ea-65f6-44e0-88f9-1571aaabb56e Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWA002.ant.amazon.com (10.250.64.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:07:19 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:07:17 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 06/15] ipv6: Split ip6_route_info_create(). Date: Thu, 17 Apr 2025 17:03:47 -0700 Message-ID: <20250418000443.43734-7-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D045UWC003.ant.amazon.com (10.13.139.198) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT and rely on RCU to guarantee dev and nexthop lifetime. Then, we want to allocate as much as possible before entering the RCU section. The RCU section will start in the middle of ip6_route_info_create(), and this is problematic for ip6_route_multipath_add() that calls ip6_route_info_create() multiple times. Let's split ip6_route_info_create() into two parts; one for memory allocation and another for nexthop setup. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni --- v3: Update changelog s/everything as possible/as much as possible/ --- net/ipv6/route.c | 95 +++++++++++++++++++++++++++++++----------------- 1 file changed, 62 insertions(+), 33 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 06c5414fc14e..7328404c77c1 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3728,15 +3728,13 @@ void fib6_nh_release_dsts(struct fib6_nh *fib6_nh) } static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, - gfp_t gfp_flags, - struct netlink_ext_ack *extack) + gfp_t gfp_flags, + struct netlink_ext_ack *extack) { struct net *net = cfg->fc_nlinfo.nl_net; - struct fib6_info *rt = NULL; struct fib6_table *table; - struct fib6_nh *fib6_nh; - int err = -ENOBUFS; - int addr_type; + struct fib6_info *rt; + int err; if (cfg->fc_nlinfo.nlh && !(cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_CREATE)) { @@ -3748,22 +3746,22 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, } else { table = fib6_new_table(net, cfg->fc_table); } + if (!table) { + err = -ENOBUFS; + goto err; + } - if (!table) - goto out; - - err = -ENOMEM; rt = fib6_info_alloc(gfp_flags, !cfg->fc_nh_id); - if (!rt) - goto out; + if (!rt) { + err = -ENOMEM; + goto err; + } rt->fib6_metrics = ip_fib_metrics_init(cfg->fc_mx, cfg->fc_mx_len, extack); if (IS_ERR(rt->fib6_metrics)) { err = PTR_ERR(rt->fib6_metrics); - /* Do not leave garbage there. */ - rt->fib6_metrics = (struct dst_metrics *)&dst_default_metrics; - goto out_free; + goto free; } if (cfg->fc_flags & RTF_ADDRCONF) @@ -3771,12 +3769,12 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, if (cfg->fc_flags & RTF_EXPIRES) fib6_set_expires(rt, jiffies + - clock_t_to_jiffies(cfg->fc_expires)); + clock_t_to_jiffies(cfg->fc_expires)); if (cfg->fc_protocol == RTPROT_UNSPEC) cfg->fc_protocol = RTPROT_BOOT; - rt->fib6_protocol = cfg->fc_protocol; + rt->fib6_protocol = cfg->fc_protocol; rt->fib6_table = table; rt->fib6_metric = cfg->fc_metric; rt->fib6_type = cfg->fc_type ? : RTN_UNICAST; @@ -3789,6 +3787,20 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, ipv6_addr_prefix(&rt->fib6_src.addr, &cfg->fc_src, cfg->fc_src_len); rt->fib6_src.plen = cfg->fc_src_len; #endif + return rt; +free: + kfree(rt); +err: + return ERR_PTR(err); +} + +static int ip6_route_info_create_nh(struct fib6_info *rt, + struct fib6_config *cfg, + struct netlink_ext_ack *extack) +{ + struct net *net = cfg->fc_nlinfo.nl_net; + struct fib6_nh *fib6_nh; + int err; if (cfg->fc_nh_id) { struct nexthop *nh; @@ -3813,9 +3825,11 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, rt->nh = nh; fib6_nh = nexthop_fib6_nh(rt->nh); } else { - err = fib6_nh_init(net, rt->fib6_nh, cfg, gfp_flags, extack); + int addr_type; + + err = fib6_nh_init(net, rt->fib6_nh, cfg, GFP_ATOMIC, extack); if (err) - goto out; + goto out_release; fib6_nh = rt->fib6_nh; @@ -3834,21 +3848,20 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, if (!ipv6_chk_addr(net, &cfg->fc_prefsrc, dev, 0)) { NL_SET_ERR_MSG(extack, "Invalid source address"); err = -EINVAL; - goto out; + goto out_release; } rt->fib6_prefsrc.addr = cfg->fc_prefsrc; rt->fib6_prefsrc.plen = 128; - } else - rt->fib6_prefsrc.plen = 0; + } - return rt; -out: + return 0; +out_release: fib6_info_release(rt); - return ERR_PTR(err); + return err; out_free: ip_fib_metrics_put(rt->fib6_metrics); kfree(rt); - return ERR_PTR(err); + return err; } int ip6_route_add(struct fib6_config *cfg, gfp_t gfp_flags, @@ -3861,6 +3874,10 @@ int ip6_route_add(struct fib6_config *cfg, gfp_t gfp_flags, if (IS_ERR(rt)) return PTR_ERR(rt); + err = ip6_route_info_create_nh(rt, cfg, extack); + if (err) + return err; + err = __ip6_ins_rt(rt, &cfg->fc_nlinfo, extack); fib6_info_release(rt); @@ -4584,6 +4601,7 @@ struct fib6_info *addrconf_f6i_alloc(struct net *net, .fc_ignore_dev_down = true, }; struct fib6_info *f6i; + int err; if (anycast) { cfg.fc_type = RTN_ANYCAST; @@ -4594,14 +4612,19 @@ struct fib6_info *addrconf_f6i_alloc(struct net *net, } f6i = ip6_route_info_create(&cfg, gfp_flags, extack); - if (!IS_ERR(f6i)) { - f6i->dst_nocount = true; + if (IS_ERR(f6i)) + return f6i; - if (!anycast && - (READ_ONCE(net->ipv6.devconf_all->disable_policy) || - READ_ONCE(idev->cnf.disable_policy))) - f6i->dst_nopolicy = true; - } + err = ip6_route_info_create_nh(f6i, &cfg, extack); + if (err) + return ERR_PTR(err); + + f6i->dst_nocount = true; + + if (!anycast && + (READ_ONCE(net->ipv6.devconf_all->disable_policy) || + READ_ONCE(idev->cnf.disable_policy))) + f6i->dst_nopolicy = true; return f6i; } @@ -5399,6 +5422,12 @@ static int ip6_route_multipath_add(struct fib6_config *cfg, goto cleanup; } + err = ip6_route_info_create_nh(rt, &r_cfg, extack); + if (err) { + rt = NULL; + goto cleanup; + } + rt->fib6_nh->fib_nh_weight = rtnh->rtnh_hops + 1; err = ip6_route_info_append(info->nl_net, &rt6_nh_list, From patchwork Fri Apr 18 00:03:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056431 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-80009.amazon.com (smtp-fw-80009.amazon.com [99.78.197.220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC924372 for ; Fri, 18 Apr 2025 00:07:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.220 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934868; cv=none; b=k1953+H+IEEcGNQZvjR3ZB5mC+SZTHWIZYh1hHM8GTBpSwaiYRoiJI9zR6nd/Bymu1HprtrAdXXUu7GEaR6P7zxDTqKfIcep7W4LXefsnQ0stqpASvkSvrnlUFE6QqcrEhYZeSzDmfnKjzJBU55M5srIxhBMnrrTAt8OAiyegEo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934868; c=relaxed/simple; bh=0VjY1ESLr76PmkuLVTZHldsHAtwKKI8bX3QbI6POrR0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=MIcSOx5HZnBoNw4ozLRvmd7W/iIvV6mvPPz72hIa4ZJw3xCklPvR15pPN5JH0bw0blzllPy11l8eqNDmXZKEnocJInwZtXGynUzVDuMfGYNlOMr3U66Ttn1juDCa4tR2Sddl/jKUzkLw9AWNj7FZsNtY+gBQa9XmW8/Q6qvL1UY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=bfJ/h4UL; arc=none smtp.client-ip=99.78.197.220 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="bfJ/h4UL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744934866; x=1776470866; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=H/9Ib4uAPeswRKX89M/H+u2dIGhKz124xau90Arp5DY=; b=bfJ/h4ULLU2DKl5H8HAl2xJChgqeItIR57l4D/khMBsmxcIt+hgOBYcI f1TGtSSeI9cPkndtAfJfVAC+9Ve/257AWEeX8F6I5S6SLn+/9gKfI4S2s 3tlfflUbLglaFx3zHWmXAdY9QB/iGdZgY00ZPJD93bvSPRsbBZ6F24Pba Y=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="192136091" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80009.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:07:45 +0000 Received: from EX19MTAUWB001.ant.amazon.com [10.0.21.151:40033] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.47.20:2525] with esmtp (Farcaster) id 38dd2e39-73db-440b-b4d6-6b67a6b66bf5; Fri, 18 Apr 2025 00:07:45 +0000 (UTC) X-Farcaster-Flow-ID: 38dd2e39-73db-440b-b4d6-6b67a6b66bf5 Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWB001.ant.amazon.com (10.250.64.248) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:07:44 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:07:41 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 07/15] ipv6: Preallocate rt->fib6_nh->rt6i_pcpu in ip6_route_info_create(). Date: Thu, 17 Apr 2025 17:03:48 -0700 Message-ID: <20250418000443.43734-8-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D032UWB001.ant.amazon.com (10.13.139.152) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org ip6_route_info_create_nh() will be called under RCU. Then, fib6_nh_init() is also under RCU, but per-cpu memory allocation is very likely to fail with GFP_ATOMIC while bulk-adding IPv6 routes and we would see a bunch of this message in dmesg. percpu: allocation failed, size=8 align=8 atomic=1, atomic alloc failed, no space left percpu: allocation failed, size=8 align=8 atomic=1, atomic alloc failed, no space left Let's preallocate rt->fib6_nh->rt6i_pcpu in ip6_route_info_create(). If something fails before the original memory allocation in fib6_nh_init(), ip6_route_info_create_nh() calls fib6_info_release(), which releases the preallocated per-cpu memory. Note that rt->fib6_nh->rt6i_pcpu is not preallocated when called via ipv6_stub, so we still need alloc_percpu_gfp() in fib6_nh_init(). Signed-off-by: Kuniyuki Iwashima --- v3: Explain alloc_percpu_gfp() is still needed when called via ipv6_stub. --- net/ipv6/route.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 7328404c77c1..b0ddb73c732e 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3664,10 +3664,12 @@ int fib6_nh_init(struct net *net, struct fib6_nh *fib6_nh, goto out; pcpu_alloc: - fib6_nh->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, gfp_flags); if (!fib6_nh->rt6i_pcpu) { - err = -ENOMEM; - goto out; + fib6_nh->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, gfp_flags); + if (!fib6_nh->rt6i_pcpu) { + err = -ENOMEM; + goto out; + } } fib6_nh->fib_nh_dev = dev; @@ -3727,6 +3729,15 @@ void fib6_nh_release_dsts(struct fib6_nh *fib6_nh) } } +static int fib6_nh_prealloc_percpu(struct fib6_nh *fib6_nh, gfp_t gfp_flags) +{ + fib6_nh->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, gfp_flags); + if (!fib6_nh->rt6i_pcpu) + return -ENOMEM; + + return 0; +} + static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, gfp_t gfp_flags, struct netlink_ext_ack *extack) @@ -3764,6 +3775,12 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, goto free; } + if (!cfg->fc_nh_id) { + err = fib6_nh_prealloc_percpu(&rt->fib6_nh[0], gfp_flags); + if (err) + goto free_metrics; + } + if (cfg->fc_flags & RTF_ADDRCONF) rt->dst_nocount = true; @@ -3788,6 +3805,8 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, rt->fib6_src.plen = cfg->fc_src_len; #endif return rt; +free_metrics: + ip_fib_metrics_put(rt->fib6_metrics); free: kfree(rt); err: From patchwork Fri Apr 18 00:03:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056433 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-52005.amazon.com (smtp-fw-52005.amazon.com [52.119.213.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABD8879EA for ; Fri, 18 Apr 2025 00:08:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934896; cv=none; b=C5TPfYDoGKTfzQ/EgrhufaUY3Aiu/4geic4XGD4M1KjWjY8EOLq6lR9NVJRpqujJqjKTCWQ11E7rSntebCZHjL0YNFDGVV/NVYDJEDHjvttv29qyVmZrXSjgzEMYH6F5MzGBQSsm0SE3ll+Nc8ciQGfABwv6m7JTiUS4Wy2Ntg8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934896; c=relaxed/simple; bh=k/8qX0BNwgh176hMRAi+XVQ3knseryLGwqugBHGY5Lo=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=b15L/haTR+aFAL1ktmmZJqDN4SruKy+T8ax6QLdITAq1YCsx9EzTfiwfcYi3lhdnXqruxjOkBjk6ePm9yJAqXENtSWsK3Sty5ALuZhpixae3a/jXw3sCcstmE/9I5dm2J58kLUR+0OHYoIaOQM644ixRo+fE9rb5Z7csoWNTgvo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=f037affS; arc=none smtp.client-ip=52.119.213.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="f037affS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744934894; x=1776470894; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=x/vIwFPrQWzVaKDGSkoh0hV7FgliOgaH5bk4sHYD+Sc=; b=f037affSWL2Jn/Sl9ltYIkf7+XDilNfnOu+CT55YygCEi7ukGUUDJe7/ IHEFEZ6Fw94euAEc+qkTw/zTxOOQhJ0ftY/UcMrOd3+MxpvBdVN94vRtP ktm7FewU+MjeSXhKMo5Fl6x1kHhAraYp2C1yQmkWDcPQGIQhK4uyMUhcs k=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="736588014" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52005.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:08:13 +0000 Received: from EX19MTAUWB002.ant.amazon.com [10.0.21.151:56926] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.42.20:2525] with esmtp (Farcaster) id 4df646f8-28ca-497a-8aa6-08d3e18d79ef; Fri, 18 Apr 2025 00:08:12 +0000 (UTC) X-Farcaster-Flow-ID: 4df646f8-28ca-497a-8aa6-08d3e18d79ef Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWB002.ant.amazon.com (10.250.64.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:08:08 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:08:05 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 08/15] ipv6: Preallocate nhc_pcpu_rth_output in ip6_route_info_create(). Date: Thu, 17 Apr 2025 17:03:49 -0700 Message-ID: <20250418000443.43734-9-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D032UWB001.ant.amazon.com (10.13.139.152) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org ip6_route_info_create_nh() will be called under RCU. It calls fib_nh_common_init() and allocates nhc->nhc_pcpu_rth_output. As with the reason for rt->fib6_nh->rt6i_pcpu, we want to avoid GFP_ATOMIC allocation for nhc->nhc_pcpu_rth_output under RCU. Let's preallocate it in ip6_route_info_create(). Signed-off-by: Kuniyuki Iwashima --- net/ipv4/fib_semantics.c | 10 ++++++---- net/ipv6/route.c | 9 +++++++++ 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index f68bb9e34c34..5326f1501af0 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -617,10 +617,12 @@ int fib_nh_common_init(struct net *net, struct fib_nh_common *nhc, { int err; - nhc->nhc_pcpu_rth_output = alloc_percpu_gfp(struct rtable __rcu *, - gfp_flags); - if (!nhc->nhc_pcpu_rth_output) - return -ENOMEM; + if (!nhc->nhc_pcpu_rth_output) { + nhc->nhc_pcpu_rth_output = alloc_percpu_gfp(struct rtable __rcu *, + gfp_flags); + if (!nhc->nhc_pcpu_rth_output) + return -ENOMEM; + } if (encap) { struct lwtunnel_state *lwtstate; diff --git a/net/ipv6/route.c b/net/ipv6/route.c index b0ddb73c732e..ea755027cf61 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3731,10 +3731,19 @@ void fib6_nh_release_dsts(struct fib6_nh *fib6_nh) static int fib6_nh_prealloc_percpu(struct fib6_nh *fib6_nh, gfp_t gfp_flags) { + struct fib_nh_common *nhc = &fib6_nh->nh_common; + fib6_nh->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, gfp_flags); if (!fib6_nh->rt6i_pcpu) return -ENOMEM; + nhc->nhc_pcpu_rth_output = alloc_percpu_gfp(struct rtable __rcu *, + gfp_flags); + if (!nhc->nhc_pcpu_rth_output) { + free_percpu(fib6_nh->rt6i_pcpu); + return -ENOMEM; + } + return 0; } From patchwork Fri Apr 18 00:03:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056434 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-80006.amazon.com (smtp-fw-80006.amazon.com [99.78.197.217]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D31583C30 for ; Fri, 18 Apr 2025 00:08:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.217 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934917; cv=none; b=rOY5aE/Jf3jCIX1nAs+6TP01FvOb7lLspwC2krkZ70U3lrn+OCmkwfKOi/EfyYq8rOq+WOIVwqjqNmjEepUfmVdXwnczwoqSddHTmQnvL76lopeeF24+bNdB5soHEn6Kv9H9wIX8oFL12tXx/1x/Nf77oQdYogJbSegYhCgsGgA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934917; c=relaxed/simple; bh=XC4DC1youASP2x8nFAqmoT4PbQsEIh5HauJs/aMf70s=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=U56n6Am/mEjwe9dXdwxu3CHaJm5KTBMk1tEExX5kO8I9DHBUtkC0NXBhX+XUl31BFGqOCTbxPl1yhDdyut1i7UdVcTuuSu0VvxJKOd0zYpaRmXvVLn6Oi91/WbhHA8+IMZsEK3b5QzIhIYYD8rp3wJ5MT2f3b7OuhEpdl8cjV/k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=b6ErQaSk; arc=none smtp.client-ip=99.78.197.217 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="b6ErQaSk" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744934916; x=1776470916; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LJZUXxmKaCTUQVVUwy5Cspv1ZWBy26pfo9q0xyLuGN8=; b=b6ErQaSkPw2FLecod1DrlD3G74nboPfmYnU0ADtW5TWkisvjlGuoP+vO xWmkyeGlqfourBBk15bGKA7SwAqEDRfvfaKaslSkQ0IucDpLdsUni1GLQ gZaFdz2tGLfHuxccGc6lIyDVrbEe73X3cZQhtIru71a+cZ69WounZBOQ2 g=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="41634107" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80006.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:08:34 +0000 Received: from EX19MTAUWB001.ant.amazon.com [10.0.21.151:38407] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.13.240:2525] with esmtp (Farcaster) id 8de30197-680b-4b71-aca8-66aeba7cfabd; Fri, 18 Apr 2025 00:08:33 +0000 (UTC) X-Farcaster-Flow-ID: 8de30197-680b-4b71-aca8-66aeba7cfabd Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWB001.ant.amazon.com (10.250.64.248) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:08:32 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:08:29 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 09/15] ipv6: Don't pass net to ip6_route_info_append(). Date: Thu, 17 Apr 2025 17:03:50 -0700 Message-ID: <20250418000443.43734-10-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D039UWA004.ant.amazon.com (10.13.139.68) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org net is not used in ip6_route_info_append() after commit 36f19d5b4f99 ("net/ipv6: Remove extra call to ip6_convert_metrics for multipath case"). Let's remove the argument. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni --- net/ipv6/route.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index ea755027cf61..af11fcaa5cf3 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -5317,8 +5317,7 @@ struct rt6_nh { struct list_head next; }; -static int ip6_route_info_append(struct net *net, - struct list_head *rt6_nh_list, +static int ip6_route_info_append(struct list_head *rt6_nh_list, struct fib6_info *rt, struct fib6_config *r_cfg) { @@ -5458,8 +5457,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg, rt->fib6_nh->fib_nh_weight = rtnh->rtnh_hops + 1; - err = ip6_route_info_append(info->nl_net, &rt6_nh_list, - rt, &r_cfg); + err = ip6_route_info_append(&rt6_nh_list, rt, &r_cfg); if (err) { fib6_info_release(rt); goto cleanup; From patchwork Fri Apr 18 00:03:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056437 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B63494A2D for ; Fri, 18 Apr 2025 00:10:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744935015; cv=none; b=lIYxkMpcKrdsyY+f4Op482VR/wasRCntPQ76544sGHaXUSY5fvmDJZM8mjUouMPyhv8miczPdzOewmJE9Hue4NWc0U0Kl/4cWLSyAjO7JBOXJZ+vx1iLL2eF4iW7DYrXhEX55S+RXgIH9i8iX7+rRUn5OzLcF4dysghxuJTxvCY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744935015; c=relaxed/simple; bh=ocxHcpFd/aXAiqEguL79Hk8g2DvC9kRI4VaXt7Pc0/M=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=G8DfQJ9GNF62mLSg6WuSRptWOvqfAc8khfXRtjSvtYgEoB6RBo5UoSRV6AWAxBqgMNoDrShX6EHnwqlR8ME9xFgUZ24VlGCk+PHov74KIG9jeTc0tn0jnEEPa3BUpEw8heRuNcPvyvXpigRiQNjJvGRuZTkyTowrQotmKCfQm84= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=lrcS/1nz; arc=none smtp.client-ip=52.119.213.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="lrcS/1nz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744935010; x=1776471010; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=F8blhG/i8aGYh/3YoAtrjYC/S1xXErFQ4LA8ZKcBdbc=; b=lrcS/1nzBDsZUa1MQHBayee4xfy7O1GKP3WEAnqGeZ14Rzd541ku88vN Z8bgytqk7Ki6a28b/yKBApft5kx2nNdFqPse8wZDi+P1SmXM28fZSz07E V/d/wFTdmwF9jH9dZxUxvMvrl9MSlt9jZM6Frh4zrUMIsj0hhGgn60Hrk 0=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="289448017" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:08:59 +0000 Received: from EX19MTAUWC001.ant.amazon.com [10.0.7.35:7157] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.29.65:2525] with esmtp (Farcaster) id 689019fb-99e4-4766-b6b9-8fce679beab4; Fri, 18 Apr 2025 00:08:58 +0000 (UTC) X-Farcaster-Flow-ID: 689019fb-99e4-4766-b6b9-8fce679beab4 Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWC001.ant.amazon.com (10.250.64.174) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:08:56 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:08:53 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 10/15] ipv6: Rename rt6_nh.next to rt6_nh.list. Date: Thu, 17 Apr 2025 17:03:51 -0700 Message-ID: <20250418000443.43734-11-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D031UWA002.ant.amazon.com (10.13.139.96) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org ip6_route_multipath_add() allocates struct rt6_nh for each config of multipath routes to link them to a local list rt6_nh_list. struct rt6_nh.next is the list node of each config, so the name is quite misleading. Let's rename it to list. Suggested-by: Paolo Abeni Link: https://lore.kernel.org/netdev/c9bee472-c94e-4878-8cc2-1512b2c54db5@redhat.com/ Signed-off-by: Kuniyuki Iwashima --- net/ipv6/route.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index af11fcaa5cf3..05e33d319488 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -5314,7 +5314,7 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh, struct rt6_nh { struct fib6_info *fib6_info; struct fib6_config r_cfg; - struct list_head next; + struct list_head list; }; static int ip6_route_info_append(struct list_head *rt6_nh_list, @@ -5324,7 +5324,7 @@ static int ip6_route_info_append(struct list_head *rt6_nh_list, struct rt6_nh *nh; int err = -EEXIST; - list_for_each_entry(nh, rt6_nh_list, next) { + list_for_each_entry(nh, rt6_nh_list, list) { /* check if fib6_info already exists */ if (rt6_duplicate_nexthop(nh->fib6_info, rt)) return err; @@ -5335,7 +5335,7 @@ static int ip6_route_info_append(struct list_head *rt6_nh_list, return -ENOMEM; nh->fib6_info = rt; memcpy(&nh->r_cfg, r_cfg, sizeof(*r_cfg)); - list_add_tail(&nh->next, rt6_nh_list); + list_add_tail(&nh->list, rt6_nh_list); return 0; } @@ -5478,7 +5478,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg, info->skip_notify_kernel = 1; err_nh = NULL; - list_for_each_entry(nh, &rt6_nh_list, next) { + list_for_each_entry(nh, &rt6_nh_list, list) { err = __ip6_ins_rt(nh->fib6_info, info, extack); if (err) { @@ -5546,16 +5546,16 @@ static int ip6_route_multipath_add(struct fib6_config *cfg, ip6_route_mpath_notify(rt_notif, rt_last, info, nlflags); /* Delete routes that were already added */ - list_for_each_entry(nh, &rt6_nh_list, next) { + list_for_each_entry(nh, &rt6_nh_list, list) { if (err_nh == nh) break; ip6_route_del(&nh->r_cfg, extack); } cleanup: - list_for_each_entry_safe(nh, nh_safe, &rt6_nh_list, next) { + list_for_each_entry_safe(nh, nh_safe, &rt6_nh_list, list) { fib6_info_release(nh->fib6_info); - list_del(&nh->next); + list_del(&nh->list); kfree(nh); } From patchwork Fri Apr 18 00:03:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056435 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E12271C32 for ; Fri, 18 Apr 2025 00:09:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934964; cv=none; b=QmXwiO9HVUQRlo+qsxLdVUouBdqILo9qWdGD8dxzcUNx92/NdNGv8bR4LaOMF81eZp+J0ohdrQD2fz+KfeAHqoFl9hiGwtnOaLau+zFcn+lxJYQwtXZRxFF5ZdUg8dUIU4y5i041n/Bp3ohvrNLye0Tq3rnaLhOwwp1CqM7vbQQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744934964; c=relaxed/simple; bh=qPgP/W4r1aJzNQ84WpxIrIyoeZRQSoEDj/JPbsOxS5Q=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Ha2+obPoRFxbt1P6jfymiVmFxQC5tOYCs/v2CZIBipLxIph27WpsHfpn7fNfVIzn83Zq2INoRgGbhcddjUZRTaX51ypPEChlRnItYjupw0oX3XdjZm/futFQNTRfwD3xr3kwWCJbzEovILlai7ueLtQYL3iIhXyxZgL8icD3ZZs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=Zdn2rHM0; arc=none smtp.client-ip=52.119.213.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="Zdn2rHM0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744934963; x=1776470963; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JYRoj6FJbX7eMeeZIq41fOwsKA5MzHr6pxu8v4PaDRM=; b=Zdn2rHM0jJ26lIWryXilo9VGfnWh2SMxeAmHUNaCDMyxccsQQmgzg2e4 c14xKTkJckgGE6TdTdnNw6jNC6gi9tPYnAO8JPIGqdnhvVC9/7GVN6q9U B3kEGpXwNP/k8YkfuqHZkMSJQ2Y9maN+sMJHylR92l7lCcWLJ22r3LOBq c=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="289448084" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:09:21 +0000 Received: from EX19MTAUWB002.ant.amazon.com [10.0.21.151:50334] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.50.54:2525] with esmtp (Farcaster) id 901e5d79-fc7a-4429-9927-a0b08f39d77f; Fri, 18 Apr 2025 00:09:20 +0000 (UTC) X-Farcaster-Flow-ID: 901e5d79-fc7a-4429-9927-a0b08f39d77f Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWB002.ant.amazon.com (10.250.64.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:09:20 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:09:18 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 11/15] ipv6: Factorise ip6_route_multipath_add(). Date: Thu, 17 Apr 2025 17:03:52 -0700 Message-ID: <20250418000443.43734-12-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D035UWA003.ant.amazon.com (10.13.139.86) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT and rely on RCU to guarantee dev and nexthop lifetime. Then, the RCU section will start before ip6_route_info_create_nh() in ip6_route_multipath_add(), but ip6_route_info_create() is called in the same loop and will sleep. Let's split the loop into ip6_route_mpath_info_create() and ip6_route_mpath_info_create_nh(). Note that ip6_route_info_append() is now integrated into ip6_route_mpath_info_create_nh() because we need to call different free functions for nexthops that passed ip6_route_info_create_nh(). In case of failure, the remaining nexthops that ip6_route_info_create_nh() has not been called for will be freed by ip6_route_mpath_info_cleanup(). OTOH, if a nexthop passes ip6_route_info_create_nh(), it will be linked to a local temporary list, which will be spliced back to rt6_nh_list. In case of failure, these nexthops will be released by fib6_info_release() in ip6_route_multipath_add(). Signed-off-by: Kuniyuki Iwashima --- net/ipv6/route.c | 205 ++++++++++++++++++++++++++++++----------------- 1 file changed, 130 insertions(+), 75 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 05e33d319488..c8c1c75268e3 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -5315,29 +5315,131 @@ struct rt6_nh { struct fib6_info *fib6_info; struct fib6_config r_cfg; struct list_head list; + int weight; }; -static int ip6_route_info_append(struct list_head *rt6_nh_list, - struct fib6_info *rt, - struct fib6_config *r_cfg) +static void ip6_route_mpath_info_cleanup(struct list_head *rt6_nh_list) { - struct rt6_nh *nh; - int err = -EEXIST; + struct rt6_nh *nh, *nh_next; - list_for_each_entry(nh, rt6_nh_list, list) { - /* check if fib6_info already exists */ - if (rt6_duplicate_nexthop(nh->fib6_info, rt)) - return err; + list_for_each_entry_safe(nh, nh_next, rt6_nh_list, list) { + struct fib6_info *rt = nh->fib6_info; + + if (rt) { + free_percpu(rt->fib6_nh->nh_common.nhc_pcpu_rth_output); + free_percpu(rt->fib6_nh->rt6i_pcpu); + ip_fib_metrics_put(rt->fib6_metrics); + kfree(rt); + } + + list_del(&nh->list); + kfree(nh); } +} - nh = kzalloc(sizeof(*nh), GFP_KERNEL); - if (!nh) - return -ENOMEM; - nh->fib6_info = rt; - memcpy(&nh->r_cfg, r_cfg, sizeof(*r_cfg)); - list_add_tail(&nh->list, rt6_nh_list); +static int ip6_route_mpath_info_create(struct list_head *rt6_nh_list, + struct fib6_config *cfg, + struct netlink_ext_ack *extack) +{ + struct rtnexthop *rtnh; + int remaining; + int err; + + remaining = cfg->fc_mp_len; + rtnh = (struct rtnexthop *)cfg->fc_mp; + + /* Parse a Multipath Entry and build a list (rt6_nh_list) of + * fib6_info structs per nexthop + */ + while (rtnh_ok(rtnh, remaining)) { + struct fib6_config r_cfg; + struct fib6_info *rt; + struct rt6_nh *nh; + int attrlen; + + nh = kzalloc(sizeof(*nh), GFP_KERNEL); + if (!nh) { + err = -ENOMEM; + goto err; + } + + list_add_tail(&nh->list, rt6_nh_list); + + memcpy(&r_cfg, cfg, sizeof(*cfg)); + if (rtnh->rtnh_ifindex) + r_cfg.fc_ifindex = rtnh->rtnh_ifindex; + + attrlen = rtnh_attrlen(rtnh); + if (attrlen > 0) { + struct nlattr *nla, *attrs = rtnh_attrs(rtnh); + + nla = nla_find(attrs, attrlen, RTA_GATEWAY); + if (nla) { + r_cfg.fc_gateway = nla_get_in6_addr(nla); + r_cfg.fc_flags |= RTF_GATEWAY; + } + + r_cfg.fc_encap = nla_find(attrs, attrlen, RTA_ENCAP); + nla = nla_find(attrs, attrlen, RTA_ENCAP_TYPE); + if (nla) + r_cfg.fc_encap_type = nla_get_u16(nla); + } + + r_cfg.fc_flags |= (rtnh->rtnh_flags & RTNH_F_ONLINK); + + rt = ip6_route_info_create(&r_cfg, GFP_KERNEL, extack); + if (IS_ERR(rt)) { + err = PTR_ERR(rt); + goto err; + } + + nh->fib6_info = rt; + nh->weight = rtnh->rtnh_hops + 1; + memcpy(&nh->r_cfg, &r_cfg, sizeof(r_cfg)); + + rtnh = rtnh_next(rtnh, &remaining); + } return 0; +err: + ip6_route_mpath_info_cleanup(rt6_nh_list); + return err; +} + +static int ip6_route_mpath_info_create_nh(struct list_head *rt6_nh_list, + struct netlink_ext_ack *extack) +{ + struct rt6_nh *nh, *nh_next, *nh_tmp; + LIST_HEAD(tmp); + int err; + + list_for_each_entry_safe(nh, nh_next, rt6_nh_list, list) { + struct fib6_info *rt = nh->fib6_info; + + err = ip6_route_info_create_nh(rt, &nh->r_cfg, extack); + if (err) { + nh->fib6_info = NULL; + goto err; + } + + rt->fib6_nh->fib_nh_weight = nh->weight; + + list_move_tail(&nh->list, &tmp); + + list_for_each_entry(nh_tmp, rt6_nh_list, list) { + /* check if fib6_info already exists */ + if (rt6_duplicate_nexthop(nh_tmp->fib6_info, rt)) { + err = -EEXIST; + goto err; + } + } + } +out: + list_splice(&tmp, rt6_nh_list); + return err; +err: + ip6_route_mpath_info_cleanup(rt6_nh_list); + goto out; } static void ip6_route_mpath_notify(struct fib6_info *rt, @@ -5396,75 +5498,28 @@ static int ip6_route_multipath_add(struct fib6_config *cfg, { struct fib6_info *rt_notif = NULL, *rt_last = NULL; struct nl_info *info = &cfg->fc_nlinfo; - struct fib6_config r_cfg; - struct rtnexthop *rtnh; - struct fib6_info *rt; - struct rt6_nh *err_nh; struct rt6_nh *nh, *nh_safe; + LIST_HEAD(rt6_nh_list); + struct rt6_nh *err_nh; __u16 nlflags; - int remaining; - int attrlen; - int err = 1; int nhn = 0; - int replace = (cfg->fc_nlinfo.nlh && - (cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_REPLACE)); - LIST_HEAD(rt6_nh_list); + int replace; + int err; + + replace = (cfg->fc_nlinfo.nlh && + (cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_REPLACE)); nlflags = replace ? NLM_F_REPLACE : NLM_F_CREATE; if (info->nlh && info->nlh->nlmsg_flags & NLM_F_APPEND) nlflags |= NLM_F_APPEND; - remaining = cfg->fc_mp_len; - rtnh = (struct rtnexthop *)cfg->fc_mp; - - /* Parse a Multipath Entry and build a list (rt6_nh_list) of - * fib6_info structs per nexthop - */ - while (rtnh_ok(rtnh, remaining)) { - memcpy(&r_cfg, cfg, sizeof(*cfg)); - if (rtnh->rtnh_ifindex) - r_cfg.fc_ifindex = rtnh->rtnh_ifindex; - - attrlen = rtnh_attrlen(rtnh); - if (attrlen > 0) { - struct nlattr *nla, *attrs = rtnh_attrs(rtnh); - - nla = nla_find(attrs, attrlen, RTA_GATEWAY); - if (nla) { - r_cfg.fc_gateway = nla_get_in6_addr(nla); - r_cfg.fc_flags |= RTF_GATEWAY; - } - - r_cfg.fc_encap = nla_find(attrs, attrlen, RTA_ENCAP); - nla = nla_find(attrs, attrlen, RTA_ENCAP_TYPE); - if (nla) - r_cfg.fc_encap_type = nla_get_u16(nla); - } - - r_cfg.fc_flags |= (rtnh->rtnh_flags & RTNH_F_ONLINK); - rt = ip6_route_info_create(&r_cfg, GFP_KERNEL, extack); - if (IS_ERR(rt)) { - err = PTR_ERR(rt); - rt = NULL; - goto cleanup; - } - - err = ip6_route_info_create_nh(rt, &r_cfg, extack); - if (err) { - rt = NULL; - goto cleanup; - } - - rt->fib6_nh->fib_nh_weight = rtnh->rtnh_hops + 1; - - err = ip6_route_info_append(&rt6_nh_list, rt, &r_cfg); - if (err) { - fib6_info_release(rt); - goto cleanup; - } + err = ip6_route_mpath_info_create(&rt6_nh_list, cfg, extack); + if (err) + return err; - rtnh = rtnh_next(rtnh, &remaining); - } + err = ip6_route_mpath_info_create_nh(&rt6_nh_list, extack); + if (err) + goto cleanup; /* for add and replace send one notification with all nexthops. * Skip the notification in fib6_add_rt2node and send one with From patchwork Fri Apr 18 00:03:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056440 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-52003.amazon.com (smtp-fw-52003.amazon.com [52.119.213.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8234D442C for ; Fri, 18 Apr 2025 00:10:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.152 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744935068; cv=none; b=UuDfeYW08c1c8fpATn/nYtOcrpAE3sRiQaA8aPWGDfvmiXCI+MX17hcNFxi3POca5NSQoLjuYTUe1iExoIDH7AfjYhCNGAT5iTeIbm5wugLR3ePGk2V90qD8RODNnCIuIJWugXkOQrPc45tjhgye9Qdk5K4GB6EHfugOQGnN5Ss= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744935068; c=relaxed/simple; bh=s7CHZqZcsXdyNoNhn9yRvT7Jur7Qmj60erO3CRxPwZs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Ivd69Eu3kbIt7roKBmYT+iFxARX6L0XTbEhYGZKSyy1ilewWNc+VeE60Mrxo929tMgSpxcgiT7GbKWMmSU3ZDuIKS/sjd5EO2TBTY0fwG33eWr85ImdoGgSXNlNjf9KkuXf7E+Z2D5TnVXiEXGbQtUKjn4b88nZZPxHSCLlUP60= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=RoQuTGcB; arc=none smtp.client-ip=52.119.213.152 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="RoQuTGcB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744935057; x=1776471057; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=eDBQgMa7jn0o+BB8P/Y+JwNfnTEUqRUfyQn4GsE6RkI=; b=RoQuTGcBWEwTJv15SFLct38IgyJLQBY64DhipRPC3EWoHJknDeUkFlmc SlWn+0EVxP3oIr5wZY7ofrgf5A5q0neWImJhKnDfbi+iwOTC+CzgdvL6j aDOQsGY3vrBjWl/bHEh+zLb4w+BSBZ3CrsIjsUIJmdLVkK8WCG2M6D4dF w=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="84707639" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:09:47 +0000 Received: from EX19MTAUWB001.ant.amazon.com [10.0.21.151:9668] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.50.54:2525] with esmtp (Farcaster) id d79657d3-1bd1-46da-b7b6-d36d742e9b38; Fri, 18 Apr 2025 00:09:45 +0000 (UTC) X-Farcaster-Flow-ID: d79657d3-1bd1-46da-b7b6-d36d742e9b38 Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWB001.ant.amazon.com (10.250.64.248) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:09:44 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:09:42 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 12/15] ipv6: Protect fib6_link_table() with spinlock. Date: Thu, 17 Apr 2025 17:03:53 -0700 Message-ID: <20250418000443.43734-13-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D039UWA002.ant.amazon.com (10.13.139.32) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT. If the request specifies a new table ID, fib6_new_table() is called to create a new routing table. Two concurrent requests could specify the same table ID, so we need a lock to protect net->ipv6.fib_table_hash[h]. Let's add a spinlock to protect the hash bucket linkage. Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni --- include/net/netns/ipv6.h | 1 + net/ipv6/ip6_fib.c | 26 +++++++++++++++++++++----- 2 files changed, 22 insertions(+), 5 deletions(-) diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h index 5f2cfd84570a..47dc70d8100a 100644 --- a/include/net/netns/ipv6.h +++ b/include/net/netns/ipv6.h @@ -72,6 +72,7 @@ struct netns_ipv6 { struct rt6_statistics *rt6_stats; struct timer_list ip6_fib_timer; struct hlist_head *fib_table_hash; + spinlock_t fib_table_hash_lock; struct fib6_table *fib6_main_tbl; struct list_head fib6_walkers; rwlock_t fib6_walker_lock; diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index bf727149fdec..79b672f3fc53 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -249,19 +249,33 @@ static struct fib6_table *fib6_alloc_table(struct net *net, u32 id) struct fib6_table *fib6_new_table(struct net *net, u32 id) { - struct fib6_table *tb; + struct fib6_table *tb, *new_tb; if (id == 0) id = RT6_TABLE_MAIN; + tb = fib6_get_table(net, id); if (tb) return tb; - tb = fib6_alloc_table(net, id); - if (tb) - fib6_link_table(net, tb); + new_tb = fib6_alloc_table(net, id); + if (!new_tb) + return NULL; + + spin_lock_bh(&net->ipv6.fib_table_hash_lock); + + tb = fib6_get_table(net, id); + if (unlikely(tb)) { + spin_unlock_bh(&net->ipv6.fib_table_hash_lock); + kfree(new_tb); + return tb; + } - return tb; + fib6_link_table(net, new_tb); + + spin_unlock_bh(&net->ipv6.fib_table_hash_lock); + + return new_tb; } EXPORT_SYMBOL_GPL(fib6_new_table); @@ -2423,6 +2437,8 @@ static int __net_init fib6_net_init(struct net *net) if (!net->ipv6.fib_table_hash) goto out_rt6_stats; + spin_lock_init(&net->ipv6.fib_table_hash_lock); + net->ipv6.fib6_main_tbl = kzalloc(sizeof(*net->ipv6.fib6_main_tbl), GFP_KERNEL); if (!net->ipv6.fib6_main_tbl) From patchwork Fri Apr 18 00:03:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056436 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-52004.amazon.com (smtp-fw-52004.amazon.com [52.119.213.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EDF31C32 for ; Fri, 18 Apr 2025 00:10:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744935013; cv=none; b=b5q+JDVSkpDR2rxp3JV7gLbUS2cySi7rcGDZX0G4IDTCkVXGtSMslVBLcsBU2Hr5Trf0MIBbm5EYqIeDx7VNJikvNy6bbbBaR4l/30TxIDtR5Wba+obAxsd/z1koj65+alHJ3Xmj3o7jVpzH4Bk878W5eyg50JDZuViBtkXfaF0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744935013; c=relaxed/simple; bh=YGkNDMqdZqchvESjOX5O3TZvDoIrMwsn3Bp1pWvp5rY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Ow48Ogi4c/pyhraycwvySewW76sKyNQzzyGBM359oBYgEudDie3RMkIuThtrYng6KcyjP9MRXIcbzO3BfcJECvkCPh4WK0wUG+A6FP7LDi0A0PpB8R2f8540Jh/k1AiOorztcxaG95wHbbJ/IhC6Y3jzbHgiIeAroVcSKoyrdko= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=tXkNngH5; arc=none smtp.client-ip=52.119.213.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="tXkNngH5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744935013; x=1776471013; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CWXCbuUzffdftajprNuLLbXgxV3sgU+WkNBT2DK66P8=; b=tXkNngH5SCGFZWBEfN8nJ97BRnkzIEl18kQ/M+w1zoBSU/UlNnYlijEH apc54Gpr8CYoEzENOBORiw1aBK5mfxtgx0T9HUd9H8dIyJNl2YOjMHvSz 5r3O1OEMQsF/Ie/wOn3D6wUdVi3N3gDfBmiJ4kqNqpzBYIKM1m0jj7g+S 8=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="289448190" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:10:11 +0000 Received: from EX19MTAUWC001.ant.amazon.com [10.0.38.20:28021] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.47.20:2525] with esmtp (Farcaster) id b9bffe3d-43a4-42ff-8548-54006c614cbd; Fri, 18 Apr 2025 00:10:09 +0000 (UTC) X-Farcaster-Flow-ID: b9bffe3d-43a4-42ff-8548-54006c614cbd Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWC001.ant.amazon.com (10.250.64.174) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:10:08 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:10:06 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 13/15] ipv6: Defer fib6_purge_rt() in fib6_add_rt2node() to fib6_add(). Date: Thu, 17 Apr 2025 17:03:54 -0700 Message-ID: <20250418000443.43734-14-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D039UWB001.ant.amazon.com (10.13.138.119) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org The next patch adds per-nexthop spinlock which protects nh->f6i_list. When rt->nh is not NULL, fib6_add_rt2node() will be called under the lock. fib6_add_rt2node() could call fib6_purge_rt() for another route, which could holds another nexthop lock. Then, deadlock could happen between two nexthops. Let's defer fib6_purge_rt() after fib6_add_rt2node(). Signed-off-by: Kuniyuki Iwashima Acked-by: Paolo Abeni --- include/net/ip6_fib.h | 1 + net/ipv6/ip6_fib.c | 21 ++++++++++++++------- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index 7c87873ae211..88b0dd4d8e09 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -198,6 +198,7 @@ struct fib6_info { fib6_destroying:1, unused:4; + struct list_head purge_link; struct rcu_head rcu; struct nexthop *nh; struct fib6_nh fib6_nh[]; diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 79b672f3fc53..9e9db5470bbf 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1083,8 +1083,8 @@ static void fib6_purge_rt(struct fib6_info *rt, struct fib6_node *fn, */ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, - struct nl_info *info, - struct netlink_ext_ack *extack) + struct nl_info *info, struct netlink_ext_ack *extack, + struct list_head *purge_list) { struct fib6_info *leaf = rcu_dereference_protected(fn->leaf, lockdep_is_held(&rt->fib6_table->tb6_lock)); @@ -1308,10 +1308,9 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, } nsiblings = iter->fib6_nsiblings; iter->fib6_node = NULL; - fib6_purge_rt(iter, fn, info->nl_net); + list_add(&iter->purge_link, purge_list); if (rcu_access_pointer(fn->rr_ptr) == iter) fn->rr_ptr = NULL; - fib6_info_release(iter); if (nsiblings) { /* Replacing an ECMP route, remove all siblings */ @@ -1324,10 +1323,9 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, if (rt6_qualify_for_ecmp(iter)) { *ins = iter->fib6_next; iter->fib6_node = NULL; - fib6_purge_rt(iter, fn, info->nl_net); + list_add(&iter->purge_link, purge_list); if (rcu_access_pointer(fn->rr_ptr) == iter) fn->rr_ptr = NULL; - fib6_info_release(iter); nsiblings--; info->nl_net->ipv6.rt6_stats->fib_rt_entries--; } else { @@ -1397,6 +1395,7 @@ int fib6_add(struct fib6_node *root, struct fib6_info *rt, struct nl_info *info, struct netlink_ext_ack *extack) { struct fib6_table *table = rt->fib6_table; + LIST_HEAD(purge_list); struct fib6_node *fn; #ifdef CONFIG_IPV6_SUBTREES struct fib6_node *pn = NULL; @@ -1499,8 +1498,16 @@ int fib6_add(struct fib6_node *root, struct fib6_info *rt, } #endif - err = fib6_add_rt2node(fn, rt, info, extack); + err = fib6_add_rt2node(fn, rt, info, extack, &purge_list); if (!err) { + struct fib6_info *iter, *next; + + list_for_each_entry_safe(iter, next, &purge_list, purge_link) { + list_del(&iter->purge_link); + fib6_purge_rt(iter, fn, info->nl_net); + fib6_info_release(iter); + } + if (rt->nh) list_add(&rt->nh_list, &rt->nh->f6i_list); __fib6_update_sernum_upto_root(rt, fib6_new_sernum(info->nl_net)); From patchwork Fri Apr 18 00:03:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056438 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79691442C for ; Fri, 18 Apr 2025 00:10:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.184.29 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744935036; cv=none; b=qngAWtqQPtFKAXSZSHNXSZbV0rU0TzUWjEmQAbVbZPFeGzGiDlfikvgJ5KxW1nchR0/E31PTjRV7zjEqwKUzPMbXTJ6o7dvDKOth28KbFW6LHdgDVog+UH5fnvC1OsqfxuziIq4XDolNoeS2vW+N+HZAjmGoNX4R2/tqHpOhAmQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744935036; c=relaxed/simple; bh=xX7FySFNBQ9DLZl87r733wZCDP2+PjZdxfaUFhHnlJQ=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VOTtyBAUxyLTRGVfFp8jZcXWRVo2d+UqJy42KNfUzUjyyqFiyhyeqYnxO1wZj84lSzHPwy4+Ej8EYHQULEEs+uiJRbswRM7ShII75QjDO3l74gkUUb56sj2FY8bCPLKdV3lRS9bsYYikueHQ1AqKe0yRGP7xToMrxy2LBBikCMw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=FpSaRj0r; arc=none smtp.client-ip=207.171.184.29 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="FpSaRj0r" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744935035; x=1776471035; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=o/YU7ev4mGLwhdCB8vprRxybPY4K79+jho6X9E3w1Uk=; b=FpSaRj0rGisbcMall5WHAlqZfyzj/x2+wkRV26JEjjx/NKpkkvhYW+LT cwynyNIsL1NbaCLFeucZWE+MuoaesKOH3Gd/s4XUSmGSj9cWEQpzxHrTs wkSmWKf6G9yZeVcTpj/hCiApQGA47ipTY5ct2yqvUrYwnP6hmEhp/OCSs g=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="512501547" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:10:34 +0000 Received: from EX19MTAUWA002.ant.amazon.com [10.0.38.20:35666] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.50.54:2525] with esmtp (Farcaster) id 645f41fe-a1bb-4402-812b-4fa159bac305; Fri, 18 Apr 2025 00:10:33 +0000 (UTC) X-Farcaster-Flow-ID: 645f41fe-a1bb-4402-812b-4fa159bac305 Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWA002.ant.amazon.com (10.250.64.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:10:33 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:10:30 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 14/15] ipv6: Protect nh->f6i_list with spinlock and flag. Date: Thu, 17 Apr 2025 17:03:55 -0700 Message-ID: <20250418000443.43734-15-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D044UWA002.ant.amazon.com (10.13.139.11) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org We will get rid of RTNL from RTM_NEWROUTE and SIOCADDRT. Then, we may be going to add a route tied to a dying nexthop. The nexthop itself is not freed during the RCU grace period, but if we link a route after __remove_nexthop_fib() is called for the nexthop, the route will be leaked. To avoid the race between IPv6 route addition under RCU vs nexthop deletion under RTNL, let's add a dead flag and protect it and nh->f6i_list with a spinlock. __remove_nexthop_fib() acquires the nexthop's spinlock and sets false to nh->dead, then calls ip6_del_rt() for the linked route one by one without the spinlock because fib6_purge_rt() acquires it later. While adding an IPv6 route, fib6_add() acquires the nexthop lock and checks the dead flag just before inserting the route. Signed-off-by: Kuniyuki Iwashima --- v3: Bundle critical section for rt->nh as fib6_add_rt2node_nh() --- include/net/nexthop.h | 2 ++ net/ipv4/nexthop.c | 18 +++++++++++++++--- net/ipv6/ip6_fib.c | 39 ++++++++++++++++++++++++++++++++++----- 3 files changed, 51 insertions(+), 8 deletions(-) diff --git a/include/net/nexthop.h b/include/net/nexthop.h index d9fb44e8b321..572e69cda476 100644 --- a/include/net/nexthop.h +++ b/include/net/nexthop.h @@ -152,6 +152,8 @@ struct nexthop { u8 protocol; /* app managing this nh */ u8 nh_flags; bool is_group; + bool dead; + spinlock_t lock; /* protect dead and f6i_list */ refcount_t refcnt; struct rcu_head rcu; diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index d9cf06b297d1..6ba6cb1340c1 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -541,6 +541,7 @@ static struct nexthop *nexthop_alloc(void) INIT_LIST_HEAD(&nh->f6i_list); INIT_LIST_HEAD(&nh->grp_list); INIT_LIST_HEAD(&nh->fdb_list); + spin_lock_init(&nh->lock); } return nh; } @@ -2118,7 +2119,7 @@ static void remove_nexthop_group(struct nexthop *nh, struct nl_info *nlinfo) /* not called for nexthop replace */ static void __remove_nexthop_fib(struct net *net, struct nexthop *nh) { - struct fib6_info *f6i, *tmp; + struct fib6_info *f6i; bool do_flush = false; struct fib_info *fi; @@ -2129,13 +2130,24 @@ static void __remove_nexthop_fib(struct net *net, struct nexthop *nh) if (do_flush) fib_flush(net); - /* ip6_del_rt removes the entry from this list hence the _safe */ - list_for_each_entry_safe(f6i, tmp, &nh->f6i_list, nh_list) { + spin_lock_bh(&nh->lock); + + nh->dead = true; + + while (!list_empty(&nh->f6i_list)) { + f6i = list_first_entry(&nh->f6i_list, typeof(*f6i), nh_list); + /* __ip6_del_rt does a release, so do a hold here */ fib6_info_hold(f6i); + + spin_unlock_bh(&nh->lock); ipv6_stub->ip6_del_rt(net, f6i, !READ_ONCE(net->ipv4.sysctl_nexthop_compat_mode)); + + spin_lock_bh(&nh->lock); } + + spin_unlock_bh(&nh->lock); } static void __remove_nexthop(struct net *net, struct nexthop *nh, diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 9e9db5470bbf..1f860340690c 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1048,8 +1048,14 @@ static void fib6_purge_rt(struct fib6_info *rt, struct fib6_node *fn, rt6_flush_exceptions(rt); fib6_drop_pcpu_from(rt, table); - if (rt->nh && !list_empty(&rt->nh_list)) - list_del_init(&rt->nh_list); + if (rt->nh) { + spin_lock(&rt->nh->lock); + + if (!list_empty(&rt->nh_list)) + list_del_init(&rt->nh_list); + + spin_unlock(&rt->nh->lock); + } if (refcount_read(&rt->fib6_ref) != 1) { /* This route is used as dummy address holder in some split @@ -1341,6 +1347,28 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, return 0; } +static int fib6_add_rt2node_nh(struct fib6_node *fn, struct fib6_info *rt, + struct nl_info *info, struct netlink_ext_ack *extack, + struct list_head *purge_list) +{ + int err; + + spin_lock(&rt->nh->lock); + + if (rt->nh->dead) { + NL_SET_ERR_MSG(extack, "Nexthop has been deleted"); + err = -EINVAL; + } else { + err = fib6_add_rt2node(fn, rt, info, extack, purge_list); + if (!err) + list_add(&rt->nh_list, &rt->nh->f6i_list); + } + + spin_unlock(&rt->nh->lock); + + return err; +} + static void fib6_start_gc(struct net *net, struct fib6_info *rt) { if (!timer_pending(&net->ipv6.ip6_fib_timer) && @@ -1498,7 +1526,10 @@ int fib6_add(struct fib6_node *root, struct fib6_info *rt, } #endif - err = fib6_add_rt2node(fn, rt, info, extack, &purge_list); + if (rt->nh) + err = fib6_add_rt2node_nh(fn, rt, info, extack, &purge_list); + else + err = fib6_add_rt2node(fn, rt, info, extack, &purge_list); if (!err) { struct fib6_info *iter, *next; @@ -1508,8 +1539,6 @@ int fib6_add(struct fib6_node *root, struct fib6_info *rt, fib6_info_release(iter); } - if (rt->nh) - list_add(&rt->nh_list, &rt->nh->f6i_list); __fib6_update_sernum_upto_root(rt, fib6_new_sernum(info->nl_net)); if (rt->fib6_flags & RTF_EXPIRES) From patchwork Fri Apr 18 00:03:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 14056439 X-Patchwork-Delegate: kuba@kernel.org Received: from smtp-fw-6002.amazon.com (smtp-fw-6002.amazon.com [52.95.49.90]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E41064A28 for ; Fri, 18 Apr 2025 00:10:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.95.49.90 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744935061; cv=none; b=XNZciwV+MOWnriCaiaxI0CMXYT9C058adsHVopzxJsT3VbY4QcKcHB/h+UyEGj0kh1I4GZuAGSkwqoKwTRAVJI6Mq4Rk9gBvhmWeCpN/IN1+P5QjgSK5h3vvUdWdjjjm9hLiXTuC+03xQ6POF6eLo/maVcsO2azvUOP4jwsn4RY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744935061; c=relaxed/simple; bh=wCkyCKZDA2qFtOXwoxT0MpoO7W9rsomVx4NAYQYLLHE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=TaPGHLPff3QcB9x959qQ5X8Y+v6K07SOo4uSfwpGOQqGVYX30VIb7HNasfh9NXHCHgijjqD3EINKt8v9A+x20T0i7nBNb/ewb4tNhyy36Jogg8qYkrNcbBZx52R4JAfKbw1zwkKMA8/32/RthJAmIWIIJCO2TjEFRXH6R+HTu8g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.jp; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=FCYV1bC5; arc=none smtp.client-ip=52.95.49.90 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.jp Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="FCYV1bC5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1744935060; x=1776471060; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XgyMnDVwBP4DHemFo3dGiC1WYM3h6jTM1r5Bw8Tf9Dg=; b=FCYV1bC54C9S6MM7/M3MBxFs9bWxL6hoPXmivPO7iQHx2ZW5UeAGmZ9v xqjf+aSRX4Z46Nu7/0Xm+AHT8ISrgiqd2NjF5MTtu+fahw6uvz/oA6iJj Sv84iypZSbs7Sq+LxkwCZ92UapmxJyev5kWhOCBFo67l4JLAym3F/FBte 4=; X-IronPort-AV: E=Sophos;i="6.15,220,1739836800"; d="scan'208";a="490408267" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-6002.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2025 00:10:59 +0000 Received: from EX19MTAUWC002.ant.amazon.com [10.0.38.20:5000] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.57.220:2525] with esmtp (Farcaster) id 2ad56887-35ac-4ec9-875c-beed4632b621; Fri, 18 Apr 2025 00:10:57 +0000 (UTC) X-Farcaster-Flow-ID: 2ad56887-35ac-4ec9-875c-beed4632b621 Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19MTAUWC002.ant.amazon.com (10.250.64.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:10:57 +0000 Received: from 6c7e67bfbae3.amazon.com (10.94.49.59) by EX19D004ANA001.ant.amazon.com (10.37.240.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 18 Apr 2025 00:10:54 +0000 From: Kuniyuki Iwashima To: "David S. Miller" , David Ahern , Eric Dumazet , Jakub Kicinski , "Paolo Abeni" CC: Simon Horman , Kuniyuki Iwashima , Kuniyuki Iwashima , Subject: [PATCH v3 net-next 15/15] ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE. Date: Thu, 17 Apr 2025 17:03:56 -0700 Message-ID: <20250418000443.43734-16-kuniyu@amazon.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250418000443.43734-1-kuniyu@amazon.com> References: <20250418000443.43734-1-kuniyu@amazon.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D033UWC001.ant.amazon.com (10.13.139.218) To EX19D004ANA001.ant.amazon.com (10.37.240.138) X-Patchwork-Delegate: kuba@kernel.org Now we are ready to remove RTNL from SIOCADDRT and RTM_NEWROUTE. The remaining things to do are 1. pass false to lwtunnel_valid_encap_type_attr() 2. use rcu_dereference_rtnl() in fib6_check_nexthop() 3. place rcu_read_lock() before ip6_route_info_create_nh(). Let's complete the RTNL-free conversion. When each CPU-X adds 100000 routes on table-X in a batch concurrently on c7a.metal-48xl EC2 instance with 192 CPUs, without this series: $ sudo ./route_test.sh ... added 19200000 routes (100000 routes * 192 tables). time elapsed: 191577 milliseconds. with this series: $ sudo ./route_test.sh ... added 19200000 routes (100000 routes * 192 tables). time elapsed: 62854 milliseconds. I changed the number of routes in each table (1000 ~ 100000) and consistently saw it finish 3x faster with this series. Note that now every caller of lwtunnel_valid_encap_type() passes false as the last argument, and this can be removed later. Signed-off-by: Kuniyuki Iwashima --- net/ipv4/nexthop.c | 4 ++-- net/ipv6/route.c | 18 ++++++++++++------ 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 6ba6cb1340c1..823e4a783d2b 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -1556,12 +1556,12 @@ int fib6_check_nexthop(struct nexthop *nh, struct fib6_config *cfg, if (nh->is_group) { struct nh_group *nhg; - nhg = rtnl_dereference(nh->nh_grp); + nhg = rcu_dereference_rtnl(nh->nh_grp); if (nhg->has_v4) goto no_v4_nh; is_fdb_nh = nhg->fdb_nh; } else { - nhi = rtnl_dereference(nh->nh_info); + nhi = rcu_dereference_rtnl(nh->nh_info); if (nhi->family == AF_INET) goto no_v4_nh; is_fdb_nh = nhi->fdb_nh; diff --git a/net/ipv6/route.c b/net/ipv6/route.c index c8c1c75268e3..bb46e724db73 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3902,12 +3902,16 @@ int ip6_route_add(struct fib6_config *cfg, gfp_t gfp_flags, if (IS_ERR(rt)) return PTR_ERR(rt); + rcu_read_lock(); + err = ip6_route_info_create_nh(rt, cfg, extack); if (err) - return err; + goto unlock; err = __ip6_ins_rt(rt, &cfg->fc_nlinfo, extack); fib6_info_release(rt); +unlock: + rcu_read_unlock(); return err; } @@ -4528,12 +4532,10 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, struct in6_rtmsg *rtmsg) switch (cmd) { case SIOCADDRT: - rtnl_lock(); /* Only do the default setting of fc_metric in route adding */ if (cfg.fc_metric == 0) cfg.fc_metric = IP6_RT_PRIO_USER; err = ip6_route_add(&cfg, GFP_KERNEL, NULL); - rtnl_unlock(); break; case SIOCDELRT: err = ip6_route_del(&cfg, NULL); @@ -5112,7 +5114,7 @@ static int rtm_to_fib6_multipath_config(struct fib6_config *cfg, } while (rtnh_ok(rtnh, remaining)); return lwtunnel_valid_encap_type_attr(cfg->fc_mp, cfg->fc_mp_len, - extack, newroute); + extack, false); } static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh, @@ -5250,7 +5252,7 @@ static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh, cfg->fc_encap_type = nla_get_u16(tb[RTA_ENCAP_TYPE]); err = lwtunnel_valid_encap_type(cfg->fc_encap_type, - extack, newroute); + extack, false); if (err < 0) goto errout; } @@ -5517,6 +5519,8 @@ static int ip6_route_multipath_add(struct fib6_config *cfg, if (err) return err; + rcu_read_lock(); + err = ip6_route_mpath_info_create_nh(&rt6_nh_list, extack); if (err) goto cleanup; @@ -5608,6 +5612,8 @@ static int ip6_route_multipath_add(struct fib6_config *cfg, } cleanup: + rcu_read_unlock(); + list_for_each_entry_safe(nh, nh_safe, &rt6_nh_list, list) { fib6_info_release(nh->fib6_info); list_del(&nh->list); @@ -6890,7 +6896,7 @@ static void bpf_iter_unregister(void) static const struct rtnl_msg_handler ip6_route_rtnl_msg_handlers[] __initconst_or_module = { {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_NEWROUTE, - .doit = inet6_rtm_newroute}, + .doit = inet6_rtm_newroute, .flags = RTNL_FLAG_DOIT_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_DELROUTE, .doit = inet6_rtm_delroute, .flags = RTNL_FLAG_DOIT_UNLOCKED}, {.owner = THIS_MODULE, .protocol = PF_INET6, .msgtype = RTM_GETROUTE,