From patchwork Wed Oct 23 10:53:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steffen Klassert X-Patchwork-Id: 13846911 X-Patchwork-Delegate: kuba@kernel.org Received: from a.mx.secunet.com (a.mx.secunet.com [62.96.220.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 626BB19F128 for ; Wed, 23 Oct 2024 10:54:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.96.220.36 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729680843; cv=none; b=BtrHr+jvcFk07qMDsh7zkposvYbZMuZ7T90jjYrSSigEGS29EIJSFt2SxAuOUxTsRIfQsP78OxxD9ynW8Ojx9XAfifQm2FlrLp9sxfvo1qOq3lavABMSzgvR5GqI2LDl3v8dBFCSKX/IV+p6NRSHhyN9tA4Rog0/wxe53u/r2Ik= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729680843; c=relaxed/simple; bh=/AyFqKqEJOGDu2Z93pFoLXdMx4+3sTh4f2ljj1cVKjY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=BrFPI+GN6kyoOJUah4xx6YtyqB+fBL/HIYTgFO2hKkNWLg4k1EH7Qbi18a9gDs/Fd7NQ00vUS1K1WNb6sf2YprdN1+BoDQTaZukSsMHuz7tRA8L9zMMKUAZEEd2hIetkbmSYjBnywZ7Di2aa8fW01Z72RpNZt1sS9Mn0yBXSXtA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com; spf=pass smtp.mailfrom=secunet.com; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b=gom7yqZf; arc=none smtp.client-ip=62.96.220.36 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=secunet.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b="gom7yqZf" Received: from localhost (localhost [127.0.0.1]) by a.mx.secunet.com (Postfix) with ESMTP id D4EFA2084C; Wed, 23 Oct 2024 12:53:53 +0200 (CEST) X-Virus-Scanned: by secunet Received: from a.mx.secunet.com ([127.0.0.1]) by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NrfIx-LJeFw7; Wed, 23 Oct 2024 12:53:52 +0200 (CEST) Received: from cas-essen-02.secunet.de (rl2.secunet.de [10.53.40.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by a.mx.secunet.com (Postfix) with ESMTPS id 5A5AF201E5; Wed, 23 Oct 2024 12:53:52 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.11.0 a.mx.secunet.com 5A5AF201E5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=secunet.com; s=202301; t=1729680832; bh=TU1ajqm+D1AOkMG7KVYxrYug0iWXiZKV+L05S5y9Nc8=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=gom7yqZfDUD8zHr1wm0EIquzdNn6d+pynJLSiVfBEApo8+VyKu/7HtkBDYKIMM9Yd h89l+GgrbG92QhqY2L/6Getc6H6jjXnZetETAnUuYgyimKv3G+DrvGwU5VJmpABBJ1 HRVfnP5gg7RH19OtXJX5CTvxQ6OKmMuo2NhOghXPbHwIfIZ/kJtkttjjYRR6dydYKc hIpQbw88BIPognovqoVt/jskXKKCTv1MdZkXDjUiJHiy4hLHdva0/6bV2IdlVPp+Ro uypBbR1nvBCnL+QQ/2LKnkhz1T+0hfLtHR4WgqkxlXscFkOsUA05/zwOZ/1dped9wL DIbuJTljOGuFw== Received: from mbx-essen-02.secunet.de (10.53.40.198) by cas-essen-02.secunet.de (10.53.40.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 23 Oct 2024 12:53:52 +0200 Received: from gauss2.secunet.de (10.182.7.193) by mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 23 Oct 2024 12:53:51 +0200 Received: by gauss2.secunet.de (Postfix, from userid 1000) id BDBDC318116E; Wed, 23 Oct 2024 12:53:50 +0200 (CEST) From: Steffen Klassert To: Tobias Brunner , Antony Antony , Daniel Xu , Paul Wouters , Simon Horman , Sabrina Dubroca CC: Steffen Klassert , , Subject: [PATCH v3 ipsec-next 1/4] xfrm: Add support for per cpu xfrm state handling. Date: Wed, 23 Oct 2024 12:53:42 +0200 Message-ID: <20241023105345.1376856-2-steffen.klassert@secunet.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241023105345.1376856-1-steffen.klassert@secunet.com> References: <20241023105345.1376856-1-steffen.klassert@secunet.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: cas-essen-01.secunet.de (10.53.40.201) To mbx-essen-02.secunet.de (10.53.40.198) X-EXCLAIMER-MD-CONFIG: 2c86f778-e09b-4440-8b15-867914633a10 X-Patchwork-Delegate: kuba@kernel.org Currently all flows for a certain SA must be processed by the same cpu to avoid packet reordering and lock contention of the xfrm state lock. To get rid of this limitation, the IETF standardized per cpu SAs in RFC 9611. This patch implements the xfrm part of it. We add the cpu as a lookup key for xfrm states and a config option to generate acquire messages for each cpu. With that, we can have on each cpu a SA with identical traffic selector so that flows can be processed in parallel on all cpus. Signed-off-by: Steffen Klassert --- include/net/xfrm.h | 5 ++-- include/uapi/linux/xfrm.h | 2 ++ net/key/af_key.c | 7 +++-- net/xfrm/xfrm_compat.c | 6 ++-- net/xfrm/xfrm_state.c | 58 +++++++++++++++++++++++++++++++-------- net/xfrm/xfrm_user.c | 56 ++++++++++++++++++++++++++++++++++--- 6 files changed, 112 insertions(+), 22 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index b6bfdc6416c7..e23ad52824e2 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -188,6 +188,7 @@ struct xfrm_state { refcount_t refcnt; spinlock_t lock; + u32 pcpu_num; struct xfrm_id id; struct xfrm_selector sel; struct xfrm_mark mark; @@ -1679,7 +1680,7 @@ struct xfrmk_spdinfo { u32 spdhmcnt; }; -struct xfrm_state *xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq); +struct xfrm_state *xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq, u32 pcpu_num); int xfrm_state_delete(struct xfrm_state *x); int xfrm_state_flush(struct net *net, u8 proto, bool task_valid, bool sync); int xfrm_dev_state_flush(struct net *net, struct net_device *dev, bool task_valid); @@ -1794,7 +1795,7 @@ int verify_spi_info(u8 proto, u32 min, u32 max, struct netlink_ext_ack *extack); int xfrm_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi, struct netlink_ext_ack *extack); struct xfrm_state *xfrm_find_acq(struct net *net, const struct xfrm_mark *mark, - u8 mode, u32 reqid, u32 if_id, u8 proto, + u8 mode, u32 reqid, u32 if_id, u32 pcpu_num, u8 proto, const xfrm_address_t *daddr, const xfrm_address_t *saddr, int create, unsigned short family); diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h index f28701500714..d73a97e3030a 100644 --- a/include/uapi/linux/xfrm.h +++ b/include/uapi/linux/xfrm.h @@ -322,6 +322,7 @@ enum xfrm_attr_type_t { XFRMA_MTIMER_THRESH, /* __u32 in seconds for input SA */ XFRMA_SA_DIR, /* __u8 */ XFRMA_NAT_KEEPALIVE_INTERVAL, /* __u32 in seconds for NAT keepalive */ + XFRMA_SA_PCPU, /* __u32 */ __XFRMA_MAX #define XFRMA_OUTPUT_MARK XFRMA_SET_MARK /* Compatibility */ @@ -437,6 +438,7 @@ struct xfrm_userpolicy_info { #define XFRM_POLICY_LOCALOK 1 /* Allow user to override global policy */ /* Automatically expand selector to include matching ICMP payloads. */ #define XFRM_POLICY_ICMP 2 +#define XFRM_POLICY_CPU_ACQUIRE 4 __u8 share; }; diff --git a/net/key/af_key.c b/net/key/af_key.c index f79fb99271ed..c56bb4f451e6 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -1354,7 +1354,7 @@ static int pfkey_getspi(struct sock *sk, struct sk_buff *skb, const struct sadb_ } if (hdr->sadb_msg_seq) { - x = xfrm_find_acq_byseq(net, DUMMY_MARK, hdr->sadb_msg_seq); + x = xfrm_find_acq_byseq(net, DUMMY_MARK, hdr->sadb_msg_seq, UINT_MAX); if (x && !xfrm_addr_equal(&x->id.daddr, xdaddr, family)) { xfrm_state_put(x); x = NULL; @@ -1362,7 +1362,8 @@ static int pfkey_getspi(struct sock *sk, struct sk_buff *skb, const struct sadb_ } if (!x) - x = xfrm_find_acq(net, &dummy_mark, mode, reqid, 0, proto, xdaddr, xsaddr, 1, family); + x = xfrm_find_acq(net, &dummy_mark, mode, reqid, 0, UINT_MAX, + proto, xdaddr, xsaddr, 1, family); if (x == NULL) return -ENOENT; @@ -1417,7 +1418,7 @@ static int pfkey_acquire(struct sock *sk, struct sk_buff *skb, const struct sadb if (hdr->sadb_msg_seq == 0 || hdr->sadb_msg_errno == 0) return 0; - x = xfrm_find_acq_byseq(net, DUMMY_MARK, hdr->sadb_msg_seq); + x = xfrm_find_acq_byseq(net, DUMMY_MARK, hdr->sadb_msg_seq, UINT_MAX); if (x == NULL) return 0; diff --git a/net/xfrm/xfrm_compat.c b/net/xfrm/xfrm_compat.c index 91357ccaf4af..5b9ee63e30b6 100644 --- a/net/xfrm/xfrm_compat.c +++ b/net/xfrm/xfrm_compat.c @@ -132,6 +132,7 @@ static const struct nla_policy compat_policy[XFRMA_MAX+1] = { [XFRMA_MTIMER_THRESH] = { .type = NLA_U32 }, [XFRMA_SA_DIR] = NLA_POLICY_RANGE(NLA_U8, XFRM_SA_DIR_IN, XFRM_SA_DIR_OUT), [XFRMA_NAT_KEEPALIVE_INTERVAL] = { .type = NLA_U32 }, + [XFRMA_SA_PCPU] = { .type = NLA_U32 }, }; static struct nlmsghdr *xfrm_nlmsg_put_compat(struct sk_buff *skb, @@ -282,9 +283,10 @@ static int xfrm_xlate64_attr(struct sk_buff *dst, const struct nlattr *src) case XFRMA_MTIMER_THRESH: case XFRMA_SA_DIR: case XFRMA_NAT_KEEPALIVE_INTERVAL: + case XFRMA_SA_PCPU: return xfrm_nla_cpy(dst, src, nla_len(src)); default: - BUILD_BUG_ON(XFRMA_MAX != XFRMA_NAT_KEEPALIVE_INTERVAL); + BUILD_BUG_ON(XFRMA_MAX != XFRMA_SA_PCPU); pr_warn_once("unsupported nla_type %d\n", src->nla_type); return -EOPNOTSUPP; } @@ -439,7 +441,7 @@ static int xfrm_xlate32_attr(void *dst, const struct nlattr *nla, int err; if (type > XFRMA_MAX) { - BUILD_BUG_ON(XFRMA_MAX != XFRMA_NAT_KEEPALIVE_INTERVAL); + BUILD_BUG_ON(XFRMA_MAX != XFRMA_SA_PCPU); NL_SET_ERR_MSG(extack, "Bad attribute"); return -EOPNOTSUPP; } diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 37478d36a8df..ebef07b80afa 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -679,6 +679,7 @@ struct xfrm_state *xfrm_state_alloc(struct net *net) x->lft.hard_packet_limit = XFRM_INF; x->replay_maxage = 0; x->replay_maxdiff = 0; + x->pcpu_num = UINT_MAX; spin_lock_init(&x->lock); } return x; @@ -1155,6 +1156,12 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, struct xfrm_state *x, struct xfrm_state **best, int *acq_in_progress, int *error) { + /* We need the cpu id just as a lookup key, + * we don't require it to be stable. + */ + unsigned int pcpu_id = get_cpu(); + put_cpu(); + /* Resolution logic: * 1. There is a valid state with matching selector. Done. * 2. Valid state with inappropriate selector. Skip. @@ -1174,13 +1181,18 @@ static void xfrm_state_look_at(struct xfrm_policy *pol, struct xfrm_state *x, &fl->u.__fl_common)) return; + if (x->pcpu_num != UINT_MAX && x->pcpu_num != pcpu_id) + return; + if (!*best || + ((*best)->pcpu_num == UINT_MAX && x->pcpu_num == pcpu_id) || (*best)->km.dying > x->km.dying || ((*best)->km.dying == x->km.dying && (*best)->curlft.add_time < x->curlft.add_time)) *best = x; } else if (x->km.state == XFRM_STATE_ACQ) { - *acq_in_progress = 1; + if (!*best || x->pcpu_num == pcpu_id) + *acq_in_progress = 1; } else if (x->km.state == XFRM_STATE_ERROR || x->km.state == XFRM_STATE_EXPIRED) { if ((!x->sel.family || @@ -1209,6 +1221,13 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, unsigned short encap_family = tmpl->encap_family; unsigned int sequence; struct km_event c; + unsigned int pcpu_id; + + /* We need the cpu id just as a lookup key, + * we don't require it to be stable. + */ + pcpu_id = get_cpu(); + put_cpu(); to_put = NULL; @@ -1282,7 +1301,10 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, } found: - x = best; + if (!(pol->flags & XFRM_POLICY_CPU_ACQUIRE) || + (best && (best->pcpu_num == pcpu_id))) + x = best; + if (!x && !error && !acquire_in_progress) { if (tmpl->id.spi && (x0 = __xfrm_state_lookup_all(net, mark, daddr, @@ -1314,6 +1336,8 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, xfrm_init_tempstate(x, fl, tmpl, daddr, saddr, family); memcpy(&x->mark, &pol->mark, sizeof(x->mark)); x->if_id = if_id; + if ((pol->flags & XFRM_POLICY_CPU_ACQUIRE) && best) + x->pcpu_num = pcpu_id; error = security_xfrm_state_alloc_acquire(x, pol->security, fl->flowi_secid); if (error) { @@ -1392,6 +1416,11 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, x = NULL; error = -ESRCH; } + + /* Use the already installed 'fallback' while the CPU-specific + * SA acquire is handled*/ + if (best) + x = best; } out: if (x) { @@ -1524,12 +1553,14 @@ static void __xfrm_state_bump_genids(struct xfrm_state *xnew) unsigned int h; u32 mark = xnew->mark.v & xnew->mark.m; u32 if_id = xnew->if_id; + u32 cpu_id = xnew->pcpu_num; h = xfrm_dst_hash(net, &xnew->id.daddr, &xnew->props.saddr, reqid, family); hlist_for_each_entry(x, net->xfrm.state_bydst+h, bydst) { if (x->props.family == family && x->props.reqid == reqid && x->if_id == if_id && + x->pcpu_num == cpu_id && (mark & x->mark.m) == x->mark.v && xfrm_addr_equal(&x->id.daddr, &xnew->id.daddr, family) && xfrm_addr_equal(&x->props.saddr, &xnew->props.saddr, family)) @@ -1552,7 +1583,7 @@ EXPORT_SYMBOL(xfrm_state_insert); static struct xfrm_state *__find_acq_core(struct net *net, const struct xfrm_mark *m, unsigned short family, u8 mode, - u32 reqid, u32 if_id, u8 proto, + u32 reqid, u32 if_id, u32 pcpu_num, u8 proto, const xfrm_address_t *daddr, const xfrm_address_t *saddr, int create) @@ -1569,6 +1600,7 @@ static struct xfrm_state *__find_acq_core(struct net *net, x->id.spi != 0 || x->id.proto != proto || (mark & x->mark.m) != x->mark.v || + x->pcpu_num != pcpu_num || !xfrm_addr_equal(&x->id.daddr, daddr, family) || !xfrm_addr_equal(&x->props.saddr, saddr, family)) continue; @@ -1602,6 +1634,7 @@ static struct xfrm_state *__find_acq_core(struct net *net, break; } + x->pcpu_num = pcpu_num; x->km.state = XFRM_STATE_ACQ; x->id.proto = proto; x->props.family = family; @@ -1630,7 +1663,7 @@ static struct xfrm_state *__find_acq_core(struct net *net, return x; } -static struct xfrm_state *__xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq); +static struct xfrm_state *__xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq, u32 pcpu_num); int xfrm_state_add(struct xfrm_state *x) { @@ -1656,7 +1689,7 @@ int xfrm_state_add(struct xfrm_state *x) } if (use_spi && x->km.seq) { - x1 = __xfrm_find_acq_byseq(net, mark, x->km.seq); + x1 = __xfrm_find_acq_byseq(net, mark, x->km.seq, x->pcpu_num); if (x1 && ((x1->id.proto != x->id.proto) || !xfrm_addr_equal(&x1->id.daddr, &x->id.daddr, family))) { to_put = x1; @@ -1666,7 +1699,7 @@ int xfrm_state_add(struct xfrm_state *x) if (use_spi && !x1) x1 = __find_acq_core(net, &x->mark, family, x->props.mode, - x->props.reqid, x->if_id, x->id.proto, + x->props.reqid, x->if_id, x->pcpu_num, x->id.proto, &x->id.daddr, &x->props.saddr, 0); __xfrm_state_bump_genids(x); @@ -1791,6 +1824,7 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig, x->props.flags = orig->props.flags; x->props.extra_flags = orig->props.extra_flags; + x->pcpu_num = orig->pcpu_num; x->if_id = orig->if_id; x->tfcpad = orig->tfcpad; x->replay_maxdiff = orig->replay_maxdiff; @@ -2066,13 +2100,14 @@ EXPORT_SYMBOL(xfrm_state_lookup_byaddr); struct xfrm_state * xfrm_find_acq(struct net *net, const struct xfrm_mark *mark, u8 mode, u32 reqid, - u32 if_id, u8 proto, const xfrm_address_t *daddr, + u32 if_id, u32 pcpu_num, u8 proto, const xfrm_address_t *daddr, const xfrm_address_t *saddr, int create, unsigned short family) { struct xfrm_state *x; spin_lock_bh(&net->xfrm.xfrm_state_lock); - x = __find_acq_core(net, mark, family, mode, reqid, if_id, proto, daddr, saddr, create); + x = __find_acq_core(net, mark, family, mode, reqid, if_id, pcpu_num, + proto, daddr, saddr, create); spin_unlock_bh(&net->xfrm.xfrm_state_lock); return x; @@ -2207,7 +2242,7 @@ xfrm_state_sort(struct xfrm_state **dst, struct xfrm_state **src, int n, /* Silly enough, but I'm lazy to build resolution list */ -static struct xfrm_state *__xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq) +static struct xfrm_state *__xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq, u32 pcpu_num) { unsigned int h = xfrm_seq_hash(net, seq); struct xfrm_state *x; @@ -2215,6 +2250,7 @@ static struct xfrm_state *__xfrm_find_acq_byseq(struct net *net, u32 mark, u32 s hlist_for_each_entry_rcu(x, net->xfrm.state_byseq + h, byseq) { if (x->km.seq == seq && (mark & x->mark.m) == x->mark.v && + x->pcpu_num == pcpu_num && x->km.state == XFRM_STATE_ACQ) { xfrm_state_hold(x); return x; @@ -2224,12 +2260,12 @@ static struct xfrm_state *__xfrm_find_acq_byseq(struct net *net, u32 mark, u32 s return NULL; } -struct xfrm_state *xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq) +struct xfrm_state *xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq, u32 pcpu_num) { struct xfrm_state *x; spin_lock_bh(&net->xfrm.xfrm_state_lock); - x = __xfrm_find_acq_byseq(net, mark, seq); + x = __xfrm_find_acq_byseq(net, mark, seq, pcpu_num); spin_unlock_bh(&net->xfrm.xfrm_state_lock); return x; } diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 2b10a45ff124..a9d071c93836 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -456,6 +456,12 @@ static int verify_newsa_info(struct xfrm_usersa_info *p, } } + if (!sa_dir && attrs[XFRMA_SA_PCPU]) { + NL_SET_ERR_MSG(extack, "SA_PCPU only supported with SA_DIR"); + err = -EINVAL; + goto out; + } + out: return err; } @@ -837,6 +843,12 @@ static struct xfrm_state *xfrm_state_construct(struct net *net, x->nat_keepalive_interval = nla_get_u32(attrs[XFRMA_NAT_KEEPALIVE_INTERVAL]); + if (attrs[XFRMA_SA_PCPU]) { + x->pcpu_num = nla_get_u32(attrs[XFRMA_SA_PCPU]); + if (x->pcpu_num >= num_possible_cpus()) + goto error; + } + err = __xfrm_init_state(x, false, attrs[XFRMA_OFFLOAD_DEV], extack); if (err) goto error; @@ -1290,6 +1302,11 @@ static int copy_to_user_state_extra(struct xfrm_state *x, if (ret) goto out; } + if (x->pcpu_num != UINT_MAX) { + ret = nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num); + if (ret) + goto out; + } if (x->dir) ret = nla_put_u8(skb, XFRMA_SA_DIR, x->dir); @@ -1694,6 +1711,7 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh, u32 mark; struct xfrm_mark m; u32 if_id = 0; + u32 pcpu_num = UINT_MAX; p = nlmsg_data(nlh); err = verify_spi_info(p->info.id.proto, p->min, p->max, extack); @@ -1710,8 +1728,16 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh, if (attrs[XFRMA_IF_ID]) if_id = nla_get_u32(attrs[XFRMA_IF_ID]); + if (attrs[XFRMA_SA_PCPU]) { + pcpu_num = nla_get_u32(attrs[XFRMA_SA_PCPU]); + if (pcpu_num >= num_possible_cpus()) { + err = -EINVAL; + goto out_noput; + } + } + if (p->info.seq) { - x = xfrm_find_acq_byseq(net, mark, p->info.seq); + x = xfrm_find_acq_byseq(net, mark, p->info.seq, pcpu_num); if (x && !xfrm_addr_equal(&x->id.daddr, daddr, family)) { xfrm_state_put(x); x = NULL; @@ -1720,7 +1746,7 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh, if (!x) x = xfrm_find_acq(net, &m, p->info.mode, p->info.reqid, - if_id, p->info.id.proto, daddr, + if_id, pcpu_num, p->info.id.proto, daddr, &p->info.saddr, 1, family); err = -ENOENT; @@ -2520,7 +2546,8 @@ static inline unsigned int xfrm_aevent_msgsize(struct xfrm_state *x) + nla_total_size(sizeof(struct xfrm_mark)) + nla_total_size(4) /* XFRM_AE_RTHR */ + nla_total_size(4) /* XFRM_AE_ETHR */ - + nla_total_size(sizeof(x->dir)); /* XFRMA_SA_DIR */ + + nla_total_size(sizeof(x->dir)) /* XFRMA_SA_DIR */ + + nla_total_size(4); /* XFRMA_SA_PCPU */ } static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct km_event *c) @@ -2576,6 +2603,8 @@ static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, const struct err = xfrm_if_id_put(skb, x->if_id); if (err) goto out_cancel; + if (x->pcpu_num != UINT_MAX) + err = nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num); if (x->dir) { err = nla_put_u8(skb, XFRMA_SA_DIR, x->dir); @@ -2846,6 +2875,13 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh, xfrm_mark_get(attrs, &mark); + if (attrs[XFRMA_SA_PCPU]) { + x->pcpu_num = nla_get_u32(attrs[XFRMA_SA_PCPU]); + err = -EINVAL; + if (x->pcpu_num >= num_possible_cpus()) + goto free_state; + } + err = verify_newpolicy_info(&ua->policy, extack); if (err) goto free_state; @@ -3176,6 +3212,7 @@ const struct nla_policy xfrma_policy[XFRMA_MAX+1] = { [XFRMA_MTIMER_THRESH] = { .type = NLA_U32 }, [XFRMA_SA_DIR] = NLA_POLICY_RANGE(NLA_U8, XFRM_SA_DIR_IN, XFRM_SA_DIR_OUT), [XFRMA_NAT_KEEPALIVE_INTERVAL] = { .type = NLA_U32 }, + [XFRMA_SA_PCPU] = { .type = NLA_U32 }, }; EXPORT_SYMBOL_GPL(xfrma_policy); @@ -3342,7 +3379,8 @@ static inline unsigned int xfrm_expire_msgsize(void) { return NLMSG_ALIGN(sizeof(struct xfrm_user_expire)) + nla_total_size(sizeof(struct xfrm_mark)) + - nla_total_size(sizeof_field(struct xfrm_state, dir)); + nla_total_size(sizeof_field(struct xfrm_state, dir)) + + nla_total_size(4); /* XFRMA_SA_PCPU */ } static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct km_event *c) @@ -3368,6 +3406,11 @@ static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct err = xfrm_if_id_put(skb, x->if_id); if (err) return err; + if (x->pcpu_num != UINT_MAX) { + err = nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num); + if (err) + return err; + } if (x->dir) { err = nla_put_u8(skb, XFRMA_SA_DIR, x->dir); @@ -3475,6 +3518,8 @@ static inline unsigned int xfrm_sa_len(struct xfrm_state *x) } if (x->if_id) l += nla_total_size(sizeof(x->if_id)); + if (x->pcpu_num) + l += nla_total_size(sizeof(x->pcpu_num)); /* Must count x->lastused as it may become non-zero behind our back. */ l += nla_total_size_64bit(sizeof(u64)); @@ -3581,6 +3626,7 @@ static inline unsigned int xfrm_acquire_msgsize(struct xfrm_state *x, + nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr) + nla_total_size(sizeof(struct xfrm_mark)) + nla_total_size(xfrm_user_sec_ctx_size(x->security)) + + nla_total_size(4) /* XFRMA_SA_PCPU */ + userpolicy_type_attrsize(); } @@ -3617,6 +3663,8 @@ static int build_acquire(struct sk_buff *skb, struct xfrm_state *x, err = xfrm_if_id_put(skb, xp->if_id); if (!err && xp->xdo.dev) err = copy_user_offload(&xp->xdo, skb); + if (!err && x->pcpu_num != UINT_MAX) + err = nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num); if (err) { nlmsg_cancel(skb, nlh); return err; From patchwork Wed Oct 23 10:53:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steffen Klassert X-Patchwork-Id: 13846909 X-Patchwork-Delegate: kuba@kernel.org Received: from a.mx.secunet.com (a.mx.secunet.com [62.96.220.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C27DA55896 for ; Wed, 23 Oct 2024 10:53:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.96.220.36 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729680842; cv=none; b=o3UEB5ptpikrHlEvCPKMnMn/tLzpfhWaEMSNykVaWn8gGpgCkMFjNbi1H8YUxHd8SmUT8BMfhIWQ04eIaSOpFgQsvGsvTqT/00/phQzsDacP2lK3UNVoha7E4sLVG5RF5UBvbrr+aOGAAyhK+RntzSi0h+4fdFL+f/Wk8sZJqmE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729680842; c=relaxed/simple; bh=cpoX40IbbeLwIC9wg+kztKG9JS1jEJ/19ItkCjaN8ws=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ljbvMqMoNYo0Ek/Ef1HVhG83UwY8iFmg+bG44vKfbdjJTsCtswGecgTacaaD9JF4enhkuzLjQRgNtMBaSiZEMDR9O/XX3NAj0oSH8tK2plKem0khvdJXmEffstLdEbDq9/EywYkFjm09+3Ah2yxThnRKPXP+P+XtCErKFZaBpKs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com; spf=pass smtp.mailfrom=secunet.com; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b=AO5CodWH; arc=none smtp.client-ip=62.96.220.36 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=secunet.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b="AO5CodWH" Received: from localhost (localhost [127.0.0.1]) by a.mx.secunet.com (Postfix) with ESMTP id 3B985207CA; Wed, 23 Oct 2024 12:53:52 +0200 (CEST) X-Virus-Scanned: by secunet Received: from a.mx.secunet.com ([127.0.0.1]) by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id V2FuOwNbKm-8; Wed, 23 Oct 2024 12:53:51 +0200 (CEST) Received: from cas-essen-01.secunet.de (rl1.secunet.de [10.53.40.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by a.mx.secunet.com (Postfix) with ESMTPS id 6639B201E5; Wed, 23 Oct 2024 12:53:51 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.11.0 a.mx.secunet.com 6639B201E5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=secunet.com; s=202301; t=1729680831; bh=Oag9hT3JqXn6OKuT6RcUCMlwJzNFkKqDVK/MhrCQ3g0=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=AO5CodWH7XJxBFRe6fPJ0ET89TX/Xc0J+dq4bVcx19yUErqsI5GMfzbGX+oJjrv0O FYWiOtWYQG90YjccvPSaayoGAxojwwP037gYTjxPIPr8DcJNCSa8Y50xiUYMrAlDHK J9ELh4o7y8LHpcq6ZXcecJwSbyCnzS/BeTATy/zxNdlQqN0hWjCJJxk+zQOS0rVOM0 LRyG0hSt3RZ8du8gR3W/R0178UKdDnXnq2fxf1gVNDi4X4pC4NaVHH+V5H/mcvSgO+ U71YIW3kvxM5id7IZkbdECOgYyJQUpaxx9q1CBslX1a2gKXQJ0DqYpFEzQRgYUqfOM AJKI2TCnUUHeg== Received: from mbx-essen-02.secunet.de (10.53.40.198) by cas-essen-01.secunet.de (10.53.40.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 23 Oct 2024 12:53:51 +0200 Received: from gauss2.secunet.de (10.182.7.193) by mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 23 Oct 2024 12:53:51 +0200 Received: by gauss2.secunet.de (Postfix, from userid 1000) id C461C3184D3B; Wed, 23 Oct 2024 12:53:50 +0200 (CEST) From: Steffen Klassert To: Tobias Brunner , Antony Antony , Daniel Xu , Paul Wouters , Simon Horman , Sabrina Dubroca CC: Steffen Klassert , , Subject: [PATCH v3 ipsec-next 2/4] xfrm: Cache used outbound xfrm states at the policy. Date: Wed, 23 Oct 2024 12:53:43 +0200 Message-ID: <20241023105345.1376856-3-steffen.klassert@secunet.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241023105345.1376856-1-steffen.klassert@secunet.com> References: <20241023105345.1376856-1-steffen.klassert@secunet.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: cas-essen-02.secunet.de (10.53.40.202) To mbx-essen-02.secunet.de (10.53.40.198) X-EXCLAIMER-MD-CONFIG: 2c86f778-e09b-4440-8b15-867914633a10 X-Patchwork-Delegate: kuba@kernel.org Now that we can have percpu xfrm states, the number of active states might increase. To get a better lookup performance, we cache the used xfrm states at the policy for outbound IPsec traffic. Signed-off-by: Steffen Klassert --- include/net/xfrm.h | 4 +++ net/xfrm/xfrm_policy.c | 12 +++++++++ net/xfrm/xfrm_state.c | 55 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 71 insertions(+) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index e23ad52824e2..0710efb8c143 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -184,6 +184,7 @@ struct xfrm_state { }; struct hlist_node byspi; struct hlist_node byseq; + struct hlist_node state_cache; refcount_t refcnt; spinlock_t lock; @@ -532,6 +533,7 @@ struct xfrm_policy_queue { * @xp_net: network namespace the policy lives in * @bydst: hlist node for SPD hash table or rbtree list * @byidx: hlist node for index hash table + * @state_cache_list: hlist head for policy cached xfrm states * @lock: serialize changes to policy structure members * @refcnt: reference count, freed once it reaches 0 * @pos: kernel internal tie-breaker to determine age of policy @@ -562,6 +564,8 @@ struct xfrm_policy { struct hlist_node bydst; struct hlist_node byidx; + struct hlist_head state_cache_list; + /* This lock only affects elements except for entry. */ rwlock_t lock; refcount_t refcnt; diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 914bac03b52a..82d1e0b9be70 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -413,6 +413,7 @@ struct xfrm_policy *xfrm_policy_alloc(struct net *net, gfp_t gfp) if (policy) { write_pnet(&policy->xp_net, net); INIT_LIST_HEAD(&policy->walk.all); + INIT_HLIST_HEAD(&policy->state_cache_list); INIT_HLIST_NODE(&policy->bydst); INIT_HLIST_NODE(&policy->byidx); rwlock_init(&policy->lock); @@ -454,6 +455,9 @@ EXPORT_SYMBOL(xfrm_policy_destroy); static void xfrm_policy_kill(struct xfrm_policy *policy) { + struct net *net = xp_net(policy); + struct xfrm_state *x; + xfrm_dev_policy_delete(policy); write_lock_bh(&policy->lock); @@ -469,6 +473,13 @@ static void xfrm_policy_kill(struct xfrm_policy *policy) if (del_timer(&policy->timer)) xfrm_pol_put(policy); + /* XXX: Flush state cache */ + spin_lock_bh(&net->xfrm.xfrm_state_lock); + hlist_for_each_entry_rcu(x, &policy->state_cache_list, state_cache) { + hlist_del_init_rcu(&x->state_cache); + } + spin_unlock_bh(&net->xfrm.xfrm_state_lock); + xfrm_pol_put(policy); } @@ -3249,6 +3260,7 @@ struct dst_entry *xfrm_lookup_with_ifid(struct net *net, dst_release(dst); dst = dst_orig; } + ok: xfrm_pols_put(pols, drop_pols); if (dst && dst->xfrm && diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index ebef07b80afa..a2047825f6c8 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -665,6 +665,7 @@ struct xfrm_state *xfrm_state_alloc(struct net *net) refcount_set(&x->refcnt, 1); atomic_set(&x->tunnel_users, 0); INIT_LIST_HEAD(&x->km.all); + INIT_HLIST_NODE(&x->state_cache); INIT_HLIST_NODE(&x->bydst); INIT_HLIST_NODE(&x->bysrc); INIT_HLIST_NODE(&x->byspi); @@ -744,12 +745,15 @@ int __xfrm_state_delete(struct xfrm_state *x) if (x->km.state != XFRM_STATE_DEAD) { x->km.state = XFRM_STATE_DEAD; + spin_lock(&net->xfrm.xfrm_state_lock); list_del(&x->km.all); hlist_del_rcu(&x->bydst); hlist_del_rcu(&x->bysrc); if (x->km.seq) hlist_del_rcu(&x->byseq); + if (!hlist_unhashed(&x->state_cache)) + hlist_del_rcu(&x->state_cache); if (x->id.spi) hlist_del_rcu(&x->byspi); net->xfrm.state_num--; @@ -1222,6 +1226,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, unsigned int sequence; struct km_event c; unsigned int pcpu_id; + bool cached = false; /* We need the cpu id just as a lookup key, * we don't require it to be stable. @@ -1234,6 +1239,46 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, sequence = read_seqcount_begin(&net->xfrm.xfrm_state_hash_generation); rcu_read_lock(); + hlist_for_each_entry_rcu(x, &pol->state_cache_list, state_cache) { + if (x->props.family == encap_family && + x->props.reqid == tmpl->reqid && + (mark & x->mark.m) == x->mark.v && + x->if_id == if_id && + !(x->props.flags & XFRM_STATE_WILDRECV) && + xfrm_state_addr_check(x, daddr, saddr, encap_family) && + tmpl->mode == x->props.mode && + tmpl->id.proto == x->id.proto && + (tmpl->id.spi == x->id.spi || !tmpl->id.spi)) + xfrm_state_look_at(pol, x, fl, encap_family, + &best, &acquire_in_progress, &error); + } + + if (best) + goto cached; + + hlist_for_each_entry_rcu(x, &pol->state_cache_list, state_cache) { + if (x->props.family == encap_family && + x->props.reqid == tmpl->reqid && + (mark & x->mark.m) == x->mark.v && + x->if_id == if_id && + !(x->props.flags & XFRM_STATE_WILDRECV) && + xfrm_addr_equal(&x->id.daddr, daddr, encap_family) && + tmpl->mode == x->props.mode && + tmpl->id.proto == x->id.proto && + (tmpl->id.spi == x->id.spi || !tmpl->id.spi)) + xfrm_state_look_at(pol, x, fl, family, + &best, &acquire_in_progress, &error); + } + +cached: + cached = true; + if (best) + goto found; + else if (error) + best = NULL; + else if (acquire_in_progress) /* XXX: acquire_in_progress should not happen */ + WARN_ON(1); + h = xfrm_dst_hash(net, daddr, saddr, tmpl->reqid, encap_family); hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h, bydst) { #ifdef CONFIG_XFRM_OFFLOAD @@ -1383,6 +1428,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, XFRM_STATE_INSERT(bysrc, &x->bysrc, net->xfrm.state_bysrc + h, x->xso.type); + INIT_HLIST_NODE(&x->state_cache); if (x->id.spi) { h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, encap_family); XFRM_STATE_INSERT(byspi, &x->byspi, @@ -1431,6 +1477,15 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, } else { *err = acquire_in_progress ? -EAGAIN : error; } + + if (x && x->km.state == XFRM_STATE_VALID && !cached && + (!(pol->flags & XFRM_POLICY_CPU_ACQUIRE) || x->pcpu_num == pcpu_id)) { + spin_lock_bh(&net->xfrm.xfrm_state_lock); + if (hlist_unhashed(&x->state_cache)) + hlist_add_head_rcu(&x->state_cache, &pol->state_cache_list); + spin_unlock_bh(&net->xfrm.xfrm_state_lock); + } + rcu_read_unlock(); if (to_put) xfrm_state_put(to_put); From patchwork Wed Oct 23 10:53:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steffen Klassert X-Patchwork-Id: 13846910 X-Patchwork-Delegate: kuba@kernel.org Received: from a.mx.secunet.com (a.mx.secunet.com [62.96.220.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D23551A2C32 for ; Wed, 23 Oct 2024 10:54:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.96.220.36 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729680842; cv=none; b=TrB+HmOPbr8wXfjVhRdliSJ6HAjQALlu0H4VCWEZ1tsBoUhZwKAfvp4Lp2oo/AIh844njCAWOOwrtTAsUrS+1hL9DHZravFlnuRWtXyK8/FlLq05OFnu5nVJEnTBamm102j0Jr6hAz3EFRKcOdjdesdq6xZow/Clp8viFSHxN0I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729680842; c=relaxed/simple; bh=/PrQvjIG2QoF3ppPcee53y6F/dKD2f+aSEl8YvEO+qE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=eGwfnfvKTC76aIv0UCnZfmj9b4s9obziU8yIQv0cSv+558FlbPvQjHRlA+8PEbWih9RIEojsV/TCPuo1w+LDHd91/nQ3j8M2QS+yKlmLyuP4UYzyN8tnhOltuYE2fPwDPpW4f+WcGS8GAw16SNWPsMPThASZP95jW5xcF14hfgM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com; spf=pass smtp.mailfrom=secunet.com; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b=w2j+6baJ; arc=none smtp.client-ip=62.96.220.36 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=secunet.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b="w2j+6baJ" Received: from localhost (localhost [127.0.0.1]) by a.mx.secunet.com (Postfix) with ESMTP id B862E20851; Wed, 23 Oct 2024 12:53:52 +0200 (CEST) X-Virus-Scanned: by secunet Received: from a.mx.secunet.com ([127.0.0.1]) by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MpjvSKXZFXo9; Wed, 23 Oct 2024 12:53:52 +0200 (CEST) Received: from cas-essen-01.secunet.de (rl1.secunet.de [10.53.40.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by a.mx.secunet.com (Postfix) with ESMTPS id A98B8207D8; Wed, 23 Oct 2024 12:53:51 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.11.0 a.mx.secunet.com A98B8207D8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=secunet.com; s=202301; t=1729680831; bh=QYvGCHvIchh/8KWWRAZFd3CllSqLWhFosN/CcjWudjs=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=w2j+6baJmkiOUfpQFVKH4n4wzVbP4PnnmUgq/JciLE8W3df6u0/mpFGRO9R4QNdUt AUMPXdmW2cwmoC9CWPXrHSPxV4CjYqW950nVQqf72/1QlJDyf/H1G7HV++bo2xj6Pc GhhhM1SPONET7wCWFLjB49WE7ck37Bqf1xGtGR7efPUx7x2pkw/9rDlzSnYZXPgFiY Gqzc4c+nbx+r6/W9GaqsSvBP32oXesekcvtq7lU0HQAy119Z7x0bjHaZtt+ooU2LD4 8ZkOORuQwdxtq1Y9HZxzDHU+YWeIyeEYFLrbQHRe+PYDGmKQln8ubwqpWlNjgbPkpA mE7EHaGNx3HvA== Received: from mbx-essen-02.secunet.de (10.53.40.198) by cas-essen-01.secunet.de (10.53.40.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 23 Oct 2024 12:53:51 +0200 Received: from gauss2.secunet.de (10.182.7.193) by mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 23 Oct 2024 12:53:51 +0200 Received: by gauss2.secunet.de (Postfix, from userid 1000) id CA0053184D54; Wed, 23 Oct 2024 12:53:50 +0200 (CEST) From: Steffen Klassert To: Tobias Brunner , Antony Antony , Daniel Xu , Paul Wouters , Simon Horman , Sabrina Dubroca CC: Steffen Klassert , , Subject: [PATCH v3 ipsec-next 3/4] xfrm: Add an inbound percpu state cache. Date: Wed, 23 Oct 2024 12:53:44 +0200 Message-ID: <20241023105345.1376856-4-steffen.klassert@secunet.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241023105345.1376856-1-steffen.klassert@secunet.com> References: <20241023105345.1376856-1-steffen.klassert@secunet.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: cas-essen-02.secunet.de (10.53.40.202) To mbx-essen-02.secunet.de (10.53.40.198) X-EXCLAIMER-MD-CONFIG: 2c86f778-e09b-4440-8b15-867914633a10 X-Patchwork-Delegate: kuba@kernel.org Now that we can have percpu xfrm states, the number of active states might increase. To get a better lookup performance, we add a percpu cache to cache the used inbound xfrm states. Signed-off-by: Steffen Klassert --- include/net/netns/xfrm.h | 1 + include/net/xfrm.h | 5 ++++ net/ipv4/esp4_offload.c | 6 ++--- net/ipv6/esp6_offload.c | 6 ++--- net/xfrm/xfrm_input.c | 2 +- net/xfrm/xfrm_state.c | 57 ++++++++++++++++++++++++++++++++++++++++ 6 files changed, 70 insertions(+), 7 deletions(-) diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h index d489d9250bff..4e0702598d52 100644 --- a/include/net/netns/xfrm.h +++ b/include/net/netns/xfrm.h @@ -43,6 +43,7 @@ struct netns_xfrm { struct hlist_head __rcu *state_bysrc; struct hlist_head __rcu *state_byspi; struct hlist_head __rcu *state_byseq; + struct hlist_head __percpu *state_cache_input; unsigned int state_hmask; unsigned int state_num; struct work_struct state_hash_work; diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 0710efb8c143..ecc35536853e 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -185,6 +185,7 @@ struct xfrm_state { struct hlist_node byspi; struct hlist_node byseq; struct hlist_node state_cache; + struct hlist_node state_cache_input; refcount_t refcnt; spinlock_t lock; @@ -1645,6 +1646,10 @@ int xfrm_state_update(struct xfrm_state *x); struct xfrm_state *xfrm_state_lookup(struct net *net, u32 mark, const xfrm_address_t *daddr, __be32 spi, u8 proto, unsigned short family); +struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, + const xfrm_address_t *daddr, + __be32 spi, u8 proto, + unsigned short family); struct xfrm_state *xfrm_state_lookup_byaddr(struct net *net, u32 mark, const xfrm_address_t *daddr, const xfrm_address_t *saddr, diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c index 80c4ea0e12f4..e0d94270da28 100644 --- a/net/ipv4/esp4_offload.c +++ b/net/ipv4/esp4_offload.c @@ -53,9 +53,9 @@ static struct sk_buff *esp4_gro_receive(struct list_head *head, if (sp->len == XFRM_MAX_DEPTH) goto out_reset; - x = xfrm_state_lookup(dev_net(skb->dev), skb->mark, - (xfrm_address_t *)&ip_hdr(skb)->daddr, - spi, IPPROTO_ESP, AF_INET); + x = xfrm_input_state_lookup(dev_net(skb->dev), skb->mark, + (xfrm_address_t *)&ip_hdr(skb)->daddr, + spi, IPPROTO_ESP, AF_INET); if (unlikely(x && x->dir && x->dir != XFRM_SA_DIR_IN)) { /* non-offload path will record the error and audit log */ diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c index 919ebfabbe4e..7b41fb4f00b5 100644 --- a/net/ipv6/esp6_offload.c +++ b/net/ipv6/esp6_offload.c @@ -80,9 +80,9 @@ static struct sk_buff *esp6_gro_receive(struct list_head *head, if (sp->len == XFRM_MAX_DEPTH) goto out_reset; - x = xfrm_state_lookup(dev_net(skb->dev), skb->mark, - (xfrm_address_t *)&ipv6_hdr(skb)->daddr, - spi, IPPROTO_ESP, AF_INET6); + x = xfrm_input_state_lookup(dev_net(skb->dev), skb->mark, + (xfrm_address_t *)&ipv6_hdr(skb)->daddr, + spi, IPPROTO_ESP, AF_INET6); if (unlikely(x && x->dir && x->dir != XFRM_SA_DIR_IN)) { /* non-offload path will record the error and audit log */ diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 749e7eea99e4..841a60a6fbfe 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -572,7 +572,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) goto drop; } - x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr, family); + x = xfrm_input_state_lookup(net, mark, daddr, spi, nexthdr, family); if (x == NULL) { secpath_reset(skb); XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES); diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index a2047825f6c8..e3266a5d4f90 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -754,6 +754,9 @@ int __xfrm_state_delete(struct xfrm_state *x) hlist_del_rcu(&x->byseq); if (!hlist_unhashed(&x->state_cache)) hlist_del_rcu(&x->state_cache); + if (!hlist_unhashed(&x->state_cache_input)) + hlist_del_rcu(&x->state_cache_input); + if (x->id.spi) hlist_del_rcu(&x->byspi); net->xfrm.state_num--; @@ -1106,6 +1109,52 @@ static struct xfrm_state *__xfrm_state_lookup(struct net *net, u32 mark, return NULL; } +struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, + const xfrm_address_t *daddr, + __be32 spi, u8 proto, + unsigned short family) +{ + struct hlist_head *state_cache_input; + struct xfrm_state *x = NULL; + int cpu = get_cpu(); + + state_cache_input = per_cpu_ptr(net->xfrm.state_cache_input, cpu); + + rcu_read_lock(); + hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) { + if (x->props.family != family || + x->id.spi != spi || + x->id.proto != proto || + !xfrm_addr_equal(&x->id.daddr, daddr, family)) + continue; + + if ((mark & x->mark.m) != x->mark.v) + continue; + if (!xfrm_state_hold_rcu(x)) + continue; + goto out; + } + + x = __xfrm_state_lookup(net, mark, daddr, spi, proto, family); + + if (x && x->km.state == XFRM_STATE_VALID) { + spin_lock_bh(&net->xfrm.xfrm_state_lock); + if (hlist_unhashed(&x->state_cache_input)) { + hlist_add_head_rcu(&x->state_cache_input, state_cache_input); + } else { + hlist_del_rcu(&x->state_cache_input); + hlist_add_head_rcu(&x->state_cache_input, state_cache_input); + } + spin_unlock_bh(&net->xfrm.xfrm_state_lock); + } + +out: + rcu_read_unlock(); + put_cpu(); + return x; +} +EXPORT_SYMBOL(xfrm_input_state_lookup); + static struct xfrm_state *__xfrm_state_lookup_byaddr(struct net *net, u32 mark, const xfrm_address_t *daddr, const xfrm_address_t *saddr, @@ -3079,6 +3128,11 @@ int __net_init xfrm_state_init(struct net *net) net->xfrm.state_byseq = xfrm_hash_alloc(sz); if (!net->xfrm.state_byseq) goto out_byseq; + + net->xfrm.state_cache_input = alloc_percpu(struct hlist_head); + if (!net->xfrm.state_cache_input) + goto out_state_cache_input; + net->xfrm.state_hmask = ((sz / sizeof(struct hlist_head)) - 1); net->xfrm.state_num = 0; @@ -3088,6 +3142,8 @@ int __net_init xfrm_state_init(struct net *net) &net->xfrm.xfrm_state_lock); return 0; +out_state_cache_input: + xfrm_hash_free(net->xfrm.state_byseq, sz); out_byseq: xfrm_hash_free(net->xfrm.state_byspi, sz); out_byspi: @@ -3117,6 +3173,7 @@ void xfrm_state_fini(struct net *net) xfrm_hash_free(net->xfrm.state_bysrc, sz); WARN_ON(!hlist_empty(net->xfrm.state_bydst)); xfrm_hash_free(net->xfrm.state_bydst, sz); + free_percpu(net->xfrm.state_cache_input); } #ifdef CONFIG_AUDITSYSCALL From patchwork Wed Oct 23 10:53:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steffen Klassert X-Patchwork-Id: 13846907 X-Patchwork-Delegate: kuba@kernel.org Received: from a.mx.secunet.com (a.mx.secunet.com [62.96.220.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C287E1A2658 for ; Wed, 23 Oct 2024 10:53:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.96.220.36 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729680841; cv=none; b=Hb7ruBIu0JM4p0DHdS3ft5Z2l5XPssT7t8pDX2lhPU/cbDujcGJ57qgRuG77fUElI0GkxB9isFL0WvbhZTTP9orzIGA+7Bx07DBuuq+2YFsg1GgVNRhgnwWUu0rRfGKUffdXyTBNIH7GawYgRVzINYGhwPgl0rLx0RqToGx5eVQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729680841; c=relaxed/simple; bh=eLd/sGRiS+pFCP6ne3bsworvj9oBXFUEGh4vG6Y3i0Q=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=GcjFvsBZTsvpTSARpdjwu0nF+hEmyYA7x3tFKjHEmOHww12k+E4yta9ugXSuhUbvRs5WKHZLLFfn+47E5nx8yo1LPXno7/K2Zg5EjORsXOFNfijjBX47U76Lru8pkZuz3khsUla9CLSzM3fdmtV60BRgf1A9ivbZoJXoxbniE0M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com; spf=pass smtp.mailfrom=secunet.com; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b=M6qcBaUB; arc=none smtp.client-ip=62.96.220.36 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=secunet.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b="M6qcBaUB" Received: from localhost (localhost [127.0.0.1]) by a.mx.secunet.com (Postfix) with ESMTP id A34EB2085B; Wed, 23 Oct 2024 12:53:52 +0200 (CEST) X-Virus-Scanned: by secunet Received: from a.mx.secunet.com ([127.0.0.1]) by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NbYJaY2a7lGi; Wed, 23 Oct 2024 12:53:52 +0200 (CEST) Received: from cas-essen-01.secunet.de (rl1.secunet.de [10.53.40.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by a.mx.secunet.com (Postfix) with ESMTPS id D20B420851; Wed, 23 Oct 2024 12:53:51 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.11.0 a.mx.secunet.com D20B420851 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=secunet.com; s=202301; t=1729680831; bh=BCuSOqqZ/r5jqkFe+vBmbyLNl/+cRzHkmTdEJNVTVtc=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=M6qcBaUBEaRpEBqHbOIxAVw4kZJcLf1mStzaAFoO+iJ3+glprWf8/UfVqilu9unMS Yj530MknJ94E2uGajIfrsFbd1YgmZNxHnfAXm9c50oIggExOVgG045BeOIp66NVxGH qTqpZWdrwyrtPO0emAs/R+5Q9e1EPMp4iXA0DMiP2PxpoWqx8bEmZ8xGbho63VIIhl HkPNV0r73aaSMC1gDW74A3mImSOa7OgzmTRm3E6JhnnOAUxY2P5vfIS1BbTq5tuJ+q 9PHwJye1PwfWQP0toPSVh9qX6eDHdznkBFZGC2N5CwjEI7oxud5ViRDnYoTR+AmTC0 9QJSoha083mPA== Received: from mbx-essen-02.secunet.de (10.53.40.198) by cas-essen-01.secunet.de (10.53.40.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 23 Oct 2024 12:53:51 +0200 Received: from gauss2.secunet.de (10.182.7.193) by mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 23 Oct 2024 12:53:51 +0200 Received: by gauss2.secunet.de (Postfix, from userid 1000) id CF7C63184DD3; Wed, 23 Oct 2024 12:53:50 +0200 (CEST) From: Steffen Klassert To: Tobias Brunner , Antony Antony , Daniel Xu , Paul Wouters , Simon Horman , Sabrina Dubroca CC: Steffen Klassert , , Subject: [PATCH v3 ipsec-next 4/4] xfrm: Restrict percpu SA attribute to specific netlink message types Date: Wed, 23 Oct 2024 12:53:45 +0200 Message-ID: <20241023105345.1376856-5-steffen.klassert@secunet.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241023105345.1376856-1-steffen.klassert@secunet.com> References: <20241023105345.1376856-1-steffen.klassert@secunet.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: cas-essen-01.secunet.de (10.53.40.201) To mbx-essen-02.secunet.de (10.53.40.198) X-EXCLAIMER-MD-CONFIG: 2c86f778-e09b-4440-8b15-867914633a10 X-Patchwork-Delegate: kuba@kernel.org Reject the usage of XFRMA_SA_PCPU in xfrm netlink messages when it's not applicable. Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_user.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index a9d071c93836..7bf7c870b851 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -3276,6 +3276,20 @@ static int xfrm_reject_unused_attr(int type, struct nlattr **attrs, } } + if (attrs[XFRMA_SA_PCPU]) { + switch (type) { + case XFRM_MSG_NEWSA: + case XFRM_MSG_UPDSA: + case XFRM_MSG_ALLOCSPI: + case XFRM_MSG_ACQUIRE: + + break; + default: + NL_SET_ERR_MSG(extack, "Invalid attribute SA_PCPU"); + return -EINVAL; + } + } + return 0; }