From patchwork Mon Jan 27 06:07:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steffen Klassert X-Patchwork-Id: 13950994 X-Patchwork-Delegate: kuba@kernel.org Received: from a.mx.secunet.com (a.mx.secunet.com [62.96.220.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF3421FC7CE for ; Mon, 27 Jan 2025 06:08:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.96.220.36 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737958086; cv=none; b=PFPheuq2ubHTOTl56qABunHEczWdQGHunSUmSbdHVlTiZ+jmFI43Q1F8XRxslayeJj2H812tvCeDAv58Cb5uUzUhIw2tvEDGnH0XEDJPvLv+5swolzhRVuO8HkP0WUIqLzXaGq5i0hdbyHmWB1f1wyriIekTypU67N6ehqiFmMc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737958086; c=relaxed/simple; bh=aWWohmYw2H9P//PGTeMTRlcBDUV7hhrycorrD8bAnN4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hKgYpIzqbfPO/UIAFYz12kM95xUqud6CFg4WcNThZMH0Rds/dd2ip7dv3TONYSA6RkWnHrZ0pHJoaCI7H9fR5lzlmqDdl0sG2VyJnLskIAQSQJg/A2bTjDRyjYxM0a82qDntjXfsPqNaxsdFAP3aTfCT6R+eHtBqEiI04HloPbw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com; spf=pass smtp.mailfrom=secunet.com; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b=sF4VNdi7; arc=none smtp.client-ip=62.96.220.36 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=secunet.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b="sF4VNdi7" Received: from localhost (localhost [127.0.0.1]) by a.mx.secunet.com (Postfix) with ESMTP id DA39020518; Mon, 27 Jan 2025 07:08:02 +0100 (CET) X-Virus-Scanned: by secunet Received: from a.mx.secunet.com ([127.0.0.1]) by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TyRTom4vjm6a; Mon, 27 Jan 2025 07:08:02 +0100 (CET) Received: from cas-essen-02.secunet.de (rl2.secunet.de [10.53.40.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by a.mx.secunet.com (Postfix) with ESMTPS id D8092207AC; Mon, 27 Jan 2025 07:08:00 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.11.0 a.mx.secunet.com D8092207AC DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=secunet.com; s=202301; t=1737958080; bh=UeOskQ6csRAAogDjKy0FM27y5bnGyr9x/0vNP5OdCvA=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=sF4VNdi7NeCZ7ZZvgyHUxjFFQxwsO4EelFi357QYHRsNl1Ey6NcrZjGqBhMg5OUJJ hbZ0P5mZj8mekoqOFxI7Jxg8mUtF5Mcaw/ZfOTxKMrJjOIOX3bJ9B0k5KAP287mABF 6GHCD1zgCnWE5YLAoU5nKqqjefJisRGaNbMAyVsEzVzXqQgTcFNSTgCUA74R5Lrhtv Wmexh6RVaMpTCfXA/OG2CgTsq6PXDLw8wyrfhezJrO986brTr8dJ/LSdA0TczjbJWX jxUHDmc5acxralb/oB2PqxxR7+0RiRwMcg0Oy1vNRpK4TXXAVywDbivhWvUy4HVZaW isQhOp+mh6+0A== Received: from mbx-essen-02.secunet.de (10.53.40.198) by cas-essen-02.secunet.de (10.53.40.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 27 Jan 2025 07:08:00 +0100 Received: from gauss2.secunet.de (10.182.7.193) by mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 27 Jan 2025 07:08:00 +0100 Received: by gauss2.secunet.de (Postfix, from userid 1000) id C7BFE3180585; Mon, 27 Jan 2025 07:07:59 +0100 (CET) From: Steffen Klassert To: David Miller , Jakub Kicinski CC: Herbert Xu , Steffen Klassert , Subject: [PATCH 1/5] xfrm: replay: Fix the update of replay_esn->oseq_hi for GSO Date: Mon, 27 Jan 2025 07:07:53 +0100 Message-ID: <20250127060757.3946314-2-steffen.klassert@secunet.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250127060757.3946314-1-steffen.klassert@secunet.com> References: <20250127060757.3946314-1-steffen.klassert@secunet.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: cas-essen-01.secunet.de (10.53.40.201) To mbx-essen-02.secunet.de (10.53.40.198) X-EXCLAIMER-MD-CONFIG: 2c86f778-e09b-4440-8b15-867914633a10 X-Patchwork-Delegate: kuba@kernel.org From: Jianbo Liu When skb needs GSO and wrap around happens, if xo->seq.low (seqno of the first skb segment) is before the last seq number but oseq (seqno of the last segment) is after it, xo->seq.low is still bigger than replay_esn->oseq while oseq is smaller than it, so the update of replay_esn->oseq_hi is missed for this case wrap around because of the change in the cited commit. For example, if sending a packet with gso_segs=3 while old replay_esn->oseq=0xfffffffe, we calculate: xo->seq.low = 0xfffffffe + 1 = 0x0xffffffff oseq = 0xfffffffe + 3 = 0x1 (oseq < replay_esn->oseq) is true, but (xo->seq.low < replay_esn->oseq) is false, so replay_esn->oseq_hi is not incremented. To fix this issue, change the outer checking back for the update of replay_esn->oseq_hi. And add new checking inside for the update of packet's oseq_hi. Fixes: 4b549ccce941 ("xfrm: replay: Fix ESN wrap around for GSO") Signed-off-by: Jianbo Liu Reviewed-by: Patrisious Haddad Signed-off-by: Leon Romanovsky Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_replay.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/xfrm/xfrm_replay.c b/net/xfrm/xfrm_replay.c index bc56c6305725..235bbefc2aba 100644 --- a/net/xfrm/xfrm_replay.c +++ b/net/xfrm/xfrm_replay.c @@ -714,10 +714,12 @@ static int xfrm_replay_overflow_offload_esn(struct xfrm_state *x, struct sk_buff oseq += skb_shinfo(skb)->gso_segs; } - if (unlikely(xo->seq.low < replay_esn->oseq)) { - XFRM_SKB_CB(skb)->seq.output.hi = ++oseq_hi; - xo->seq.hi = oseq_hi; - replay_esn->oseq_hi = oseq_hi; + if (unlikely(oseq < replay_esn->oseq)) { + replay_esn->oseq_hi = ++oseq_hi; + if (xo->seq.low < replay_esn->oseq) { + XFRM_SKB_CB(skb)->seq.output.hi = oseq_hi; + xo->seq.hi = oseq_hi; + } if (replay_esn->oseq_hi == 0) { replay_esn->oseq--; replay_esn->oseq_hi--; From patchwork Mon Jan 27 06:07:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steffen Klassert X-Patchwork-Id: 13950996 X-Patchwork-Delegate: kuba@kernel.org Received: from a.mx.secunet.com (a.mx.secunet.com [62.96.220.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 428081FBEBB for ; Mon, 27 Jan 2025 06:08:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.96.220.36 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737958086; cv=none; b=TwDq1Avb8I5tRu4Q/QZ3EmikQeyuz7CnfvjaqY3COjQDr/qon8G6PujlefxhBdiwyc/sAPRXBQhp0QLz8E69YxZAOZxe6/snBdSMJ6ETNKL/UmydFXW67lTLf099lCcAqITiQRkLitNOzrqTlv9JQgLW0uWcxXWicQpzlg5epBk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737958086; c=relaxed/simple; bh=qvFhziT254o4L4NkTSjefSsh+TpcA4eNxqbb6kika78=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=umhsqcSo+binH77i13aewv0IKVraFCE6+sgaiSa8DMnqAkSvDX569Kb/srPqnSZyLDOJlifZBqXxVJjHDIa4WKgLlnu2eBfIxasXTh4LF0kC79u/glyeAOJGO4StJA77n4uFJFb8/sdRTg5Vv91eWlGyoqHkwrt1b2ZoUWf3Ffs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com; spf=pass smtp.mailfrom=secunet.com; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b=SrCy34E9; arc=none smtp.client-ip=62.96.220.36 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=secunet.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b="SrCy34E9" Received: from localhost (localhost [127.0.0.1]) by a.mx.secunet.com (Postfix) with ESMTP id A4282201E2; Mon, 27 Jan 2025 07:08:02 +0100 (CET) X-Virus-Scanned: by secunet Received: from a.mx.secunet.com ([127.0.0.1]) by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pDnnTeyHJdQL; Mon, 27 Jan 2025 07:08:01 +0100 (CET) Received: from cas-essen-02.secunet.de (rl2.secunet.de [10.53.40.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by a.mx.secunet.com (Postfix) with ESMTPS id B2F0120518; Mon, 27 Jan 2025 07:08:00 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.11.0 a.mx.secunet.com B2F0120518 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=secunet.com; s=202301; t=1737958080; bh=BUKKA63mDsYNpAQE70jIuYZ1vdfg+FbF+jLS+ZVlnDg=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=SrCy34E9jGAYwh6oRcpYooe+Nbj2YujMO9cAjkL2qvbsXgj8eIFeBZJ+doNOcw++o hr9FQJu+FxWL4luf6ElByUlkYXroZEDeXmKqfo/ykNa/zPcF8JtmwIjHAiLwY4E7C0 0HoIs4pkeoIrl0uoEvNqAIKGxplW5LryGS3Dcdx+f8uaEw3IOf51G7CWKU2CLbr945 C9EgWsFWkAp+MfVd3eQRW4wL4DkF6Jmu+COKdjSiS1FA6e3bKUBBGObbw1z6zCdLVw UqmFhMiMMNFc+LuiH3bkplpSkPJp4NIG625dYIQYJH2W8KF2vlYJv+AzjKDav0R0UT RiuWtj+8uI9Zg== Received: from mbx-essen-02.secunet.de (10.53.40.198) by cas-essen-02.secunet.de (10.53.40.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 27 Jan 2025 07:08:00 +0100 Received: from gauss2.secunet.de (10.182.7.193) by mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 27 Jan 2025 07:07:59 +0100 Received: by gauss2.secunet.de (Postfix, from userid 1000) id CD0123183FAF; Mon, 27 Jan 2025 07:07:59 +0100 (CET) From: Steffen Klassert To: David Miller , Jakub Kicinski CC: Herbert Xu , Steffen Klassert , Subject: [PATCH 2/5] xfrm: state: fix out-of-bounds read during lookup Date: Mon, 27 Jan 2025 07:07:54 +0100 Message-ID: <20250127060757.3946314-3-steffen.klassert@secunet.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250127060757.3946314-1-steffen.klassert@secunet.com> References: <20250127060757.3946314-1-steffen.klassert@secunet.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: cas-essen-02.secunet.de (10.53.40.202) To mbx-essen-02.secunet.de (10.53.40.198) X-EXCLAIMER-MD-CONFIG: 2c86f778-e09b-4440-8b15-867914633a10 X-Patchwork-Delegate: kuba@kernel.org From: Florian Westphal lookup and resize can run in parallel. The xfrm_state_hash_generation seqlock ensures a retry, but the hash functions can observe a hmask value that is too large for the new hlist array. rehash does: rcu_assign_pointer(net->xfrm.state_bydst, ndst) [..] net->xfrm.state_hmask = nhashmask; While state lookup does: h = xfrm_dst_hash(net, daddr, saddr, tmpl->reqid, encap_family); hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h, bydst) { This is only safe in case the update to state_bydst is larger than net->xfrm.xfrm_state_hmask (or if the lookup function gets serialized via state spinlock again). Fix this by prefetching state_hmask and the associated pointers. The xfrm_state_hash_generation seqlock retry will ensure that the pointer and the hmask will be consistent. The existing helpers, like xfrm_dst_hash(), are now unsafe for RCU side, add lockdep assertions to document that they are only safe for insert side. xfrm_state_lookup_byaddr() uses the spinlock rather than RCU. AFAICS this is an oversight from back when state lookup was converted to RCU, this lock should be replaced with RCU in a future patch. Reported-by: syzbot+5f9f31cb7d985f584d8e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/CACT4Y+azwfrE3uz6A5ZErov5YN2LYBN5KrsymBerT36VU8qzBA@mail.gmail.com/ Diagnosed-by: Dmitry Vyukov Fixes: c2f672fc9464 ("xfrm: state lookup can be lockless") Signed-off-by: Florian Westphal Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_state.c | 89 ++++++++++++++++++++++++++++++++++--------- 1 file changed, 70 insertions(+), 19 deletions(-) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 67ca7ac955a3..1781728ca428 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -34,6 +34,8 @@ #define xfrm_state_deref_prot(table, net) \ rcu_dereference_protected((table), lockdep_is_held(&(net)->xfrm.xfrm_state_lock)) +#define xfrm_state_deref_check(table, net) \ + rcu_dereference_check((table), lockdep_is_held(&(net)->xfrm.xfrm_state_lock)) static void xfrm_state_gc_task(struct work_struct *work); @@ -62,6 +64,8 @@ static inline unsigned int xfrm_dst_hash(struct net *net, u32 reqid, unsigned short family) { + lockdep_assert_held(&net->xfrm.xfrm_state_lock); + return __xfrm_dst_hash(daddr, saddr, reqid, family, net->xfrm.state_hmask); } @@ -70,6 +74,8 @@ static inline unsigned int xfrm_src_hash(struct net *net, const xfrm_address_t *saddr, unsigned short family) { + lockdep_assert_held(&net->xfrm.xfrm_state_lock); + return __xfrm_src_hash(daddr, saddr, family, net->xfrm.state_hmask); } @@ -77,11 +83,15 @@ static inline unsigned int xfrm_spi_hash(struct net *net, const xfrm_address_t *daddr, __be32 spi, u8 proto, unsigned short family) { + lockdep_assert_held(&net->xfrm.xfrm_state_lock); + return __xfrm_spi_hash(daddr, spi, proto, family, net->xfrm.state_hmask); } static unsigned int xfrm_seq_hash(struct net *net, u32 seq) { + lockdep_assert_held(&net->xfrm.xfrm_state_lock); + return __xfrm_seq_hash(seq, net->xfrm.state_hmask); } @@ -1041,16 +1051,38 @@ xfrm_init_tempstate(struct xfrm_state *x, const struct flowi *fl, x->props.family = tmpl->encap_family; } -static struct xfrm_state *__xfrm_state_lookup_all(struct net *net, u32 mark, +struct xfrm_hash_state_ptrs { + const struct hlist_head *bydst; + const struct hlist_head *bysrc; + const struct hlist_head *byspi; + unsigned int hmask; +}; + +static void xfrm_hash_ptrs_get(const struct net *net, struct xfrm_hash_state_ptrs *ptrs) +{ + unsigned int sequence; + + do { + sequence = read_seqcount_begin(&net->xfrm.xfrm_state_hash_generation); + + ptrs->bydst = xfrm_state_deref_check(net->xfrm.state_bydst, net); + ptrs->bysrc = xfrm_state_deref_check(net->xfrm.state_bysrc, net); + ptrs->byspi = xfrm_state_deref_check(net->xfrm.state_byspi, net); + ptrs->hmask = net->xfrm.state_hmask; + } while (read_seqcount_retry(&net->xfrm.xfrm_state_hash_generation, sequence)); +} + +static struct xfrm_state *__xfrm_state_lookup_all(const struct xfrm_hash_state_ptrs *state_ptrs, + u32 mark, const xfrm_address_t *daddr, __be32 spi, u8 proto, unsigned short family, struct xfrm_dev_offload *xdo) { - unsigned int h = xfrm_spi_hash(net, daddr, spi, proto, family); + unsigned int h = __xfrm_spi_hash(daddr, spi, proto, family, state_ptrs->hmask); struct xfrm_state *x; - hlist_for_each_entry_rcu(x, net->xfrm.state_byspi + h, byspi) { + hlist_for_each_entry_rcu(x, state_ptrs->byspi + h, byspi) { #ifdef CONFIG_XFRM_OFFLOAD if (xdo->type == XFRM_DEV_OFFLOAD_PACKET) { if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) @@ -1084,15 +1116,16 @@ static struct xfrm_state *__xfrm_state_lookup_all(struct net *net, u32 mark, return NULL; } -static struct xfrm_state *__xfrm_state_lookup(struct net *net, u32 mark, +static struct xfrm_state *__xfrm_state_lookup(const struct xfrm_hash_state_ptrs *state_ptrs, + u32 mark, const xfrm_address_t *daddr, __be32 spi, u8 proto, unsigned short family) { - unsigned int h = xfrm_spi_hash(net, daddr, spi, proto, family); + unsigned int h = __xfrm_spi_hash(daddr, spi, proto, family, state_ptrs->hmask); struct xfrm_state *x; - hlist_for_each_entry_rcu(x, net->xfrm.state_byspi + h, byspi) { + hlist_for_each_entry_rcu(x, state_ptrs->byspi + h, byspi) { if (x->props.family != family || x->id.spi != spi || x->id.proto != proto || @@ -1114,6 +1147,7 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, __be32 spi, u8 proto, unsigned short family) { + struct xfrm_hash_state_ptrs state_ptrs; struct hlist_head *state_cache_input; struct xfrm_state *x = NULL; int cpu = get_cpu(); @@ -1135,7 +1169,9 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, goto out; } - x = __xfrm_state_lookup(net, mark, daddr, spi, proto, family); + xfrm_hash_ptrs_get(net, &state_ptrs); + + x = __xfrm_state_lookup(&state_ptrs, mark, daddr, spi, proto, family); if (x && x->km.state == XFRM_STATE_VALID) { spin_lock_bh(&net->xfrm.xfrm_state_lock); @@ -1155,15 +1191,16 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, } EXPORT_SYMBOL(xfrm_input_state_lookup); -static struct xfrm_state *__xfrm_state_lookup_byaddr(struct net *net, u32 mark, +static struct xfrm_state *__xfrm_state_lookup_byaddr(const struct xfrm_hash_state_ptrs *state_ptrs, + u32 mark, const xfrm_address_t *daddr, const xfrm_address_t *saddr, u8 proto, unsigned short family) { - unsigned int h = xfrm_src_hash(net, daddr, saddr, family); + unsigned int h = __xfrm_src_hash(daddr, saddr, family, state_ptrs->hmask); struct xfrm_state *x; - hlist_for_each_entry_rcu(x, net->xfrm.state_bysrc + h, bysrc) { + hlist_for_each_entry_rcu(x, state_ptrs->bysrc + h, bysrc) { if (x->props.family != family || x->id.proto != proto || !xfrm_addr_equal(&x->id.daddr, daddr, family) || @@ -1183,14 +1220,17 @@ static struct xfrm_state *__xfrm_state_lookup_byaddr(struct net *net, u32 mark, static inline struct xfrm_state * __xfrm_state_locate(struct xfrm_state *x, int use_spi, int family) { + struct xfrm_hash_state_ptrs state_ptrs; struct net *net = xs_net(x); u32 mark = x->mark.v & x->mark.m; + xfrm_hash_ptrs_get(net, &state_ptrs); + if (use_spi) - return __xfrm_state_lookup(net, mark, &x->id.daddr, + return __xfrm_state_lookup(&state_ptrs, mark, &x->id.daddr, x->id.spi, x->id.proto, family); else - return __xfrm_state_lookup_byaddr(net, mark, + return __xfrm_state_lookup_byaddr(&state_ptrs, mark, &x->id.daddr, &x->props.saddr, x->id.proto, family); @@ -1264,6 +1304,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, unsigned short family, u32 if_id) { static xfrm_address_t saddr_wildcard = { }; + struct xfrm_hash_state_ptrs state_ptrs; struct net *net = xp_net(pol); unsigned int h, h_wildcard; struct xfrm_state *x, *x0, *to_put; @@ -1328,8 +1369,10 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, else if (acquire_in_progress) /* XXX: acquire_in_progress should not happen */ WARN_ON(1); - h = xfrm_dst_hash(net, daddr, saddr, tmpl->reqid, encap_family); - hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h, bydst) { + xfrm_hash_ptrs_get(net, &state_ptrs); + + h = __xfrm_dst_hash(daddr, saddr, tmpl->reqid, encap_family, state_ptrs.hmask); + hlist_for_each_entry_rcu(x, state_ptrs.bydst + h, bydst) { #ifdef CONFIG_XFRM_OFFLOAD if (pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) @@ -1362,8 +1405,9 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, if (best || acquire_in_progress) goto found; - h_wildcard = xfrm_dst_hash(net, daddr, &saddr_wildcard, tmpl->reqid, encap_family); - hlist_for_each_entry_rcu(x, net->xfrm.state_bydst + h_wildcard, bydst) { + h_wildcard = __xfrm_dst_hash(daddr, &saddr_wildcard, tmpl->reqid, + encap_family, state_ptrs.hmask); + hlist_for_each_entry_rcu(x, state_ptrs.bydst + h_wildcard, bydst) { #ifdef CONFIG_XFRM_OFFLOAD if (pol->xdo.type == XFRM_DEV_OFFLOAD_PACKET) { if (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) @@ -1401,7 +1445,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, if (!x && !error && !acquire_in_progress) { if (tmpl->id.spi && - (x0 = __xfrm_state_lookup_all(net, mark, daddr, + (x0 = __xfrm_state_lookup_all(&state_ptrs, mark, daddr, tmpl->id.spi, tmpl->id.proto, encap_family, &pol->xdo)) != NULL) { @@ -2180,10 +2224,13 @@ struct xfrm_state * xfrm_state_lookup(struct net *net, u32 mark, const xfrm_address_t *daddr, __be32 spi, u8 proto, unsigned short family) { + struct xfrm_hash_state_ptrs state_ptrs; struct xfrm_state *x; rcu_read_lock(); - x = __xfrm_state_lookup(net, mark, daddr, spi, proto, family); + xfrm_hash_ptrs_get(net, &state_ptrs); + + x = __xfrm_state_lookup(&state_ptrs, mark, daddr, spi, proto, family); rcu_read_unlock(); return x; } @@ -2194,10 +2241,14 @@ xfrm_state_lookup_byaddr(struct net *net, u32 mark, const xfrm_address_t *daddr, const xfrm_address_t *saddr, u8 proto, unsigned short family) { + struct xfrm_hash_state_ptrs state_ptrs; struct xfrm_state *x; spin_lock_bh(&net->xfrm.xfrm_state_lock); - x = __xfrm_state_lookup_byaddr(net, mark, daddr, saddr, proto, family); + + xfrm_hash_ptrs_get(net, &state_ptrs); + + x = __xfrm_state_lookup_byaddr(&state_ptrs, mark, daddr, saddr, proto, family); spin_unlock_bh(&net->xfrm.xfrm_state_lock); return x; } From patchwork Mon Jan 27 06:07:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steffen Klassert X-Patchwork-Id: 13950995 X-Patchwork-Delegate: kuba@kernel.org Received: from a.mx.secunet.com (a.mx.secunet.com [62.96.220.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 427A91547FE for ; Mon, 27 Jan 2025 06:08:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.96.220.36 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737958086; cv=none; b=RWiuevZcX8GP28j6qkwIf0S+Q40F0eOmx5ntfLOQ4wwNXwpSaLoCrMsg/Yxe8hmG49fO+xb/QteFMR5zGbcGC8g58mmtT7jzCsyaKkSCG/u8ESojAIxF7KvEAeelo+Sw9EL+UYre3VLN8WrRc9/sNraXxbnMVp8UwejRZYY3luY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737958086; c=relaxed/simple; bh=7O3OoG7dgTMC6b8FNlEeBVEMP9ZH/s+axbT5y6TpsQY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=f4aM1eknbdv+SYtaVFpmIGcA6FQ/40aAFKCyem8G8XhC0/rMhZX6dLb6aJtB/4O/RJLUQ/Ttrxt5wjLunWvEkP1FqILG8anCRIdT4hYVlM1ywEy46hA4YW48/DnF9X8P2CIfb3U75onvQDx/iV/xfrg5OY2C9qRivakpRYpxJp4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com; spf=pass smtp.mailfrom=secunet.com; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b=qW2CDj3U; arc=none smtp.client-ip=62.96.220.36 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=secunet.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b="qW2CDj3U" Received: from localhost (localhost [127.0.0.1]) by a.mx.secunet.com (Postfix) with ESMTP id 5B2202074B; Mon, 27 Jan 2025 07:08:02 +0100 (CET) X-Virus-Scanned: by secunet Received: from a.mx.secunet.com ([127.0.0.1]) by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id quLJU4-pB8iI; Mon, 27 Jan 2025 07:08:00 +0100 (CET) Received: from cas-essen-02.secunet.de (rl2.secunet.de [10.53.40.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by a.mx.secunet.com (Postfix) with ESMTPS id 53A3E201E2; Mon, 27 Jan 2025 07:08:00 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.11.0 a.mx.secunet.com 53A3E201E2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=secunet.com; s=202301; t=1737958080; bh=KfhSoTFyNphIRbJ4FkOafQH9Skam5N4rpGsYYCdAFHQ=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=qW2CDj3UJKfa0+QJDU0X9odv9SkRSOxow5v6/OdzMFLOUkgRGSTGZIN6aE8VHw151 UV9j6uq5a7xZ4sf5E9jnQAdhM3l8TPdHxkAgJ6Nkj6Md25nlxpZ6n6/oe7cBDHBBr3 acRadQF0ZldMTzva5m+TC3i4fpVkHSME/00D21vTu86eFpxl/9oeB6MucpaU8zrHqH IeUY9+TieQvo1O7ggh+WxfQFZMvzRo2ccR/0sdFsnLOAzrvRcAraeEI5wxIy2sbWET U5tcPakf8IMJxBvU6c8WH4v0ryH3vIIz6Lrpa3Gc9LcS9CsLw9izpl62L1TbPnrmGI RfMRWGQszq2Bw== Received: from mbx-essen-02.secunet.de (10.53.40.198) by cas-essen-02.secunet.de (10.53.40.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 27 Jan 2025 07:08:00 +0100 Received: from gauss2.secunet.de (10.182.7.193) by mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 27 Jan 2025 07:07:59 +0100 Received: by gauss2.secunet.de (Postfix, from userid 1000) id D12B63184110; Mon, 27 Jan 2025 07:07:59 +0100 (CET) From: Steffen Klassert To: David Miller , Jakub Kicinski CC: Herbert Xu , Steffen Klassert , Subject: [PATCH 3/5] xfrm: delete intermediate secpath entry in packet offload mode Date: Mon, 27 Jan 2025 07:07:55 +0100 Message-ID: <20250127060757.3946314-4-steffen.klassert@secunet.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250127060757.3946314-1-steffen.klassert@secunet.com> References: <20250127060757.3946314-1-steffen.klassert@secunet.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: cas-essen-02.secunet.de (10.53.40.202) To mbx-essen-02.secunet.de (10.53.40.198) X-EXCLAIMER-MD-CONFIG: 2c86f778-e09b-4440-8b15-867914633a10 X-Patchwork-Delegate: kuba@kernel.org From: Alexandre Cassen Packets handled by hardware have added secpath as a way to inform XFRM core code that this path was already handled. That secpath is not needed at all after policy is checked and it is removed later in the stack. However, in the case of IP forwarding is enabled (/proc/sys/net/ipv4/ip_forward), that secpath is not removed and packets which already were handled are reentered to the driver TX path with xfrm_offload set. The following kernel panic is observed in mlx5 in such case: mlx5_core 0000:04:00.0 enp4s0f0np0: Link up mlx5_core 0000:04:00.1 enp4s0f1np1: Link up Initializing XFRM netlink socket IPsec XFRM device driver BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 0 P4D 0 Oops: Oops: 0010 [#1] PREEMPT SMP CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.13.0-rc1-alex #3 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffb87380003800 EFLAGS: 00010206 RAX: ffff8df004e02600 RBX: ffffb873800038d8 RCX: 00000000ffff98cf RDX: ffff8df00733e108 RSI: ffff8df00521fb80 RDI: ffff8df001661f00 RBP: ffffb87380003850 R08: ffff8df013980000 R09: 0000000000000010 R10: 0000000000000002 R11: 0000000000000002 R12: ffff8df001661f00 R13: ffff8df00521fb80 R14: ffff8df00733e108 R15: ffff8df011faf04e FS: 0000000000000000(0000) GS:ffff8df46b800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000106384000 CR4: 0000000000350ef0 Call Trace: ? show_regs+0x63/0x70 ? __die_body+0x20/0x60 ? __die+0x2b/0x40 ? page_fault_oops+0x15c/0x550 ? do_user_addr_fault+0x3ed/0x870 ? exc_page_fault+0x7f/0x190 ? asm_exc_page_fault+0x27/0x30 mlx5e_ipsec_handle_tx_skb+0xe7/0x2f0 [mlx5_core] mlx5e_xmit+0x58e/0x1980 [mlx5_core] ? __fib_lookup+0x6a/0xb0 dev_hard_start_xmit+0x82/0x1d0 sch_direct_xmit+0xfe/0x390 __dev_queue_xmit+0x6d8/0xee0 ? __fib_lookup+0x6a/0xb0 ? internal_add_timer+0x48/0x70 ? mod_timer+0xe2/0x2b0 neigh_resolve_output+0x115/0x1b0 __neigh_update+0x26a/0xc50 neigh_update+0x14/0x20 arp_process+0x2cb/0x8e0 ? __napi_build_skb+0x5e/0x70 arp_rcv+0x11e/0x1c0 ? dev_gro_receive+0x574/0x820 __netif_receive_skb_list_core+0x1cf/0x1f0 netif_receive_skb_list_internal+0x183/0x2a0 napi_complete_done+0x76/0x1c0 mlx5e_napi_poll+0x234/0x7a0 [mlx5_core] __napi_poll+0x2d/0x1f0 net_rx_action+0x1a6/0x370 ? atomic_notifier_call_chain+0x3b/0x50 ? irq_int_handler+0x15/0x20 [mlx5_core] handle_softirqs+0xb9/0x2f0 ? handle_irq_event+0x44/0x60 irq_exit_rcu+0xdb/0x100 common_interrupt+0x98/0xc0 asm_common_interrupt+0x27/0x40 RIP: 0010:pv_native_safe_halt+0xb/0x10 Code: 09 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 22 0f 1f 84 00 00 00 00 00 90 eb 07 0f 00 2d 7f e9 36 00 fb 40 00 83 ff 07 77 21 89 ff ff 24 fd 88 3d a1 bd 0f 21 f8 RSP: 0018:ffffffffbe603de8 EFLAGS: 00000202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000f92f46680 RDX: 0000000000000037 RSI: 00000000ffffffff RDI: 00000000000518d4 RBP: ffffffffbe603df0 R08: 000000cd42e4dffb R09: ffffffffbe603d70 R10: 0000004d80d62680 R11: 0000000000000001 R12: ffffffffbe60bf40 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffbe60aff8 ? default_idle+0x9/0x20 arch_cpu_idle+0x9/0x10 default_idle_call+0x29/0xf0 do_idle+0x1f2/0x240 cpu_startup_entry+0x2c/0x30 rest_init+0xe7/0x100 start_kernel+0x76b/0xb90 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xc0/0x110 ? setup_ghcb+0xe/0x130 common_startup_64+0x13e/0x141 Modules linked in: esp4_offload esp4 xfrm_interface xfrm6_tunnel tunnel4 tunnel6 xfrm_user xfrm_algo binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd ccp kvm input_leds serio_raw qemu_fw_cfg sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx libcrc32c raid1 raid0 mlx5_core crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 ahci mlxfw i2c_i801 libahci i2c_mux i2c_smbus psample virtio_rng pci_hyperv_intf aesni_intel crypto_simd cryptd CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffb87380003800 EFLAGS: 00010206 RAX: ffff8df004e02600 RBX: ffffb873800038d8 RCX: 00000000ffff98cf RDX: ffff8df00733e108 RSI: ffff8df00521fb80 RDI: ffff8df001661f00 RBP: ffffb87380003850 R08: ffff8df013980000 R09: 0000000000000010 R10: 0000000000000002 R11: 0000000000000002 R12: ffff8df001661f00 R13: ffff8df00521fb80 R14: ffff8df00733e108 R15: ffff8df011faf04e FS: 0000000000000000(0000) GS:ffff8df46b800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000106384000 CR4: 0000000000350ef0 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x3b800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: 5958372ddf62 ("xfrm: add RX datapath protection for IPsec packet offload mode") Signed-off-by: Alexandre Cassen Signed-off-by: Leon Romanovsky Signed-off-by: Steffen Klassert --- include/net/xfrm.h | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 32c09e85a64c..2c4eda6a8596 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -1224,9 +1224,19 @@ static inline int __xfrm_policy_check2(struct sock *sk, int dir, if (xo) { x = xfrm_input_state(skb); - if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET) - return (xo->flags & CRYPTO_DONE) && - (xo->status & CRYPTO_SUCCESS); + if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET) { + bool check = (xo->flags & CRYPTO_DONE) && + (xo->status & CRYPTO_SUCCESS); + + /* The packets here are plain ones and secpath was + * needed to indicate that hardware already handled + * them and there is no need to do nothing in addition. + * + * Consume secpath which was set by drivers. + */ + secpath_reset(skb); + return check; + } } return __xfrm_check_nopolicy(net, skb, dir) || From patchwork Mon Jan 27 06:07:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steffen Klassert X-Patchwork-Id: 13950993 X-Patchwork-Delegate: kuba@kernel.org Received: from a.mx.secunet.com (a.mx.secunet.com [62.96.220.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2EA77291E for ; Mon, 27 Jan 2025 06:08:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.96.220.36 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737958086; cv=none; b=tLT58uwvL8VpCbdjhl7ZtcHHag9vZhYqv/ur4PYG01PpSNYvgbOMGxAOp5RD3Xh2IrrmxQGbcNfhCyiAnMUFTxUgUgy/S6yO/Qt7eWaQJ4h8UAfjCBffTliDcQRwSGymwZ5b+r0IZ+wGE31jGpa+xN+hoJUMJ21NJLK8QnlxfUw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737958086; c=relaxed/simple; bh=VsaGG5NR94ct6n+k56hN1fyPxcyLKrK9QNuUJBCy+ZU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=JBx2IYf2tcnbnEDx3g/0aPaN/jiXRZix7fjHJJCIqalpNSFHC8fvZHVJp07Y0hDE2A6EO/Dzou/Lp8T8tNHYNna/1DlqeDGB8zNSJNV6FwNQN+6y1hnTpnagjVjXjVJiqC6IThaKM3ZKjEP1lG9kHT1zicm698j1RuD0AEhsiaM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com; spf=pass smtp.mailfrom=secunet.com; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b=cgtYwLNN; arc=none smtp.client-ip=62.96.220.36 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=secunet.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b="cgtYwLNN" Received: from localhost (localhost [127.0.0.1]) by a.mx.secunet.com (Postfix) with ESMTP id 3D1F7207B3; Mon, 27 Jan 2025 07:08:01 +0100 (CET) X-Virus-Scanned: by secunet Received: from a.mx.secunet.com ([127.0.0.1]) by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sPC5Fu0LZyBZ; Mon, 27 Jan 2025 07:08:00 +0100 (CET) Received: from cas-essen-02.secunet.de (rl2.secunet.de [10.53.40.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by a.mx.secunet.com (Postfix) with ESMTPS id 768D52074B; Mon, 27 Jan 2025 07:08:00 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.11.0 a.mx.secunet.com 768D52074B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=secunet.com; s=202301; t=1737958080; bh=4doNUptqiuVBiYhQzBX6nBtkgeRT8MZ+1fdSCAjjB4w=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=cgtYwLNN0Ng/f74kHYi3xr0vkZWgZBf2ihI2/WedmBsUx2QCaRJ9pa5EdKLrp1rG6 YIGdJy5Q8KRJEIZFyy4qRpRV2dvcuBwytbOQmYoMwWhvywdXlp71wwOH0I0u19S6Np 9T5SkgXFMc9g65QjZKVNHGyAyr5TT4TVRGizo2Hs2VlAFaBoeQrUxtL9NTGMZvRvT7 LJQLuO/SuJZMVoelxY1MN5pWu23lnEaZA3j4HaRnIQwbeW7wIA8fu71D3Eqx2/jP9F eM2hT0/P6xMCwoVkjJ5eygDMkILH7mNq1F/wEIgxC5BaepT2mqotLqYM1j9a6Rezvo xbClr5kSshfxw== Received: from mbx-essen-02.secunet.de (10.53.40.198) by cas-essen-02.secunet.de (10.53.40.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 27 Jan 2025 07:08:00 +0100 Received: from gauss2.secunet.de (10.182.7.193) by mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 27 Jan 2025 07:08:00 +0100 Received: by gauss2.secunet.de (Postfix, from userid 1000) id D537D3184172; Mon, 27 Jan 2025 07:07:59 +0100 (CET) From: Steffen Klassert To: David Miller , Jakub Kicinski CC: Herbert Xu , Steffen Klassert , Subject: [PATCH 4/5] xfrm: Fix the usage of skb->sk Date: Mon, 27 Jan 2025 07:07:56 +0100 Message-ID: <20250127060757.3946314-5-steffen.klassert@secunet.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250127060757.3946314-1-steffen.klassert@secunet.com> References: <20250127060757.3946314-1-steffen.klassert@secunet.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: cas-essen-02.secunet.de (10.53.40.202) To mbx-essen-02.secunet.de (10.53.40.198) X-EXCLAIMER-MD-CONFIG: 2c86f778-e09b-4440-8b15-867914633a10 X-Patchwork-Delegate: kuba@kernel.org xfrm assumed to always have a full socket at skb->sk. This is not always true, so fix it by converting to a full socket before it is used. Signed-off-by: Steffen Klassert Reviewed-by: Eric Dumazet --- net/ipv4/esp4.c | 2 +- net/ipv6/esp6.c | 2 +- net/ipv6/xfrm6_output.c | 4 ++-- net/xfrm/xfrm_interface_core.c | 2 +- net/xfrm/xfrm_output.c | 7 ++++--- net/xfrm/xfrm_policy.c | 2 +- 6 files changed, 10 insertions(+), 9 deletions(-) diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index f3281312eb5e..8cf5f6634775 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -279,7 +279,7 @@ static void esp_output_done(void *data, int err) x->encap && x->encap->encap_type == TCP_ENCAP_ESPINTCP) esp_output_tail_tcp(x, skb); else - xfrm_output_resume(skb->sk, skb, err); + xfrm_output_resume(skb_to_full_sk(skb), skb, err); } } diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c index b2400c226a32..fad4d7c9fa50 100644 --- a/net/ipv6/esp6.c +++ b/net/ipv6/esp6.c @@ -315,7 +315,7 @@ static void esp_output_done(void *data, int err) x->encap && x->encap->encap_type == TCP_ENCAP_ESPINTCP) esp_output_tail_tcp(x, skb); else - xfrm_output_resume(skb->sk, skb, err); + xfrm_output_resume(skb_to_full_sk(skb), skb, err); } } diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c index 5f7b1fdbffe6..b3d5d1f266ee 100644 --- a/net/ipv6/xfrm6_output.c +++ b/net/ipv6/xfrm6_output.c @@ -82,14 +82,14 @@ static int __xfrm6_output(struct net *net, struct sock *sk, struct sk_buff *skb) toobig = skb->len > mtu && !skb_is_gso(skb); - if (toobig && xfrm6_local_dontfrag(skb->sk)) { + if (toobig && xfrm6_local_dontfrag(sk)) { xfrm6_local_rxpmtu(skb, mtu); kfree_skb(skb); return -EMSGSIZE; } else if (toobig && xfrm6_noneed_fragment(skb)) { skb->ignore_df = 1; goto skip_frag; - } else if (!skb->ignore_df && toobig && skb->sk) { + } else if (!skb->ignore_df && toobig && sk) { xfrm_local_error(skb, mtu); kfree_skb(skb); return -EMSGSIZE; diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c index 98f1e2b67c76..c397eb99d867 100644 --- a/net/xfrm/xfrm_interface_core.c +++ b/net/xfrm/xfrm_interface_core.c @@ -506,7 +506,7 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl) skb_dst_set(skb, dst); skb->dev = tdev; - err = dst_output(xi->net, skb->sk, skb); + err = dst_output(xi->net, skb_to_full_sk(skb), skb); if (net_xmit_eval(err) == 0) { dev_sw_netstats_tx_add(dev, 1, length); } else { diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c index e5722c95b8bb..1fb4dc0a76e1 100644 --- a/net/xfrm/xfrm_output.c +++ b/net/xfrm/xfrm_output.c @@ -796,7 +796,7 @@ static int xfrm4_tunnel_check_size(struct sk_buff *skb) !skb_gso_validate_network_len(skb, ip_skb_dst_mtu(skb->sk, skb)))) { skb->protocol = htons(ETH_P_IP); - if (skb->sk) + if (skb->sk && sk_fullsock(skb->sk)) xfrm_local_error(skb, mtu); else icmp_send(skb, ICMP_DEST_UNREACH, @@ -832,6 +832,7 @@ static int xfrm6_tunnel_check_size(struct sk_buff *skb) { int mtu, ret = 0; struct dst_entry *dst = skb_dst(skb); + struct sock *sk = skb_to_full_sk(skb); if (skb->ignore_df) goto out; @@ -846,9 +847,9 @@ static int xfrm6_tunnel_check_size(struct sk_buff *skb) skb->dev = dst->dev; skb->protocol = htons(ETH_P_IPV6); - if (xfrm6_local_dontfrag(skb->sk)) + if (xfrm6_local_dontfrag(sk)) ipv6_stub->xfrm6_local_rxpmtu(skb, mtu); - else if (skb->sk) + else if (sk) xfrm_local_error(skb, mtu); else icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu); diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 4408c11c0835..c27da1fd070e 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -2959,7 +2959,7 @@ static void xfrm_policy_queue_process(struct timer_list *t) skb_dst_drop(skb); skb_dst_set(skb, dst); - dst_output(net, skb->sk, skb); + dst_output(net, skb_to_full_sk(skb), skb); } out: From patchwork Mon Jan 27 06:07:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steffen Klassert X-Patchwork-Id: 13950998 X-Patchwork-Delegate: kuba@kernel.org Received: from a.mx.secunet.com (a.mx.secunet.com [62.96.220.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D78491FCCE6 for ; Mon, 27 Jan 2025 06:08:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.96.220.36 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737958089; cv=none; b=VRibamm9ErK2OcMhyqKG3V7d7EDNtyD/CDZEz92RBiXt0NWh5Rm0IrvgHxMuLC4fWmXJXpO0BingxdbxTDgmO1UlIL+1Rj18+3pVuW2UhYI5/FarwvZLBFG8yTaj2pWek8KwMN9Sb6O0Bsvcs7ULlEX8KTdgvsc0aU6cYcJ6pY8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737958089; c=relaxed/simple; bh=QEgd7BX7yzxVM380ufMNe3+nh8qKdJ/bIRiczmmNff0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=GKtqYUbjbsqlZPLsmLbP7B6ebS+pBTb48nSrK+w0HcSZsTbYgiDjPs/gOw6WwQqjIo/uN4djmM5hFTQAXiw1yNCu2fkRlHEjcwzCjq4pVkVuCn8+YdrhGF46X02R3bXArCUPLOHjFLSzBQfD+7U2eip20o3nU8poqf49SVKmshU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com; spf=pass smtp.mailfrom=secunet.com; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b=j78iKRCy; arc=none smtp.client-ip=62.96.220.36 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=secunet.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=secunet.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=secunet.com header.i=@secunet.com header.b="j78iKRCy" Received: from localhost (localhost [127.0.0.1]) by a.mx.secunet.com (Postfix) with ESMTP id A8781207A4; Mon, 27 Jan 2025 07:08:03 +0100 (CET) X-Virus-Scanned: by secunet Received: from a.mx.secunet.com ([127.0.0.1]) by localhost (a.mx.secunet.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FYtzDEQ4eXtQ; Mon, 27 Jan 2025 07:08:02 +0100 (CET) Received: from cas-essen-02.secunet.de (rl2.secunet.de [10.53.40.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by a.mx.secunet.com (Postfix) with ESMTPS id 346DB207BB; Mon, 27 Jan 2025 07:08:01 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.11.0 a.mx.secunet.com 346DB207BB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=secunet.com; s=202301; t=1737958081; bh=k6g73wc7M5myNhw7yNvG5xCuUlAikR2eFihMZIMtQ1I=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=j78iKRCyELmNr73pOi3vprsvp+cVE5gcx2tgljTj+pZqhsiEUGOD4dWwp08IMBMoe 9IJwfnHYh3KvADlLimw7Q/+LoZjQAxBAeWeEy6gm/W0DhKVc7IrRy6Ph2jyIVvUe3f q9MiQV2HsaWhICVwbNPDcKBJhlUsUzIijXsUuG9izeyo7Q114iRPKUEL95vWlQwx+z 6stNL7W4NU84uSJzJpVSlWo6e2AZuwvq8JmqojK1o40zF3YF+zmAtDXJav//+vmzIO +6J2xD5NcLHoDXcwxzdKIzpsbSIVo62eoTr373D5SJQZXgi8Tq9MDfv9nkQbrlAy5y prZWsmruRgE6g== Received: from mbx-essen-02.secunet.de (10.53.40.198) by cas-essen-02.secunet.de (10.53.40.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 27 Jan 2025 07:08:01 +0100 Received: from gauss2.secunet.de (10.182.7.193) by mbx-essen-02.secunet.de (10.53.40.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 27 Jan 2025 07:08:00 +0100 Received: by gauss2.secunet.de (Postfix, from userid 1000) id D8DF231841AA; Mon, 27 Jan 2025 07:07:59 +0100 (CET) From: Steffen Klassert To: David Miller , Jakub Kicinski CC: Herbert Xu , Steffen Klassert , Subject: [PATCH 5/5] xfrm: Don't disable preemption while looking up cache state. Date: Mon, 27 Jan 2025 07:07:57 +0100 Message-ID: <20250127060757.3946314-6-steffen.klassert@secunet.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250127060757.3946314-1-steffen.klassert@secunet.com> References: <20250127060757.3946314-1-steffen.klassert@secunet.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: cas-essen-02.secunet.de (10.53.40.202) To mbx-essen-02.secunet.de (10.53.40.198) X-EXCLAIMER-MD-CONFIG: 2c86f778-e09b-4440-8b15-867914633a10 X-Patchwork-Delegate: kuba@kernel.org From: Sebastian Sewior For the state cache lookup xfrm_input_state_lookup() first disables preemption, to remain on the CPU and then retrieves a per-CPU pointer. Within the preempt-disable section it also acquires netns_xfrm::xfrm_state_lock, a spinlock_t. This lock must not be acquired with explicit disabled preemption (such as by get_cpu()) because this lock becomes a sleeping lock on PREEMPT_RT. To remain on the same CPU is just an optimisation for the CPU local lookup. The actual modification of the per-CPU variable happens with netns_xfrm::xfrm_state_lock acquired. Remove get_cpu() and use the state_cache_input on the current CPU. Reported-by: Alexei Starovoitov Closes: https://lore.kernel.org/all/CAADnVQKkCLaj=roayH=Mjiiqz_svdf1tsC3OE4EC0E=mAD+L1A@mail.gmail.com/ Fixes: 81a331a0e72dd ("xfrm: Add an inbound percpu state cache.") Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_state.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 1781728ca428..711e816fc404 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1150,9 +1150,8 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, struct xfrm_hash_state_ptrs state_ptrs; struct hlist_head *state_cache_input; struct xfrm_state *x = NULL; - int cpu = get_cpu(); - state_cache_input = per_cpu_ptr(net->xfrm.state_cache_input, cpu); + state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input); rcu_read_lock(); hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) { @@ -1186,7 +1185,6 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, out: rcu_read_unlock(); - put_cpu(); return x; } EXPORT_SYMBOL(xfrm_input_state_lookup);