From patchwork Mon Jun 26 15:08:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenz Bauer X-Patchwork-Id: 13293038 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E9C9EB64DC for ; Mon, 26 Jun 2023 15:09:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230303AbjFZPJT (ORCPT ); Mon, 26 Jun 2023 11:09:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229488AbjFZPJR (ORCPT ); Mon, 26 Jun 2023 11:09:17 -0400 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11A18130 for ; Mon, 26 Jun 2023 08:09:15 -0700 (PDT) Received: by mail-wr1-x42e.google.com with SMTP id ffacd0b85a97d-312863a983fso3939129f8f.2 for ; Mon, 26 Jun 2023 08:09:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1687792153; x=1690384153; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=aFcQRDCnPx+ynCpnF2uFrs+TLwer8UbpK1ON+IZqGjg=; b=Bdh0fSQLic8GK24HDf3N/rYsDXp0g5q1AXTN6FF2vyHtYHXt1iEdesUewz6eGR40Ed 9QFHTXPmVxvxWjLFJO4riKrSrGvBsSy2Rw4MrLgYCTc4GuDKvtSZ6Jr2jYH44BwP0ewD uuQ3mh+NJQGkk5OyGdPSCiECAmJMPo8d7YoAn5GsINQN0VYx0iU7dxh2Qwtb8vP6LCQh VZ195sjvz5hSuhSGswffFJdVCftH2DnZi/geebX3tQSNCyZvJKPwaaqfGWBjf3TZ2HUt JLbVUSEEJSj0wLm3Ww9IwuENK+ch0qGYbRN3NC7ULa2rmv8kfwWEIdnxDcYrkXzx416g AmTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687792153; x=1690384153; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aFcQRDCnPx+ynCpnF2uFrs+TLwer8UbpK1ON+IZqGjg=; b=LD+ipUyUI29VbSieE2gucw3yba9fz8lacRE5sT7C3TJfuXx9T+DTfdetnv0QA6k+jm l2YIzOqR8CVjfDk44I/olnmb6V6vTCPIN6gHNTYQHInHx5eDvvL3ECQlfU8v3GCZntoA 7yGiCAb8nsVvuod3cIWkQ8r0KYv4+90G+qwkI/0DRnk02bVZZfmYnRYX6w3CEjECccr5 KSFJVSyHyn9zdvC6/wH06PFWPofXbPIyxFT10Az2V45w3rcsG6soRByXRlZUlpvPqH90 ICfUkX3OoxoAMfwjyUmaR6KaT5oRbc9cGzmiM4PYMRpIcrWa9H7GR2ASaDT3WJ6yp5XN g8KQ== X-Gm-Message-State: AC+VfDyIahTWzLizzl3/0tDCeNE0C4W2UJvVBuWj4lg3cftAmuZgIoRp F/WLxG1X9rI6OuchKPB2w0oiEA== X-Google-Smtp-Source: ACHHUZ4tGG8fHK6H7JRwd23ypMKZA5H9umKPvcqfyKGjq7sH0yQK3kam6nswRSMg68RE5TI41cFW6A== X-Received: by 2002:adf:ec03:0:b0:306:36ef:2e3b with SMTP id x3-20020adfec03000000b0030636ef2e3bmr22381550wrn.70.1687792153406; Mon, 26 Jun 2023 08:09:13 -0700 (PDT) Received: from [192.168.1.193] (f.c.7.0.0.0.0.0.0.0.0.0.0.0.0.0.f.f.6.2.a.5.a.7.0.b.8.0.1.0.0.2.ip6.arpa. [2001:8b0:7a5a:26ff::7cf]) by smtp.gmail.com with ESMTPSA id v1-20020adfe281000000b00311299df211sm7668710wri.77.2023.06.26.08.09.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Jun 2023 08:09:13 -0700 (PDT) From: Lorenz Bauer Date: Mon, 26 Jun 2023 16:08:58 +0100 Subject: [PATCH bpf-next v3 1/7] udp: re-score reuseport groups when connected sockets are present MIME-Version: 1.0 Message-Id: <20230613-so-reuseport-v3-1-907b4cbb7b99@isovalent.com> References: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> In-Reply-To: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Willem de Bruijn , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Joe Stringer , Mykola Lysenko , Shuah Khan , Kuniyuki Iwashima Cc: Hemanth Malla , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, Lorenz Bauer X-Mailer: b4 0.12.2 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Contrary to TCP, UDP reuseport groups can contain TCP_ESTABLISHED sockets. To support these properly we remember whether a group has a connected socket and skip the fast reuseport early-return. In effect we continue scoring all reuseport sockets and then choose the one with the highest score. The current code fails to re-calculate the score for the result of lookup_reuseport. According to Kuniyuki Iwashima: 1) SO_INCOMING_CPU is set -> selected sk might have +1 score 2) BPF prog returns ESTABLISHED and/or SO_INCOMING_CPU sk -> selected sk will have more than 8 Using the old score could trigger more lookups depending on the order that sockets are created. sk -> sk (SO_INCOMING_CPU) -> sk (ESTABLISHED) | | `-> select the next SO_INCOMING_CPU sk | `-> select itself (We should save this lookup) Fixes: efc6b6f6c311 ("udp: Improve load balancing for SO_REUSEPORT.") Signed-off-by: Lorenz Bauer Reviewed-by: Kuniyuki Iwashima --- net/ipv4/udp.c | 20 +++++++++++++++----- net/ipv6/udp.c | 19 ++++++++++++++----- 2 files changed, 29 insertions(+), 10 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index fd3dae081f3a..5ef478d2c408 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -450,14 +450,24 @@ static struct sock *udp4_lib_lookup2(struct net *net, score = compute_score(sk, net, saddr, sport, daddr, hnum, dif, sdif); if (score > badness) { - result = lookup_reuseport(net, sk, skb, - saddr, sport, daddr, hnum); + badness = score; + result = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum); + if (!result) { + result = sk; + continue; + } + /* Fall back to scoring if group has connections */ - if (result && !reuseport_has_conns(sk)) + if (!reuseport_has_conns(sk)) return result; - result = result ? : sk; - badness = score; + /* Reuseport logic returned an error, keep original score. */ + if (IS_ERR(result)) + continue; + + badness = compute_score(result, net, saddr, sport, + daddr, hnum, dif, sdif); + } } return result; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index e5a337e6b970..8b3cb1d7da7c 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -193,14 +193,23 @@ static struct sock *udp6_lib_lookup2(struct net *net, score = compute_score(sk, net, saddr, sport, daddr, hnum, dif, sdif); if (score > badness) { - result = lookup_reuseport(net, sk, skb, - saddr, sport, daddr, hnum); + badness = score; + result = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum); + if (!result) { + result = sk; + continue; + } + /* Fall back to scoring if group has connections */ - if (result && !reuseport_has_conns(sk)) + if (!reuseport_has_conns(sk)) return result; - result = result ? : sk; - badness = score; + /* Reuseport logic returned an error, keep original score. */ + if (IS_ERR(result)) + continue; + + badness = compute_score(sk, net, saddr, sport, + daddr, hnum, dif, sdif); } } return result; From patchwork Mon Jun 26 15:08:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenz Bauer X-Patchwork-Id: 13293039 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53AF7EB64DD for ; Mon, 26 Jun 2023 15:09:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230411AbjFZPJU (ORCPT ); Mon, 26 Jun 2023 11:09:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230384AbjFZPJT (ORCPT ); Mon, 26 Jun 2023 11:09:19 -0400 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB7FC10CC for ; Mon, 26 Jun 2023 08:09:15 -0700 (PDT) Received: by mail-wr1-x436.google.com with SMTP id ffacd0b85a97d-3112f256941so3060719f8f.1 for ; Mon, 26 Jun 2023 08:09:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1687792154; x=1690384154; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=N3EvG4bx9yG1/mYfSmQ2wVSFlKC8qVb7fVvvw4JD6B4=; b=jhALMCmtL/JaHrhryTc3eBE0Q2RRwUsvZr7MRZ77W9S5xrj+WqIy0xNGm1B1tpVp8r njuaF7oyof28ORATKR28vNGnJP2Fo3DEE0xfpsAY197I0OHtfODO9RtQEHMu2tpK26Xb 3+Z6Ga05qs5pOgI68gzTafE5fYf3lP5Yk/48Ue8Xil6NFsXD7uwH34EALF96jdHIg7gA yl5TDp80uX+VRKPyKIt/k5W7vjgoK/5/rjNqj16XQ5wyLPZy/0frU5pFGTWdmf070gsX 1qeZb4UfPx5X/6Ax/KXVRKKnmGh2ypkH5gtuW8KHqPUnq0Jdr5RfRgngVEEmigHyhi1z u6lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687792154; x=1690384154; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N3EvG4bx9yG1/mYfSmQ2wVSFlKC8qVb7fVvvw4JD6B4=; b=R/xJDrrJ7UNBjZicxurAEQMjF5yavAXNCwd3cBnuu90bqwKs4p4e2R4JNejXylZRMh BV8d7wfHTq89MLnXtbAeYu4KJotv+D/0dzZ+9GPk45AW69col5yLPFkeZV7Uq799y33d 2iU8YKjUNc7ATCaP4VRN/NtXhFG6xTU1E4hOxOMKGBoGx8zQqa1nDhqUq2WCi+EmeaRc NG0ViQQiT8SfnlU8Iq2EhqBwDhmNUDuZqigFaG7Htn3NpDx9DmUDWqLxVEYnJyrfhBDV EqpF/bAvSDGmiz92O+1XOjxe38ADt9yFhz646cNygVesY1y63stW+q58A1yyGs/ZCMXy 13OQ== X-Gm-Message-State: AC+VfDwvscrkiSECOHh4eRTsgevrlxeoGLLEL0g9ZnNeDH/VY00u/h1P ii8KHe9tQEYxqrxoBkRV2rAnBg== X-Google-Smtp-Source: ACHHUZ7srgTJvKqUF4Y/gpoI30LV+WDQy8dZhgL85NZvb5EJIktLKeX/dMzkykTGLxm947HY0zqwag== X-Received: by 2002:a5d:4e8f:0:b0:30f:bf2e:4b99 with SMTP id e15-20020a5d4e8f000000b0030fbf2e4b99mr19949550wru.49.1687792154414; Mon, 26 Jun 2023 08:09:14 -0700 (PDT) Received: from [192.168.1.193] (f.c.7.0.0.0.0.0.0.0.0.0.0.0.0.0.f.f.6.2.a.5.a.7.0.b.8.0.1.0.0.2.ip6.arpa. [2001:8b0:7a5a:26ff::7cf]) by smtp.gmail.com with ESMTPSA id v1-20020adfe281000000b00311299df211sm7668710wri.77.2023.06.26.08.09.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Jun 2023 08:09:14 -0700 (PDT) From: Lorenz Bauer Date: Mon, 26 Jun 2023 16:08:59 +0100 Subject: [PATCH bpf-next v3 2/7] net: export inet_lookup_reuseport and inet6_lookup_reuseport MIME-Version: 1.0 Message-Id: <20230613-so-reuseport-v3-2-907b4cbb7b99@isovalent.com> References: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> In-Reply-To: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Willem de Bruijn , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Joe Stringer , Mykola Lysenko , Shuah Khan , Kuniyuki Iwashima Cc: Hemanth Malla , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, Lorenz Bauer X-Mailer: b4 0.12.2 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Rename the existing reuseport helpers for IPv4 and IPv6 so that they can be invoked in the follow up commit. Export them so that DCCP which may be built as a module can access them. No change in functionality. Signed-off-by: Lorenz Bauer --- include/net/inet6_hashtables.h | 7 +++++++ include/net/inet_hashtables.h | 5 +++++ net/ipv4/inet_hashtables.c | 15 ++++++++------- net/ipv6/inet6_hashtables.c | 19 ++++++++++--------- 4 files changed, 30 insertions(+), 16 deletions(-) diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index 56f1286583d3..032ddab48f8f 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -48,6 +48,13 @@ struct sock *__inet6_lookup_established(struct net *net, const u16 hnum, const int dif, const int sdif); +struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, + struct sk_buff *skb, int doff, + const struct in6_addr *saddr, + __be16 sport, + const struct in6_addr *daddr, + unsigned short hnum); + struct sock *inet6_lookup_listener(struct net *net, struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 99bd823e97f6..8734f3488f5d 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -379,6 +379,11 @@ struct sock *__inet_lookup_established(struct net *net, const __be32 daddr, const u16 hnum, const int dif, const int sdif); +struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, + struct sk_buff *skb, int doff, + __be32 saddr, __be16 sport, + __be32 daddr, unsigned short hnum); + static inline struct sock * inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo, const __be32 saddr, const __be16 sport, diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index e7391bf310a7..920131e4a65d 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -332,10 +332,10 @@ static inline int compute_score(struct sock *sk, struct net *net, return score; } -static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, - struct sk_buff *skb, int doff, - __be32 saddr, __be16 sport, - __be32 daddr, unsigned short hnum) +struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, + struct sk_buff *skb, int doff, + __be32 saddr, __be16 sport, + __be32 daddr, unsigned short hnum) { struct sock *reuse_sk = NULL; u32 phash; @@ -346,6 +346,7 @@ static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, } return reuse_sk; } +EXPORT_SYMBOL_GPL(inet_lookup_reuseport); /* * Here are some nice properties to exploit here. The BSD API @@ -369,8 +370,8 @@ static struct sock *inet_lhash2_lookup(struct net *net, sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) { score = compute_score(sk, net, hnum, daddr, dif, sdif); if (score > hiscore) { - result = lookup_reuseport(net, sk, skb, doff, - saddr, sport, daddr, hnum); + result = inet_lookup_reuseport(net, sk, skb, doff, + saddr, sport, daddr, hnum); if (result) return result; @@ -399,7 +400,7 @@ static inline struct sock *inet_lookup_run_bpf(struct net *net, if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; - reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); + reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); if (reuse_sk) sk = reuse_sk; return sk; diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index b64b49012655..b7c56867314e 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -111,12 +111,12 @@ static inline int compute_score(struct sock *sk, struct net *net, return score; } -static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, - struct sk_buff *skb, int doff, - const struct in6_addr *saddr, - __be16 sport, - const struct in6_addr *daddr, - unsigned short hnum) +struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, + struct sk_buff *skb, int doff, + const struct in6_addr *saddr, + __be16 sport, + const struct in6_addr *daddr, + unsigned short hnum) { struct sock *reuse_sk = NULL; u32 phash; @@ -127,6 +127,7 @@ static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, } return reuse_sk; } +EXPORT_SYMBOL_GPL(inet6_lookup_reuseport); /* called with rcu_read_lock() */ static struct sock *inet6_lhash2_lookup(struct net *net, @@ -143,8 +144,8 @@ static struct sock *inet6_lhash2_lookup(struct net *net, sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) { score = compute_score(sk, net, hnum, daddr, dif, sdif); if (score > hiscore) { - result = lookup_reuseport(net, sk, skb, doff, - saddr, sport, daddr, hnum); + result = inet6_lookup_reuseport(net, sk, skb, doff, + saddr, sport, daddr, hnum); if (result) return result; @@ -175,7 +176,7 @@ static inline struct sock *inet6_lookup_run_bpf(struct net *net, if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; - reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); + reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); if (reuse_sk) sk = reuse_sk; return sk; From patchwork Mon Jun 26 15:09:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenz Bauer X-Patchwork-Id: 13293040 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACAF2EB64D7 for ; Mon, 26 Jun 2023 15:09:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230384AbjFZPJV (ORCPT ); Mon, 26 Jun 2023 11:09:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230401AbjFZPJU (ORCPT ); Mon, 26 Jun 2023 11:09:20 -0400 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECFDE10D9 for ; Mon, 26 Jun 2023 08:09:16 -0700 (PDT) Received: by mail-wm1-x331.google.com with SMTP id 5b1f17b1804b1-3f9b0f139feso48440715e9.3 for ; Mon, 26 Jun 2023 08:09:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1687792155; x=1690384155; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=72bcXZ+J6bNWWGqskqAX/uK8x0vlcNvpBj1vGVpBWRk=; b=be4XVtzYf2A1sN1F/s8dUL8tiCabRGrnbrCe/IuDdNCjf6fxciARWire0VDt4RNpIP RS9ZIyqDsdcNRwqZKUnEh2cN/Ulbpa22Z5UzxZxkY6bwdzH1rw6nxim+q1Io04wDoQtW dUMSNeZGO7G4zWTcWtP8SudhAWylNU83ijO4aXjVAU48BJgKECGm34nhSlQiWYanbZ4Z UCUGr0rZCg2T9IgsGZp+ujoZkSlJaltDcPzZdR49KyPTCLHsu+F8b46NHRAAj64uuKa5 THPCEoVQxSSqnPTL/EQFnJ3Y5Rf7ziFklzO1I2EfNXHdsAErtw0G/TJHRfTt+0jN48RG u3BA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687792155; x=1690384155; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=72bcXZ+J6bNWWGqskqAX/uK8x0vlcNvpBj1vGVpBWRk=; b=I/jKz/Xdhq0AyO8ZlVNQiXVaqb464Lc1YLBS6Ep4IjXwlGoxHU8xjuW26En+sz0/Gx Wigf6vIYq7Kjp46YcwizNyLF+Kp0qS11pN9xvgUAcYs/ZVogM84Xk0EbJrIl+xb7DQXF Alry1/LsZdVJQMV5DxB/INd6zpGA02U3g6HQpheYur4UEB/xr9LfgTltpthrdnCswuGA JbMAQCogOIfDgmipOWSeAFKiHQqiEIupy7N1rC/8HC1Nf92qnTHFcSGgXCak0LK/PA6t URIbgbC3F2NbK4tVnMh/nfm8MZE3aLK0eG6PlsQ4kqCyXAYC9agaJo2aNsJUzjmYwFJH b7uw== X-Gm-Message-State: AC+VfDyaz6F50zpeuHOsT/CPpf4JbicQeefiNOekprDF61JkRErZf1X+ 9syBY8NglE5vqYLyEeweqTE0lg== X-Google-Smtp-Source: ACHHUZ5b4EmrXT88MOGMrhxoFxujV63CClmWd0z736ZrcTwaNFTnqWyh3zkcMars5E63J05+i7zmVw== X-Received: by 2002:a05:600c:22c6:b0:3f9:b17a:cb61 with SMTP id 6-20020a05600c22c600b003f9b17acb61mr18927028wmg.13.1687792155453; Mon, 26 Jun 2023 08:09:15 -0700 (PDT) Received: from [192.168.1.193] (f.c.7.0.0.0.0.0.0.0.0.0.0.0.0.0.f.f.6.2.a.5.a.7.0.b.8.0.1.0.0.2.ip6.arpa. [2001:8b0:7a5a:26ff::7cf]) by smtp.gmail.com with ESMTPSA id v1-20020adfe281000000b00311299df211sm7668710wri.77.2023.06.26.08.09.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Jun 2023 08:09:15 -0700 (PDT) From: Lorenz Bauer Date: Mon, 26 Jun 2023 16:09:00 +0100 Subject: [PATCH bpf-next v3 3/7] net: document inet[6]_lookup_reuseport sk_state requirements MIME-Version: 1.0 Message-Id: <20230613-so-reuseport-v3-3-907b4cbb7b99@isovalent.com> References: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> In-Reply-To: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Willem de Bruijn , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Joe Stringer , Mykola Lysenko , Shuah Khan , Kuniyuki Iwashima Cc: Hemanth Malla , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, Lorenz Bauer X-Mailer: b4 0.12.2 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org The current implementation was extracted from inet[6]_lhash2_lookup in commit 80b373f74f9e ("inet: Extract helper for selecting socket from reuseport group") and commit 5df6531292b5 ("inet6: Extract helper for selecting socket from reuseport group"). In the original context, sk is always in TCP_LISTEN state and so did not have a separate check. Add documentation that specifies which sk_state are valid to pass to the function. Signed-off-by: Lorenz Bauer --- net/ipv4/inet_hashtables.c | 14 ++++++++++++++ net/ipv6/inet6_hashtables.c | 14 ++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 920131e4a65d..91f9210d4e83 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -332,6 +332,20 @@ static inline int compute_score(struct sock *sk, struct net *net, return score; } +/** + * inet_lookup_reuseport() - execute reuseport logic on AF_INET socket if necessary. + * @net: network namespace. + * @sk: AF_INET socket, must be in TCP_LISTEN state for TCP or TCP_CLOSE for UDP. + * @skb: context for a potential SK_REUSEPORT program. + * @doff: header offset. + * @saddr: source address. + * @sport: source port. + * @daddr: destination address. + * @hnum: destination port in host byte order. + * + * Return: NULL if sk doesn't have SO_REUSEPORT set, otherwise a pointer to + * the selected sock or an error. + */ struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, struct sk_buff *skb, int doff, __be32 saddr, __be16 sport, diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index b7c56867314e..208998694ae3 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -111,6 +111,20 @@ static inline int compute_score(struct sock *sk, struct net *net, return score; } +/** + * inet6_lookup_reuseport() - execute reuseport logic on AF_INET6 socket if necessary. + * @net: network namespace. + * @sk: AF_INET6 socket, must be in TCP_LISTEN state for TCP or TCP_CLOSE for UDP. + * @skb: context for a potential SK_REUSEPORT program. + * @doff: header offset. + * @saddr: source address. + * @sport: source port. + * @daddr: destination address. + * @hnum: destination port in host byte order. + * + * Return: NULL if sk doesn't have SO_REUSEPORT set, otherwise a pointer to + * the selected sock or an error. + */ struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, struct sk_buff *skb, int doff, const struct in6_addr *saddr, From patchwork Mon Jun 26 15:09:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenz Bauer X-Patchwork-Id: 13293042 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 532EAEB64DC for ; Mon, 26 Jun 2023 15:09:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230089AbjFZPJ0 (ORCPT ); Mon, 26 Jun 2023 11:09:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230454AbjFZPJX (ORCPT ); Mon, 26 Jun 2023 11:09:23 -0400 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D67A10CF for ; Mon, 26 Jun 2023 08:09:19 -0700 (PDT) Received: by mail-wm1-x329.google.com with SMTP id 5b1f17b1804b1-3fa99742af9so8810335e9.2 for ; Mon, 26 Jun 2023 08:09:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1687792157; x=1690384157; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=9ct7YsjVQJp9T465ku7hA1Ygl3bo6vDeEBz5TLiSCNk=; b=S9FNU0oCsDht1rIhRtbxX7erLiuoXEOahKKyuE6+Li1hdObRhBSN2KOAQR14YqBxbY Pa8prYbMSU53c3E0ZQ4qXRcr0u+NZzACTNJKEoJvehHsxvXN5SFQDsvkSFSriymdMCwR /Z3XDvcAg8pQyPUE9W3OD6jrLXVjtOpa53Z3cmTrZDlbO2Vb3GiQ1gwDQFnYrwp2rdLQ DRSbZSzW6vbSe15HGr+spDFP0KK6+dAV5INLhkmcm+bi6aNyfVTghG9hN+Wf7U8METez iDzSoZgKSLzhHxkPgX/XijwB4p/pKG7/lEH1CYdh6nc9dQk37xGBw/0rbiCS+Rpz2kQw fKJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687792157; x=1690384157; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9ct7YsjVQJp9T465ku7hA1Ygl3bo6vDeEBz5TLiSCNk=; b=KFvJIecuC+yws8cqE0TIBAPhmwGyTb1mDecKrnCUST94HQLF8xKqX4nvDtj0bUZX0D Ueyl0ZcppKqqaqdSrMOPVKwVwrXjp8uW8URBEVuWiS9+2Bbm2v6gNXEjOZ8bA7nTqWLs 969doMxgeytgPyqSYJwYIW+WftaTlcXUg1Pw54DLprEJSqC82ZbTYooVh7KPPGSW9SRw Dzr8vby7EgEFfoheKHDz5iO1+hfTHNBevKbfTl49ofOJGs8b+fikdliB8hZuAE32qYvu V8Prz9KH90hUiS2ebDS2jnZxKcTuBEqIUXwch9CeGZTDHOwse662TDeG4E/L8D2OjV0u 7sog== X-Gm-Message-State: AC+VfDyYTG/GQINQqjQMmCJHGJiFySQHi8jgcFRsVy9dFSWDJ9FCtxyF pfAse219OJ2I96CgZWduCqjTLA== X-Google-Smtp-Source: ACHHUZ4Lg+llBzTicJWcDyowo9/Gu6e2WdUCvY9zMY5t+fAOExDSAp9gpVrqrFVNxy8Wk9x6ZjI5bA== X-Received: by 2002:a7b:cbd3:0:b0:3f9:b3ec:35d0 with SMTP id n19-20020a7bcbd3000000b003f9b3ec35d0mr13854968wmi.10.1687792156492; Mon, 26 Jun 2023 08:09:16 -0700 (PDT) Received: from [192.168.1.193] (f.c.7.0.0.0.0.0.0.0.0.0.0.0.0.0.f.f.6.2.a.5.a.7.0.b.8.0.1.0.0.2.ip6.arpa. [2001:8b0:7a5a:26ff::7cf]) by smtp.gmail.com with ESMTPSA id v1-20020adfe281000000b00311299df211sm7668710wri.77.2023.06.26.08.09.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Jun 2023 08:09:16 -0700 (PDT) From: Lorenz Bauer Date: Mon, 26 Jun 2023 16:09:01 +0100 Subject: [PATCH bpf-next v3 4/7] net: remove duplicate reuseport_lookup functions MIME-Version: 1.0 Message-Id: <20230613-so-reuseport-v3-4-907b4cbb7b99@isovalent.com> References: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> In-Reply-To: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Willem de Bruijn , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Joe Stringer , Mykola Lysenko , Shuah Khan , Kuniyuki Iwashima Cc: Hemanth Malla , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, Lorenz Bauer X-Mailer: b4 0.12.2 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org There are currently four copies of reuseport_lookup: one each for (TCP, UDP)x(IPv4, IPv6). This forces us to duplicate all callers of those functions as well. This is already the case for sk_lookup helpers (inet,inet6,udp4,udp6)_lookup_run_bpf. There are two differences between the reuseport_lookup helpers: 1. They call different hash functions depending on protocol 2. UDP reuseport_lookup checks that sk_state != TCP_ESTABLISHED Move the check for sk_state into the caller and use the INDIRECT_CALL infrastructure to cut down the helpers to one per IP version. Signed-off-by: Lorenz Bauer --- include/net/inet6_hashtables.h | 11 ++++++++++- include/net/inet_hashtables.h | 15 ++++++++++----- include/net/udp.h | 8 ++++++++ net/ipv4/inet_hashtables.c | 23 ++++++++++++++++------- net/ipv4/udp.c | 34 +++++++++++++--------------------- net/ipv6/inet6_hashtables.c | 17 +++++++++++++---- net/ipv6/udp.c | 41 ++++++++++++++++------------------------- 7 files changed, 86 insertions(+), 63 deletions(-) diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index 032ddab48f8f..49d586454287 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -48,12 +48,21 @@ struct sock *__inet6_lookup_established(struct net *net, const u16 hnum, const int dif, const int sdif); +typedef u32 (*inet6_ehashfn_t)(const struct net *net, + const struct in6_addr *laddr, const u16 lport, + const struct in6_addr *faddr, const __be16 fport); + +u32 inet6_ehashfn(const struct net *net, + const struct in6_addr *laddr, const u16 lport, + const struct in6_addr *faddr, const __be16 fport); + struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, struct sk_buff *skb, int doff, const struct in6_addr *saddr, __be16 sport, const struct in6_addr *daddr, - unsigned short hnum); + unsigned short hnum, + inet6_ehashfn_t ehashfn); struct sock *inet6_lookup_listener(struct net *net, struct inet_hashinfo *hashinfo, diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 8734f3488f5d..51ab6a1a3601 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -379,10 +379,19 @@ struct sock *__inet_lookup_established(struct net *net, const __be32 daddr, const u16 hnum, const int dif, const int sdif); +typedef u32 (*inet_ehashfn_t)(const struct net *net, + const __be32 laddr, const __u16 lport, + const __be32 faddr, const __be16 fport); + +u32 inet_ehashfn(const struct net *net, + const __be32 laddr, const __u16 lport, + const __be32 faddr, const __be16 fport); + struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, struct sk_buff *skb, int doff, __be32 saddr, __be16 sport, - __be32 daddr, unsigned short hnum); + __be32 daddr, unsigned short hnum, + inet_ehashfn_t ehashfn); static inline struct sock * inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo, @@ -453,10 +462,6 @@ static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo, refcounted); } -u32 inet6_ehashfn(const struct net *net, - const struct in6_addr *laddr, const u16 lport, - const struct in6_addr *faddr, const __be16 fport); - static inline void sk_daddr_set(struct sock *sk, __be32 addr) { sk->sk_daddr = addr; /* alias of inet_daddr */ diff --git a/include/net/udp.h b/include/net/udp.h index 5cad44318d71..3b404b429f88 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -317,6 +317,14 @@ struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport); int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor); +INDIRECT_CALLABLE_DECLARE(u32 udp_ehashfn(const struct net *, + const __be32, const __u16, + const __be32, const __be16)); + +INDIRECT_CALLABLE_DECLARE(u32 udp6_ehashfn(const struct net *, + const struct in6_addr *, const u16, + const struct in6_addr *, const __be16)); + /* UDP uses skb->dev_scratch to cache as much information as possible and avoid * possibly multiple cache miss on dequeue() */ diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 91f9210d4e83..0dd768ab22d9 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -28,9 +28,9 @@ #include #include -static u32 inet_ehashfn(const struct net *net, const __be32 laddr, - const __u16 lport, const __be32 faddr, - const __be16 fport) +u32 inet_ehashfn(const struct net *net, const __be32 laddr, + const __u16 lport, const __be32 faddr, + const __be16 fport) { static u32 inet_ehash_secret __read_mostly; @@ -39,6 +39,7 @@ static u32 inet_ehashfn(const struct net *net, const __be32 laddr, return __inet_ehashfn(laddr, lport, faddr, fport, inet_ehash_secret + net_hash_mix(net)); } +EXPORT_SYMBOL_GPL(inet_ehashfn); /* This function handles inet_sock, but also timewait and request sockets * for IPv4/IPv6. @@ -332,6 +333,10 @@ static inline int compute_score(struct sock *sk, struct net *net, return score; } +INDIRECT_CALLABLE_DECLARE(u32 udp_ehashfn(const struct net *, + const __be32, const __u16, + const __be32, const __be16)); + /** * inet_lookup_reuseport() - execute reuseport logic on AF_INET socket if necessary. * @net: network namespace. @@ -342,6 +347,7 @@ static inline int compute_score(struct sock *sk, struct net *net, * @sport: source port. * @daddr: destination address. * @hnum: destination port in host byte order. + * @ehashfn: hash function used to generate the fallback hash. * * Return: NULL if sk doesn't have SO_REUSEPORT set, otherwise a pointer to * the selected sock or an error. @@ -349,13 +355,15 @@ static inline int compute_score(struct sock *sk, struct net *net, struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, struct sk_buff *skb, int doff, __be32 saddr, __be16 sport, - __be32 daddr, unsigned short hnum) + __be32 daddr, unsigned short hnum, + inet_ehashfn_t ehashfn) { struct sock *reuse_sk = NULL; u32 phash; if (sk->sk_reuseport) { - phash = inet_ehashfn(net, daddr, hnum, saddr, sport); + phash = INDIRECT_CALL_2(ehashfn, udp_ehashfn, inet_ehashfn, + net, daddr, hnum, saddr, sport); reuse_sk = reuseport_select_sock(sk, phash, skb, doff); } return reuse_sk; @@ -385,7 +393,7 @@ static struct sock *inet_lhash2_lookup(struct net *net, score = compute_score(sk, net, hnum, daddr, dif, sdif); if (score > hiscore) { result = inet_lookup_reuseport(net, sk, skb, doff, - saddr, sport, daddr, hnum); + saddr, sport, daddr, hnum, inet_ehashfn); if (result) return result; @@ -414,7 +422,8 @@ static inline struct sock *inet_lookup_run_bpf(struct net *net, if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; - reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); + reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum, + inet_ehashfn); if (reuse_sk) sk = reuse_sk; return sk; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 5ef478d2c408..7258edece691 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -405,9 +405,9 @@ static int compute_score(struct sock *sk, struct net *net, return score; } -static u32 udp_ehashfn(const struct net *net, const __be32 laddr, - const __u16 lport, const __be32 faddr, - const __be16 fport) +INDIRECT_CALLABLE_SCOPE +u32 udp_ehashfn(const struct net *net, const __be32 laddr, const __u16 lport, + const __be32 faddr, const __be16 fport) { static u32 udp_ehash_secret __read_mostly; @@ -417,22 +417,6 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr, udp_ehash_secret + net_hash_mix(net)); } -static struct sock *lookup_reuseport(struct net *net, struct sock *sk, - struct sk_buff *skb, - __be32 saddr, __be16 sport, - __be32 daddr, unsigned short hnum) -{ - struct sock *reuse_sk = NULL; - u32 hash; - - if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) { - hash = udp_ehashfn(net, daddr, hnum, saddr, sport); - reuse_sk = reuseport_select_sock(sk, hash, skb, - sizeof(struct udphdr)); - } - return reuse_sk; -} - /* called with rcu_read_lock() */ static struct sock *udp4_lib_lookup2(struct net *net, __be32 saddr, __be16 sport, @@ -451,7 +435,14 @@ static struct sock *udp4_lib_lookup2(struct net *net, daddr, hnum, dif, sdif); if (score > badness) { badness = score; - result = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum); + + if (sk->sk_state == TCP_ESTABLISHED) { + result = sk; + continue; + } + + result = inet_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), + saddr, sport, daddr, hnum, udp_ehashfn); if (!result) { result = sk; continue; @@ -490,7 +481,8 @@ static struct sock *udp4_lookup_run_bpf(struct net *net, if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; - reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum); + reuse_sk = inet_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), + saddr, sport, daddr, hnum, udp_ehashfn); if (reuse_sk) sk = reuse_sk; return sk; diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 208998694ae3..b5de1642bc51 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -39,6 +39,7 @@ u32 inet6_ehashfn(const struct net *net, return __inet6_ehashfn(lhash, lport, fhash, fport, inet6_ehash_secret + net_hash_mix(net)); } +EXPORT_SYMBOL_GPL(inet6_ehashfn); /* * Sockets in TCP_CLOSE state are _always_ taken out of the hash, so @@ -111,6 +112,10 @@ static inline int compute_score(struct sock *sk, struct net *net, return score; } +INDIRECT_CALLABLE_DECLARE(u32 udp6_ehashfn(const struct net *, + const struct in6_addr *, const u16, + const struct in6_addr *, const __be16)); + /** * inet6_lookup_reuseport() - execute reuseport logic on AF_INET6 socket if necessary. * @net: network namespace. @@ -121,6 +126,7 @@ static inline int compute_score(struct sock *sk, struct net *net, * @sport: source port. * @daddr: destination address. * @hnum: destination port in host byte order. + * @ehashfn: hash function used to generate the fallback hash. * * Return: NULL if sk doesn't have SO_REUSEPORT set, otherwise a pointer to * the selected sock or an error. @@ -130,13 +136,15 @@ struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk, const struct in6_addr *saddr, __be16 sport, const struct in6_addr *daddr, - unsigned short hnum) + unsigned short hnum, + inet6_ehashfn_t ehashfn) { struct sock *reuse_sk = NULL; u32 phash; if (sk->sk_reuseport) { - phash = inet6_ehashfn(net, daddr, hnum, saddr, sport); + phash = INDIRECT_CALL_INET(ehashfn, udp6_ehashfn, inet6_ehashfn, + net, daddr, hnum, saddr, sport); reuse_sk = reuseport_select_sock(sk, phash, skb, doff); } return reuse_sk; @@ -159,7 +167,7 @@ static struct sock *inet6_lhash2_lookup(struct net *net, score = compute_score(sk, net, hnum, daddr, dif, sdif); if (score > hiscore) { result = inet6_lookup_reuseport(net, sk, skb, doff, - saddr, sport, daddr, hnum); + saddr, sport, daddr, hnum, inet6_ehashfn); if (result) return result; @@ -190,7 +198,8 @@ static inline struct sock *inet6_lookup_run_bpf(struct net *net, if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; - reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum); + reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, + saddr, sport, daddr, hnum, inet6_ehashfn); if (reuse_sk) sk = reuse_sk; return sk; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 8b3cb1d7da7c..ebac9200b15c 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -70,11 +70,12 @@ int udpv6_init_sock(struct sock *sk) return 0; } -static u32 udp6_ehashfn(const struct net *net, - const struct in6_addr *laddr, - const u16 lport, - const struct in6_addr *faddr, - const __be16 fport) +INDIRECT_CALLABLE_SCOPE +u32 udp6_ehashfn(const struct net *net, + const struct in6_addr *laddr, + const u16 lport, + const struct in6_addr *faddr, + const __be16 fport) { static u32 udp6_ehash_secret __read_mostly; static u32 udp_ipv6_hash_secret __read_mostly; @@ -159,24 +160,6 @@ static int compute_score(struct sock *sk, struct net *net, return score; } -static struct sock *lookup_reuseport(struct net *net, struct sock *sk, - struct sk_buff *skb, - const struct in6_addr *saddr, - __be16 sport, - const struct in6_addr *daddr, - unsigned int hnum) -{ - struct sock *reuse_sk = NULL; - u32 hash; - - if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) { - hash = udp6_ehashfn(net, daddr, hnum, saddr, sport); - reuse_sk = reuseport_select_sock(sk, hash, skb, - sizeof(struct udphdr)); - } - return reuse_sk; -} - /* called with rcu_read_lock() */ static struct sock *udp6_lib_lookup2(struct net *net, const struct in6_addr *saddr, __be16 sport, @@ -194,7 +177,14 @@ static struct sock *udp6_lib_lookup2(struct net *net, daddr, hnum, dif, sdif); if (score > badness) { badness = score; - result = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum); + + if (sk->sk_state == TCP_ESTABLISHED) { + result = sk; + continue; + } + + result = inet6_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), + saddr, sport, daddr, hnum, udp6_ehashfn); if (!result) { result = sk; continue; @@ -234,7 +224,8 @@ static inline struct sock *udp6_lookup_run_bpf(struct net *net, if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; - reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum); + reuse_sk = inet6_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), + saddr, sport, daddr, hnum, udp6_ehashfn); if (reuse_sk) sk = reuse_sk; return sk; From patchwork Mon Jun 26 15:09:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenz Bauer X-Patchwork-Id: 13293041 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E3BAEB64D7 for ; Mon, 26 Jun 2023 15:09:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230482AbjFZPJZ (ORCPT ); Mon, 26 Jun 2023 11:09:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230440AbjFZPJW (ORCPT ); Mon, 26 Jun 2023 11:09:22 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27F0E10C2 for ; Mon, 26 Jun 2023 08:09:19 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-313e1c27476so1811854f8f.1 for ; Mon, 26 Jun 2023 08:09:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1687792157; x=1690384157; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=r5hvZWbrd4Kzqcl+2Bmi5kP9FC3uv1kKB+RjaEZXL14=; b=aQfj69jbsyUHrywXTqq37w0hjak+YvblQhu8IG5xe+bedOWlweciFVBqgx+tpg9PiM XP/56gzcKSrMIrdyfSRWxHMYZ/ZcDgMxMY+lfv+7D91teDqnABbqloMA2aHdByYEAikf prINqqXV+QRF1f9aZWBEW9IYb4myj1XnzdSDXVv1QC4DjumGrzEQ22MfBm/4xl0O4uLJ So0E2It6ySURI0QrhTOQJibDkynLl0ySfs1/Wx3UKfH96Cp6PScpj66x/ufb0v2oFO8H XAyIrfpeo86waez8vWLfgTfT1myXRIWvV2q5efkwxUHzBeEqx6IZ48nxOqzn6wy7SV4X rHQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687792157; x=1690384157; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r5hvZWbrd4Kzqcl+2Bmi5kP9FC3uv1kKB+RjaEZXL14=; b=W3uPOjAi0pbxsMzXh6jBX9wqKptYBgueECWh38Tc0y58aFBdKC9la4h02feEoBqJUS Kua/FEbKV0y3RzuVjyzHJA7yf/YKEvjQbQHsaRSHR6081wvcQh0RIEyP5hVI/MgdRjUM 3XxOpH77jZtub6KqCPGQHToVNtQkSU+SzZpl3cDd5uT4xV69+hJudMHm8rWZkO9+sZYn iwnXJkS83SGFCO24BOzWBFAf2UPPWQtYj39sNzW/CpQ6CmxXT4W1srZKouEW+XPSwAe8 koMp46Ytb3zT+Aj5hWxhLkoed01XEuuhnanGRj1efNvc4mZx2HU7R0HuMXiz8mQYUXEU pExw== X-Gm-Message-State: AC+VfDwBVbcbjkbKKD6N7Fyaprl9Eq2Yvdd6anHL+movAsqxVxGyfH98 PvbGe3IgO+K1wDT4EiLOJ6T2Dg== X-Google-Smtp-Source: ACHHUZ4ffZqPDmSHPF4zFCELjwWOxBcCOS8KxRDBBqlnTonCFtQ8Pi+XzTFWY17UcD+q73SSEWU2Yg== X-Received: by 2002:adf:f443:0:b0:313:f0a7:133a with SMTP id f3-20020adff443000000b00313f0a7133amr3416428wrp.13.1687792157677; Mon, 26 Jun 2023 08:09:17 -0700 (PDT) Received: from [192.168.1.193] (f.c.7.0.0.0.0.0.0.0.0.0.0.0.0.0.f.f.6.2.a.5.a.7.0.b.8.0.1.0.0.2.ip6.arpa. [2001:8b0:7a5a:26ff::7cf]) by smtp.gmail.com with ESMTPSA id v1-20020adfe281000000b00311299df211sm7668710wri.77.2023.06.26.08.09.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Jun 2023 08:09:17 -0700 (PDT) From: Lorenz Bauer Date: Mon, 26 Jun 2023 16:09:02 +0100 Subject: [PATCH bpf-next v3 5/7] net: remove duplicate sk_lookup helpers MIME-Version: 1.0 Message-Id: <20230613-so-reuseport-v3-5-907b4cbb7b99@isovalent.com> References: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> In-Reply-To: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Willem de Bruijn , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Joe Stringer , Mykola Lysenko , Shuah Khan , Kuniyuki Iwashima Cc: Hemanth Malla , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, Lorenz Bauer X-Mailer: b4 0.12.2 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Now that inet[6]_lookup_reuseport are parameterised on the ehashfn we can remove two sk_lookup helpers. Signed-off-by: Lorenz Bauer Reviewed-by: Kuniyuki Iwashima --- include/net/inet6_hashtables.h | 9 +++++++++ include/net/inet_hashtables.h | 7 +++++++ net/ipv4/inet_hashtables.c | 26 +++++++++++++------------- net/ipv4/udp.c | 32 +++++--------------------------- net/ipv6/inet6_hashtables.c | 31 ++++++++++++++++--------------- net/ipv6/udp.c | 34 +++++----------------------------- 6 files changed, 55 insertions(+), 84 deletions(-) diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index 49d586454287..4d2a1a3c0be7 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -73,6 +73,15 @@ struct sock *inet6_lookup_listener(struct net *net, const unsigned short hnum, const int dif, const int sdif); +struct sock *inet6_lookup_run_sk_lookup(struct net *net, + int protocol, + struct sk_buff *skb, int doff, + const struct in6_addr *saddr, + const __be16 sport, + const struct in6_addr *daddr, + const u16 hnum, const int dif, + inet6_ehashfn_t ehashfn); + static inline struct sock *__inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 51ab6a1a3601..aa02f1db1f86 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -393,6 +393,13 @@ struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, __be32 daddr, unsigned short hnum, inet_ehashfn_t ehashfn); +struct sock *inet_lookup_run_sk_lookup(struct net *net, + int protocol, + struct sk_buff *skb, int doff, + __be32 saddr, __be16 sport, + __be32 daddr, u16 hnum, const int dif, + inet_ehashfn_t ehashfn); + static inline struct sock * inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo, const __be32 saddr, const __be16 sport, diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 0dd768ab22d9..34e44a096795 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -405,25 +405,23 @@ static struct sock *inet_lhash2_lookup(struct net *net, return result; } -static inline struct sock *inet_lookup_run_bpf(struct net *net, - struct inet_hashinfo *hashinfo, - struct sk_buff *skb, int doff, - __be32 saddr, __be16 sport, - __be32 daddr, u16 hnum, const int dif) +struct sock *inet_lookup_run_sk_lookup(struct net *net, + int protocol, + struct sk_buff *skb, int doff, + __be32 saddr, __be16 sport, + __be32 daddr, u16 hnum, const int dif, + inet_ehashfn_t ehashfn) { struct sock *sk, *reuse_sk; bool no_reuseport; - if (hashinfo != net->ipv4.tcp_death_row.hashinfo) - return NULL; /* only TCP is supported */ - - no_reuseport = bpf_sk_lookup_run_v4(net, IPPROTO_TCP, saddr, sport, + no_reuseport = bpf_sk_lookup_run_v4(net, protocol, saddr, sport, daddr, hnum, dif, &sk); if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum, - inet_ehashfn); + ehashfn); if (reuse_sk) sk = reuse_sk; return sk; @@ -441,9 +439,11 @@ struct sock *__inet_lookup_listener(struct net *net, unsigned int hash2; /* Lookup redirect from BPF */ - if (static_branch_unlikely(&bpf_sk_lookup_enabled)) { - result = inet_lookup_run_bpf(net, hashinfo, skb, doff, - saddr, sport, daddr, hnum, dif); + if (static_branch_unlikely(&bpf_sk_lookup_enabled) && + hashinfo == net->ipv4.tcp_death_row.hashinfo) { + result = inet_lookup_run_sk_lookup(net, IPPROTO_TCP, skb, doff, + saddr, sport, daddr, hnum, dif, + inet_ehashfn); if (result) goto done; } diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 7258edece691..eb79268f216d 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -464,30 +464,6 @@ static struct sock *udp4_lib_lookup2(struct net *net, return result; } -static struct sock *udp4_lookup_run_bpf(struct net *net, - struct udp_table *udptable, - struct sk_buff *skb, - __be32 saddr, __be16 sport, - __be32 daddr, u16 hnum, const int dif) -{ - struct sock *sk, *reuse_sk; - bool no_reuseport; - - if (udptable != net->ipv4.udp_table) - return NULL; /* only UDP is supported */ - - no_reuseport = bpf_sk_lookup_run_v4(net, IPPROTO_UDP, saddr, sport, - daddr, hnum, dif, &sk); - if (no_reuseport || IS_ERR_OR_NULL(sk)) - return sk; - - reuse_sk = inet_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), - saddr, sport, daddr, hnum, udp_ehashfn); - if (reuse_sk) - sk = reuse_sk; - return sk; -} - /* UDP is nearly always wildcards out the wazoo, it makes no sense to try * harder than this. -DaveM */ @@ -512,9 +488,11 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, goto done; /* Lookup redirect from BPF */ - if (static_branch_unlikely(&bpf_sk_lookup_enabled)) { - sk = udp4_lookup_run_bpf(net, udptable, skb, - saddr, sport, daddr, hnum, dif); + if (static_branch_unlikely(&bpf_sk_lookup_enabled) && + udptable == net->ipv4.udp_table) { + sk = inet_lookup_run_sk_lookup(net, IPPROTO_UDP, skb, sizeof(struct udphdr), + saddr, sport, daddr, hnum, dif, + udp_ehashfn); if (sk) { result = sk; goto done; diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index b5de1642bc51..e908aa3cb7ea 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -179,31 +179,30 @@ static struct sock *inet6_lhash2_lookup(struct net *net, return result; } -static inline struct sock *inet6_lookup_run_bpf(struct net *net, - struct inet_hashinfo *hashinfo, - struct sk_buff *skb, int doff, - const struct in6_addr *saddr, - const __be16 sport, - const struct in6_addr *daddr, - const u16 hnum, const int dif) +struct sock *inet6_lookup_run_sk_lookup(struct net *net, + int protocol, + struct sk_buff *skb, int doff, + const struct in6_addr *saddr, + const __be16 sport, + const struct in6_addr *daddr, + const u16 hnum, const int dif, + inet6_ehashfn_t ehashfn) { struct sock *sk, *reuse_sk; bool no_reuseport; - if (hashinfo != net->ipv4.tcp_death_row.hashinfo) - return NULL; /* only TCP is supported */ - - no_reuseport = bpf_sk_lookup_run_v6(net, IPPROTO_TCP, saddr, sport, + no_reuseport = bpf_sk_lookup_run_v6(net, protocol, saddr, sport, daddr, hnum, dif, &sk); if (no_reuseport || IS_ERR_OR_NULL(sk)) return sk; reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, - saddr, sport, daddr, hnum, inet6_ehashfn); + saddr, sport, daddr, hnum, ehashfn); if (reuse_sk) sk = reuse_sk; return sk; } +EXPORT_SYMBOL_GPL(inet6_lookup_run_sk_lookup); struct sock *inet6_lookup_listener(struct net *net, struct inet_hashinfo *hashinfo, @@ -217,9 +216,11 @@ struct sock *inet6_lookup_listener(struct net *net, unsigned int hash2; /* Lookup redirect from BPF */ - if (static_branch_unlikely(&bpf_sk_lookup_enabled)) { - result = inet6_lookup_run_bpf(net, hashinfo, skb, doff, - saddr, sport, daddr, hnum, dif); + if (static_branch_unlikely(&bpf_sk_lookup_enabled) && + hashinfo == net->ipv4.tcp_death_row.hashinfo) { + result = inet6_lookup_run_sk_lookup(net, IPPROTO_TCP, skb, doff, + saddr, sport, daddr, hnum, dif, + inet6_ehashfn); if (result) goto done; } diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index ebac9200b15c..8a6d94cabee0 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -205,32 +205,6 @@ static struct sock *udp6_lib_lookup2(struct net *net, return result; } -static inline struct sock *udp6_lookup_run_bpf(struct net *net, - struct udp_table *udptable, - struct sk_buff *skb, - const struct in6_addr *saddr, - __be16 sport, - const struct in6_addr *daddr, - u16 hnum, const int dif) -{ - struct sock *sk, *reuse_sk; - bool no_reuseport; - - if (udptable != net->ipv4.udp_table) - return NULL; /* only UDP is supported */ - - no_reuseport = bpf_sk_lookup_run_v6(net, IPPROTO_UDP, saddr, sport, - daddr, hnum, dif, &sk); - if (no_reuseport || IS_ERR_OR_NULL(sk)) - return sk; - - reuse_sk = inet6_lookup_reuseport(net, sk, skb, sizeof(struct udphdr), - saddr, sport, daddr, hnum, udp6_ehashfn); - if (reuse_sk) - sk = reuse_sk; - return sk; -} - /* rcu_read_lock() must be held */ struct sock *__udp6_lib_lookup(struct net *net, const struct in6_addr *saddr, __be16 sport, @@ -255,9 +229,11 @@ struct sock *__udp6_lib_lookup(struct net *net, goto done; /* Lookup redirect from BPF */ - if (static_branch_unlikely(&bpf_sk_lookup_enabled)) { - sk = udp6_lookup_run_bpf(net, udptable, skb, - saddr, sport, daddr, hnum, dif); + if (static_branch_unlikely(&bpf_sk_lookup_enabled) && + udptable == net->ipv4.udp_table) { + sk = inet6_lookup_run_sk_lookup(net, IPPROTO_UDP, skb, sizeof(struct udphdr), + saddr, sport, daddr, hnum, dif, + udp6_ehashfn); if (sk) { result = sk; goto done; From patchwork Mon Jun 26 15:09:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenz Bauer X-Patchwork-Id: 13293043 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE973EB64D7 for ; Mon, 26 Jun 2023 15:09:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231154AbjFZPJ3 (ORCPT ); Mon, 26 Jun 2023 11:09:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230458AbjFZPJY (ORCPT ); Mon, 26 Jun 2023 11:09:24 -0400 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C27810C9 for ; Mon, 26 Jun 2023 08:09:20 -0700 (PDT) Received: by mail-wm1-x32d.google.com with SMTP id 5b1f17b1804b1-3f9b4a71623so32289275e9.1 for ; Mon, 26 Jun 2023 08:09:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1687792159; x=1690384159; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=N1mUcNYvpna70Q37QvqW6u9sw5DCteX2GrzJY8MbnLo=; b=Ll4CsGUBHJMBf0AviFFx59ajuikfOQgBOZaEntBpok6OU8KTrSeV3z3+eXcCLWmTl+ 00WhdhAdtKOFGKGkUPmPOomstQtQfxuFnCKxAH9NCOzdOR9lAYdfHnbzoqvsTtE/7dYB J/VjFCEgroUiuOnb7+Q8JPH7YeLJVOGRB9qWDL+HGLe88znQ+0vsmLeMYvzSKcR7ivMN ISuHeTyJdy5UvaOrpDkyUYjVz4jDjAAj00LQooKfk4R7ikVLA6ebXH9zTahQATzWIuxn wEvXXEKJoQ8iOs4rRwNwwFtJq/KS9VF2p/BRxjsp+p2v52SigMP7qSkpI7+1CruJ+x8w SERw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687792159; x=1690384159; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N1mUcNYvpna70Q37QvqW6u9sw5DCteX2GrzJY8MbnLo=; b=dgLdvJpLn91vvniS0/cd+zhXVHr27LuUZ8CF3uJXc93uKsBcFVjazIic5W9z+E9S7n cvAyjSEa1ChJq+lLQm41u+X4k/7F3zJzxJu5uudSwgX4sf+7bCfXImQa1EltlIa9zgvK 1cpT8A8obIwSBs5kxD5PchrGaQ1qOwORVe+o50rcX+W2GBnB2TxcpSgcmcx7sTk+F2RN VAWAtMNRK5i2kCofR16BEicM42y3PV+aHtYH7+MBmGD970TlbCeYZfTKxiduMj2cQUWS VDwti5wHpYKmQCSXzb7ZsKxbXovjJA16I46JYG9S2TgM+ITb6B6p1wv86Cx3Da6C8jT/ SfKw== X-Gm-Message-State: AC+VfDzSMs4YfDIfNnnquqFawrLnpu7X/GOpGExTDRReIOPUAsazpFt2 OZodEzv94WLHkA9dkdbdnbb2XA== X-Google-Smtp-Source: ACHHUZ50GRszsBL5Odf9vxH9v0+l5XowGitRBYVdPQNJ8SaaHvX/tGIJaNHy45/Fhqtn9dbR5OkK1g== X-Received: by 2002:adf:fe44:0:b0:313:e36a:3494 with SMTP id m4-20020adffe44000000b00313e36a3494mr4232225wrs.45.1687792158801; Mon, 26 Jun 2023 08:09:18 -0700 (PDT) Received: from [192.168.1.193] (f.c.7.0.0.0.0.0.0.0.0.0.0.0.0.0.f.f.6.2.a.5.a.7.0.b.8.0.1.0.0.2.ip6.arpa. [2001:8b0:7a5a:26ff::7cf]) by smtp.gmail.com with ESMTPSA id v1-20020adfe281000000b00311299df211sm7668710wri.77.2023.06.26.08.09.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Jun 2023 08:09:18 -0700 (PDT) From: Lorenz Bauer Date: Mon, 26 Jun 2023 16:09:03 +0100 Subject: [PATCH bpf-next v3 6/7] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign MIME-Version: 1.0 Message-Id: <20230613-so-reuseport-v3-6-907b4cbb7b99@isovalent.com> References: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> In-Reply-To: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Willem de Bruijn , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Joe Stringer , Mykola Lysenko , Shuah Khan , Kuniyuki Iwashima Cc: Hemanth Malla , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, Lorenz Bauer , Joe Stringer X-Mailer: b4 0.12.2 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT sockets. This means we can't use the helper to steer traffic to Envoy, which configures SO_REUSEPORT on its sockets. In turn, we're blocked from removing TPROXY from our setup. The reason that bpf_sk_assign refuses such sockets is that the bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead, one of the reuseport sockets is selected by hash. This could cause dispatch to the "wrong" socket: sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup helpers unfortunately. In the tc context, L2 headers are at the start of the skb, while SK_REUSEPORT expects L3 headers instead. Instead, we execute the SK_REUSEPORT program when the assigned socket is pulled out of the skb, further up the stack. This creates some trickiness with regards to refcounting as bpf_sk_assign will put both refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU freed. We can infer that the sk_assigned socket is RCU freed if the reuseport lookup succeeds, but convincing yourself of this fact isn't straight forward. Therefore we defensively check refcounting on the sk_assign sock even though it's probably not required in practice. Fixes: 8e368dc72e86 ("bpf: Fix use of sk->sk_reuseport from sk_assign") Fixes: cf7fbe660f2d ("bpf: Add socket assign support") Co-developed-by: Daniel Borkmann Signed-off-by: Daniel Borkmann Signed-off-by: Lorenz Bauer Cc: Joe Stringer Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/ --- include/net/inet6_hashtables.h | 59 ++++++++++++++++++++++++++++++++++++++---- include/net/inet_hashtables.h | 52 +++++++++++++++++++++++++++++++++++-- include/net/sock.h | 7 +++-- include/uapi/linux/bpf.h | 3 --- net/core/filter.c | 2 -- net/ipv4/udp.c | 8 ++++-- net/ipv6/udp.c | 10 ++++--- tools/include/uapi/linux/bpf.h | 3 --- 8 files changed, 122 insertions(+), 22 deletions(-) diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index 4d2a1a3c0be7..4d300af6ccb6 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -103,6 +103,49 @@ static inline struct sock *__inet6_lookup(struct net *net, daddr, hnum, dif, sdif); } +static inline +struct sock *inet6_steal_sock(struct net *net, struct sk_buff *skb, int doff, + const struct in6_addr *saddr, const __be16 sport, + const struct in6_addr *daddr, const __be16 dport, + bool *refcounted, inet6_ehashfn_t ehashfn) +{ + struct sock *sk, *reuse_sk; + bool prefetched; + + sk = skb_steal_sock(skb, refcounted, &prefetched); + if (!sk) + return NULL; + + if (!prefetched) + return sk; + + if (sk->sk_protocol == IPPROTO_TCP) { + if (sk->sk_state != TCP_LISTEN) + return sk; + } else if (sk->sk_protocol == IPPROTO_UDP) { + if (sk->sk_state != TCP_CLOSE) + return sk; + } else { + return sk; + } + + reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, + saddr, sport, daddr, ntohs(dport), + ehashfn); + if (!reuse_sk || reuse_sk == sk) + return sk; + + /* We've chosen a new reuseport sock which is never refcounted. + * sk might be refcounted however, drop the reference if necessary. + */ + if (*refcounted) { + sock_put(sk); + *refcounted = false; + } + + return reuse_sk; +} + static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, const __be16 sport, @@ -110,14 +153,20 @@ static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo, int iif, int sdif, bool *refcounted) { - struct sock *sk = skb_steal_sock(skb, refcounted); - + struct net *net = dev_net(skb_dst(skb)->dev); + const struct ipv6hdr *ip6h = ipv6_hdr(skb); + struct sock *sk; + + sk = inet6_steal_sock(net, skb, doff, &ip6h->saddr, sport, &ip6h->daddr, dport, + refcounted, inet6_ehashfn); + if (IS_ERR(sk)) + return NULL; if (sk) return sk; - return __inet6_lookup(dev_net(skb_dst(skb)->dev), hashinfo, skb, - doff, &ipv6_hdr(skb)->saddr, sport, - &ipv6_hdr(skb)->daddr, ntohs(dport), + return __inet6_lookup(net, hashinfo, skb, + doff, &ip6h->saddr, sport, + &ip6h->daddr, ntohs(dport), iif, sdif, refcounted); } diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index aa02f1db1f86..2c405d9df300 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -449,6 +449,49 @@ static inline struct sock *inet_lookup(struct net *net, return sk; } +static inline +struct sock *inet_steal_sock(struct net *net, struct sk_buff *skb, int doff, + const __be32 saddr, const __be16 sport, + const __be32 daddr, const __be16 dport, + bool *refcounted, inet_ehashfn_t ehashfn) +{ + struct sock *sk, *reuse_sk; + bool prefetched; + + sk = skb_steal_sock(skb, refcounted, &prefetched); + if (!sk) + return NULL; + + if (!prefetched) + return sk; + + if (sk->sk_protocol == IPPROTO_TCP) { + if (sk->sk_state != TCP_LISTEN) + return sk; + } else if (sk->sk_protocol == IPPROTO_UDP) { + if (sk->sk_state != TCP_CLOSE) + return sk; + } else { + return sk; + } + + reuse_sk = inet_lookup_reuseport(net, sk, skb, doff, + saddr, sport, daddr, ntohs(dport), + ehashfn); + if (!reuse_sk || reuse_sk == sk) + return sk; + + /* We've chosen a new reuseport sock which is never refcounted. + * sk might be refcounted however, drop the reference if necessary. + */ + if (*refcounted) { + sock_put(sk); + *refcounted = false; + } + + return reuse_sk; +} + static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo, struct sk_buff *skb, int doff, @@ -457,13 +500,18 @@ static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo, const int sdif, bool *refcounted) { - struct sock *sk = skb_steal_sock(skb, refcounted); + struct net *net = dev_net(skb_dst(skb)->dev); const struct iphdr *iph = ip_hdr(skb); + struct sock *sk; + sk = inet_steal_sock(net, skb, doff, iph->saddr, sport, iph->daddr, dport, + refcounted, inet_ehashfn); + if (IS_ERR(sk)) + return NULL; if (sk) return sk; - return __inet_lookup(dev_net(skb_dst(skb)->dev), hashinfo, skb, + return __inet_lookup(net, hashinfo, skb, doff, iph->saddr, sport, iph->daddr, dport, inet_iif(skb), sdif, refcounted); diff --git a/include/net/sock.h b/include/net/sock.h index 656ea89f60ff..5645570c2a64 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2806,20 +2806,23 @@ sk_is_refcounted(struct sock *sk) * skb_steal_sock - steal a socket from an sk_buff * @skb: sk_buff to steal the socket from * @refcounted: is set to true if the socket is reference-counted + * @prefetched: is set to true if the socket was assigned from bpf */ static inline struct sock * -skb_steal_sock(struct sk_buff *skb, bool *refcounted) +skb_steal_sock(struct sk_buff *skb, bool *refcounted, bool *prefetched) { if (skb->sk) { struct sock *sk = skb->sk; *refcounted = true; - if (skb_sk_is_prefetched(skb)) + *prefetched = skb_sk_is_prefetched(skb); + if (*prefetched) *refcounted = sk_is_refcounted(sk); skb->destructor = NULL; skb->sk = NULL; return sk; } + *prefetched = false; *refcounted = false; return NULL; } diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index a7b5e91dd768..d6fb6f43b0f3 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4158,9 +4158,6 @@ union bpf_attr { * **-EOPNOTSUPP** if the operation is not supported, for example * a call from outside of TC ingress. * - * **-ESOCKTNOSUPPORT** if the socket type is not supported - * (reuseport). - * * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags) * Description * Helper is overloaded depending on BPF program type. This diff --git a/net/core/filter.c b/net/core/filter.c index 428df050d021..d4be0a1d754c 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -7278,8 +7278,6 @@ BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags) return -EOPNOTSUPP; if (unlikely(dev_net(skb->dev) != sock_net(sk))) return -ENETUNREACH; - if (unlikely(sk_fullsock(sk) && sk->sk_reuseport)) - return -ESOCKTNOSUPPORT; if (sk_is_refcounted(sk) && unlikely(!refcount_inc_not_zero(&sk->sk_refcnt))) return -ENOENT; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index eb79268f216d..b256f1f73b4d 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2388,7 +2388,11 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, if (udp4_csum_init(skb, uh, proto)) goto csum_error; - sk = skb_steal_sock(skb, &refcounted); + sk = inet_steal_sock(net, skb, sizeof(struct udphdr), saddr, uh->source, daddr, uh->dest, + &refcounted, udp_ehashfn); + if (IS_ERR(sk)) + goto no_sk; + if (sk) { struct dst_entry *dst = skb_dst(skb); int ret; @@ -2409,7 +2413,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable); if (sk) return udp_unicast_rcv_skb(sk, skb, uh); - +no_sk: if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) goto drop; nf_reset_ct(skb); diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 8a6d94cabee0..2d4c05bc322a 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -923,9 +923,9 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED; const struct in6_addr *saddr, *daddr; struct net *net = dev_net(skb->dev); + bool refcounted; struct udphdr *uh; struct sock *sk; - bool refcounted; u32 ulen = 0; if (!pskb_may_pull(skb, sizeof(struct udphdr))) @@ -962,7 +962,11 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, goto csum_error; /* Check if the socket is already available, e.g. due to early demux */ - sk = skb_steal_sock(skb, &refcounted); + sk = inet6_steal_sock(net, skb, sizeof(struct udphdr), saddr, uh->source, daddr, uh->dest, + &refcounted, udp6_ehashfn); + if (IS_ERR(sk)) + goto no_sk; + if (sk) { struct dst_entry *dst = skb_dst(skb); int ret; @@ -996,7 +1000,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, goto report_csum_error; return udp6_unicast_rcv_skb(sk, skb, uh); } - +no_sk: reason = SKB_DROP_REASON_NO_SOCKET; if (!uh->check) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index a7b5e91dd768..d6fb6f43b0f3 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -4158,9 +4158,6 @@ union bpf_attr { * **-EOPNOTSUPP** if the operation is not supported, for example * a call from outside of TC ingress. * - * **-ESOCKTNOSUPPORT** if the socket type is not supported - * (reuseport). - * * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags) * Description * Helper is overloaded depending on BPF program type. This From patchwork Mon Jun 26 15:09:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lorenz Bauer X-Patchwork-Id: 13293044 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD376EB64DC for ; Mon, 26 Jun 2023 15:09:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230355AbjFZPJa (ORCPT ); Mon, 26 Jun 2023 11:09:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230491AbjFZPJZ (ORCPT ); Mon, 26 Jun 2023 11:09:25 -0400 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BD6D10D7 for ; Mon, 26 Jun 2023 08:09:21 -0700 (PDT) Received: by mail-wr1-x430.google.com with SMTP id ffacd0b85a97d-313f18f5295so1347448f8f.3 for ; Mon, 26 Jun 2023 08:09:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1687792160; x=1690384160; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=Rrd+oVQwBiLqnLnpd+kxgQqY6NXFWO5dlCZVFiDqXJs=; b=BFsL6sINMVaWePlHs/jt/2afGXwYpZilnvJeihcnRJVUofXwASokKpt/6cHPYTQK09 Di0UgJOyrXrGwdGAQ9MdVEKKJ4XFg0XjY4hcLpyICrubwFZLw9WP7QKlJneJct8tzVN1 i3HPDBtlw8STN5WcRo2PwmNLecbyFlhSgE1Z2HcU2yozqYiN1tOWqnsWx8fob5nnLp3E X4zjYeJfcQy1Bt2poJh+o2fwR/mXl5ZAxprAs9BoXkwiw00MrIyvsk/7qDhyYGP+UKEY B9BTEerUsvU2isuMPfHqUm5FZ/BFROT3BazKGmIhTNJQSAlGbZ6nsh4ZI5lzLvTEtpg9 Aj6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687792160; x=1690384160; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Rrd+oVQwBiLqnLnpd+kxgQqY6NXFWO5dlCZVFiDqXJs=; b=JFcw5yiJPhpg1o5bWFODhc8yEyWvwqDN4JNL03LxRFLWrmVwuF6bOHz2AU3Hb9+6pI cnfCpKt7vZ2QqHmrMiB4Ep3ueoseDrMxg5HuSUrj64ypW/S4FWuleG/lCOv+V10DJpyh Wx93VeYzE2eGwXbxzlU+/7y8T9CuQRDbOsvWC89wXzH7o0IURt7l8OuVGf+IT3LOP1/K d2aVJZXFMtROUwCb9uJldiB8Xk5LjiqZL3deBh2mTmK5nj8vCHVjmnK4HaOGygxcyL1z /oGQ1tMNLd6g3CZLMKL064cKfIqz7SGLJix6Hjdm++v6v86yXv0EL1cKsIKPMzHP0plx KP/Q== X-Gm-Message-State: AC+VfDz85rdPMeNM4AWte3IebRZlWNPxEbXTEvreDBK9u0SWgUc1A8uk TwCjZPSDrxx+PMHz593e57fFkA== X-Google-Smtp-Source: ACHHUZ7F6XvEI28XrMPDv+oc/zhq1SZlEnRVR/xGZ41jRyPRs9eky8SiMGrxIH3MLazXZw0UOLxu5Q== X-Received: by 2002:a5d:63c7:0:b0:313:eee3:a4a4 with SMTP id c7-20020a5d63c7000000b00313eee3a4a4mr3392258wrw.64.1687792159839; Mon, 26 Jun 2023 08:09:19 -0700 (PDT) Received: from [192.168.1.193] (f.c.7.0.0.0.0.0.0.0.0.0.0.0.0.0.f.f.6.2.a.5.a.7.0.b.8.0.1.0.0.2.ip6.arpa. [2001:8b0:7a5a:26ff::7cf]) by smtp.gmail.com with ESMTPSA id v1-20020adfe281000000b00311299df211sm7668710wri.77.2023.06.26.08.09.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Jun 2023 08:09:19 -0700 (PDT) From: Lorenz Bauer Date: Mon, 26 Jun 2023 16:09:04 +0100 Subject: [PATCH bpf-next v3 7/7] selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper MIME-Version: 1.0 Message-Id: <20230613-so-reuseport-v3-7-907b4cbb7b99@isovalent.com> References: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> In-Reply-To: <20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Ahern , Willem de Bruijn , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Joe Stringer , Mykola Lysenko , Shuah Khan , Kuniyuki Iwashima Cc: Hemanth Malla , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, Lorenz Bauer , Joe Stringer X-Mailer: b4 0.12.2 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Daniel Borkmann We use two programs to check that the new reuseport logic is executed appropriately. The first is a TC clsact program which bpf_sk_assigns the skb to a UDP or TCP socket created by user space. Since the test communicates via lo we see both directions of packets in the eBPF. Traffic ingressing to the reuseport socket is identified by looking at the destination port. For TCP, we additionally need to make sure that we only assign the initial SYN packets towards our listening socket. The network stack then creates a request socket which transitions to ESTABLISHED after the 3WHS. The second is a reuseport program which shares the fact that it has been executed with user space. This tells us that the delayed lookup mechanism is working. Signed-off-by: Daniel Borkmann Co-developed-by: Lorenz Bauer Signed-off-by: Lorenz Bauer Cc: Joe Stringer --- tools/testing/selftests/bpf/network_helpers.c | 3 + .../selftests/bpf/prog_tests/assign_reuse.c | 197 +++++++++++++++++++++ .../selftests/bpf/progs/test_assign_reuse.c | 142 +++++++++++++++ 3 files changed, 342 insertions(+) diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index a105c0cd008a..8a33bcea97de 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -423,6 +423,9 @@ struct nstoken *open_netns(const char *name) void close_netns(struct nstoken *token) { + if (!token) + return; + ASSERT_OK(setns(token->orig_netns_fd, CLONE_NEWNET), "setns"); close(token->orig_netns_fd); free(token); diff --git a/tools/testing/selftests/bpf/prog_tests/assign_reuse.c b/tools/testing/selftests/bpf/prog_tests/assign_reuse.c new file mode 100644 index 000000000000..622f123410f4 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/assign_reuse.c @@ -0,0 +1,197 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2023 Isovalent */ +#include +#include + +#include +#include + +#include "network_helpers.h" +#include "test_assign_reuse.skel.h" + +#define NS_TEST "assign_reuse" +#define LOOPBACK 1 +#define PORT 4443 + +static int attach_reuseport(int sock_fd, int prog_fd) +{ + return setsockopt(sock_fd, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF, + &prog_fd, sizeof(prog_fd)); +} + +static __u64 cookie(int fd) +{ + __u64 cookie = 0; + socklen_t cookie_len = sizeof(cookie); + int ret; + + ret = getsockopt(fd, SOL_SOCKET, SO_COOKIE, &cookie, &cookie_len); + ASSERT_OK(ret, "cookie"); + ASSERT_GT(cookie, 0, "cookie_invalid"); + + return cookie; +} + +static int echo_test_udp(int fd_sv) +{ + struct sockaddr_storage addr = {}; + socklen_t len = sizeof(addr); + char buff[1] = {}; + int fd_cl = -1, ret; + + fd_cl = connect_to_fd(fd_sv, 100); + ASSERT_GT(fd_cl, 0, "create_client"); + ASSERT_EQ(getsockname(fd_cl, (void *)&addr, &len), 0, "getsockname"); + + ASSERT_EQ(send(fd_cl, buff, sizeof(buff), 0), 1, "send_client"); + + ret = recv(fd_sv, buff, sizeof(buff), 0); + if (ret < 0) + return errno; + + ASSERT_EQ(ret, 1, "recv_server"); + ASSERT_EQ(sendto(fd_sv, buff, sizeof(buff), 0, (void *)&addr, len), 1, "send_server"); + ASSERT_EQ(recv(fd_cl, buff, sizeof(buff), 0), 1, "recv_client"); + close(fd_cl); + return 0; +} + +static int echo_test_tcp(int fd_sv) +{ + char buff[1] = {}; + int fd_cl = -1, fd_sv_cl = -1; + + fd_cl = connect_to_fd(fd_sv, 100); + if (fd_cl < 0) + return errno; + + fd_sv_cl = accept(fd_sv, NULL, NULL); + ASSERT_GE(fd_sv_cl, 0, "accept_fd"); + + ASSERT_EQ(send(fd_cl, buff, sizeof(buff), 0), 1, "send_client"); + ASSERT_EQ(recv(fd_sv_cl, buff, sizeof(buff), 0), 1, "recv_server"); + ASSERT_EQ(send(fd_sv_cl, buff, sizeof(buff), 0), 1, "send_server"); + ASSERT_EQ(recv(fd_cl, buff, sizeof(buff), 0), 1, "recv_client"); + close(fd_sv_cl); + close(fd_cl); + return 0; +} + +void run_assign_reuse(int family, int sotype, const char *ip, __u16 port) +{ + DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook, + .ifindex = LOOPBACK, + .attach_point = BPF_TC_INGRESS, + ); + DECLARE_LIBBPF_OPTS(bpf_tc_opts, tc_opts, + .handle = 1, + .priority = 1, + ); + bool hook_created = false, tc_attached = false; + int ret, fd_tc, fd_accept, fd_drop, fd_map; + int *fd_sv = NULL; + __u64 fd_val; + struct test_assign_reuse *skel; + const int zero = 0; + + skel = test_assign_reuse__open(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + goto cleanup; + + skel->rodata->dest_port = port; + + ret = test_assign_reuse__load(skel); + if (!ASSERT_OK(ret, "skel_load")) + goto cleanup; + + ASSERT_EQ(skel->bss->sk_cookie_seen, 0, "cookie_init"); + + fd_tc = bpf_program__fd(skel->progs.tc_main); + fd_accept = bpf_program__fd(skel->progs.reuse_accept); + fd_drop = bpf_program__fd(skel->progs.reuse_drop); + fd_map = bpf_map__fd(skel->maps.sk_map); + + fd_sv = start_reuseport_server(family, sotype, ip, port, 100, 1); + if (!ASSERT_NEQ(fd_sv, NULL, "start_reuseport_server")) + goto cleanup; + + ret = attach_reuseport(*fd_sv, fd_drop); + if (!ASSERT_OK(ret, "attach_reuseport")) + goto cleanup; + + fd_val = *fd_sv; + ret = bpf_map_update_elem(fd_map, &zero, &fd_val, BPF_NOEXIST); + if (!ASSERT_OK(ret, "bpf_sk_map")) + goto cleanup; + + ret = bpf_tc_hook_create(&tc_hook); + if (ret == 0) + hook_created = true; + ret = ret == -EEXIST ? 0 : ret; + if (!ASSERT_OK(ret, "bpf_tc_hook_create")) + goto cleanup; + + tc_opts.prog_fd = fd_tc; + ret = bpf_tc_attach(&tc_hook, &tc_opts); + if (!ASSERT_OK(ret, "bpf_tc_attach")) + goto cleanup; + tc_attached = true; + + if (sotype == SOCK_STREAM) + ASSERT_EQ(echo_test_tcp(*fd_sv), ECONNREFUSED, "drop_tcp"); + else + ASSERT_EQ(echo_test_udp(*fd_sv), EAGAIN, "drop_udp"); + ASSERT_EQ(skel->bss->reuseport_executed, 1, "program executed once"); + + skel->bss->sk_cookie_seen = 0; + skel->bss->reuseport_executed = 0; + ASSERT_OK(attach_reuseport(*fd_sv, fd_accept), "attach_reuseport(accept)"); + + if (sotype == SOCK_STREAM) + ASSERT_EQ(echo_test_tcp(*fd_sv), 0, "echo_tcp"); + else + ASSERT_EQ(echo_test_udp(*fd_sv), 0, "echo_udp"); + + ASSERT_EQ(skel->bss->sk_cookie_seen, cookie(*fd_sv), + "cookie_mismatch"); + ASSERT_EQ(skel->bss->reuseport_executed, 1, "program executed once"); +cleanup: + if (tc_attached) { + tc_opts.flags = tc_opts.prog_fd = tc_opts.prog_id = 0; + ret = bpf_tc_detach(&tc_hook, &tc_opts); + ASSERT_OK(ret, "bpf_tc_detach"); + } + if (hook_created) { + tc_hook.attach_point = BPF_TC_INGRESS | BPF_TC_EGRESS; + bpf_tc_hook_destroy(&tc_hook); + } + test_assign_reuse__destroy(skel); + free_fds(fd_sv, 1); +} + +void test_assign_reuse(void) +{ + struct nstoken *tok = NULL; + + SYS(out, "ip netns add %s", NS_TEST); + SYS(cleanup, "ip -net %s link set dev lo up", NS_TEST); + + tok = open_netns(NS_TEST); + if (!ASSERT_OK_PTR(tok, "netns token")) + return; + + if (test__start_subtest("tcpv4")) + run_assign_reuse(AF_INET, SOCK_STREAM, "127.0.0.1", PORT); + if (test__start_subtest("tcpv6")) + run_assign_reuse(AF_INET6, SOCK_STREAM, "::1", PORT); + if (test__start_subtest("udpv4")) + run_assign_reuse(AF_INET, SOCK_DGRAM, "127.0.0.1", PORT); + if (test__start_subtest("udpv6")) + run_assign_reuse(AF_INET6, SOCK_DGRAM, "::1", PORT); + +cleanup: + close_netns(tok); + SYS_NOFAIL("ip netns delete %s", NS_TEST); +out: + return; +} diff --git a/tools/testing/selftests/bpf/progs/test_assign_reuse.c b/tools/testing/selftests/bpf/progs/test_assign_reuse.c new file mode 100644 index 000000000000..4f2e2321ea06 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_assign_reuse.c @@ -0,0 +1,142 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2023 Isovalent */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +char LICENSE[] SEC("license") = "GPL"; + +__u64 sk_cookie_seen; +__u64 reuseport_executed; +union { + struct tcphdr tcp; + struct udphdr udp; +} headers; + +const volatile __u16 dest_port; + +struct { + __uint(type, BPF_MAP_TYPE_SOCKMAP); + __uint(max_entries, 1); + __type(key, __u32); + __type(value, __u64); +} sk_map SEC(".maps"); + +SEC("sk_reuseport") +int reuse_accept(struct sk_reuseport_md *ctx) +{ + reuseport_executed++; + + if (ctx->ip_protocol == IPPROTO_TCP) { + if (ctx->data + sizeof(headers.tcp) > ctx->data_end) + return SK_DROP; + + if (__builtin_memcmp(&headers.tcp, ctx->data, sizeof(headers.tcp)) != 0) + return SK_DROP; + } else if (ctx->ip_protocol == IPPROTO_UDP) { + if (ctx->data + sizeof(headers.udp) > ctx->data_end) + return SK_DROP; + + if (__builtin_memcmp(&headers.udp, ctx->data, sizeof(headers.udp)) != 0) + return SK_DROP; + } else { + return SK_DROP; + } + + sk_cookie_seen = bpf_get_socket_cookie(ctx->sk); + return SK_PASS; +} + +SEC("sk_reuseport") +int reuse_drop(struct sk_reuseport_md *ctx) +{ + reuseport_executed++; + sk_cookie_seen = 0; + return SK_DROP; +} + +static int +assign_sk(struct __sk_buff *skb) +{ + int zero = 0, ret = 0; + struct bpf_sock *sk; + + sk = bpf_map_lookup_elem(&sk_map, &zero); + if (!sk) + return TC_ACT_SHOT; + ret = bpf_sk_assign(skb, sk, 0); + bpf_sk_release(sk); + return ret ? TC_ACT_SHOT : TC_ACT_OK; +} + +static bool +maybe_assign_tcp(struct __sk_buff *skb, struct tcphdr *th) +{ + if (th + 1 > (void *)(long)(skb->data_end)) + return TC_ACT_SHOT; + + if (!th->syn || th->ack || th->dest != bpf_htons(dest_port)) + return TC_ACT_OK; + + __builtin_memcpy(&headers.tcp, th, sizeof(headers.tcp)); + return assign_sk(skb); +} + +static bool +maybe_assign_udp(struct __sk_buff *skb, struct udphdr *uh) +{ + if (uh + 1 > (void *)(long)(skb->data_end)) + return TC_ACT_SHOT; + + if (uh->dest != bpf_htons(dest_port)) + return TC_ACT_OK; + + __builtin_memcpy(&headers.udp, uh, sizeof(headers.udp)); + return assign_sk(skb); +} + +SEC("tc") +int tc_main(struct __sk_buff *skb) +{ + void *data_end = (void *)(long)skb->data_end; + void *data = (void *)(long)skb->data; + struct ethhdr *eth; + + eth = (struct ethhdr *)(data); + if (eth + 1 > data_end) + return TC_ACT_SHOT; + + if (eth->h_proto == bpf_htons(ETH_P_IP)) { + struct iphdr *iph = (struct iphdr *)(data + sizeof(*eth)); + + if (iph + 1 > data_end) + return TC_ACT_SHOT; + + if (iph->protocol == IPPROTO_TCP) + return maybe_assign_tcp(skb, (struct tcphdr *)(iph + 1)); + else if (iph->protocol == IPPROTO_UDP) + return maybe_assign_udp(skb, (struct udphdr *)(iph + 1)); + else + return TC_ACT_SHOT; + } else { + struct ipv6hdr *ip6h = (struct ipv6hdr *)(data + sizeof(*eth)); + + if (ip6h + 1 > data_end) + return TC_ACT_SHOT; + + if (ip6h->nexthdr == IPPROTO_TCP) + return maybe_assign_tcp(skb, (struct tcphdr *)(ip6h + 1)); + else if (ip6h->nexthdr == IPPROTO_UDP) + return maybe_assign_udp(skb, (struct udphdr *)(ip6h + 1)); + else + return TC_ACT_SHOT; + } +}