From patchwork Fri Mar 5 01:56:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12117371 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0AB3C433DB for ; Fri, 5 Mar 2021 01:57:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7D5DF64F79 for ; Fri, 5 Mar 2021 01:57:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229704AbhCEB5J (ORCPT ); Thu, 4 Mar 2021 20:57:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42546 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229458AbhCEB5I (ORCPT ); Thu, 4 Mar 2021 20:57:08 -0500 Received: from mail-ot1-x333.google.com (mail-ot1-x333.google.com [IPv6:2607:f8b0:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66F21C061574; Thu, 4 Mar 2021 17:57:07 -0800 (PST) Received: by mail-ot1-x333.google.com with SMTP id r19so333418otk.2; Thu, 04 Mar 2021 17:57:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=R0Ojmw0D2lDoqGY2EK1tsZimcNKC42bH4E+kY1K6FpI=; b=H1bjrud1Ia9+uq9LpKaLPaAutqSazAEzEXGmoPUWbKoPYokKbdrPI7PbYq2qFliWxp kDH8Bu3+kN+vqNJGP5y6eeQIVko8pguIcJlvCEU/bNwv7stioZbQ3z3QWRN6SeZk/8wq XEt1MFsMoxKyD/Qxp2dbxx4XF86ReC9lER+EpG4GnuKaCkp3ClWNcG8QZcrViMCnq3D7 FZKFFjQOVtWu46SJRoQU/l5uIW7TyZuT/NpvKMDocB/EYXZxL0UxEx5HowFrb5M3FVZK bcrElTECOJN3aCyar+Au5OXjWXuvT3QrcTSfMCSRn7ctvaqe7SAoeQhIFNho/CcxHXAw cdiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=R0Ojmw0D2lDoqGY2EK1tsZimcNKC42bH4E+kY1K6FpI=; b=NiUYcDxZlwt2+i0LtUeE08cAHvESO+grEUplD6QNA4O42w1VAeUnssVzGoRr9CF5If 6wVJ0Gzs66IXUB517eZnbcYI4CUz3hNIKRPiHUBrIgsDBxHma8HzCi3iniiLzjxTSsuk H0u1aRbF5WWxW6z/q5kurS3UhgVZqe7VVLoF1UZKaSbVhZryLC9l73SW1wAcFnyMmvlA MCE3WyOJb4d1IHtIJd0gdTmcT49lAITghwCBM/6A7oIadfTPNs0E2cEZnq53q2oSM+9K 3YXQG5BT2pwmsUqFGYo8GAelrrrVUdndkh8EeUDfJzzNkbzmlQSmHN3q5wJOt0p6mCK2 4AKQ== X-Gm-Message-State: AOAM533O5qIY2mHAS4Ng/MDKwnl5WDpUgm8B/KxRD1e7rskGcVUR69mP YL/QN78HgTccDe7Lq5ck2jhS1tGSf35FDw== X-Google-Smtp-Source: ABdhPJwNig1tbWkVID8sPms9csDZvB7GUIK33pQuP3m5bjSib0okTgoGVd64l8E7HZqJuivqvpwXwg== X-Received: by 2002:a9d:7a52:: with SMTP id z18mr6195117otm.106.1614909426640; Thu, 04 Mar 2021 17:57:06 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:95de:1d5:1b36:946a]) by smtp.gmail.com with ESMTPSA id r3sm224126oif.5.2021.03.04.17.57.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Mar 2021 17:57:06 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v3 1/9] sock_map: introduce BPF_SK_SKB_VERDICT Date: Thu, 4 Mar 2021 17:56:47 -0800 Message-Id: <20210305015655.14249-2-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210305015655.14249-1-xiyou.wangcong@gmail.com> References: <20210305015655.14249-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Reusing BPF_SK_SKB_STREAM_VERDICT is possible but its name is confusing and more importantly we still want to distinguish them from user-space. So we can just reuse the stream verdict code but introduce a new type of eBPF program, skb_verdict. Users are not allowed to set stream_verdict and skb_verdict at the same time. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 3 +++ include/uapi/linux/bpf.h | 1 + kernel/bpf/syscall.c | 1 + net/core/skmsg.c | 4 +++- net/core/sock_map.c | 23 ++++++++++++++++++++++- tools/bpf/bpftool/common.c | 1 + tools/bpf/bpftool/prog.c | 1 + tools/include/uapi/linux/bpf.h | 1 + 8 files changed, 33 insertions(+), 2 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 6c09d94be2e9..451530d41af7 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -58,6 +58,7 @@ struct sk_psock_progs { struct bpf_prog *msg_parser; struct bpf_prog *stream_parser; struct bpf_prog *stream_verdict; + struct bpf_prog *skb_verdict; }; enum sk_psock_state_bits { @@ -442,6 +443,7 @@ static inline void psock_progs_drop(struct sk_psock_progs *progs) psock_set_prog(&progs->msg_parser, NULL); psock_set_prog(&progs->stream_parser, NULL); psock_set_prog(&progs->stream_verdict, NULL); + psock_set_prog(&progs->skb_verdict, NULL); } int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb); @@ -489,5 +491,6 @@ static inline void skb_bpf_redirect_clear(struct sk_buff *skb) { skb->_sk_redir = 0; } + #endif /* CONFIG_NET_SOCK_MSG */ #endif /* _LINUX_SKMSG_H */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index b89af20cfa19..1a08ab00a45e 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -247,6 +247,7 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_SKB_VERDICT, __MAX_BPF_ATTACH_TYPE }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c859bc46d06c..afa803a1553e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2941,6 +2941,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type) return BPF_PROG_TYPE_SK_MSG; case BPF_SK_SKB_STREAM_PARSER: case BPF_SK_SKB_STREAM_VERDICT: + case BPF_SK_SKB_VERDICT: return BPF_PROG_TYPE_SK_SKB; case BPF_LIRC_MODE2: return BPF_PROG_TYPE_LIRC_MODE2; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 07f54015238a..5efd790f1b47 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -693,7 +693,7 @@ void sk_psock_drop(struct sock *sk, struct sk_psock *psock) rcu_assign_sk_user_data(sk, NULL); if (psock->progs.stream_parser) sk_psock_stop_strp(sk, psock); - else if (psock->progs.stream_verdict) + else if (psock->progs.stream_verdict || psock->progs.skb_verdict) sk_psock_stop_verdict(sk, psock); write_unlock_bh(&sk->sk_callback_lock); sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); @@ -1010,6 +1010,8 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb, } skb_set_owner_r(skb, sk); prog = READ_ONCE(psock->progs.stream_verdict); + if (!prog) + prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { skb_dst_drop(skb); skb_bpf_redirect_clear(skb); diff --git a/net/core/sock_map.c b/net/core/sock_map.c index dd53a7771d7e..3bddd9dd2da2 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -155,6 +155,8 @@ static void sock_map_del_link(struct sock *sk, strp_stop = true; if (psock->saved_data_ready && stab->progs.stream_verdict) verdict_stop = true; + if (psock->saved_data_ready && stab->progs.skb_verdict) + verdict_stop = true; list_del(&link->list); sk_psock_free_link(link); } @@ -227,7 +229,7 @@ static struct sk_psock *sock_map_psock_get_checked(struct sock *sk) static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, struct sock *sk) { - struct bpf_prog *msg_parser, *stream_parser, *stream_verdict; + struct bpf_prog *msg_parser, *stream_parser, *stream_verdict, *skb_verdict; struct sk_psock *psock; int ret; @@ -256,6 +258,15 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, } } + skb_verdict = READ_ONCE(progs->skb_verdict); + if (skb_verdict) { + skb_verdict = bpf_prog_inc_not_zero(skb_verdict); + if (IS_ERR(skb_verdict)) { + ret = PTR_ERR(skb_verdict); + goto out_put_msg_parser; + } + } + psock = sock_map_psock_get_checked(sk); if (IS_ERR(psock)) { ret = PTR_ERR(psock); @@ -265,6 +276,7 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, if (psock) { if ((msg_parser && READ_ONCE(psock->progs.msg_parser)) || (stream_parser && READ_ONCE(psock->progs.stream_parser)) || + (skb_verdict && READ_ONCE(psock->progs.skb_verdict)) || (stream_verdict && READ_ONCE(psock->progs.stream_verdict))) { sk_psock_put(sk, psock); ret = -EBUSY; @@ -296,6 +308,9 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, } else if (!stream_parser && stream_verdict && !psock->saved_data_ready) { psock_set_prog(&psock->progs.stream_verdict, stream_verdict); sk_psock_start_verdict(sk,psock); + } else if (!stream_verdict && skb_verdict && !psock->saved_data_ready) { + psock_set_prog(&psock->progs.skb_verdict, skb_verdict); + sk_psock_start_verdict(sk, psock); } write_unlock_bh(&sk->sk_callback_lock); return 0; @@ -304,6 +319,9 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, out_drop: sk_psock_put(sk, psock); out_progs: + if (skb_verdict) + bpf_prog_put(skb_verdict); +out_put_msg_parser: if (msg_parser) bpf_prog_put(msg_parser); out_put_stream_parser: @@ -1468,6 +1486,9 @@ static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog, case BPF_SK_SKB_STREAM_VERDICT: pprog = &progs->stream_verdict; break; + case BPF_SK_SKB_VERDICT: + pprog = &progs->skb_verdict; + break; default: return -EOPNOTSUPP; } diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index 65303664417e..1828bba19020 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -57,6 +57,7 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = { [BPF_SK_SKB_STREAM_PARSER] = "sk_skb_stream_parser", [BPF_SK_SKB_STREAM_VERDICT] = "sk_skb_stream_verdict", + [BPF_SK_SKB_VERDICT] = "sk_skb_verdict", [BPF_SK_MSG_VERDICT] = "sk_msg_verdict", [BPF_LIRC_MODE2] = "lirc_mode2", [BPF_FLOW_DISSECTOR] = "flow_dissector", diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index f2b915b20546..3f067d2d7584 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -76,6 +76,7 @@ enum dump_mode { static const char * const attach_type_strings[] = { [BPF_SK_SKB_STREAM_PARSER] = "stream_parser", [BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict", + [BPF_SK_SKB_VERDICT] = "skb_verdict", [BPF_SK_MSG_VERDICT] = "msg_verdict", [BPF_FLOW_DISSECTOR] = "flow_dissector", [__MAX_BPF_ATTACH_TYPE] = NULL, diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index b89af20cfa19..1a08ab00a45e 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -247,6 +247,7 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_SKB_VERDICT, __MAX_BPF_ATTACH_TYPE }; From patchwork Fri Mar 5 01:56:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12117373 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA771C433DB for ; Fri, 5 Mar 2021 01:57:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BC83F65005 for ; Fri, 5 Mar 2021 01:57:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229773AbhCEB5K (ORCPT ); Thu, 4 Mar 2021 20:57:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229750AbhCEB5J (ORCPT ); Thu, 4 Mar 2021 20:57:09 -0500 Received: from mail-oo1-xc2e.google.com (mail-oo1-xc2e.google.com [IPv6:2607:f8b0:4864:20::c2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59273C061574; Thu, 4 Mar 2021 17:57:09 -0800 (PST) Received: by mail-oo1-xc2e.google.com with SMTP id x19so52507ooj.10; Thu, 04 Mar 2021 17:57:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=02/+efQXRRhrxIoXzRQc0o39Iu9TKok5AO33e3HYfho=; b=pyu1tFw1P+8WOpYKHuBWrvTeLCsN1zwWxTxePpNS6s9LbTantjhr/SX693zmK+CU/B SAo8orO9A1Ae4OsHT21TMIcpZYeWO8ftQYCyfUlNvFVMinM9THgP6XjuYoIOpk34V7MA SGmng1W/el0G9i+Fd//r//cKiY3QCzJcVJ5FsBqlG22uwTVKDS3H9FTLtpDBcHs/TJtU 315MS/ykGFTfTZ2CdodMObrMazBvIVOPHRn/Cls/aXrqyT1g4Nxiz6G5YHPvkZDnvr9T krcB3bMHNtKv6ZEgxE8hQ7PwVllokp50hcmT6OrK9Hnd+HcBCfPH4RtFEwDx1Ue+v6y+ dyOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=02/+efQXRRhrxIoXzRQc0o39Iu9TKok5AO33e3HYfho=; b=jXSStJvuuO2hche6PG5/tBeCCgKo4CiAs/hY6f1/HofUpadFU/MLANTI3Oi17WrwtT SIy4o1Qy1rG/NS7XB+kME/oerdiLscSGH+vwIKt1uitzHnjDFJYL/Gflp+XfB0CHCll0 Qyf0MHTdHQ7o1D4C3RLErS3bX6KR7x5SkCTF1TegC+SJPgQsvfFlFoebS8B7xvGLT6Qs ZSnBwV3UE4q8SuB3B9GgqHuYcRI4FR0PVSe7JgBLqUR8kK+ktQQ+Y6XSU1Z0hXbMp/cL ypP1K1rnn2Q80jFHmgJYahLW6JrJhAlCy9Ogw9rLOgqnz3KTpJCOY/tMZY7qOVJ78n0X Txhg== X-Gm-Message-State: AOAM532nnXKH1DNKWJ2uubzVkUhBxnkm7qnx9P2LeD6z7+T1WAA+fZoD 3EM7jvf2E9okpXSz4d+RkOJLxGKFgEDzYQ== X-Google-Smtp-Source: ABdhPJzpjWrNE3vkoFY6jP8dk1+NUwGDe7/bG/iyBSERM9kI4ZdzzSkhjbKm6d2tWZ/UIXCjKrivvA== X-Received: by 2002:a4a:a5cc:: with SMTP id k12mr5849754oom.37.1614909427973; Thu, 04 Mar 2021 17:57:07 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:95de:1d5:1b36:946a]) by smtp.gmail.com with ESMTPSA id r3sm224126oif.5.2021.03.04.17.57.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Mar 2021 17:57:07 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v3 2/9] sock: introduce sk->sk_prot->psock_update_sk_prot() Date: Thu, 4 Mar 2021 17:56:48 -0800 Message-Id: <20210305015655.14249-3-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210305015655.14249-1-xiyou.wangcong@gmail.com> References: <20210305015655.14249-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Currently sockmap calls into each protocol to update the struct proto and replace it. This certainly won't work when the protocol is implemented as a module, for example, AF_UNIX. Introduce a new ops sk->sk_prot->psock_update_sk_prot(), so each protocol can implement its own way to replace the struct proto. This also helps get rid of symbol dependencies on CONFIG_INET. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 18 +++--------------- include/net/sock.h | 3 +++ include/net/tcp.h | 1 + include/net/udp.h | 1 + net/core/skmsg.c | 5 ----- net/core/sock_map.c | 24 ++++-------------------- net/ipv4/tcp_bpf.c | 24 +++++++++++++++++++++--- net/ipv4/tcp_ipv4.c | 3 +++ net/ipv4/udp.c | 3 +++ net/ipv4/udp_bpf.c | 15 +++++++++++++-- net/ipv6/tcp_ipv6.c | 3 +++ net/ipv6/udp.c | 3 +++ 12 files changed, 58 insertions(+), 45 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 451530d41af7..c2e2bdff7338 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -98,6 +98,7 @@ struct sk_psock { void (*saved_close)(struct sock *sk, long timeout); void (*saved_write_space)(struct sock *sk); void (*saved_data_ready)(struct sock *sk); + int (*psock_update_sk_prot)(struct sock *sk, bool restore); struct proto *sk_proto; struct sk_psock_work_state work_state; struct work_struct work; @@ -350,25 +351,12 @@ static inline void sk_psock_cork_free(struct sk_psock *psock) } } -static inline void sk_psock_update_proto(struct sock *sk, - struct sk_psock *psock, - struct proto *ops) -{ - /* Pairs with lockless read in sk_clone_lock() */ - WRITE_ONCE(sk->sk_prot, ops); -} - static inline void sk_psock_restore_proto(struct sock *sk, struct sk_psock *psock) { sk->sk_prot->unhash = psock->saved_unhash; - if (inet_csk_has_ulp(sk)) { - tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space); - } else { - sk->sk_write_space = psock->saved_write_space; - /* Pairs with lockless read in sk_clone_lock() */ - WRITE_ONCE(sk->sk_prot, psock->sk_proto); - } + if (psock->psock_update_sk_prot) + psock->psock_update_sk_prot(sk, true); } static inline void sk_psock_set_state(struct sk_psock *psock, diff --git a/include/net/sock.h b/include/net/sock.h index 636810ddcd9b..eda64fbd5e3d 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1184,6 +1184,9 @@ struct proto { void (*unhash)(struct sock *sk); void (*rehash)(struct sock *sk); int (*get_port)(struct sock *sk, unsigned short snum); +#ifdef CONFIG_BPF_SYSCALL + int (*psock_update_sk_prot)(struct sock *sk, bool restore); +#endif /* Keeping track of sockets in use */ #ifdef CONFIG_PROC_FS diff --git a/include/net/tcp.h b/include/net/tcp.h index 075de26f449d..2efa4e5ea23d 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2203,6 +2203,7 @@ struct sk_psock; #ifdef CONFIG_BPF_SYSCALL struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock); +int tcp_bpf_update_proto(struct sock *sk, bool restore); void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); #endif /* CONFIG_BPF_SYSCALL */ diff --git a/include/net/udp.h b/include/net/udp.h index d4d064c59232..df7cc1edc200 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -518,6 +518,7 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk, #ifdef CONFIG_BPF_SYSCALL struct sk_psock; struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock); +int udp_bpf_update_proto(struct sock *sk, bool restore); #endif #endif /* _UDP_H */ diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 5efd790f1b47..7dbd8344ec89 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -563,11 +563,6 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) write_lock_bh(&sk->sk_callback_lock); - if (inet_csk_has_ulp(sk)) { - psock = ERR_PTR(-EINVAL); - goto out; - } - if (sk->sk_user_data) { psock = ERR_PTR(-EBUSY); goto out; diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 3bddd9dd2da2..7346c93d0f71 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -184,26 +184,10 @@ static void sock_map_unref(struct sock *sk, void *link_raw) static int sock_map_init_proto(struct sock *sk, struct sk_psock *psock) { - struct proto *prot; - - switch (sk->sk_type) { - case SOCK_STREAM: - prot = tcp_bpf_get_proto(sk, psock); - break; - - case SOCK_DGRAM: - prot = udp_bpf_get_proto(sk, psock); - break; - - default: + if (!sk->sk_prot->psock_update_sk_prot) return -EINVAL; - } - - if (IS_ERR(prot)) - return PTR_ERR(prot); - - sk_psock_update_proto(sk, psock, prot); - return 0; + psock->psock_update_sk_prot = sk->sk_prot->psock_update_sk_prot; + return sk->sk_prot->psock_update_sk_prot(sk, false); } static struct sk_psock *sock_map_psock_get_checked(struct sock *sk) @@ -570,7 +554,7 @@ static bool sock_map_redirect_allowed(const struct sock *sk) static bool sock_map_sk_is_suitable(const struct sock *sk) { - return sk_is_tcp(sk) || sk_is_udp(sk); + return !!sk->sk_prot->psock_update_sk_prot; } static bool sock_map_sk_state_allowed(const struct sock *sk) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 17c322b875fd..2022de8b625a 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -601,20 +601,38 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops) ops->sendpage == tcp_sendpage ? 0 : -ENOTSUPP; } -struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock) +int tcp_bpf_update_proto(struct sock *sk, bool restore) { + struct sk_psock *psock = sk_psock(sk); int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4; int config = psock->progs.msg_parser ? TCP_BPF_TX : TCP_BPF_BASE; + if (restore) { + if (inet_csk_has_ulp(sk)) { + tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space); + } else { + sk->sk_write_space = psock->saved_write_space; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, psock->sk_proto); + } + return 0; + } + + if (inet_csk_has_ulp(sk)) + return -EINVAL; + if (sk->sk_family == AF_INET6) { if (tcp_bpf_assert_proto_ops(psock->sk_proto)) - return ERR_PTR(-EINVAL); + return -EINVAL; tcp_bpf_check_v6_needs_rebuild(psock->sk_proto); } - return &tcp_bpf_prots[family][config]; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, &tcp_bpf_prots[family][config]); + return 0; } +EXPORT_SYMBOL_GPL(tcp_bpf_update_proto); /* If a child got cloned from a listening socket that had tcp_bpf * protocol callbacks installed, we need to restore the callbacks to diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index daad4f99db32..dfc6d1c0e710 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2806,6 +2806,9 @@ struct proto tcp_prot = { .hash = inet_hash, .unhash = inet_unhash, .get_port = inet_csk_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = tcp_bpf_update_proto, +#endif .enter_memory_pressure = tcp_enter_memory_pressure, .leave_memory_pressure = tcp_leave_memory_pressure, .stream_memory_free = tcp_stream_memory_free, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 4a0478b17243..38952aaee3a1 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2849,6 +2849,9 @@ struct proto udp_prot = { .unhash = udp_lib_unhash, .rehash = udp_v4_rehash, .get_port = udp_v4_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = udp_bpf_update_proto, +#endif .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min), diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 7a94791efc1a..6001f93cd3a0 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -41,12 +41,23 @@ static int __init udp_bpf_v4_build_proto(void) } core_initcall(udp_bpf_v4_build_proto); -struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock) +int udp_bpf_update_proto(struct sock *sk, bool restore) { int family = sk->sk_family == AF_INET ? UDP_BPF_IPV4 : UDP_BPF_IPV6; + struct sk_psock *psock = sk_psock(sk); + + if (restore) { + sk->sk_write_space = psock->saved_write_space; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, psock->sk_proto); + return 0; + } if (sk->sk_family == AF_INET6) udp_bpf_check_v6_needs_rebuild(psock->sk_proto); - return &udp_bpf_prots[family]; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, &udp_bpf_prots[family]); + return 0; } +EXPORT_SYMBOL_GPL(udp_bpf_update_proto); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index bd44ded7e50c..4fdc58a9e19e 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -2134,6 +2134,9 @@ struct proto tcpv6_prot = { .hash = inet6_hash, .unhash = inet_unhash, .get_port = inet_csk_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = tcp_bpf_update_proto, +#endif .enter_memory_pressure = tcp_enter_memory_pressure, .leave_memory_pressure = tcp_leave_memory_pressure, .stream_memory_free = tcp_stream_memory_free, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index d25e5a9252fd..ef2c75bb4771 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1713,6 +1713,9 @@ struct proto udpv6_prot = { .unhash = udp_lib_unhash, .rehash = udp_v6_rehash, .get_port = udp_v6_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = udp_bpf_update_proto, +#endif .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min), From patchwork Fri Mar 5 01:56:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12117377 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65806C433E9 for ; Fri, 5 Mar 2021 01:57:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 397E365005 for ; Fri, 5 Mar 2021 01:57:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229794AbhCEB5K (ORCPT ); Thu, 4 Mar 2021 20:57:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42562 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229791AbhCEB5K (ORCPT ); Thu, 4 Mar 2021 20:57:10 -0500 Received: from mail-ot1-x32d.google.com (mail-ot1-x32d.google.com [IPv6:2607:f8b0:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04711C061574; Thu, 4 Mar 2021 17:57:10 -0800 (PST) Received: by mail-ot1-x32d.google.com with SMTP id w3so309179oti.8; Thu, 04 Mar 2021 17:57:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bC4BMv/s3sCOyOd1K17vW7opInCuUyyPq+85GVpXtSo=; b=c2i3SK7BiJgwZBR7Trn9VCs7xqpL8NTeSj2UAlawXVMf+Mg29mrxXFfsc09hPrLDzU n6I5P7LDzpm0xuNOWu/dw1Ew5so0psEpQV0lfCNSDs5UFOitkVPvVyiWWPgkkhhVVMSY SOvI5hYa2OiefPhbqptaiqvN5a6SySVs4bESY7PSuVIOIIfG+dIn9IlIQJaRDqpyJg3/ YybnvA5PGyHYolqSxljtTWAhIbuLCWzO3/UUZ6gX/r359FZk5sTC/D/edoJJlL+RaBOU E1wjz4VMfCLTkuB7UEfVsuANgPO2t3lrp9m7Co8BWSFwjvesWvPNXH826m0kh02gc6IZ Qn4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bC4BMv/s3sCOyOd1K17vW7opInCuUyyPq+85GVpXtSo=; b=kw/rK0/Qr1mBXpSSRJQRKhB/4nqCezuAT5Z+3f6fICQqKSjTSp8bXUky8GYk30nEOM CiBhFrGmSef5/HieLTmfQc9k5aAqoQaE9U11GatGOmpdgsCUXN6+TyS0n7Q9VuFFdu3s ypUTNR552mXcjv/atEoy43ZMEzQ8Cad4F+3gzOtwLBXgVuJaSbwUx2kkhsFb2E5FRVXE iUh141oGDYI/roRz7KBZOCHcVjcgi7qxPZB2PwldyD3/gM0NkMesoh0CFBxa49uIfn+Z rTR6jpxVH3yhT4luiGL/QQTV/V+AO8zz4avDDI0we3XjZD7zIkSuOif6QE2bKXxnvIQc XDWA== X-Gm-Message-State: AOAM533k8ml42wXIODfpYG26InS+6yoXNzCSIP7Yz5aCbuz5FS/zumYL QdshI0tjIgW7lFLJJ54Sz3e8zWQKtydaHg== X-Google-Smtp-Source: ABdhPJxrBnW9RgYbvdN5QnaFDu3sBHy75xlA7Z+uD/035xIZ5m/1vfA25JDvD1eDmN9OL3Fy5QhoLw== X-Received: by 2002:a05:6830:558:: with SMTP id l24mr5964912otb.209.1614909429233; Thu, 04 Mar 2021 17:57:09 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:95de:1d5:1b36:946a]) by smtp.gmail.com with ESMTPSA id r3sm224126oif.5.2021.03.04.17.57.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Mar 2021 17:57:08 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v3 3/9] udp: implement ->sendmsg_locked() Date: Thu, 4 Mar 2021 17:56:49 -0800 Message-Id: <20210305015655.14249-4-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210305015655.14249-1-xiyou.wangcong@gmail.com> References: <20210305015655.14249-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang UDP already has udp_sendmsg() which takes lock_sock() inside. We have to build ->sendmsg_locked() on top of it, by adding a new parameter for whether the sock has been locked. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/net/udp.h | 1 + net/ipv4/af_inet.c | 1 + net/ipv4/udp.c | 30 +++++++++++++++++++++++------- 3 files changed, 25 insertions(+), 7 deletions(-) diff --git a/include/net/udp.h b/include/net/udp.h index df7cc1edc200..5264ba1439f9 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -292,6 +292,7 @@ int udp_get_port(struct sock *sk, unsigned short snum, int udp_err(struct sk_buff *, u32); int udp_abort(struct sock *sk, int err); int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len); +int udp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len); int udp_push_pending_frames(struct sock *sk); void udp_flush_pending_frames(struct sock *sk); int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index a02ce89b56b5..d8c73a848c53 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1071,6 +1071,7 @@ const struct proto_ops inet_dgram_ops = { .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, .sendmsg = inet_sendmsg, + .sendmsg_locked = udp_sendmsg_locked, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, .sendpage = inet_sendpage, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 38952aaee3a1..424231e910a9 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1024,7 +1024,7 @@ int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size) } EXPORT_SYMBOL_GPL(udp_cmsg_send); -int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) +static int __udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len, bool locked) { struct inet_sock *inet = inet_sk(sk); struct udp_sock *up = udp_sk(sk); @@ -1063,15 +1063,18 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) * There are pending frames. * The socket lock must be held while it's corked. */ - lock_sock(sk); + if (!locked) + lock_sock(sk); if (likely(up->pending)) { if (unlikely(up->pending != AF_INET)) { - release_sock(sk); + if (!locked) + release_sock(sk); return -EINVAL; } goto do_append_data; } - release_sock(sk); + if (!locked) + release_sock(sk); } ulen += sizeof(struct udphdr); @@ -1241,11 +1244,13 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) goto out; } - lock_sock(sk); + if (!locked) + lock_sock(sk); if (unlikely(up->pending)) { /* The socket is already corked while preparing it. */ /* ... which is an evident application bug. --ANK */ - release_sock(sk); + if (!locked) + release_sock(sk); net_dbg_ratelimited("socket already corked\n"); err = -EINVAL; @@ -1272,7 +1277,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) err = udp_push_pending_frames(sk); else if (unlikely(skb_queue_empty(&sk->sk_write_queue))) up->pending = 0; - release_sock(sk); + if (!locked) + release_sock(sk); out: ip_rt_put(rt); @@ -1302,8 +1308,18 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) err = 0; goto out; } + +int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) +{ + return __udp_sendmsg(sk, msg, len, false); +} EXPORT_SYMBOL(udp_sendmsg); +int udp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len) +{ + return __udp_sendmsg(sk, msg, len, true); +} + int udp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) { From patchwork Fri Mar 5 01:56:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12117375 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D81F5C433E6 for ; Fri, 5 Mar 2021 01:57:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A95B56500D for ; Fri, 5 Mar 2021 01:57:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229788AbhCEB5M (ORCPT ); Thu, 4 Mar 2021 20:57:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229575AbhCEB5L (ORCPT ); Thu, 4 Mar 2021 20:57:11 -0500 Received: from mail-ot1-x331.google.com (mail-ot1-x331.google.com [IPv6:2607:f8b0:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57E19C061574; Thu, 4 Mar 2021 17:57:11 -0800 (PST) Received: by mail-ot1-x331.google.com with SMTP id v12so296398ott.10; Thu, 04 Mar 2021 17:57:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=s1GzOzweTwmrvvMSvdXGNYdi6EOSR3K6HzD5+wNMVgQ=; b=QP2rryxtZy0yIJRoKIjB9zV1so3Kpmbt9yC+2n/xMKTHurfbkJO+I67XDxaRasUZI4 jYEfBGAnBKcqTZWUf23sHTwMInKT30VNisy7dySBKR7/gQPYyI69B4HuRqsb3F31RdKb jdQuW7xnYyoMFlQ+nDoebQId+YPj4ryj4yz/hWMReaoh3BMMRlV+pkOk+1YhUzkqB/XX qobGfAxkcNxM8/vmpNH2lgbpOu4UIjcQigQRQn73XF1ce0o04ah0SI7Yl/hxnyLsQETv pLZHNG1iGf/pLB/IUsNYaS4Qq71T1Yeri7PXPcElvI7bnlZfJ84FM0a9/2bQi2xwR4Q9 +64Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=s1GzOzweTwmrvvMSvdXGNYdi6EOSR3K6HzD5+wNMVgQ=; b=oDz5ssxn7GfwNBcxPkSrO1Sy4Eg13O2wnxS/AM205OjjOw7KTRRWLZtu4XflgyZXWb t0u4Akkb5bJ3fe2xN7DknwWzwEfxJycyrMRCJMAE4RWmiJgBDDfB7VKhT4xfCKwfuP7k qjfuSfpQ9RDQpcdyFw2GLW7dmjvh0/sr0i2T/cZqItJ9lQM1GRJqovaOZY7LOWagP2Si 2COs7fGTT64l3yLCN/rXoLWkFXx72y8KtAhDgZ0xYQ1Um542WRSMwipJF/ZjZnkkM2iR Z7g881f9jc0AV+nta/6AwumDcbFEPQZkpYbvk+RbRkCYFUefi121FGB5fGzRHmxtGqwN Xogg== X-Gm-Message-State: AOAM533+vpCSbzvaFcGLB+on7whM/ETkX5Y6Ad1Ka6gwnRWf/oYLUuLs bplN3EMABzSa7RJSBeRdVJLYO49IVGQNaQ== X-Google-Smtp-Source: ABdhPJwEhRD/7NflMm34pkr8bkfN6yhLBwP+EfwcRYBdn1aVM4fgdcNzAJf6yRsgqsT+WcB8nihtmw== X-Received: by 2002:a05:6830:1bc2:: with SMTP id v2mr532390ota.245.1614909430569; Thu, 04 Mar 2021 17:57:10 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:95de:1d5:1b36:946a]) by smtp.gmail.com with ESMTPSA id r3sm224126oif.5.2021.03.04.17.57.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Mar 2021 17:57:10 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v3 4/9] udp: implement ->read_sock() for sockmap Date: Thu, 4 Mar 2021 17:56:50 -0800 Message-Id: <20210305015655.14249-5-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210305015655.14249-1-xiyou.wangcong@gmail.com> References: <20210305015655.14249-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang This is similar to tcp_read_sock(), except we do not need to worry about connections, we just need to retrieve skb from UDP receive queue. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/net/udp.h | 2 ++ net/ipv4/af_inet.c | 1 + net/ipv4/udp.c | 34 ++++++++++++++++++++++++++++++++++ 3 files changed, 37 insertions(+) diff --git a/include/net/udp.h b/include/net/udp.h index 5264ba1439f9..44a94cfc63b5 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -330,6 +330,8 @@ struct sock *__udp6_lib_lookup(struct net *net, struct sk_buff *skb); struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport); +int udp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor); /* UDP uses skb->dev_scratch to cache as much information as possible and avoid * possibly multiple cache miss on dequeue() diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index d8c73a848c53..df8e8e238756 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1072,6 +1072,7 @@ const struct proto_ops inet_dgram_ops = { .getsockopt = sock_common_getsockopt, .sendmsg = inet_sendmsg, .sendmsg_locked = udp_sendmsg_locked, + .read_sock = udp_read_sock, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, .sendpage = inet_sendpage, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 424231e910a9..fd8f27ee5b4e 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1798,6 +1798,40 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags, } EXPORT_SYMBOL(__skb_recv_udp); +int udp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + int copied = 0; + + while (1) { + int offset = 0, err; + struct sk_buff *skb; + + skb = __skb_recv_udp(sk, 0, 1, &offset, &err); + if (!skb) + break; + if (offset < skb->len) { + int used; + size_t len; + + len = skb->len - offset; + used = recv_actor(desc, skb, offset, len); + if (used <= 0) { + if (!copied) + copied = used; + break; + } else if (used <= len) { + copied += used; + offset += used; + } + } + if (!desc->count) + break; + } + + return copied; +} + /* * This should be easy, if there is something there we * return it, otherwise we block. From patchwork Fri Mar 5 01:56:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12117379 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 027DCC4332B for ; Fri, 5 Mar 2021 01:57:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B7B746500F for ; Fri, 5 Mar 2021 01:57:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229637AbhCEB5O (ORCPT ); Thu, 4 Mar 2021 20:57:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229797AbhCEB5M (ORCPT ); Thu, 4 Mar 2021 20:57:12 -0500 Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 900F8C061574; Thu, 4 Mar 2021 17:57:12 -0800 (PST) Received: by mail-oi1-x229.google.com with SMTP id d20so749560oiw.10; Thu, 04 Mar 2021 17:57:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uGsqveMCmyCU5pgZFzP569dVFv2d0yadbOz4Nvro+g0=; b=hEPEkzDvfVo4GbvNpaEVEpzhcODcVra0+pfmAuNySIBUwkMo2cYto83LOSq8OONMWe HLPhmhYO8h8zvLszeVt/+DBHQsrOTE114Wo/d8157ryhe7qit+H8Vf9UvVFmyZ0JHrt1 2E7BLXvzBzzkl0AvZAUcPYuui/eX3+Yw6tO5K/cDHe+YHeap9ynnRZORVCkF9QBXZxSc iIRCLpcrGhz4ahAujd5Mjup9WlnoR1dIFdgbi2QN4ftGyQuwNGm0uWXwj6zR9iGy7jQh 25gEizqOkhVkgZ3DAoHganqVrkRX3nA6E/5dJKfleinoFbBGPE4TKueS1yUh42TG9BW3 SVdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uGsqveMCmyCU5pgZFzP569dVFv2d0yadbOz4Nvro+g0=; b=LUh/Vc9Q3aOTatcubhWwkD0nceM92MULYjwTgbW9tmiC98A11WZf2dKgrmtiRuaPlk GOrSASU73+6TpG0U9YAoLR6rai3syOg8wOdeXvu3V9DY1QrTehbTJn7ugK6O5rdS1N6A k96pMT5F6OtB2I+jjrTEE0Y556U9ywUJwqe3aOtF6L7F8f2v2f6JVv+btkZHQSwRBltd /nimGmV8+pScS206J2lnAlaoztPbx0b2IIDnkqLA1bSNwjMGo4iNxipSfL/xmYAmiLvU YFSABA8HAq3iDM+A/hEDLPuvxz5jDPbg7gBPeMpmGsMKXqDiMLaNaICWljI9PBp4Ylbq D+7Q== X-Gm-Message-State: AOAM5308d0nYr/kilvx7mWHHwe/66j1pILa8HvumtDClEXa1AOiBaVxW X7a/D+c8ykHbQjYegoTtAQkKaMGDmo78BQ== X-Google-Smtp-Source: ABdhPJyMie4ja78W3y2goF6TCaZQ+Fl2FE+1kvIMWPjEVl4B51tuXrysr9MEzEUI+Sgp6yQjjQbhoA== X-Received: by 2002:aca:bc89:: with SMTP id m131mr5066332oif.62.1614909431837; Thu, 04 Mar 2021 17:57:11 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:95de:1d5:1b36:946a]) by smtp.gmail.com with ESMTPSA id r3sm224126oif.5.2021.03.04.17.57.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Mar 2021 17:57:11 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v3 5/9] udp: add ->read_sock() and ->sendmsg_locked() to ipv6 Date: Thu, 4 Mar 2021 17:56:51 -0800 Message-Id: <20210305015655.14249-6-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210305015655.14249-1-xiyou.wangcong@gmail.com> References: <20210305015655.14249-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Similarly, udpv6_sendmsg() takes lock_sock() inside too, we have to build ->sendmsg_locked() on top of it. For ->read_sock(), we can just use udp_read_sock(). Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/net/ipv6.h | 1 + net/ipv4/udp.c | 1 + net/ipv6/af_inet6.c | 2 ++ net/ipv6/udp.c | 27 +++++++++++++++++++++------ 4 files changed, 25 insertions(+), 6 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index bd1f396cc9c7..48b6850dae85 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1119,6 +1119,7 @@ int inet6_hash_connect(struct inet_timewait_death_row *death_row, int inet6_sendmsg(struct socket *sock, struct msghdr *msg, size_t size); int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags); +int udpv6_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len); /* * reassembly.c diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index fd8f27ee5b4e..6658db231475 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1831,6 +1831,7 @@ int udp_read_sock(struct sock *sk, read_descriptor_t *desc, return copied; } +EXPORT_SYMBOL(udp_read_sock); /* * This should be easy, if there is something there we diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 1fb75f01756c..634ab3a825d7 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -714,7 +714,9 @@ const struct proto_ops inet6_dgram_ops = { .setsockopt = sock_common_setsockopt, /* ok */ .getsockopt = sock_common_getsockopt, /* ok */ .sendmsg = inet6_sendmsg, /* retpoline's sake */ + .sendmsg_locked = udpv6_sendmsg_locked, .recvmsg = inet6_recvmsg, /* retpoline's sake */ + .read_sock = udp_read_sock, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, .set_peek_off = sk_set_peek_off, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index ef2c75bb4771..124a316da410 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1272,7 +1272,7 @@ static int udp_v6_push_pending_frames(struct sock *sk) return err; } -int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) +static int __udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len, bool locked) { struct ipv6_txoptions opt_space; struct udp_sock *up = udp_sk(sk); @@ -1361,7 +1361,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) * There are pending frames. * The socket lock must be held while it's corked. */ - lock_sock(sk); + if (!locked) + lock_sock(sk); if (likely(up->pending)) { if (unlikely(up->pending != AF_INET6)) { release_sock(sk); @@ -1370,7 +1371,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) dst = NULL; goto do_append_data; } - release_sock(sk); + if (!locked) + release_sock(sk); } ulen += sizeof(struct udphdr); @@ -1533,11 +1535,13 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) goto out; } - lock_sock(sk); + if (!locked) + lock_sock(sk); if (unlikely(up->pending)) { /* The socket is already corked while preparing it. */ /* ... which is an evident application bug. --ANK */ - release_sock(sk); + if (!locked) + release_sock(sk); net_dbg_ratelimited("udp cork app bug 2\n"); err = -EINVAL; @@ -1562,7 +1566,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) if (err > 0) err = np->recverr ? net_xmit_errno(err) : 0; - release_sock(sk); + if (!locked) + release_sock(sk); out: dst_release(dst); @@ -1593,6 +1598,16 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) goto out; } +int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) +{ + return __udpv6_sendmsg(sk, msg, len, false); +} + +int udpv6_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len) +{ + return __udpv6_sendmsg(sk, msg, len, true); +} + void udpv6_destroy_sock(struct sock *sk) { struct udp_sock *up = udp_sk(sk); From patchwork Fri Mar 5 01:56:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12117381 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B99CC433E0 for ; Fri, 5 Mar 2021 01:57:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 72ABF6500D for ; Fri, 5 Mar 2021 01:57:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229791AbhCEB5Q (ORCPT ); Thu, 4 Mar 2021 20:57:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229801AbhCEB5O (ORCPT ); Thu, 4 Mar 2021 20:57:14 -0500 Received: from mail-oo1-xc30.google.com (mail-oo1-xc30.google.com [IPv6:2607:f8b0:4864:20::c30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E17BBC061574; Thu, 4 Mar 2021 17:57:13 -0800 (PST) Received: by mail-oo1-xc30.google.com with SMTP id x23so62377oop.1; Thu, 04 Mar 2021 17:57:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=3XTHtLkllymrd1WOxkj/6/vStiqqEthpL5NfV3vGpJY=; b=NkI2BO2owhtpvAFC8K2p+6UYUolXQEg2QokzOjrHA1iLikYw0gVxVbCsvEHKLCMksm VKx633F28RPmI6AVL65jWXc2lSTeEWsOx+T/mlPhLWKdbFBlJlFFD8p6Gq9cP+ZXZYjn WgvINBwiTDcP2fdBfEhVdvLAWRT96l7chDl5q/DMxJiBukOfc30HaT6c5MCqa2sn0Vpa ZlMTmnRcFeHDrLZse5xm466F8pdW+aAcgb4tZEMVEJV74/a/oVHlZS/8rcN3UsJ7xR35 eT2ItbG16BG5LMHAojOWssUXY1gK5yjGz/bQh6HhAZPEqtt2m7b+cjsRlGda5AUaY/KW X+lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=3XTHtLkllymrd1WOxkj/6/vStiqqEthpL5NfV3vGpJY=; b=Bv3XC5HQNBeNIAikCt6mvE88QDUFY4vtz9iwft0cgp//l169i7if+UoE/ujrsYTa3H kyVlo2lloqrEtCRKfdpFnhJmZBMnqbc7MpE6pYgeKeAnkm6edBW7iWpiHNBBG317vLly PrYirQVphrw1PqkDCBRQnt5MnsJNSMUkkfr+LdEHQ77l7zZxIuFYaKueuppqwORSr1Vt NcWQ3pguASTPH++99X9t03XHgD5Wa9TaqgbVbWAL2TzFcZYOIAS3moTxeR6EspG00J9n I45wJsK0QHtsul9eQc1GV9j7wbHopmm7Ep44NTX3hE7OXzuFhh89PF43PHKbop/ARlWL 2r/A== X-Gm-Message-State: AOAM530cjNjn4AcfroKP/ueoG7pomIF5Dyxh/NHY2PoAVchYxHAc+o+4 bAt89PKwuKGNl2b/Wy3ow9qZZ1BerIWLFQ== X-Google-Smtp-Source: ABdhPJxaI+7Ja4ooWvXV8kSwGAg2PBDo1AcNpYxOhxx3zz8vlzKK455k/1Ud0BDkCj+7gzNrC3epBA== X-Received: by 2002:a4a:d296:: with SMTP id h22mr5823027oos.23.1614909433092; Thu, 04 Mar 2021 17:57:13 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:95de:1d5:1b36:946a]) by smtp.gmail.com with ESMTPSA id r3sm224126oif.5.2021.03.04.17.57.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Mar 2021 17:57:12 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v3 6/9] skmsg: extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data() Date: Thu, 4 Mar 2021 17:56:52 -0800 Message-Id: <20210305015655.14249-7-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210305015655.14249-1-xiyou.wangcong@gmail.com> References: <20210305015655.14249-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Although these two functions are only used by TCP, they are not specific to TCP at all, both operate on skmsg and ingress_msg, so fit in net/core/skmsg.c very well. And we will need them for non-TCP, so rename and move them to skmsg.c and export them to modules. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 4 ++ include/net/tcp.h | 2 - net/core/skmsg.c | 104 +++++++++++++++++++++++++++++++++++++++++ net/ipv4/tcp_bpf.c | 106 +----------------------------------------- net/tls/tls_sw.c | 4 +- 5 files changed, 112 insertions(+), 108 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index c2e2bdff7338..bb27b93aad95 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -126,6 +126,10 @@ int sk_msg_zerocopy_from_iter(struct sock *sk, struct iov_iter *from, struct sk_msg *msg, u32 bytes); int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, struct sk_msg *msg, u32 bytes); +int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags, + long timeo, int *err); +int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, + int len, int flags); static inline void sk_msg_check_to_free(struct sk_msg *msg, u32 i, u32 bytes) { diff --git a/include/net/tcp.h b/include/net/tcp.h index 2efa4e5ea23d..31b1696c62ba 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2209,8 +2209,6 @@ void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, u32 bytes, int flags); -int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, - struct msghdr *msg, int len, int flags); #endif /* CONFIG_NET_SOCK_MSG */ #if !defined(CONFIG_BPF_SYSCALL) || !defined(CONFIG_NET_SOCK_MSG) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 7dbd8344ec89..fa10d869a728 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -399,6 +399,110 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, } EXPORT_SYMBOL_GPL(sk_msg_memcopy_from_iter); +int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags, + long timeo, int *err) +{ + DEFINE_WAIT_FUNC(wait, woken_wake_function); + int ret = 0; + + if (sk->sk_shutdown & RCV_SHUTDOWN) + return 1; + + if (!timeo) + return ret; + + add_wait_queue(sk_sleep(sk), &wait); + sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); + ret = sk_wait_event(sk, &timeo, + !list_empty(&psock->ingress_msg) || + !skb_queue_empty(&sk->sk_receive_queue), &wait); + sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); + remove_wait_queue(sk_sleep(sk), &wait); + return ret; +} +EXPORT_SYMBOL_GPL(sk_msg_wait_data); + +/* Receive sk_msg from psock->ingress_msg to @msg. */ +int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, + int len, int flags) +{ + struct iov_iter *iter = &msg->msg_iter; + int peek = flags & MSG_PEEK; + struct sk_msg *msg_rx; + int i, copied = 0; + + msg_rx = list_first_entry_or_null(&psock->ingress_msg, + struct sk_msg, list); + + while (copied != len) { + struct scatterlist *sge; + + if (unlikely(!msg_rx)) + break; + + i = msg_rx->sg.start; + do { + struct page *page; + int copy; + + sge = sk_msg_elem(msg_rx, i); + copy = sge->length; + page = sg_page(sge); + if (copied + copy > len) + copy = len - copied; + copy = copy_page_to_iter(page, sge->offset, copy, iter); + if (!copy) + return copied ? copied : -EFAULT; + + copied += copy; + if (likely(!peek)) { + sge->offset += copy; + sge->length -= copy; + if (!msg_rx->skb) + sk_mem_uncharge(sk, copy); + msg_rx->sg.size -= copy; + + if (!sge->length) { + sk_msg_iter_var_next(i); + if (!msg_rx->skb) + put_page(page); + } + } else { + /* Lets not optimize peek case if copy_page_to_iter + * didn't copy the entire length lets just break. + */ + if (copy != sge->length) + return copied; + sk_msg_iter_var_next(i); + } + + if (copied == len) + break; + } while (i != msg_rx->sg.end); + + if (unlikely(peek)) { + if (msg_rx == list_last_entry(&psock->ingress_msg, + struct sk_msg, list)) + break; + msg_rx = list_next_entry(msg_rx, list); + continue; + } + + msg_rx->sg.start = i; + if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { + list_del(&msg_rx->list); + if (msg_rx->skb) + consume_skb(msg_rx->skb); + kfree(msg_rx); + } + msg_rx = list_first_entry_or_null(&psock->ingress_msg, + struct sk_msg, list); + } + + return copied; +} +EXPORT_SYMBOL_GPL(sk_msg_recvmsg); + static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, struct sk_buff *skb) { diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 2022de8b625a..3d622a0d0753 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -10,86 +10,6 @@ #include #include -int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, - struct msghdr *msg, int len, int flags) -{ - struct iov_iter *iter = &msg->msg_iter; - int peek = flags & MSG_PEEK; - struct sk_msg *msg_rx; - int i, copied = 0; - - msg_rx = list_first_entry_or_null(&psock->ingress_msg, - struct sk_msg, list); - - while (copied != len) { - struct scatterlist *sge; - - if (unlikely(!msg_rx)) - break; - - i = msg_rx->sg.start; - do { - struct page *page; - int copy; - - sge = sk_msg_elem(msg_rx, i); - copy = sge->length; - page = sg_page(sge); - if (copied + copy > len) - copy = len - copied; - copy = copy_page_to_iter(page, sge->offset, copy, iter); - if (!copy) - return copied ? copied : -EFAULT; - - copied += copy; - if (likely(!peek)) { - sge->offset += copy; - sge->length -= copy; - if (!msg_rx->skb) - sk_mem_uncharge(sk, copy); - msg_rx->sg.size -= copy; - - if (!sge->length) { - sk_msg_iter_var_next(i); - if (!msg_rx->skb) - put_page(page); - } - } else { - /* Lets not optimize peek case if copy_page_to_iter - * didn't copy the entire length lets just break. - */ - if (copy != sge->length) - return copied; - sk_msg_iter_var_next(i); - } - - if (copied == len) - break; - } while (i != msg_rx->sg.end); - - if (unlikely(peek)) { - if (msg_rx == list_last_entry(&psock->ingress_msg, - struct sk_msg, list)) - break; - msg_rx = list_next_entry(msg_rx, list); - continue; - } - - msg_rx->sg.start = i; - if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { - list_del(&msg_rx->list); - if (msg_rx->skb) - consume_skb(msg_rx->skb); - kfree(msg_rx); - } - msg_rx = list_first_entry_or_null(&psock->ingress_msg, - struct sk_msg, list); - } - - return copied; -} -EXPORT_SYMBOL_GPL(__tcp_bpf_recvmsg); - static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock, struct sk_msg *msg, u32 apply_bytes, int flags) { @@ -243,28 +163,6 @@ static bool tcp_bpf_stream_read(const struct sock *sk) return !empty; } -static int tcp_bpf_wait_data(struct sock *sk, struct sk_psock *psock, - int flags, long timeo, int *err) -{ - DEFINE_WAIT_FUNC(wait, woken_wake_function); - int ret = 0; - - if (sk->sk_shutdown & RCV_SHUTDOWN) - return 1; - - if (!timeo) - return ret; - - add_wait_queue(sk_sleep(sk), &wait); - sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); - ret = sk_wait_event(sk, &timeo, - !list_empty(&psock->ingress_msg) || - !skb_queue_empty(&sk->sk_receive_queue), &wait); - sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); - remove_wait_queue(sk_sleep(sk), &wait); - return ret; -} - static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, int flags, int *addr_len) { @@ -284,13 +182,13 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, } lock_sock(sk); msg_bytes_ready: - copied = __tcp_bpf_recvmsg(sk, psock, msg, len, flags); + copied = sk_msg_recvmsg(sk, psock, msg, len, flags); if (!copied) { int data, err = 0; long timeo; timeo = sock_rcvtimeo(sk, nonblock); - data = tcp_bpf_wait_data(sk, psock, flags, timeo, &err); + data = sk_msg_wait_data(sk, psock, flags, timeo, &err); if (data) { if (!sk_psock_queue_empty(psock)) goto msg_bytes_ready; diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 01d933ae5f16..1dcb34dfd56b 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1789,8 +1789,8 @@ int tls_sw_recvmsg(struct sock *sk, skb = tls_wait_data(sk, psock, flags, timeo, &err); if (!skb) { if (psock) { - int ret = __tcp_bpf_recvmsg(sk, psock, - msg, len, flags); + int ret = sk_msg_recvmsg(sk, psock, msg, len, + flags); if (ret > 0) { decrypted += ret; From patchwork Fri Mar 5 01:56:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12117383 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D76CAC433E0 for ; Fri, 5 Mar 2021 01:57:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9D2FD65005 for ; Fri, 5 Mar 2021 01:57:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230051AbhCEB5T (ORCPT ); Thu, 4 Mar 2021 20:57:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229843AbhCEB5P (ORCPT ); Thu, 4 Mar 2021 20:57:15 -0500 Received: from mail-oi1-x22f.google.com (mail-oi1-x22f.google.com [IPv6:2607:f8b0:4864:20::22f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E31FC061756; Thu, 4 Mar 2021 17:57:15 -0800 (PST) Received: by mail-oi1-x22f.google.com with SMTP id z126so774999oiz.6; Thu, 04 Mar 2021 17:57:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ktDD3wFfhTgzuwFfQ6FhKlNOYR4srlqkDAegPxHtcbc=; b=APhEwVozkBBp/ekYFg+g3/ANKOt9BiyrKsB9PtYBfJM+n2Z3M4ZRTj8bGlErYv98HD tg0Fs8ol2lRHu6RpZMAzOfmEt1lbFwT3AGsRJoKTYXPxDcVYv8w2BMcXbo/GDbwpn5Rz 6j+Bk601XfAbdkrIkkq2JetP1XtJXdq1Ty/xR2jleL0hpD6SqDZqMNlYOKblY6HdZ1qb KAyW8nJ55++h96VGHalUcvGtE78fIQPExgkVEJvIe5NXPPagIHUHUSKj7AMoBu1RlZFT 1YAuOfAF1xtcs6Ca8ZXKT6XTAHrGSC6AVI1Ne9hxEd3DHNz03gyQj1oZJKWbs7qojTlt u3Ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ktDD3wFfhTgzuwFfQ6FhKlNOYR4srlqkDAegPxHtcbc=; b=AkBbpG+fxB8QX8YZ+oFf+ipVa8dAdWst9tIvFwsoaZHK0O5QvFIEuXcugNJ9XKl2Y8 G3HrvF5DVd9840hNoT6v23ScVh/87DhRfTREwV9i6TW9fIIalOvFZ7yLHluDnQXL0fE6 /mw7Fa50cftpoGqxrZTFdFb6SfoU0idc4AKMO52RDQP3jAWNAB5UHw3bgzZ+LlPbBPBn uVz5g2CPJPlCWFSddT8eWgj2/x51+Lbkter7/7hMHcK/43F2yDXywOZ+AgtnNquYfj7s ZxOgecyBFevZ9b9HoMaaJJU3O+7qtnI3/kEqU1yj+z7tX05WJWoPaCBDb1bdVQ6f5TYq GEGg== X-Gm-Message-State: AOAM5300LwJXgKKPCOIMAjw1QUlWDTE9W7L/tk9iP1B+s2nLmG4S9n+E xI6cIcAc3zjnaxi9zUI1xfSt/wRNNJ/pdw== X-Google-Smtp-Source: ABdhPJypLPrialcUuWHW+xUGQHLgEHWELEMnWf/6XETruC9NSAVQldLxArpwLX8zDUJBfFZGNZfIyQ== X-Received: by 2002:aca:4745:: with SMTP id u66mr5200520oia.37.1614909434330; Thu, 04 Mar 2021 17:57:14 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:95de:1d5:1b36:946a]) by smtp.gmail.com with ESMTPSA id r3sm224126oif.5.2021.03.04.17.57.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Mar 2021 17:57:13 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v3 7/9] udp: implement udp_bpf_recvmsg() for sockmap Date: Thu, 4 Mar 2021 17:56:53 -0800 Message-Id: <20210305015655.14249-8-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210305015655.14249-1-xiyou.wangcong@gmail.com> References: <20210305015655.14249-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang We have to implement udp_bpf_recvmsg() to replace the ->recvmsg() to retrieve skmsg from ingress_msg. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- net/ipv4/udp_bpf.c | 64 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 6001f93cd3a0..7d5c4ebf42fe 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -4,6 +4,68 @@ #include #include #include +#include + +#include "udp_impl.h" + +static struct proto *udpv6_prot_saved __read_mostly; + +static int sk_udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, + int noblock, int flags, int *addr_len) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (sk->sk_family == AF_INET6) + return udpv6_prot_saved->recvmsg(sk, msg, len, noblock, flags, + addr_len); +#endif + return udp_prot.recvmsg(sk, msg, len, noblock, flags, addr_len); +} + +static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, + int nonblock, int flags, int *addr_len) +{ + struct sk_psock *psock; + int copied, ret; + + if (unlikely(flags & MSG_ERRQUEUE)) + return inet_recv_error(sk, msg, len, addr_len); + + psock = sk_psock_get(sk); + if (unlikely(!psock)) + return sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + + lock_sock(sk); + if (sk_psock_queue_empty(psock)) { + ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + goto out; + } + +msg_bytes_ready: + copied = sk_msg_recvmsg(sk, psock, msg, len, flags); + if (!copied) { + int data, err = 0; + long timeo; + + timeo = sock_rcvtimeo(sk, nonblock); + data = sk_msg_wait_data(sk, psock, flags, timeo, &err); + if (data) { + if (!sk_psock_queue_empty(psock)) + goto msg_bytes_ready; + ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + goto out; + } + if (err) { + ret = err; + goto out; + } + copied = -EAGAIN; + } + ret = copied; +out: + release_sock(sk); + sk_psock_put(sk, psock); + return ret; +} enum { UDP_BPF_IPV4, @@ -11,7 +73,6 @@ enum { UDP_BPF_NUM_PROTS, }; -static struct proto *udpv6_prot_saved __read_mostly; static DEFINE_SPINLOCK(udpv6_prot_lock); static struct proto udp_bpf_prots[UDP_BPF_NUM_PROTS]; @@ -20,6 +81,7 @@ static void udp_bpf_rebuild_protos(struct proto *prot, const struct proto *base) *prot = *base; prot->unhash = sock_map_unhash; prot->close = sock_map_close; + prot->recvmsg = udp_bpf_recvmsg; } static void udp_bpf_check_v6_needs_rebuild(struct proto *ops) From patchwork Fri Mar 5 01:56:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12117385 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2957C433E9 for ; Fri, 5 Mar 2021 01:57:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 750AD65005 for ; Fri, 5 Mar 2021 01:57:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229957AbhCEB5T (ORCPT ); Thu, 4 Mar 2021 20:57:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229882AbhCEB5Q (ORCPT ); Thu, 4 Mar 2021 20:57:16 -0500 Received: from mail-oi1-x234.google.com (mail-oi1-x234.google.com [IPv6:2607:f8b0:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3BDE3C061574; Thu, 4 Mar 2021 17:57:16 -0800 (PST) Received: by mail-oi1-x234.google.com with SMTP id l64so759964oig.9; Thu, 04 Mar 2021 17:57:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=G2pc2Y0r7yxQXq7OMqDkdZt60cvbmBdKnnGFRzv2eB4=; b=qfNcZ1U9zgSH+2Pydqr7ywyWioNRYYuQP3mwwhbO7Y1onjcksPcTZdfr9dvQGd0j5L LzQNwYnvMW4tRSqjbMD9Mv/TLxcd8YW/GtES19Y/JeJ7ljwDfJb/37oqXRwiPcDa534J sqnGND29+e+MUoYABa9L1C8koggw0I/OkezVX9rJuWknCvHF5odl0R3f67L2GWt70W9S b9v4LCJ8p/c3Mlx6F5vIIAShWVS7ayKzUqcoi5PFzksZgk/aEpogTUaSVmO/OfKd/EvB dBSSsQprVVjPEFwseufqWVVEetPtEygePi1jCnvDLzRqi3C/YGeFjVC74TZoPPwg/mkX M3nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=G2pc2Y0r7yxQXq7OMqDkdZt60cvbmBdKnnGFRzv2eB4=; b=r3BOyvchLrs105J17R+q6pmfDX/9Nd5CmElrjpVhToncGq/dPFD1+nVll6KO71sEi6 yQF+uCY+UVI2RfQtNKsVHgZmWqAsLx3kH/Pj33IjwXYKpkdaqZ6A+qvuv+6oNbW6+B8a DFfKO9eyFN3YDLjvNODHHCJL9MasdMCY1paIOAyEGHkxASpZiF3PELZ/Z8ha/zPObsxu 7AeY48XLeOg3NK3JL9g7sPhGgDUDqmmDSWKkFr068v+7n2ca7Ep34Z9DwyNZoqfZY+Om sQWNbed6VnG5zrpqHDdw7SaeAlAS0I/Fs8fJIX22jcq2S9Nwo8OewT114m6KbjhhYEKq IoFQ== X-Gm-Message-State: AOAM532nxTHLZq4T8cuaOaOrzDl6n0/JCZhQjfmm4ZejKmRLU2iUCny/ kJyjUNDibP+5chEhdNTouKA9gEw06Iaz+g== X-Google-Smtp-Source: ABdhPJxc5LyDVd5Y5cQj9O1cn7CwxwxExCsUShOwiUB/rKQDqtO0xEkvniNgHRrt5WchSCE88523ew== X-Received: by 2002:aca:1e04:: with SMTP id m4mr3198104oic.124.1614909435518; Thu, 04 Mar 2021 17:57:15 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:95de:1d5:1b36:946a]) by smtp.gmail.com with ESMTPSA id r3sm224126oif.5.2021.03.04.17.57.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Mar 2021 17:57:15 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v3 8/9] sock_map: update sock type checks for UDP Date: Thu, 4 Mar 2021 17:56:54 -0800 Message-Id: <20210305015655.14249-9-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210305015655.14249-1-xiyou.wangcong@gmail.com> References: <20210305015655.14249-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Now UDP supports sockmap and redirection, we can safely update the sock type checks for it accordingly. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- net/core/sock_map.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 7346c93d0f71..64a5d5996669 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -549,7 +549,10 @@ static bool sk_is_udp(const struct sock *sk) static bool sock_map_redirect_allowed(const struct sock *sk) { - return sk_is_tcp(sk) && sk->sk_state != TCP_LISTEN; + if (sk_is_tcp(sk)) + return sk->sk_state != TCP_LISTEN; + else + return sk->sk_state == TCP_ESTABLISHED; } static bool sock_map_sk_is_suitable(const struct sock *sk) From patchwork Fri Mar 5 01:56:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12117387 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BECA8C433E6 for ; Fri, 5 Mar 2021 01:57:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 951E464E68 for ; Fri, 5 Mar 2021 01:57:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229797AbhCEB5T (ORCPT ); Thu, 4 Mar 2021 20:57:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229980AbhCEB5S (ORCPT ); Thu, 4 Mar 2021 20:57:18 -0500 Received: from mail-oi1-x22d.google.com (mail-oi1-x22d.google.com [IPv6:2607:f8b0:4864:20::22d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDB56C06175F; Thu, 4 Mar 2021 17:57:17 -0800 (PST) Received: by mail-oi1-x22d.google.com with SMTP id j1so796933oiw.3; Thu, 04 Mar 2021 17:57:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=g3nptvPrG/Rja0vDg1eqjKadpsNbfe6SjFJEHfl1sz0=; b=eJIaSsW/tkhp7KSvJHhCOLeaAJUs14VQFct8tQRCVp7/0qQrx0rj+ySt2niM1ieILn 6Tq/bRWgwmThmOPJA8CIL9zeIQCbIz2pS8dR8dRevV9T7elu+RE7uQWJLJLg8D8NQwAR CsiEDCLVLQB9UyZC1cnTEXwWQ5WJ0ESoJbbD1OppqUUQzaieW0OpQdzHYkRlirmf3nXX 2OAw0aRSNMaM8oQACm4MLEtG2VLirsug+7FwrvNaeoebEeBYkNv0WugG3IMSs79Sds4H 6T72jwslqRzpUZt62NU8uFaEdqZk+daKoTOneyZWU0U8y3pTFGBIK6uNAbdcSARflJJL Y/Hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=g3nptvPrG/Rja0vDg1eqjKadpsNbfe6SjFJEHfl1sz0=; b=qK5KM62QfcQGBE081cqQhDUcUriTLVcBCnGUOPVMYknZPOk+MsFFBBpwHLujUJB9QH sh6GpYLx+uVGU85Zbez4U7Co+fch2dMw4UZK7jnmz7sqpc5fcpye5HrDxFc7fHUQSdWF tluZYc0D7X+IxoPSvlqI8JizFSY01cGV/devjAG1yDERkoaRgeGaoKiA94SI2+Ku1GFX fy/lqfhnWH+mbpwhXWt84yrIrvOooiAPPLx978+UuIuQI0lT4nZlqhwERcgE1QhVCsMh rrDKPuVubyJrfuszk/cVFzYXoVIzOM4+0a9O5UrjpY+GbhYqC18H4p8VEp6TfZqzhscx dFUg== X-Gm-Message-State: AOAM533j/JrNfpbmu4pnlz5+yzrs/4SMFrQsud2HvTotV34eHY6/n3MO B0w7Ba4udVXUaGDxzMeqm4GffhiKHG1pEg== X-Google-Smtp-Source: ABdhPJzbn2wtO2FcZniaxoCr1zuqLvqrNV7/bwJuBqMVX/DB9WboTs1LNIIlFJjD8nK1bonaE9WB1A== X-Received: by 2002:aca:1e0f:: with SMTP id m15mr5324575oic.41.1614909437150; Thu, 04 Mar 2021 17:57:17 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:95de:1d5:1b36:946a]) by smtp.gmail.com with ESMTPSA id r3sm224126oif.5.2021.03.04.17.57.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Mar 2021 17:57:16 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v3 9/9] selftests/bpf: add a test case for udp sockmap Date: Thu, 4 Mar 2021 17:56:55 -0800 Message-Id: <20210305015655.14249-10-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210305015655.14249-1-xiyou.wangcong@gmail.com> References: <20210305015655.14249-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Add a test case to ensure redirection between two UDP sockets work. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- .../selftests/bpf/prog_tests/sockmap_listen.c | 140 ++++++++++++++++++ .../selftests/bpf/progs/test_sockmap_listen.c | 22 +++ 2 files changed, 162 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index c26e6bf05e49..a549ebd3b5a6 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -1563,6 +1563,142 @@ static void test_redir(struct test_sockmap_listen *skel, struct bpf_map *map, } } +static void udp_redir_to_connected(int family, int sotype, int sock_mapfd, + int verd_mapfd, enum redir_mode mode) +{ + const char *log_prefix = redir_mode_str(mode); + struct sockaddr_storage addr; + int c0, c1, p0, p1; + unsigned int pass; + socklen_t len; + int err, n; + u64 value; + u32 key; + char b; + + zero_verdict_count(verd_mapfd); + + p0 = socket_loopback(family, sotype | SOCK_NONBLOCK); + if (p0 < 0) + return; + len = sizeof(addr); + err = xgetsockname(p0, sockaddr(&addr), &len); + if (err) + goto close_peer0; + + c0 = xsocket(family, sotype | SOCK_NONBLOCK, 0); + if (c0 < 0) + goto close_peer0; + err = xconnect(c0, sockaddr(&addr), len); + if (err) + goto close_cli0; + err = xgetsockname(c0, sockaddr(&addr), &len); + if (err) + goto close_cli0; + err = xconnect(p0, sockaddr(&addr), len); + if (err) + goto close_cli0; + + p1 = socket_loopback(family, sotype | SOCK_NONBLOCK); + if (p1 < 0) + goto close_cli0; + err = xgetsockname(p1, sockaddr(&addr), &len); + if (err) + goto close_cli0; + + c1 = xsocket(family, sotype | SOCK_NONBLOCK, 0); + if (c1 < 0) + goto close_peer1; + err = xconnect(c1, sockaddr(&addr), len); + if (err) + goto close_cli1; + err = xgetsockname(c1, sockaddr(&addr), &len); + if (err) + goto close_cli1; + err = xconnect(p1, sockaddr(&addr), len); + if (err) + goto close_cli1; + + key = 0; + value = p0; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_cli1; + + key = 1; + value = p1; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_cli1; + + n = write(c1, "a", 1); + if (n < 0) + FAIL_ERRNO("%s: write", log_prefix); + if (n == 0) + FAIL("%s: incomplete write", log_prefix); + if (n < 1) + goto close_cli1; + + key = SK_PASS; + err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass); + if (err) + goto close_cli1; + if (pass != 1) + FAIL("%s: want pass count 1, have %d", log_prefix, pass); + + n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1); + if (n < 0) + FAIL_ERRNO("%s: read", log_prefix); + if (n == 0) + FAIL("%s: incomplete read", log_prefix); + +close_cli1: + xclose(c1); +close_peer1: + xclose(p1); +close_cli0: + xclose(c0); +close_peer0: + xclose(p0); +} + +static void udp_skb_redir_to_connected(struct test_sockmap_listen *skel, + struct bpf_map *inner_map, int family, + int sotype) +{ + int verdict = bpf_program__fd(skel->progs.prog_skb_verdict); + int verdict_map = bpf_map__fd(skel->maps.verdict_map); + int sock_map = bpf_map__fd(inner_map); + int err; + + err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0); + if (err) + return; + + skel->bss->test_ingress = false; + udp_redir_to_connected(family, sotype, sock_map, verdict_map, + REDIR_EGRESS); + skel->bss->test_ingress = true; + udp_redir_to_connected(family, sotype, sock_map, verdict_map, + REDIR_INGRESS); + + xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT); +} + +static void test_udp_redir(struct test_sockmap_listen *skel, struct bpf_map *map, + int family) +{ + const char *family_name, *map_name; + char s[MAX_TEST_NAME]; + + family_name = family_str(family); + map_name = map_type_str(map); + snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__); + if (!test__start_subtest(s)) + return; + udp_skb_redir_to_connected(skel, map, family, SOCK_DGRAM); +} + static void test_reuseport(struct test_sockmap_listen *skel, struct bpf_map *map, int family, int sotype) { @@ -1626,10 +1762,14 @@ void test_sockmap_listen(void) skel->bss->test_sockmap = true; run_tests(skel, skel->maps.sock_map, AF_INET); run_tests(skel, skel->maps.sock_map, AF_INET6); + test_udp_redir(skel, skel->maps.sock_map, AF_INET); + test_udp_redir(skel, skel->maps.sock_map, AF_INET6); skel->bss->test_sockmap = false; run_tests(skel, skel->maps.sock_hash, AF_INET); run_tests(skel, skel->maps.sock_hash, AF_INET6); + test_udp_redir(skel, skel->maps.sock_hash, AF_INET); + test_udp_redir(skel, skel->maps.sock_hash, AF_INET6); test_sockmap_listen__destroy(skel); } diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c index fa221141e9c1..a39eba9f5201 100644 --- a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c +++ b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c @@ -29,6 +29,7 @@ struct { } verdict_map SEC(".maps"); static volatile bool test_sockmap; /* toggled by user-space */ +static volatile bool test_ingress; /* toggled by user-space */ SEC("sk_skb/stream_parser") int prog_stream_parser(struct __sk_buff *skb) @@ -55,6 +56,27 @@ int prog_stream_verdict(struct __sk_buff *skb) return verdict; } +SEC("sk_skb/skb_verdict") +int prog_skb_verdict(struct __sk_buff *skb) +{ + unsigned int *count; + __u32 zero = 0; + int verdict; + + if (test_sockmap) + verdict = bpf_sk_redirect_map(skb, &sock_map, zero, + test_ingress ? BPF_F_INGRESS : 0); + else + verdict = bpf_sk_redirect_hash(skb, &sock_hash, &zero, + test_ingress ? BPF_F_INGRESS : 0); + + count = bpf_map_lookup_elem(&verdict_map, &verdict); + if (count) + (*count)++; + + return verdict; +} + SEC("sk_msg") int prog_msg_verdict(struct sk_msg_md *msg) {