From patchwork Wed Mar 10 05:32:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12127047 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E753DC4332E for ; Wed, 10 Mar 2021 05:33:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B0BE465005 for ; Wed, 10 Mar 2021 05:33:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231483AbhCJFdW (ORCPT ); Wed, 10 Mar 2021 00:33:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229667AbhCJFcr (ORCPT ); Wed, 10 Mar 2021 00:32:47 -0500 Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com [IPv6:2607:f8b0:4864:20::732]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DCD3AC06174A; Tue, 9 Mar 2021 21:32:31 -0800 (PST) Received: by mail-qk1-x732.google.com with SMTP id d20so15728634qkc.2; Tue, 09 Mar 2021 21:32:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jiBFQhb007udnKP3xSiYvQ4GEZgCLJgSEXMTlUQvTG8=; b=KzeNLpBGh8RO0L9Ad8dBQEg597xPaj0AEwvxXPP/Tb2+0lDeWRmn1SujNVLQHbz5+4 KaYgq3xfu/yilsqyqSOOdGW3LKth333i/U2AbGIRdJlgpYDvjjrDdeppifnsDqxSnwuC CQ8rx07WzlbpDz+D54FB0xVTIJbcToUNK4/Jh4Ni3NmQPS9oL6jYDdob2pb5WKmqokQc uF02PuBSwAWRVv290nbPIv1a/YyGjoIRA4JY41dkHUvYfvCxV+5/BxFuqxWiZbG75T6K Inpaek1a21crzHHTcGTVzhVUnUwtm0/dbfc9ES0D9gclaqBh0ZSXq1oXV4dqpMYv2Jm3 HNEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jiBFQhb007udnKP3xSiYvQ4GEZgCLJgSEXMTlUQvTG8=; b=eKzd+LAm2UI3KP5bXZw4W5Xn5cgmLoG0q0twP7aFg34oU9Crq82JWiuACWrBxbyPun ymYMl+49lv5OCOcOrp/sp3aw33mw2EsjbSEFVeie8uyUnblfYEyzuTREEJU1oN92lmSM Hex2uHjVn+wSkhLeCYB/2TI0sqIq+an6oWb+voxM1D5ExN4zypamQ2iAkL3OPz2cytI0 UOYUr7hPUmrgts3WWZv/yFz6zhD+OIAs/P6Qm6cXoJrYY+d2t4lm4G79WOnzkFNab4oZ jSyy1qZuiTKBxTqK5WvsDFnLkqCnUtNCEa609KZXicPD89+Th6oI95Lcg/2PP6NEQyia c3MA== X-Gm-Message-State: AOAM5336sZz/huXD5j/HZdQwXaKF7GVqolJeD2FBxWiC3YBY9MwXDBf5 J4HmyVD0g09zbsDD++i8ZoacxFhr0p9wxg== X-Google-Smtp-Source: ABdhPJxANgbY8NFYDuk1Sy3Lrc1UssiQaYXJ4thtbJcoBQp2njNeUbqBVMpcwuuA68Dw+l/m4/4Qig== X-Received: by 2002:a37:9bd1:: with SMTP id d200mr1153637qke.328.1615354350734; Tue, 09 Mar 2021 21:32:30 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:91f3:e7ef:7f61:a131]) by smtp.gmail.com with ESMTPSA id g21sm12118739qkk.72.2021.03.09.21.32.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Mar 2021 21:32:30 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v4 01/11] skmsg: lock ingress_skb when purging Date: Tue, 9 Mar 2021 21:32:12 -0800 Message-Id: <20210310053222.41371-2-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210310053222.41371-1-xiyou.wangcong@gmail.com> References: <20210310053222.41371-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Currently we purge the ingress_skb queue only when psock refcnt goes down to 0, so locking the queue is not necessary, but in order to be called during ->close, we have to lock it here. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang Acked-by: Jakub Sitnicki --- net/core/skmsg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 07f54015238a..bebf84ed4e30 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -634,7 +634,7 @@ static void sk_psock_zap_ingress(struct sk_psock *psock) { struct sk_buff *skb; - while ((skb = __skb_dequeue(&psock->ingress_skb)) != NULL) { + while ((skb = skb_dequeue(&psock->ingress_skb)) != NULL) { skb_bpf_redirect_clear(skb); kfree_skb(skb); } From patchwork Wed Mar 10 05:32:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12127055 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 069CDC4332D for ; Wed, 10 Mar 2021 05:33:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E473B64F51 for ; Wed, 10 Mar 2021 05:33:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229803AbhCJFdV (ORCPT ); Wed, 10 Mar 2021 00:33:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229732AbhCJFcr (ORCPT ); Wed, 10 Mar 2021 00:32:47 -0500 Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com [IPv6:2607:f8b0:4864:20::732]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 881F0C061760; Tue, 9 Mar 2021 21:32:33 -0800 (PST) Received: by mail-qk1-x732.google.com with SMTP id b130so15663939qkc.10; Tue, 09 Mar 2021 21:32:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=oR8vNPuyYW5ybMkD1YXMVIl84QjFioYi6NOhEadLhCo=; b=lyABdKQu1LDAvDr3UpWklQRKFQI6t5Ee5nELXCrlMitgKLT5zlcM8qD5k1in7ZF6gq 63wB1uAs9BPDl6gASoPnCaFeKbf82pvDHrgvMgDpAsztJZdo6MkqNITsvmwYKPenlRey g/ylJ0i7uTCqnMvIHE006sQixrddHb3U2SmYx+VsTp3K0/AjSjSKbA613SW/3EGn3w7k GagwAUGXayybeWHD7UKprZypeLu5nCV+BoMNGZ65nRvCjN+ynn/ox7IkEkDP/YJPsFKl Ji34SFFeLkJB9vf2iOvZz1MpiuOvIPYr32c4LUV/WXMl/IbC0M3uZ4R+xJxmfzyM5rRk yEYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oR8vNPuyYW5ybMkD1YXMVIl84QjFioYi6NOhEadLhCo=; b=OKXlI4o1/G3YjS27JQnHYZkl3FZucmMikJ01iJZYPYVYuw3GS/fACAQiaGqb2EkaWQ obOOIKK6X2rntDGzRiq2v4dt4Dkdsw/H7B9TKMZj8UvASqlmvXvmP6MdZPKhnCGTjNcE ucC5NLcG2oJ47GvtaCtHQkTFnmoVD7E+VQSZu/FbnIwMB3i2XnMV0njtoDZXOPcErhFM napc4PsOBjf2msomT7rFWR/ZsyEcAyYt6J0gEcrKv5RMUNQUvPckXt4DUOMR+01gGTEP /ZphAyjNWTez4UAOTOpEQx4lljM1nKINHRNJmZsUOH3L6vweoVss6EOOAigdYQOSif75 F9fg== X-Gm-Message-State: AOAM5338WX7rH6wOII7PSmXkYQp+VtU8UdQsaDBxi3ZhtsIktaqaY5B+ j7wA9yxjRdfP2J9TkqPoWv6a3DbocPQ6nw== X-Google-Smtp-Source: ABdhPJwvkQI32wjPgiHEftUwy7GlQS30bzD9r1cx2aHbkJy+8tfDgb/r1CGpuMHBK1yROTosD6kSFg== X-Received: by 2002:a37:44cc:: with SMTP id r195mr1151267qka.224.1615354352451; Tue, 09 Mar 2021 21:32:32 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:91f3:e7ef:7f61:a131]) by smtp.gmail.com with ESMTPSA id g21sm12118739qkk.72.2021.03.09.21.32.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Mar 2021 21:32:31 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v4 02/11] skmsg: introduce a spinlock to protect ingress_msg Date: Tue, 9 Mar 2021 21:32:13 -0800 Message-Id: <20210310053222.41371-3-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210310053222.41371-1-xiyou.wangcong@gmail.com> References: <20210310053222.41371-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Currently we rely on lock_sock to protect ingress_msg, it is too big for this, we can actually just use a spinlock to protect this list like protecting other skb queues. __tcp_bpf_recvmsg() is still special because of peeking, it still has to use lock_sock. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang Acked-by: Jakub Sitnicki --- include/linux/skmsg.h | 46 +++++++++++++++++++++++++++++++++++++++++++ net/core/skmsg.c | 3 +++ net/ipv4/tcp_bpf.c | 18 ++++++----------- 3 files changed, 55 insertions(+), 12 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 6c09d94be2e9..7333bf881b81 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -89,6 +89,7 @@ struct sk_psock { #endif struct sk_buff_head ingress_skb; struct list_head ingress_msg; + spinlock_t ingress_lock; unsigned long state; struct list_head link; spinlock_t link_lock; @@ -284,7 +285,45 @@ static inline struct sk_psock *sk_psock(const struct sock *sk) static inline void sk_psock_queue_msg(struct sk_psock *psock, struct sk_msg *msg) { + spin_lock_bh(&psock->ingress_lock); list_add_tail(&msg->list, &psock->ingress_msg); + spin_unlock_bh(&psock->ingress_lock); +} + +static inline struct sk_msg *sk_psock_deque_msg(struct sk_psock *psock) +{ + struct sk_msg *msg; + + spin_lock_bh(&psock->ingress_lock); + msg = list_first_entry_or_null(&psock->ingress_msg, struct sk_msg, list); + if (msg) + list_del(&msg->list); + spin_unlock_bh(&psock->ingress_lock); + return msg; +} + +static inline struct sk_msg *sk_psock_peek_msg(struct sk_psock *psock) +{ + struct sk_msg *msg; + + spin_lock_bh(&psock->ingress_lock); + msg = list_first_entry_or_null(&psock->ingress_msg, struct sk_msg, list); + spin_unlock_bh(&psock->ingress_lock); + return msg; +} + +static inline struct sk_msg *sk_psock_next_msg(struct sk_psock *psock, + struct sk_msg *msg) +{ + struct sk_msg *ret; + + spin_lock_bh(&psock->ingress_lock); + if (list_is_last(&msg->list, &psock->ingress_msg)) + ret = NULL; + else + ret = list_next_entry(msg, list); + spin_unlock_bh(&psock->ingress_lock); + return ret; } static inline bool sk_psock_queue_empty(const struct sk_psock *psock) @@ -292,6 +331,13 @@ static inline bool sk_psock_queue_empty(const struct sk_psock *psock) return psock ? list_empty(&psock->ingress_msg) : true; } +static inline void kfree_sk_msg(struct sk_msg *msg) +{ + if (msg->skb) + consume_skb(msg->skb); + kfree(msg); +} + static inline void sk_psock_report_error(struct sk_psock *psock, int err) { struct sock *sk = psock->sk; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index bebf84ed4e30..41a5f82c53e6 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -592,6 +592,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) INIT_WORK(&psock->work, sk_psock_backlog); INIT_LIST_HEAD(&psock->ingress_msg); + spin_lock_init(&psock->ingress_lock); skb_queue_head_init(&psock->ingress_skb); sk_psock_set_state(psock, SK_PSOCK_TX_ENABLED); @@ -623,11 +624,13 @@ static void __sk_psock_purge_ingress_msg(struct sk_psock *psock) { struct sk_msg *msg, *tmp; + spin_lock_bh(&psock->ingress_lock); list_for_each_entry_safe(msg, tmp, &psock->ingress_msg, list) { list_del(&msg->list); sk_msg_free(psock->sk, msg); kfree(msg); } + spin_unlock_bh(&psock->ingress_lock); } static void sk_psock_zap_ingress(struct sk_psock *psock) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 17c322b875fd..ad1261cdcdde 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -18,9 +18,7 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, struct sk_msg *msg_rx; int i, copied = 0; - msg_rx = list_first_entry_or_null(&psock->ingress_msg, - struct sk_msg, list); - + msg_rx = sk_psock_peek_msg(psock); while (copied != len) { struct scatterlist *sge; @@ -68,22 +66,18 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, } while (i != msg_rx->sg.end); if (unlikely(peek)) { - if (msg_rx == list_last_entry(&psock->ingress_msg, - struct sk_msg, list)) + msg_rx = sk_psock_next_msg(psock, msg_rx); + if (!msg_rx) break; - msg_rx = list_next_entry(msg_rx, list); continue; } msg_rx->sg.start = i; if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { - list_del(&msg_rx->list); - if (msg_rx->skb) - consume_skb(msg_rx->skb); - kfree(msg_rx); + msg_rx = sk_psock_deque_msg(psock); + kfree_sk_msg(msg_rx); } - msg_rx = list_first_entry_or_null(&psock->ingress_msg, - struct sk_msg, list); + msg_rx = sk_psock_peek_msg(psock); } return copied; From patchwork Wed Mar 10 05:32:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12127049 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9791C43331 for ; Wed, 10 Mar 2021 05:33:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C62AC64F59 for ; Wed, 10 Mar 2021 05:33:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231431AbhCJFdV (ORCPT ); Wed, 10 Mar 2021 00:33:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229691AbhCJFcr (ORCPT ); Wed, 10 Mar 2021 00:32:47 -0500 Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33D8CC061761; Tue, 9 Mar 2021 21:32:35 -0800 (PST) Received: by mail-qk1-x72d.google.com with SMTP id 130so15672061qkh.11; Tue, 09 Mar 2021 21:32:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=CPLlyzobmDibvol8ZTJ80tV8ELbFVTyX9eFQXacFyC0=; b=CRRQgm54vzgKwUeWrPsWyMyxMmrKoaCtwV+H7VIafn7HsryMvxq5t1WNfuUSNC96ch wyO8nZYQWai+SFoE/NppSbkHZaLKdF2X+QcSZQ94kA3lh7nzhRtbBicFZQpzqpczlW8g CQ4fCjuhaDz23XfKFnu5FJ+dQvfxTBYwcC6CjVpC0d5Z9Bu+lDsGSCnlTXon0nqlw+mY lrsoAsJIl4dlucMXfrYosbUvQfULd3Oh/sqipLVr5MAj2bfssGLlsQuX5sfWoz1dl7wb G8lpJa3H4MjAEbSFh9FVNUsbqy0G2VqRyQi0SKKO92nmgLQ6+44qP/kBqEJprpVy5erQ RDjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CPLlyzobmDibvol8ZTJ80tV8ELbFVTyX9eFQXacFyC0=; b=WK4Dhin/+9KOPP5rxO/Zd3EPF0PvW6tbv+DF1ZK/wQZSYqRWXD9ZebbZ35ukLp7qIH tVCX1Ia2tDz39Tm13zaC0Owe7NZxIG7zOMzWsFe7QTG3PASdDWrtNNoCHWrQuWQjWNJV cvs1JhaZAO77nUAEj59JUriuhRLtQj0bsiPjuVsmXT42yE21es9CFRjtei69xLcfKXmP VpZXPxw08lx2ZLTGm3mZgUcjjo98gTTpA1Guoltq4hoDYlCTgOltBWT5FFvqbdRSqLV+ Qkzx1kFORZtE/1Z7tc5MPAoiB1wMFtM1XoxZZefSNfHBqXkBMdri7hhmaafj+1hZg8Fp 93wA== X-Gm-Message-State: AOAM532uuDZ2S9zRb7qj3wW7mWIcg6B75AgU1ISJhZmroYnLcZHyVTNl e2DOpimtul8+QR2gY9p810oG/y0qDuWq5g== X-Google-Smtp-Source: ABdhPJwwcWj+WT6fz349sZQg49UJ3VoyyrI7WF0nwSOXYej+BY6csCohT4UJ2P4QDZSc2/7bOmaiVg== X-Received: by 2002:a37:5b43:: with SMTP id p64mr1130955qkb.131.1615354354078; Tue, 09 Mar 2021 21:32:34 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:91f3:e7ef:7f61:a131]) by smtp.gmail.com with ESMTPSA id g21sm12118739qkk.72.2021.03.09.21.32.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Mar 2021 21:32:33 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v4 03/11] skmsg: introduce skb_send_sock() for sock_map Date: Tue, 9 Mar 2021 21:32:14 -0800 Message-Id: <20210310053222.41371-4-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210310053222.41371-1-xiyou.wangcong@gmail.com> References: <20210310053222.41371-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang We only have skb_send_sock_locked() which requires callers to use lock_sock(). Introduce a variant skb_send_sock() which locks on its own, callers do not need to lock it any more. This will save us from adding a ->sendmsg_locked for each protocol. To reuse the code, pass function pointers to __skb_send_sock() and build skb_send_sock() and skb_send_sock_locked() on top. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skbuff.h | 1 + net/core/skbuff.c | 52 ++++++++++++++++++++++++++++++++++++------ 2 files changed, 46 insertions(+), 7 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0503c917d773..2fc8c3657c53 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3626,6 +3626,7 @@ int skb_splice_bits(struct sk_buff *skb, struct sock *sk, unsigned int offset, unsigned int flags); int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, int len); +int skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int len); void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to); unsigned int skb_zerocopy_headlen(const struct sk_buff *from); int skb_zerocopy(struct sk_buff *to, struct sk_buff *from, diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 545a472273a5..396586bd6ae3 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2500,9 +2500,12 @@ int skb_splice_bits(struct sk_buff *skb, struct sock *sk, unsigned int offset, } EXPORT_SYMBOL_GPL(skb_splice_bits); -/* Send skb data on a socket. Socket must be locked. */ -int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, - int len) +typedef int (*sendmsg_func)(struct sock *sk, struct msghdr *msg, + struct kvec *vec, size_t num, size_t size); +typedef int (*sendpage_func)(struct sock *sk, struct page *page, int offset, + size_t size, int flags); +static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, + int len, sendmsg_func sendmsg, sendpage_func sendpage) { unsigned int orig_len = len; struct sk_buff *head = skb; @@ -2522,7 +2525,7 @@ int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, memset(&msg, 0, sizeof(msg)); msg.msg_flags = MSG_DONTWAIT; - ret = kernel_sendmsg_locked(sk, &msg, &kv, 1, slen); + ret = sendmsg(sk, &msg, &kv, 1, slen); if (ret <= 0) goto error; @@ -2553,9 +2556,9 @@ int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, slen = min_t(size_t, len, skb_frag_size(frag) - offset); while (slen) { - ret = kernel_sendpage_locked(sk, skb_frag_page(frag), - skb_frag_off(frag) + offset, - slen, MSG_DONTWAIT); + ret = sendpage(sk, skb_frag_page(frag), + skb_frag_off(frag) + offset, + slen, MSG_DONTWAIT); if (ret <= 0) goto error; @@ -2587,8 +2590,43 @@ int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, error: return orig_len == len ? ret : orig_len - len; } + +/* Send skb data on a socket. Socket must be locked. */ +int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, + int len) +{ + return __skb_send_sock(sk, skb, offset, len, kernel_sendmsg_locked, + kernel_sendpage_locked); +} EXPORT_SYMBOL_GPL(skb_send_sock_locked); +static int sendmsg_unlocked(struct sock *sk, struct msghdr *msg, struct kvec *vec, + size_t num, size_t size) +{ + struct socket *sock = sk->sk_socket; + + if (!sock) + return -EINVAL; + return kernel_sendmsg(sock, msg, vec, num, size); +} + +static int sendpage_unlocked(struct sock *sk, struct page *page, int offset, + size_t size, int flags) +{ + struct socket *sock = sk->sk_socket; + + if (!sock) + return -EINVAL; + return kernel_sendpage(sock, page, offset, size, flags); +} + +/* Send skb data on a socket. Socket must be unlocked. */ +int skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int len) +{ + return __skb_send_sock(sk, skb, offset, len, sendmsg_unlocked, + sendpage_unlocked); +} + /** * skb_store_bits - store bits from kernel buffer to skb * @skb: destination buffer From patchwork Wed Mar 10 05:32:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12127043 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 966D7C433DB for ; Wed, 10 Mar 2021 05:33:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6A66964FF6 for ; Wed, 10 Mar 2021 05:33:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231327AbhCJFdU (ORCPT ); Wed, 10 Mar 2021 00:33:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229784AbhCJFdD (ORCPT ); Wed, 10 Mar 2021 00:33:03 -0500 Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A340C061762; Tue, 9 Mar 2021 21:32:36 -0800 (PST) Received: by mail-qt1-x835.google.com with SMTP id l7so3704268qtq.7; Tue, 09 Mar 2021 21:32:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KytboVyhhl6Gmf2upv0pcKByMVhcWe4p37EfgN7ElTc=; b=kYO0bbVyP+HTx0BdMQV40Atr4R/ERxNROHn0owu8dCqIGX6nSEj0kDC7xy1rl8KMnD zbecGtGjTWEgDVpP5gfr5DXn8ceXoPgijZuNhuw9r2jsuWvoEzeRNx3o4QjTEP1kq5VB lPWzL6E23MKFrED76pUbVHr91aXCtNV8sDxHi99q2ETxZdho0DYQwxSJH9ZEZoxCdjlr IHAkzXZ+XOfjqjhYKDniUcg7zeJnVmDNpWU+8pZ5h7HAG1zXIyxkcEpUQPKBPWkewdt5 gRD2VuwgTXFM/LssuFh0XcmVAIvrUKbH4k1MZgx8bZDvpFTIQ0Gu2Ao0Ke6lRInp6Jpv 86Pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KytboVyhhl6Gmf2upv0pcKByMVhcWe4p37EfgN7ElTc=; b=UEigbJPPyvTySJCSENY2br1bPzWIVPh7S4ATOAFdlvJHhGODRb7VgatXLS6jz3yxrS BiZXMTKc14sbRVit9yAyQZD6XxAh3WUZi6YmJO/u2ZmJfXnZ+efo1XlqzlldAAfW06rd 1uUsZPzWHoe0iFnnizWpmaYu95Tehg6t7OZL4YRIbTnixgGAdad14ZfH/Xfx41mJUGYZ f15j0OQmUK2IHagtkdzQd05WHMYLrRgwwPbkIl6U73Z1eE/VqOMk8/Lw4RoI/gqxi5rY 6M+I5/AhN9FDphZuik4l3MwHhKTZmgAycW4fKUfk6JeGr+MRWsJ+x6/SenieuwCvO7Ub 1xCg== X-Gm-Message-State: AOAM531onRGRM9IjDOkRiMIs2AC8nZwC5DX/p6gIGPrxcP2tbpquZ8Mq oS7CAP1lXMZ9F3+meoSES47YirxIzYzRYw== X-Google-Smtp-Source: ABdhPJyXJK9aH5XYWxySyqp66IHsfuqvwzVUqSLih96LOQKIaVsd5eNNP5mTZvJRhpPLhWOQD8h2mw== X-Received: by 2002:aed:2fa3:: with SMTP id m32mr1335835qtd.359.1615354355768; Tue, 09 Mar 2021 21:32:35 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:91f3:e7ef:7f61:a131]) by smtp.gmail.com with ESMTPSA id g21sm12118739qkk.72.2021.03.09.21.32.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Mar 2021 21:32:35 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v4 04/11] skmsg: avoid lock_sock() in sk_psock_backlog() Date: Tue, 9 Mar 2021 21:32:15 -0800 Message-Id: <20210310053222.41371-5-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210310053222.41371-1-xiyou.wangcong@gmail.com> References: <20210310053222.41371-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang We do not have to lock the sock to avoid losing sk_socket, instead we can purge all the ingress queues when we close the socket. Sending or receiving packets after orphaning socket makes no sense. We do purge these queues when psock refcnt reaches 0 but here we want to purge them explicitly in sock_map_close(). Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 1 + net/core/skmsg.c | 22 ++++++++++++++-------- net/core/sock_map.c | 1 + 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 7333bf881b81..91b357817bb8 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -347,6 +347,7 @@ static inline void sk_psock_report_error(struct sk_psock *psock, int err) } struct sk_psock *sk_psock_init(struct sock *sk, int node); +void sk_psock_purge(struct sk_psock *psock); #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock); diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 41a5f82c53e6..bf0f874780c1 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -497,7 +497,7 @@ static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, if (!ingress) { if (!sock_writeable(psock->sk)) return -EAGAIN; - return skb_send_sock_locked(psock->sk, skb, off, len); + return skb_send_sock(psock->sk, skb, off, len); } return sk_psock_skb_ingress(psock, skb); } @@ -511,8 +511,6 @@ static void sk_psock_backlog(struct work_struct *work) u32 len, off; int ret; - /* Lock sock to avoid losing sk_socket during loop. */ - lock_sock(psock->sk); if (state->skb) { skb = state->skb; len = state->len; @@ -529,7 +527,7 @@ static void sk_psock_backlog(struct work_struct *work) skb_bpf_redirect_clear(skb); do { ret = -EIO; - if (likely(psock->sk->sk_socket)) + if (!sock_flag(psock->sk, SOCK_DEAD)) ret = sk_psock_handle_skb(psock, skb, off, len, ingress); if (ret <= 0) { @@ -537,13 +535,13 @@ static void sk_psock_backlog(struct work_struct *work) state->skb = skb; state->len = len; state->off = off; - goto end; + return; } /* Hard errors break pipe and stop xmit. */ sk_psock_report_error(psock, ret ? -ret : EPIPE); sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); kfree_skb(skb); - goto end; + return; } off += ret; len -= ret; @@ -552,8 +550,6 @@ static void sk_psock_backlog(struct work_struct *work) if (!ingress) kfree_skb(skb); } -end: - release_sock(psock->sk); } struct sk_psock *sk_psock_init(struct sock *sk, int node) @@ -654,6 +650,16 @@ static void sk_psock_link_destroy(struct sk_psock *psock) } } +void sk_psock_purge(struct sk_psock *psock) +{ + sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); + + cancel_work_sync(&psock->work); + + sk_psock_cork_free(psock); + sk_psock_zap_ingress(psock); +} + static void sk_psock_done_strp(struct sk_psock *psock); static void sk_psock_destroy_deferred(struct work_struct *gc) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index dd53a7771d7e..26ba47b099f1 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1540,6 +1540,7 @@ void sock_map_close(struct sock *sk, long timeout) saved_close = psock->saved_close; sock_map_remove_links(sk, psock); rcu_read_unlock(); + sk_psock_purge(psock); release_sock(sk); saved_close(sk, timeout); } From patchwork Wed Mar 10 05:32:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12127045 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9A07C4332B for ; Wed, 10 Mar 2021 05:33:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 90BD964FFF for ; Wed, 10 Mar 2021 05:33:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231135AbhCJFdU (ORCPT ); Wed, 10 Mar 2021 00:33:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229803AbhCJFdD (ORCPT ); Wed, 10 Mar 2021 00:33:03 -0500 Received: from mail-qt1-x834.google.com (mail-qt1-x834.google.com [IPv6:2607:f8b0:4864:20::834]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A3DDC061763; Tue, 9 Mar 2021 21:32:38 -0800 (PST) Received: by mail-qt1-x834.google.com with SMTP id h26so7825692qtm.5; Tue, 09 Mar 2021 21:32:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=a/jdC5+7Ae88aU0g5PiZMCr6qnGekeQ7zuaeZZyy+r8=; b=atI0IBKADkYIPzcGTrA1HzSTUqnL9vzLOoWmcLO0TFsastYzCsf2YmZ1ecDLqS+9Kq tHe3LyDHa0G8cmDalnPoEW95vjyL0pzdKfkmSWNq7As5C6VbDw1YcTBR4UlC1oGhQafP sh3KcDsCZK8xdTcjeRC1LuI6K37GEeP55QjVmWMlZc4jpEPHvTUOgk+bdytQS8/gf1dx 5k/Qcc/D8oYI3H77P5ZMm5wSdNgVz7UfzkUQV6cawtLqTMR+gY8IU3CQew2PMoziXN6M RwnZV/TbqB2QqidhzqZipqGbLBZ3TCoQmGt7oNz39qKAImXv1kpUwKQWA0DOehUrTb/y n7Yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=a/jdC5+7Ae88aU0g5PiZMCr6qnGekeQ7zuaeZZyy+r8=; b=MggnP57gQN79gUJrfYSz8RHZD2yrFwQDobTLwn3z89E+nNUNDn5BPb+vQSXArZku5S QMSPKU8dPBV1ZNzjtkshdDOFTgzE8b/rwTtPZLTM5v8fXitAjpPwm0bOJ0HVZ3UumrNr myQH9dizOofNIpX/JWoYt4Pen7n7LTbHQTsn0di+vg8QvSxPIheyO7v+5onIrc+BzJH/ Mf1SZyJvu2zIBeiK9l30ww6fgDoPZ/iH1NMHmS0qOHWyag0E5MWiX/9ot3oEBKGq/rxj B1jR6fgJ16S6nH41yXl31lIw38jILMrFI6ek+d8HFcKlalsufcK1UzWEwPM/ToPxnzd3 9kRA== X-Gm-Message-State: AOAM533Qut6JccS88IeZ4zhGUOcCB2Uaw7Ow018KLRd57SoZ7S5KVrrG Z1GzSD10OiI3y9D0VNtzWkVaVhnzx2uqug== X-Google-Smtp-Source: ABdhPJx+9AqwYkYvZUndVpyFeWg3K1NCBLWux6sa9Y0J3gjLkV10pMRBvp8+n8xuSpQF7shfaEnoaw== X-Received: by 2002:ac8:5a0d:: with SMTP id n13mr1281632qta.345.1615354357689; Tue, 09 Mar 2021 21:32:37 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:91f3:e7ef:7f61:a131]) by smtp.gmail.com with ESMTPSA id g21sm12118739qkk.72.2021.03.09.21.32.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Mar 2021 21:32:36 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v4 05/11] sock_map: introduce BPF_SK_SKB_VERDICT Date: Tue, 9 Mar 2021 21:32:16 -0800 Message-Id: <20210310053222.41371-6-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210310053222.41371-1-xiyou.wangcong@gmail.com> References: <20210310053222.41371-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Reusing BPF_SK_SKB_STREAM_VERDICT is possible but its name is confusing and more importantly we still want to distinguish them from user-space. So we can just reuse the stream verdict code but introduce a new type of eBPF program, skb_verdict. Users are not allowed to set stream_verdict and skb_verdict at the same time. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 2 ++ include/uapi/linux/bpf.h | 1 + kernel/bpf/syscall.c | 1 + net/core/skmsg.c | 4 +++- net/core/sock_map.c | 23 ++++++++++++++++++++++- tools/bpf/bpftool/common.c | 1 + tools/bpf/bpftool/prog.c | 1 + tools/include/uapi/linux/bpf.h | 1 + 8 files changed, 32 insertions(+), 2 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 91b357817bb8..668f24993583 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -58,6 +58,7 @@ struct sk_psock_progs { struct bpf_prog *msg_parser; struct bpf_prog *stream_parser; struct bpf_prog *stream_verdict; + struct bpf_prog *skb_verdict; }; enum sk_psock_state_bits { @@ -489,6 +490,7 @@ static inline void psock_progs_drop(struct sk_psock_progs *progs) psock_set_prog(&progs->msg_parser, NULL); psock_set_prog(&progs->stream_parser, NULL); psock_set_prog(&progs->stream_verdict, NULL); + psock_set_prog(&progs->skb_verdict, NULL); } int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 2d3036e292a9..309a24582b18 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -957,6 +957,7 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_SKB_VERDICT, __MAX_BPF_ATTACH_TYPE }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c859bc46d06c..afa803a1553e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2941,6 +2941,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type) return BPF_PROG_TYPE_SK_MSG; case BPF_SK_SKB_STREAM_PARSER: case BPF_SK_SKB_STREAM_VERDICT: + case BPF_SK_SKB_VERDICT: return BPF_PROG_TYPE_SK_SKB; case BPF_LIRC_MODE2: return BPF_PROG_TYPE_LIRC_MODE2; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index bf0f874780c1..0f4a3bad51cd 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -702,7 +702,7 @@ void sk_psock_drop(struct sock *sk, struct sk_psock *psock) rcu_assign_sk_user_data(sk, NULL); if (psock->progs.stream_parser) sk_psock_stop_strp(sk, psock); - else if (psock->progs.stream_verdict) + else if (psock->progs.stream_verdict || psock->progs.skb_verdict) sk_psock_stop_verdict(sk, psock); write_unlock_bh(&sk->sk_callback_lock); sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); @@ -1019,6 +1019,8 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb, } skb_set_owner_r(skb, sk); prog = READ_ONCE(psock->progs.stream_verdict); + if (!prog) + prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { skb_dst_drop(skb); skb_bpf_redirect_clear(skb); diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 26ba47b099f1..7cf5481d9367 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -155,6 +155,8 @@ static void sock_map_del_link(struct sock *sk, strp_stop = true; if (psock->saved_data_ready && stab->progs.stream_verdict) verdict_stop = true; + if (psock->saved_data_ready && stab->progs.skb_verdict) + verdict_stop = true; list_del(&link->list); sk_psock_free_link(link); } @@ -227,7 +229,7 @@ static struct sk_psock *sock_map_psock_get_checked(struct sock *sk) static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, struct sock *sk) { - struct bpf_prog *msg_parser, *stream_parser, *stream_verdict; + struct bpf_prog *msg_parser, *stream_parser, *stream_verdict, *skb_verdict; struct sk_psock *psock; int ret; @@ -256,6 +258,15 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, } } + skb_verdict = READ_ONCE(progs->skb_verdict); + if (skb_verdict) { + skb_verdict = bpf_prog_inc_not_zero(skb_verdict); + if (IS_ERR(skb_verdict)) { + ret = PTR_ERR(skb_verdict); + goto out_put_msg_parser; + } + } + psock = sock_map_psock_get_checked(sk); if (IS_ERR(psock)) { ret = PTR_ERR(psock); @@ -265,6 +276,7 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, if (psock) { if ((msg_parser && READ_ONCE(psock->progs.msg_parser)) || (stream_parser && READ_ONCE(psock->progs.stream_parser)) || + (skb_verdict && READ_ONCE(psock->progs.skb_verdict)) || (stream_verdict && READ_ONCE(psock->progs.stream_verdict))) { sk_psock_put(sk, psock); ret = -EBUSY; @@ -296,6 +308,9 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, } else if (!stream_parser && stream_verdict && !psock->saved_data_ready) { psock_set_prog(&psock->progs.stream_verdict, stream_verdict); sk_psock_start_verdict(sk,psock); + } else if (!stream_verdict && skb_verdict && !psock->saved_data_ready) { + psock_set_prog(&psock->progs.skb_verdict, skb_verdict); + sk_psock_start_verdict(sk, psock); } write_unlock_bh(&sk->sk_callback_lock); return 0; @@ -304,6 +319,9 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, out_drop: sk_psock_put(sk, psock); out_progs: + if (skb_verdict) + bpf_prog_put(skb_verdict); +out_put_msg_parser: if (msg_parser) bpf_prog_put(msg_parser); out_put_stream_parser: @@ -1468,6 +1486,9 @@ static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog, case BPF_SK_SKB_STREAM_VERDICT: pprog = &progs->stream_verdict; break; + case BPF_SK_SKB_VERDICT: + pprog = &progs->skb_verdict; + break; default: return -EOPNOTSUPP; } diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index 65303664417e..1828bba19020 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -57,6 +57,7 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = { [BPF_SK_SKB_STREAM_PARSER] = "sk_skb_stream_parser", [BPF_SK_SKB_STREAM_VERDICT] = "sk_skb_stream_verdict", + [BPF_SK_SKB_VERDICT] = "sk_skb_verdict", [BPF_SK_MSG_VERDICT] = "sk_msg_verdict", [BPF_LIRC_MODE2] = "lirc_mode2", [BPF_FLOW_DISSECTOR] = "flow_dissector", diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index f2b915b20546..3f067d2d7584 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -76,6 +76,7 @@ enum dump_mode { static const char * const attach_type_strings[] = { [BPF_SK_SKB_STREAM_PARSER] = "stream_parser", [BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict", + [BPF_SK_SKB_VERDICT] = "skb_verdict", [BPF_SK_MSG_VERDICT] = "msg_verdict", [BPF_FLOW_DISSECTOR] = "flow_dissector", [__MAX_BPF_ATTACH_TYPE] = NULL, diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 2d3036e292a9..309a24582b18 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -957,6 +957,7 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_SKB_VERDICT, __MAX_BPF_ATTACH_TYPE }; From patchwork Wed Mar 10 05:32:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12127061 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86D63C43603 for ; Wed, 10 Mar 2021 05:33:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5F86D64FEF for ; Wed, 10 Mar 2021 05:33:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232213AbhCJFdY (ORCPT ); Wed, 10 Mar 2021 00:33:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56410 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229804AbhCJFdD (ORCPT ); Wed, 10 Mar 2021 00:33:03 -0500 Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A482C061764; Tue, 9 Mar 2021 21:32:40 -0800 (PST) Received: by mail-qt1-x836.google.com with SMTP id d11so12166890qtx.9; Tue, 09 Mar 2021 21:32:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bRytcz5BUdZNp3ausP/aennK0WA6aA3jsJ8O457oy18=; b=jbMSGshnecRXqaa+anLZfTs06N1AoWyt9ejzSlNztwph5xbjgcvGXPGCBhJTtU+OZX 89E/uDBtu84va6DAk4LT/ltQtMfAAWP/EUmDlsD25s5/viIvWz8YByVjoP/zOPSwesFm X1zGqbrZK6YGIZTY8S61fmPrA3fCT/LpSPDZDU2piZ+Gi/G26kO7a1whvtFIvlULBPpL 8U98k/vjj78UXo5B0HpWzDAay9DojMazuAp0q8SQT918N66gh5wL5eSxBt8bqsVN0tTO vx7GWfsdFnn2UKBVB0FsDJpsJ8YsPAMxkW+moU+RIivUwpP0TsTq2n3697ougH5IuVSh S4BA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bRytcz5BUdZNp3ausP/aennK0WA6aA3jsJ8O457oy18=; b=JTbk25SzoBfD+7yzLJeLUT0lGsy2vwZZKNSzVA727Ok917cF396SmdBIVoncUJWq2L 56d48gS65sq5tVfzQuJkYAuk2nmYB9Z2agfoA6Leo8g7m7iJm58V9ZG4XnpHuamODag7 0rcX0cNLgh5sotLZZjTUvGhuDnNTgwW8lJQGDNeKMTPfPeYzqNfvtiOfm0Spg9JIQYhA /3AN7dg4AdbpgSCt48S5W5MdhI+BvfaqnMRiczJxWzM/w20K66dpb2tT+rpE1igJsq4k tUZFCdoeRrTE+ibEurYU6f7RjhCYr9i9kVPhjbiz45JrtFO216smIYko4/j25WCBnwWN GzaQ== X-Gm-Message-State: AOAM532XGhNTTPzIUJClee9tbPwhvawJz/rx3J229V8A0fy709qLhByL WY9BjWnA2+nocoqvFn1KC7G3qfEPjeV+Jg== X-Google-Smtp-Source: ABdhPJym/Pf1dCYN9ftOW/SGh33i6ERm09kYa8keSnR2HiXHk9dkPxfUZzteUd4JAdqj9JAcb1FV6Q== X-Received: by 2002:aed:2f25:: with SMTP id l34mr1419430qtd.152.1615354359425; Tue, 09 Mar 2021 21:32:39 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:91f3:e7ef:7f61:a131]) by smtp.gmail.com with ESMTPSA id g21sm12118739qkk.72.2021.03.09.21.32.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Mar 2021 21:32:38 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v4 06/11] sock: introduce sk->sk_prot->psock_update_sk_prot() Date: Tue, 9 Mar 2021 21:32:17 -0800 Message-Id: <20210310053222.41371-7-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210310053222.41371-1-xiyou.wangcong@gmail.com> References: <20210310053222.41371-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Currently sockmap calls into each protocol to update the struct proto and replace it. This certainly won't work when the protocol is implemented as a module, for example, AF_UNIX. Introduce a new ops sk->sk_prot->psock_update_sk_prot(), so each protocol can implement its own way to replace the struct proto. This also helps get rid of symbol dependencies on CONFIG_INET. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 18 +++--------------- include/net/sock.h | 3 +++ include/net/tcp.h | 1 + include/net/udp.h | 1 + net/core/skmsg.c | 5 ----- net/core/sock_map.c | 24 ++++-------------------- net/ipv4/tcp_bpf.c | 24 +++++++++++++++++++++--- net/ipv4/tcp_ipv4.c | 3 +++ net/ipv4/udp.c | 3 +++ net/ipv4/udp_bpf.c | 15 +++++++++++++-- net/ipv6/tcp_ipv6.c | 3 +++ net/ipv6/udp.c | 3 +++ 12 files changed, 58 insertions(+), 45 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 668f24993583..79458e5976cb 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -99,6 +99,7 @@ struct sk_psock { void (*saved_close)(struct sock *sk, long timeout); void (*saved_write_space)(struct sock *sk); void (*saved_data_ready)(struct sock *sk); + int (*psock_update_sk_prot)(struct sock *sk, bool restore); struct proto *sk_proto; struct sk_psock_work_state work_state; struct work_struct work; @@ -397,25 +398,12 @@ static inline void sk_psock_cork_free(struct sk_psock *psock) } } -static inline void sk_psock_update_proto(struct sock *sk, - struct sk_psock *psock, - struct proto *ops) -{ - /* Pairs with lockless read in sk_clone_lock() */ - WRITE_ONCE(sk->sk_prot, ops); -} - static inline void sk_psock_restore_proto(struct sock *sk, struct sk_psock *psock) { sk->sk_prot->unhash = psock->saved_unhash; - if (inet_csk_has_ulp(sk)) { - tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space); - } else { - sk->sk_write_space = psock->saved_write_space; - /* Pairs with lockless read in sk_clone_lock() */ - WRITE_ONCE(sk->sk_prot, psock->sk_proto); - } + if (psock->psock_update_sk_prot) + psock->psock_update_sk_prot(sk, true); } static inline void sk_psock_set_state(struct sk_psock *psock, diff --git a/include/net/sock.h b/include/net/sock.h index 636810ddcd9b..eda64fbd5e3d 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1184,6 +1184,9 @@ struct proto { void (*unhash)(struct sock *sk); void (*rehash)(struct sock *sk); int (*get_port)(struct sock *sk, unsigned short snum); +#ifdef CONFIG_BPF_SYSCALL + int (*psock_update_sk_prot)(struct sock *sk, bool restore); +#endif /* Keeping track of sockets in use */ #ifdef CONFIG_PROC_FS diff --git a/include/net/tcp.h b/include/net/tcp.h index 075de26f449d..2efa4e5ea23d 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2203,6 +2203,7 @@ struct sk_psock; #ifdef CONFIG_BPF_SYSCALL struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock); +int tcp_bpf_update_proto(struct sock *sk, bool restore); void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); #endif /* CONFIG_BPF_SYSCALL */ diff --git a/include/net/udp.h b/include/net/udp.h index d4d064c59232..df7cc1edc200 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -518,6 +518,7 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk, #ifdef CONFIG_BPF_SYSCALL struct sk_psock; struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock); +int udp_bpf_update_proto(struct sock *sk, bool restore); #endif #endif /* _UDP_H */ diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 0f4a3bad51cd..77bf61415f30 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -559,11 +559,6 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) write_lock_bh(&sk->sk_callback_lock); - if (inet_csk_has_ulp(sk)) { - psock = ERR_PTR(-EINVAL); - goto out; - } - if (sk->sk_user_data) { psock = ERR_PTR(-EBUSY); goto out; diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 7cf5481d9367..82e33622a020 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -184,26 +184,10 @@ static void sock_map_unref(struct sock *sk, void *link_raw) static int sock_map_init_proto(struct sock *sk, struct sk_psock *psock) { - struct proto *prot; - - switch (sk->sk_type) { - case SOCK_STREAM: - prot = tcp_bpf_get_proto(sk, psock); - break; - - case SOCK_DGRAM: - prot = udp_bpf_get_proto(sk, psock); - break; - - default: + if (!sk->sk_prot->psock_update_sk_prot) return -EINVAL; - } - - if (IS_ERR(prot)) - return PTR_ERR(prot); - - sk_psock_update_proto(sk, psock, prot); - return 0; + psock->psock_update_sk_prot = sk->sk_prot->psock_update_sk_prot; + return sk->sk_prot->psock_update_sk_prot(sk, false); } static struct sk_psock *sock_map_psock_get_checked(struct sock *sk) @@ -570,7 +554,7 @@ static bool sock_map_redirect_allowed(const struct sock *sk) static bool sock_map_sk_is_suitable(const struct sock *sk) { - return sk_is_tcp(sk) || sk_is_udp(sk); + return !!sk->sk_prot->psock_update_sk_prot; } static bool sock_map_sk_state_allowed(const struct sock *sk) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index ad1261cdcdde..f1eeec5146dc 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -595,20 +595,38 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops) ops->sendpage == tcp_sendpage ? 0 : -ENOTSUPP; } -struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock) +int tcp_bpf_update_proto(struct sock *sk, bool restore) { + struct sk_psock *psock = sk_psock(sk); int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4; int config = psock->progs.msg_parser ? TCP_BPF_TX : TCP_BPF_BASE; + if (restore) { + if (inet_csk_has_ulp(sk)) { + tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space); + } else { + sk->sk_write_space = psock->saved_write_space; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, psock->sk_proto); + } + return 0; + } + + if (inet_csk_has_ulp(sk)) + return -EINVAL; + if (sk->sk_family == AF_INET6) { if (tcp_bpf_assert_proto_ops(psock->sk_proto)) - return ERR_PTR(-EINVAL); + return -EINVAL; tcp_bpf_check_v6_needs_rebuild(psock->sk_proto); } - return &tcp_bpf_prots[family][config]; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, &tcp_bpf_prots[family][config]); + return 0; } +EXPORT_SYMBOL_GPL(tcp_bpf_update_proto); /* If a child got cloned from a listening socket that had tcp_bpf * protocol callbacks installed, we need to restore the callbacks to diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index daad4f99db32..dfc6d1c0e710 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2806,6 +2806,9 @@ struct proto tcp_prot = { .hash = inet_hash, .unhash = inet_unhash, .get_port = inet_csk_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = tcp_bpf_update_proto, +#endif .enter_memory_pressure = tcp_enter_memory_pressure, .leave_memory_pressure = tcp_leave_memory_pressure, .stream_memory_free = tcp_stream_memory_free, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 4a0478b17243..38952aaee3a1 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2849,6 +2849,9 @@ struct proto udp_prot = { .unhash = udp_lib_unhash, .rehash = udp_v4_rehash, .get_port = udp_v4_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = udp_bpf_update_proto, +#endif .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min), diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 7a94791efc1a..6001f93cd3a0 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -41,12 +41,23 @@ static int __init udp_bpf_v4_build_proto(void) } core_initcall(udp_bpf_v4_build_proto); -struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock) +int udp_bpf_update_proto(struct sock *sk, bool restore) { int family = sk->sk_family == AF_INET ? UDP_BPF_IPV4 : UDP_BPF_IPV6; + struct sk_psock *psock = sk_psock(sk); + + if (restore) { + sk->sk_write_space = psock->saved_write_space; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, psock->sk_proto); + return 0; + } if (sk->sk_family == AF_INET6) udp_bpf_check_v6_needs_rebuild(psock->sk_proto); - return &udp_bpf_prots[family]; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, &udp_bpf_prots[family]); + return 0; } +EXPORT_SYMBOL_GPL(udp_bpf_update_proto); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index bd44ded7e50c..4fdc58a9e19e 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -2134,6 +2134,9 @@ struct proto tcpv6_prot = { .hash = inet6_hash, .unhash = inet_unhash, .get_port = inet_csk_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = tcp_bpf_update_proto, +#endif .enter_memory_pressure = tcp_enter_memory_pressure, .leave_memory_pressure = tcp_leave_memory_pressure, .stream_memory_free = tcp_stream_memory_free, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index d25e5a9252fd..ef2c75bb4771 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1713,6 +1713,9 @@ struct proto udpv6_prot = { .unhash = udp_lib_unhash, .rehash = udp_v6_rehash, .get_port = udp_v6_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = udp_bpf_update_proto, +#endif .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min), From patchwork Wed Mar 10 05:32:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12127051 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13D6CC43332 for ; Wed, 10 Mar 2021 05:33:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0011564FF7 for ; Wed, 10 Mar 2021 05:33:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229732AbhCJFdX (ORCPT ); Wed, 10 Mar 2021 00:33:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229927AbhCJFdD (ORCPT ); Wed, 10 Mar 2021 00:33:03 -0500 Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A551C061765; Tue, 9 Mar 2021 21:32:42 -0800 (PST) Received: by mail-qt1-x82c.google.com with SMTP id l7so3704383qtq.7; Tue, 09 Mar 2021 21:32:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=djZ+GsYdOix1zOIqZyqyLKK8y1Y9aew3XZCxsSgYn+g=; b=KXv/2qXw/50H/V+qkOpG+aQBwUWzUVHz3qiWsIQac2jI9I1R/uSJwDebQvACHcNkXk gYdTCVOHX84++8rVzEnXVU5erWV9hzrRqSCVikhauRyYJ/1cjHXcr84jjsNE4z850W9F ulA++OVjbll8v3YLJ1ylZaxNTwIII3C0uUxgGxq/yDgEDc0bZlpDD6tRmL8cDP84RKSp AiRCfo5HiNuV8KgcDZMrZHz9ggSeox3wA1vvwbpa7UZpp+ReOvmn7V4iPJeBnFkPwlQw KtuprUxQrVexkUXvGSY+vWaBbXA2UFGmAODnqokp187t4T/pMFAKw/+v/7Qgoy+L5X1u XZwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=djZ+GsYdOix1zOIqZyqyLKK8y1Y9aew3XZCxsSgYn+g=; b=Fm+1HM//wuFy+zSHLlAYFqZdN3u0//UXNKqBzW6IXiB2gYyglyDWkaXsPREtrvbyfM hO+6L3xs8BLSf3X7Iq3d7ZRv8NESHpa3BG3meVhIOyezCTNNWGj9ufyFnXrwYRhs/t4o y0Qc1ldWhbPE0dd02lmgNt3WhL639hyKv9lt4IP6bEMSL+1KvSsjrK+2cNlUBAVWmtRs ifgz4UduRCN3BPdk6CjsG4LxyVi7etx30l87cbV38wtEJFmOIXKMTMovuTIIvNTrgvK2 o10oh17XcotyUlzr2Ghm2vr6vs+s6HDF7ZLxLjxJmApl8oYBrtq/A8FPFEODTjtsluuN wHQQ== X-Gm-Message-State: AOAM530huI7KZ4B5y0jGaksXbsHOv29S8O/gEuhHyQjq2B8VmfCS9rsD lAIw92Fg971rrWWgqlJDucdMWh70qO24oA== X-Google-Smtp-Source: ABdhPJzGb1eteR1aB8W45I5krGwTVmqI/zoGjDmLn2l1ITjqOevh78xc7KIq6GevetVTgW00PayFlg== X-Received: by 2002:ac8:12c8:: with SMTP id b8mr1423822qtj.2.1615354361175; Tue, 09 Mar 2021 21:32:41 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:91f3:e7ef:7f61:a131]) by smtp.gmail.com with ESMTPSA id g21sm12118739qkk.72.2021.03.09.21.32.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Mar 2021 21:32:40 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v4 07/11] udp: implement ->read_sock() for sockmap Date: Tue, 9 Mar 2021 21:32:18 -0800 Message-Id: <20210310053222.41371-8-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210310053222.41371-1-xiyou.wangcong@gmail.com> References: <20210310053222.41371-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang This is similar to tcp_read_sock(), except we do not need to worry about connections, we just need to retrieve skb from UDP receive queue. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/net/udp.h | 2 ++ net/ipv4/af_inet.c | 1 + net/ipv4/udp.c | 35 +++++++++++++++++++++++++++++++++++ net/ipv6/af_inet6.c | 1 + 4 files changed, 39 insertions(+) diff --git a/include/net/udp.h b/include/net/udp.h index df7cc1edc200..347b62a753c3 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -329,6 +329,8 @@ struct sock *__udp6_lib_lookup(struct net *net, struct sk_buff *skb); struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport); +int udp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor); /* UDP uses skb->dev_scratch to cache as much information as possible and avoid * possibly multiple cache miss on dequeue() diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index a02ce89b56b5..915001f8000b 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1071,6 +1071,7 @@ const struct proto_ops inet_dgram_ops = { .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, .sendmsg = inet_sendmsg, + .read_sock = udp_read_sock, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, .sendpage = inet_sendpage, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 38952aaee3a1..a0adee3b1af4 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1782,6 +1782,41 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags, } EXPORT_SYMBOL(__skb_recv_udp); +int udp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + int copied = 0; + + while (1) { + int offset = 0, err; + struct sk_buff *skb; + + skb = __skb_recv_udp(sk, 0, 1, &offset, &err); + if (!skb) + break; + if (offset < skb->len) { + int used; + size_t len; + + len = skb->len - offset; + used = recv_actor(desc, skb, offset, len); + if (used <= 0) { + if (!copied) + copied = used; + break; + } else if (used <= len) { + copied += used; + offset += used; + } + } + if (!desc->count) + break; + } + + return copied; +} +EXPORT_SYMBOL(udp_read_sock); + /* * This should be easy, if there is something there we * return it, otherwise we block. diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 1fb75f01756c..63218b0bcdb1 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -715,6 +715,7 @@ const struct proto_ops inet6_dgram_ops = { .getsockopt = sock_common_getsockopt, /* ok */ .sendmsg = inet6_sendmsg, /* retpoline's sake */ .recvmsg = inet6_recvmsg, /* retpoline's sake */ + .read_sock = udp_read_sock, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, .set_peek_off = sk_set_peek_off, From patchwork Wed Mar 10 05:32:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12127059 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 768D4C4360C for ; Wed, 10 Mar 2021 05:33:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 457A964FE7 for ; Wed, 10 Mar 2021 05:33:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232187AbhCJFdX (ORCPT ); Wed, 10 Mar 2021 00:33:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229808AbhCJFdD (ORCPT ); Wed, 10 Mar 2021 00:33:03 -0500 Received: from mail-qv1-xf33.google.com (mail-qv1-xf33.google.com [IPv6:2607:f8b0:4864:20::f33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A6A2C0613D7; Tue, 9 Mar 2021 21:32:44 -0800 (PST) Received: by mail-qv1-xf33.google.com with SMTP id i15so208635qvr.0; Tue, 09 Mar 2021 21:32:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=F8AFC3cGowiZuRL4hGZrbn9YGLJO+fksVA2uBtXd0+k=; b=YsWDKF6TNA8BZwv7g7bjdKD04MTnfpyKzI9vYeX/Gtwfi003Z51He5FC5m37LuuavD sGPwW+VCADqZwjqiWA0/kEopvfdKIl2IjAqEWjcFJ01F4UZmA61dZSkeTUmNDHvtDHn0 cOWZFgnCevVzvaQ6Pnj/eDXB6CR7kWK9hDumyJ8B3gZ3JsUht20YcGyUPh+AOK15k2c6 YSyuMmP8baU4CbkAAw3ArozJ+7CLKAnRGyjtMfXzv/3pWNkygwG4KcBC+oJz3DffkbbD B5P6e/UIXbgIWuyaHxwxnzZSi8U5S+kLvGTSc7fM5VzsXrKWUJSaTD4hbiZrRRTwHgup H5KQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=F8AFC3cGowiZuRL4hGZrbn9YGLJO+fksVA2uBtXd0+k=; b=iUoBTuNx1YS+sJ8NdAorafLAinuOtfqGwk2BCEcuFYHPMulIDaqrahLGLqA/8hrOWG awIoVv580KVbiLBMiDaxlb/RLwRG1ehlBpim9VRWzvbBYyKlQZ8ZAk7sLlJ5s6CCl8GK X5wnYDu0J/sIRoS2W7/7hzloQG2YMNavEBpgvFY0OqjJxUZVCNilYgHGFQJv+txFIl20 XnBLePOxlqXkKghw1xRMeRpQkeClWWfNrMzWPBbGeTtvy9fNIabXzDUHKeLBhn7ATSYX NnOihpD4No4d2o5QnW7JW2Nh8m8I8nyjruhXpzXZvBBza1yjFfoH5G1js8PWj8Vpe1DU Idng== X-Gm-Message-State: AOAM531Ta3uGN+z8jLXywRSxJHuBWs17ajsbWHzFSPObR3ebYEOzU0tm PH4ugwaM5TcQn5i0cCHkQ0O3RCgCj6kexg== X-Google-Smtp-Source: ABdhPJyCUwqQPi7Olu96Ti+/4HOunNVTrKVP9aemwIVrzmmo40yXhGJ7bPH/i8WYUekTt0mdSMBEjw== X-Received: by 2002:a0c:fecd:: with SMTP id z13mr1238471qvs.43.1615354363254; Tue, 09 Mar 2021 21:32:43 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:91f3:e7ef:7f61:a131]) by smtp.gmail.com with ESMTPSA id g21sm12118739qkk.72.2021.03.09.21.32.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Mar 2021 21:32:42 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v4 08/11] skmsg: extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data() Date: Tue, 9 Mar 2021 21:32:19 -0800 Message-Id: <20210310053222.41371-9-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210310053222.41371-1-xiyou.wangcong@gmail.com> References: <20210310053222.41371-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Although these two functions are only used by TCP, they are not specific to TCP at all, both operate on skmsg and ingress_msg, so fit in net/core/skmsg.c very well. And we will need them for non-TCP, so rename and move them to skmsg.c and export them to modules. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 4 ++ include/net/tcp.h | 2 - net/core/skmsg.c | 98 +++++++++++++++++++++++++++++++++++++++++ net/ipv4/tcp_bpf.c | 100 +----------------------------------------- net/tls/tls_sw.c | 4 +- 5 files changed, 106 insertions(+), 102 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 79458e5976cb..d3b8c99da1fa 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -127,6 +127,10 @@ int sk_msg_zerocopy_from_iter(struct sock *sk, struct iov_iter *from, struct sk_msg *msg, u32 bytes); int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, struct sk_msg *msg, u32 bytes); +int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags, + long timeo, int *err); +int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, + int len, int flags); static inline void sk_msg_check_to_free(struct sk_msg *msg, u32 i, u32 bytes) { diff --git a/include/net/tcp.h b/include/net/tcp.h index 2efa4e5ea23d..31b1696c62ba 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2209,8 +2209,6 @@ void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, u32 bytes, int flags); -int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, - struct msghdr *msg, int len, int flags); #endif /* CONFIG_NET_SOCK_MSG */ #if !defined(CONFIG_BPF_SYSCALL) || !defined(CONFIG_NET_SOCK_MSG) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 77bf61415f30..9200b021a0be 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -399,6 +399,104 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, } EXPORT_SYMBOL_GPL(sk_msg_memcopy_from_iter); +int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags, + long timeo, int *err) +{ + DEFINE_WAIT_FUNC(wait, woken_wake_function); + int ret = 0; + + if (sk->sk_shutdown & RCV_SHUTDOWN) + return 1; + + if (!timeo) + return ret; + + add_wait_queue(sk_sleep(sk), &wait); + sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); + ret = sk_wait_event(sk, &timeo, + !list_empty(&psock->ingress_msg) || + !skb_queue_empty(&sk->sk_receive_queue), &wait); + sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); + remove_wait_queue(sk_sleep(sk), &wait); + return ret; +} +EXPORT_SYMBOL_GPL(sk_msg_wait_data); + +/* Receive sk_msg from psock->ingress_msg to @msg. */ +int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, + int len, int flags) +{ + struct iov_iter *iter = &msg->msg_iter; + int peek = flags & MSG_PEEK; + struct sk_msg *msg_rx; + int i, copied = 0; + + msg_rx = sk_psock_peek_msg(psock); + while (copied != len) { + struct scatterlist *sge; + + if (unlikely(!msg_rx)) + break; + + i = msg_rx->sg.start; + do { + struct page *page; + int copy; + + sge = sk_msg_elem(msg_rx, i); + copy = sge->length; + page = sg_page(sge); + if (copied + copy > len) + copy = len - copied; + copy = copy_page_to_iter(page, sge->offset, copy, iter); + if (!copy) + return copied ? copied : -EFAULT; + + copied += copy; + if (likely(!peek)) { + sge->offset += copy; + sge->length -= copy; + if (!msg_rx->skb) + sk_mem_uncharge(sk, copy); + msg_rx->sg.size -= copy; + + if (!sge->length) { + sk_msg_iter_var_next(i); + if (!msg_rx->skb) + put_page(page); + } + } else { + /* Lets not optimize peek case if copy_page_to_iter + * didn't copy the entire length lets just break. + */ + if (copy != sge->length) + return copied; + sk_msg_iter_var_next(i); + } + + if (copied == len) + break; + } while (i != msg_rx->sg.end); + + if (unlikely(peek)) { + msg_rx = sk_psock_next_msg(psock, msg_rx); + if (!msg_rx) + break; + continue; + } + + msg_rx->sg.start = i; + if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { + msg_rx = sk_psock_deque_msg(psock); + kfree_sk_msg(msg_rx); + } + msg_rx = sk_psock_peek_msg(psock); + } + + return copied; +} +EXPORT_SYMBOL_GPL(sk_msg_recvmsg); + static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, struct sk_buff *skb) { diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index f1eeec5146dc..3d622a0d0753 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -10,80 +10,6 @@ #include #include -int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, - struct msghdr *msg, int len, int flags) -{ - struct iov_iter *iter = &msg->msg_iter; - int peek = flags & MSG_PEEK; - struct sk_msg *msg_rx; - int i, copied = 0; - - msg_rx = sk_psock_peek_msg(psock); - while (copied != len) { - struct scatterlist *sge; - - if (unlikely(!msg_rx)) - break; - - i = msg_rx->sg.start; - do { - struct page *page; - int copy; - - sge = sk_msg_elem(msg_rx, i); - copy = sge->length; - page = sg_page(sge); - if (copied + copy > len) - copy = len - copied; - copy = copy_page_to_iter(page, sge->offset, copy, iter); - if (!copy) - return copied ? copied : -EFAULT; - - copied += copy; - if (likely(!peek)) { - sge->offset += copy; - sge->length -= copy; - if (!msg_rx->skb) - sk_mem_uncharge(sk, copy); - msg_rx->sg.size -= copy; - - if (!sge->length) { - sk_msg_iter_var_next(i); - if (!msg_rx->skb) - put_page(page); - } - } else { - /* Lets not optimize peek case if copy_page_to_iter - * didn't copy the entire length lets just break. - */ - if (copy != sge->length) - return copied; - sk_msg_iter_var_next(i); - } - - if (copied == len) - break; - } while (i != msg_rx->sg.end); - - if (unlikely(peek)) { - msg_rx = sk_psock_next_msg(psock, msg_rx); - if (!msg_rx) - break; - continue; - } - - msg_rx->sg.start = i; - if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { - msg_rx = sk_psock_deque_msg(psock); - kfree_sk_msg(msg_rx); - } - msg_rx = sk_psock_peek_msg(psock); - } - - return copied; -} -EXPORT_SYMBOL_GPL(__tcp_bpf_recvmsg); - static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock, struct sk_msg *msg, u32 apply_bytes, int flags) { @@ -237,28 +163,6 @@ static bool tcp_bpf_stream_read(const struct sock *sk) return !empty; } -static int tcp_bpf_wait_data(struct sock *sk, struct sk_psock *psock, - int flags, long timeo, int *err) -{ - DEFINE_WAIT_FUNC(wait, woken_wake_function); - int ret = 0; - - if (sk->sk_shutdown & RCV_SHUTDOWN) - return 1; - - if (!timeo) - return ret; - - add_wait_queue(sk_sleep(sk), &wait); - sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); - ret = sk_wait_event(sk, &timeo, - !list_empty(&psock->ingress_msg) || - !skb_queue_empty(&sk->sk_receive_queue), &wait); - sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); - remove_wait_queue(sk_sleep(sk), &wait); - return ret; -} - static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, int flags, int *addr_len) { @@ -278,13 +182,13 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, } lock_sock(sk); msg_bytes_ready: - copied = __tcp_bpf_recvmsg(sk, psock, msg, len, flags); + copied = sk_msg_recvmsg(sk, psock, msg, len, flags); if (!copied) { int data, err = 0; long timeo; timeo = sock_rcvtimeo(sk, nonblock); - data = tcp_bpf_wait_data(sk, psock, flags, timeo, &err); + data = sk_msg_wait_data(sk, psock, flags, timeo, &err); if (data) { if (!sk_psock_queue_empty(psock)) goto msg_bytes_ready; diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 01d933ae5f16..1dcb34dfd56b 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1789,8 +1789,8 @@ int tls_sw_recvmsg(struct sock *sk, skb = tls_wait_data(sk, psock, flags, timeo, &err); if (!skb) { if (psock) { - int ret = __tcp_bpf_recvmsg(sk, psock, - msg, len, flags); + int ret = sk_msg_recvmsg(sk, psock, msg, len, + flags); if (ret > 0) { decrypted += ret; From patchwork Wed Mar 10 05:32:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12127041 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0107C433E0 for ; Wed, 10 Mar 2021 05:33:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 966AE64F51 for ; Wed, 10 Mar 2021 05:33:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229920AbhCJFdT (ORCPT ); Wed, 10 Mar 2021 00:33:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229819AbhCJFdD (ORCPT ); Wed, 10 Mar 2021 00:33:03 -0500 Received: from mail-qv1-xf36.google.com (mail-qv1-xf36.google.com [IPv6:2607:f8b0:4864:20::f36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A75EC0613D8; Tue, 9 Mar 2021 21:32:46 -0800 (PST) Received: by mail-qv1-xf36.google.com with SMTP id a14so207734qvj.7; Tue, 09 Mar 2021 21:32:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ktDD3wFfhTgzuwFfQ6FhKlNOYR4srlqkDAegPxHtcbc=; b=Mwuqp8jAxSRewGz0Nhx/5Q27Jnmh5/nbrBu0LCcbFLPt6TuTFfZzHIOk9liph1hDvv HxT9D+3fVcOMMUAhAKktP/8udwzaSP4BSvUT7zOSsXwkZ+ndpUuDqNmznnLJLIwxDSep u3uGaXR43cvxIJb8cUTQ06TWLOib6CaeHj/E7Skt0RWmeKEciee/neg5lac+56dQC7P0 Cmxd5im82MZfcTWUbhu02DyQj/4R+WtMHCeizYIkt0tSUJ2Js0TsQOkp7AmCeBEs3XLJ lBFeZ6sFbtBbaBrJATk9Knm7LiCBFTjwmc0OnFQ/XnnjMDz/R0xH9ry3y6GYni+2XwpT g0Qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ktDD3wFfhTgzuwFfQ6FhKlNOYR4srlqkDAegPxHtcbc=; b=ljoz3Lmh/rafaRobvUnqrfJ0WJMs54LFK8FgdxzEc5Kh/8gWt2ElGJOFdzmqAoNrKB GLENAubLmp+zE/jH2WEu8/tyEGlFIdVAwYzGhK3RjOcCeBX0+2bnf//LzNTw7bOrOaMj h4/QOPPv+NjSevYHARHH/7j0qAoPOIccgZHjQqUllN5Qs7g5SnM8TbOWp7YXk1oIyOvE jzZ9WFIzB1Lr8F5AYo6vmBmwkyweEyV/REpfCTs77Fxm5Q6j8bPBxdyv6jKZCJzZ28zi FO+WzGuKr6l0e/yNfzvRvyT/aP5LFVnyZB4GN8RvX9ZJv+x3VMUmTTwTDNDd3eA2cMOX 9UxA== X-Gm-Message-State: AOAM533O9yOqIPURVhzJFdTuk244Ha/Y14x3R4UiC/90iC3m3+zA6F4u blklvOwjcs6IxRFaEHvFEvy5+Y9fC9QpyQ== X-Google-Smtp-Source: ABdhPJy+mD/2lqmcUL0ICTB3KBNOCOHNLRLF0mV3Eab57GGL2rszzju41nHL7neX8f42q7uMFdQJ1Q== X-Received: by 2002:ad4:5ecc:: with SMTP id jm12mr1451922qvb.33.1615354365302; Tue, 09 Mar 2021 21:32:45 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:91f3:e7ef:7f61:a131]) by smtp.gmail.com with ESMTPSA id g21sm12118739qkk.72.2021.03.09.21.32.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Mar 2021 21:32:44 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v4 09/11] udp: implement udp_bpf_recvmsg() for sockmap Date: Tue, 9 Mar 2021 21:32:20 -0800 Message-Id: <20210310053222.41371-10-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210310053222.41371-1-xiyou.wangcong@gmail.com> References: <20210310053222.41371-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang We have to implement udp_bpf_recvmsg() to replace the ->recvmsg() to retrieve skmsg from ingress_msg. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- net/ipv4/udp_bpf.c | 64 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 6001f93cd3a0..7d5c4ebf42fe 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -4,6 +4,68 @@ #include #include #include +#include + +#include "udp_impl.h" + +static struct proto *udpv6_prot_saved __read_mostly; + +static int sk_udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, + int noblock, int flags, int *addr_len) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (sk->sk_family == AF_INET6) + return udpv6_prot_saved->recvmsg(sk, msg, len, noblock, flags, + addr_len); +#endif + return udp_prot.recvmsg(sk, msg, len, noblock, flags, addr_len); +} + +static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, + int nonblock, int flags, int *addr_len) +{ + struct sk_psock *psock; + int copied, ret; + + if (unlikely(flags & MSG_ERRQUEUE)) + return inet_recv_error(sk, msg, len, addr_len); + + psock = sk_psock_get(sk); + if (unlikely(!psock)) + return sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + + lock_sock(sk); + if (sk_psock_queue_empty(psock)) { + ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + goto out; + } + +msg_bytes_ready: + copied = sk_msg_recvmsg(sk, psock, msg, len, flags); + if (!copied) { + int data, err = 0; + long timeo; + + timeo = sock_rcvtimeo(sk, nonblock); + data = sk_msg_wait_data(sk, psock, flags, timeo, &err); + if (data) { + if (!sk_psock_queue_empty(psock)) + goto msg_bytes_ready; + ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + goto out; + } + if (err) { + ret = err; + goto out; + } + copied = -EAGAIN; + } + ret = copied; +out: + release_sock(sk); + sk_psock_put(sk, psock); + return ret; +} enum { UDP_BPF_IPV4, @@ -11,7 +73,6 @@ enum { UDP_BPF_NUM_PROTS, }; -static struct proto *udpv6_prot_saved __read_mostly; static DEFINE_SPINLOCK(udpv6_prot_lock); static struct proto udp_bpf_prots[UDP_BPF_NUM_PROTS]; @@ -20,6 +81,7 @@ static void udp_bpf_rebuild_protos(struct proto *prot, const struct proto *base) *prot = *base; prot->unhash = sock_map_unhash; prot->close = sock_map_close; + prot->recvmsg = udp_bpf_recvmsg; } static void udp_bpf_check_v6_needs_rebuild(struct proto *ops) From patchwork Wed Mar 10 05:32:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12127057 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49AA6C4321A for ; Wed, 10 Mar 2021 05:33:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2FD2764FF3 for ; Wed, 10 Mar 2021 05:33:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229784AbhCJFdW (ORCPT ); Wed, 10 Mar 2021 00:33:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229931AbhCJFdD (ORCPT ); Wed, 10 Mar 2021 00:33:03 -0500 Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com [IPv6:2607:f8b0:4864:20::f2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 167E3C0613D9; Tue, 9 Mar 2021 21:32:47 -0800 (PST) Received: by mail-qv1-xf2a.google.com with SMTP id a14so207758qvj.7; Tue, 09 Mar 2021 21:32:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LEg3npv0kkNxuAwGAdkyF3a2ih0UPGgcksP1uJl2DHk=; b=ky5HOPgg64POSkmeu+ucHhFbqZWHKCsWP2qeLLOP0fxuZGZVcAYSiX1yFK6ujztNd9 n+Gf1lm7+rI3JZTDmxU4IlN9TDCcnJ2PdFWPOvuFH0M3oS9Ws4hM66u9XoL9b2ztpNug 8OInkvaf72g9khID2j2aYQMOzEov5RXUf/8TbQ0nx58ajheEDHCbPI5JfXCSsL7iGgNH IKpTM9lHAbn2W0qbOuoAmXh9hvXMOKMMWUXMAAPg6Gv94Tz2XzymGoQEekJljHluJfE/ GRP9OdLZo0CyP3j7+i8k1fTwRASSsrjiI2GDn4l0ug69duRDA2mh9YqB94hFsA/E/EQL eR1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LEg3npv0kkNxuAwGAdkyF3a2ih0UPGgcksP1uJl2DHk=; b=rmGybaWl6MG4thEeaKgjS6ypuG76IsCmuEPZIUbOx00VcYBbriXNL3eQitQbcBJjG2 QWJ8OsA0ZePGpSIbyHkb5y+6HEaAnwjOBepwFvCwHibLpBKSyiqDONkr719pMI3Kuwmi Xtsn0Wc1wjPDjdGfrJnEKAGpg6Jbnx+QeGpXMy5negirZPj6eCTs7f3b7h+ipJClTWpv eS3pG9Q4CvJGVrnAE7EWZMeXXgjOZn+YP+N7u0b2u4C5yM/b459YsXL+iQk0TLHNUUpd jef/RTEFd9zMuq9mfwzX84MAvnV2DrzMElNQPTrgsoWjoO7EZUvPGaKCunfnLVCH6d3L fVGA== X-Gm-Message-State: AOAM533mp9yeqm+8HKObGLB6eshTQVNEaTmbPnVM0eu+cX4VIKdHdzoO YakWU2haEWbC/WIa/mw5mMXgyc6NUu+eGQ== X-Google-Smtp-Source: ABdhPJz0P6JFb6m9hbhQ6P2spc/IDy5u2Ol4nZEic2XDmJABrfnEPwvGEx0Uc26eAt7GL7VD6xi/Nw== X-Received: by 2002:a05:6214:1454:: with SMTP id b20mr1294910qvy.24.1615354366772; Tue, 09 Mar 2021 21:32:46 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:91f3:e7ef:7f61:a131]) by smtp.gmail.com with ESMTPSA id g21sm12118739qkk.72.2021.03.09.21.32.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Mar 2021 21:32:46 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v4 10/11] sock_map: update sock type checks for UDP Date: Tue, 9 Mar 2021 21:32:21 -0800 Message-Id: <20210310053222.41371-11-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210310053222.41371-1-xiyou.wangcong@gmail.com> References: <20210310053222.41371-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Now UDP supports sockmap and redirection, we can safely update the sock type checks for it accordingly. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- net/core/sock_map.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 82e33622a020..69293495c4a9 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -549,7 +549,10 @@ static bool sk_is_udp(const struct sock *sk) static bool sock_map_redirect_allowed(const struct sock *sk) { - return sk_is_tcp(sk) && sk->sk_state != TCP_LISTEN; + if (sk_is_tcp(sk)) + return sk->sk_state != TCP_LISTEN; + else + return sk->sk_state == TCP_ESTABLISHED; } static bool sock_map_sk_is_suitable(const struct sock *sk) From patchwork Wed Mar 10 05:32:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12127053 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28E56C432C3 for ; Wed, 10 Mar 2021 05:33:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 110A965006 for ; Wed, 10 Mar 2021 05:33:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231744AbhCJFdW (ORCPT ); Wed, 10 Mar 2021 00:33:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229470AbhCJFdD (ORCPT ); Wed, 10 Mar 2021 00:33:03 -0500 Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5730CC0613DA; Tue, 9 Mar 2021 21:32:50 -0800 (PST) Received: by mail-qk1-x72a.google.com with SMTP id f124so15694680qkj.5; Tue, 09 Mar 2021 21:32:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=g3nptvPrG/Rja0vDg1eqjKadpsNbfe6SjFJEHfl1sz0=; b=csfo5ny9vn031XTr0QE9HHX4i5g2UGrJhAVDLHHWHJt+orut2ZnSAqyUb23H2B02Zr PFz69KIbPrEAxSk08Fs9OzG94dM3NwiFP9EW1bYZgU/6QA1e4Uk/K0gklG4Z645sRhVZ JXyKIJlvtCWavocboX2xG0S5mblAgQqs1TGCq8+6BGidJKhK/RIU0X3VbRFQj1FMrdwi GNQ4NPTJd5qqp5N+VEEY6oroXNoMXo8oIjiZwuGfzz2olZd4+XtYY+EZsjgPQiDfQxZW nZ1WRrQjYNtSUV5VH+9eVP00XYw2wOjCRKZbb3igTW/se5pCCLx0NN3VvJjZMrQtYVaZ 0q1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=g3nptvPrG/Rja0vDg1eqjKadpsNbfe6SjFJEHfl1sz0=; b=lmXi2Q/4+X2TGPEiw4F3xlJN4vC/opKM+iId/tcf0TpfKIwvnesnLW71FiXSE8+sWu rXe5WekbSybHLViZu/u+KxZTOK8bUvcvVVI+ggNBw65IauhVRo/u7MPPdkSNYeZksEY2 CPT4a3WBVRpYw9akRSwHuX1pwhwfzga0kjw3OiRxxZIGQ7dGY6VLGNjzrr5OyB5f9oJ4 8WfCsGDnruPMvNhyoArybpR3byRl4XPtkuLfaCwk/HDtQe0TvwmCDn+VFVWlPgbsGQDN mI/LN4U6YFxK0TpQ63SjB3IeE4hJwLAWgC637ynfxWPlBN5E4ATnVRQiHMNEc8j/W3Kc pHGQ== X-Gm-Message-State: AOAM530nz/qI/as8QEFfhUrlmYO0DVGUuZFJ6CBzyRvzEroJW3Yt6DjL ViJwMbAcvxgvOnn7d6n2CvsXphQw7S548w== X-Google-Smtp-Source: ABdhPJykyOB/de8QBgiE0zmwljU20PunVO/V01yp5Fp6+6Sc/jWqASB0xZdMiwrJa3GcfUl0LfUq9Q== X-Received: by 2002:ae9:eb57:: with SMTP id b84mr1109459qkg.271.1615354369428; Tue, 09 Mar 2021 21:32:49 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:91f3:e7ef:7f61:a131]) by smtp.gmail.com with ESMTPSA id g21sm12118739qkk.72.2021.03.09.21.32.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Mar 2021 21:32:48 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v4 11/11] selftests/bpf: add a test case for udp sockmap Date: Tue, 9 Mar 2021 21:32:22 -0800 Message-Id: <20210310053222.41371-12-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210310053222.41371-1-xiyou.wangcong@gmail.com> References: <20210310053222.41371-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Add a test case to ensure redirection between two UDP sockets work. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- .../selftests/bpf/prog_tests/sockmap_listen.c | 140 ++++++++++++++++++ .../selftests/bpf/progs/test_sockmap_listen.c | 22 +++ 2 files changed, 162 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index c26e6bf05e49..a549ebd3b5a6 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -1563,6 +1563,142 @@ static void test_redir(struct test_sockmap_listen *skel, struct bpf_map *map, } } +static void udp_redir_to_connected(int family, int sotype, int sock_mapfd, + int verd_mapfd, enum redir_mode mode) +{ + const char *log_prefix = redir_mode_str(mode); + struct sockaddr_storage addr; + int c0, c1, p0, p1; + unsigned int pass; + socklen_t len; + int err, n; + u64 value; + u32 key; + char b; + + zero_verdict_count(verd_mapfd); + + p0 = socket_loopback(family, sotype | SOCK_NONBLOCK); + if (p0 < 0) + return; + len = sizeof(addr); + err = xgetsockname(p0, sockaddr(&addr), &len); + if (err) + goto close_peer0; + + c0 = xsocket(family, sotype | SOCK_NONBLOCK, 0); + if (c0 < 0) + goto close_peer0; + err = xconnect(c0, sockaddr(&addr), len); + if (err) + goto close_cli0; + err = xgetsockname(c0, sockaddr(&addr), &len); + if (err) + goto close_cli0; + err = xconnect(p0, sockaddr(&addr), len); + if (err) + goto close_cli0; + + p1 = socket_loopback(family, sotype | SOCK_NONBLOCK); + if (p1 < 0) + goto close_cli0; + err = xgetsockname(p1, sockaddr(&addr), &len); + if (err) + goto close_cli0; + + c1 = xsocket(family, sotype | SOCK_NONBLOCK, 0); + if (c1 < 0) + goto close_peer1; + err = xconnect(c1, sockaddr(&addr), len); + if (err) + goto close_cli1; + err = xgetsockname(c1, sockaddr(&addr), &len); + if (err) + goto close_cli1; + err = xconnect(p1, sockaddr(&addr), len); + if (err) + goto close_cli1; + + key = 0; + value = p0; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_cli1; + + key = 1; + value = p1; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_cli1; + + n = write(c1, "a", 1); + if (n < 0) + FAIL_ERRNO("%s: write", log_prefix); + if (n == 0) + FAIL("%s: incomplete write", log_prefix); + if (n < 1) + goto close_cli1; + + key = SK_PASS; + err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass); + if (err) + goto close_cli1; + if (pass != 1) + FAIL("%s: want pass count 1, have %d", log_prefix, pass); + + n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1); + if (n < 0) + FAIL_ERRNO("%s: read", log_prefix); + if (n == 0) + FAIL("%s: incomplete read", log_prefix); + +close_cli1: + xclose(c1); +close_peer1: + xclose(p1); +close_cli0: + xclose(c0); +close_peer0: + xclose(p0); +} + +static void udp_skb_redir_to_connected(struct test_sockmap_listen *skel, + struct bpf_map *inner_map, int family, + int sotype) +{ + int verdict = bpf_program__fd(skel->progs.prog_skb_verdict); + int verdict_map = bpf_map__fd(skel->maps.verdict_map); + int sock_map = bpf_map__fd(inner_map); + int err; + + err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0); + if (err) + return; + + skel->bss->test_ingress = false; + udp_redir_to_connected(family, sotype, sock_map, verdict_map, + REDIR_EGRESS); + skel->bss->test_ingress = true; + udp_redir_to_connected(family, sotype, sock_map, verdict_map, + REDIR_INGRESS); + + xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT); +} + +static void test_udp_redir(struct test_sockmap_listen *skel, struct bpf_map *map, + int family) +{ + const char *family_name, *map_name; + char s[MAX_TEST_NAME]; + + family_name = family_str(family); + map_name = map_type_str(map); + snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__); + if (!test__start_subtest(s)) + return; + udp_skb_redir_to_connected(skel, map, family, SOCK_DGRAM); +} + static void test_reuseport(struct test_sockmap_listen *skel, struct bpf_map *map, int family, int sotype) { @@ -1626,10 +1762,14 @@ void test_sockmap_listen(void) skel->bss->test_sockmap = true; run_tests(skel, skel->maps.sock_map, AF_INET); run_tests(skel, skel->maps.sock_map, AF_INET6); + test_udp_redir(skel, skel->maps.sock_map, AF_INET); + test_udp_redir(skel, skel->maps.sock_map, AF_INET6); skel->bss->test_sockmap = false; run_tests(skel, skel->maps.sock_hash, AF_INET); run_tests(skel, skel->maps.sock_hash, AF_INET6); + test_udp_redir(skel, skel->maps.sock_hash, AF_INET); + test_udp_redir(skel, skel->maps.sock_hash, AF_INET6); test_sockmap_listen__destroy(skel); } diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c index fa221141e9c1..a39eba9f5201 100644 --- a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c +++ b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c @@ -29,6 +29,7 @@ struct { } verdict_map SEC(".maps"); static volatile bool test_sockmap; /* toggled by user-space */ +static volatile bool test_ingress; /* toggled by user-space */ SEC("sk_skb/stream_parser") int prog_stream_parser(struct __sk_buff *skb) @@ -55,6 +56,27 @@ int prog_stream_verdict(struct __sk_buff *skb) return verdict; } +SEC("sk_skb/skb_verdict") +int prog_skb_verdict(struct __sk_buff *skb) +{ + unsigned int *count; + __u32 zero = 0; + int verdict; + + if (test_sockmap) + verdict = bpf_sk_redirect_map(skb, &sock_map, zero, + test_ingress ? BPF_F_INGRESS : 0); + else + verdict = bpf_sk_redirect_hash(skb, &sock_hash, &zero, + test_ingress ? BPF_F_INGRESS : 0); + + count = bpf_map_lookup_elem(&verdict_map, &verdict); + if (count) + (*count)++; + + return verdict; +} + SEC("sk_msg") int prog_msg_verdict(struct sk_msg_md *msg) {