From patchwork Sun Mar 28 20:20:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12169109 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CAABC433E2 for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3A3B361951 for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231434AbhC1UUq (ORCPT ); Sun, 28 Mar 2021 16:20:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229655AbhC1UUW (ORCPT ); Sun, 28 Mar 2021 16:20:22 -0400 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A197C061762; Sun, 28 Mar 2021 13:20:20 -0700 (PDT) Received: by mail-oi1-x22b.google.com with SMTP id n140so11217342oig.9; Sun, 28 Mar 2021 13:20:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=azpBQutchzCr3FplNtXl63QhozAq1NlwxEgz2eiaadY=; b=ovD8zUGLx3KDEpOyeqLkWSsJquWK/TxG/Jg0z1IX4kDG8RQ4p/nCNSuDqFulV/gASz Yd+qtq8UYoorLFn8NbYu7w0b6m7JKxzG7s8JJWh5qdquis1vl849DrgulJa60SfuY0sH 1QKVc+17y0Wo5vQ2ZPTv8anaz/T5x5mNuT1ohCS4JqAv1j7tDqdGWutjpbvd3/0KsKFA XHwdPNh/dTRO8DoZP/3UasSgnRV4etaNhVaYBjvr3kX/FVwvLKlAsXZsy4kTj4wnSmxI lG/9QE/TkmRWu6rrusBe8iso9wpVvlVb57yiULMqC9NHF7RtMRXYytG9lMFdZJROMagO GIlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=azpBQutchzCr3FplNtXl63QhozAq1NlwxEgz2eiaadY=; b=A4ZfD7skEANcIAhPKEflQMJY0JL5GUaKAX35rfrdTxEKZ6JTK22NAL3UTpDq3RuPAm u02yKiQ1R9HKWoN/7CClW0bQReBTcwE8SbSaayFn7UduCf8blQIz/y4xAfHifOzM/zDi VK4eSgm4TlOF18hBsaDBBffKL6u2LVgqrPf4IRine+88lDoXbSjv+UfLcRw5XDvs9Ixp j/LGKRR+HyfDGV9vNJ7t74EgxME00fX3ee41M4eCKfd0cw15Tcrxh8SUTQT0p8qHe4zM YOn7NHTqaUkh/AZbSNHNAbpDrd8ncBB2DcYpUdxGNz4eT+HvKawchfH8pEPtHy1mB44i UTxQ== X-Gm-Message-State: AOAM531+yssQE9WLzXKo2VCVWjxwj5ZK+Dz/S9ri5Pn9hGBW1uzyyE02 ++ACUIgDon+L16X9InVZVofkNi2o/S9GsA== X-Google-Smtp-Source: ABdhPJwu987Qf55cWQYIC3uYEaRJEF2/2X41Jxso05nEbV/VmR1bz3XQWRQkI94Jsdc7n+30AOzjYA== X-Received: by 2002:a05:6808:1413:: with SMTP id w19mr16658869oiv.20.1616962819371; Sun, 28 Mar 2021 13:20:19 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:bca7:4b69:be8f:21bb]) by smtp.gmail.com with ESMTPSA id v30sm3898240otb.23.2021.03.28.13.20.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Mar 2021 13:20:19 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Lorenz Bauer , Jakub Sitnicki Subject: [Patch bpf-next v7 01/13] skmsg: lock ingress_skb when purging Date: Sun, 28 Mar 2021 13:20:01 -0700 Message-Id: <20210328202013.29223-2-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210328202013.29223-1-xiyou.wangcong@gmail.com> References: <20210328202013.29223-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Currently we purge the ingress_skb queue only when psock refcnt goes down to 0, so locking the queue is not necessary, but in order to be called during ->close, we have to lock it here. Cc: John Fastabend Cc: Daniel Borkmann Cc: Lorenz Bauer Acked-by: Jakub Sitnicki Signed-off-by: Cong Wang --- net/core/skmsg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 07f54015238a..bebf84ed4e30 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -634,7 +634,7 @@ static void sk_psock_zap_ingress(struct sk_psock *psock) { struct sk_buff *skb; - while ((skb = __skb_dequeue(&psock->ingress_skb)) != NULL) { + while ((skb = skb_dequeue(&psock->ingress_skb)) != NULL) { skb_bpf_redirect_clear(skb); kfree_skb(skb); } From patchwork Sun Mar 28 20:20:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12169107 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EF05C433E3 for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4E4906195E for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231474AbhC1UUr (ORCPT ); Sun, 28 Mar 2021 16:20:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35252 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229955AbhC1UUW (ORCPT ); Sun, 28 Mar 2021 16:20:22 -0400 Received: from mail-oi1-x230.google.com (mail-oi1-x230.google.com [IPv6:2607:f8b0:4864:20::230]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61A02C0613B1; Sun, 28 Mar 2021 13:20:21 -0700 (PDT) Received: by mail-oi1-x230.google.com with SMTP id a8so11214102oic.11; Sun, 28 Mar 2021 13:20:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bkt2WplmLy1bOLMwQLt5MnCSbjUaxe7m0LNKG1XmAxE=; b=Py7vPw9OuXJ7hXh2eubqB1df0Qnf15H636/6gG/tQpsBuVGItU/qfXxhKx0bAZek8I sVABSA/SYmKlO+ZQnpFhZS07brRWTo/3sRv9U6iR4nzBIQTMcRC+OzJ+dRRaffX9bGy9 H3EVye1ZnkYOOoguLAa8hfbbASyR8AH4hLSA8U9NK6tWif4kMUtxe28jXHafykAR5bz7 yAQvGbc18IPH357nFQXK5xrf/tOKUSxcqu/THCWEpBuOk/CebAOGmRM10Sdz4NKVCoZ1 o6X1HkludKpmOVbKBt+Z60TWW2N+tcLmylz89i5XmlEB8jwKdz0pvokiboJ2mQm3y7tJ wqaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bkt2WplmLy1bOLMwQLt5MnCSbjUaxe7m0LNKG1XmAxE=; b=pb2VV02G42EDvfMLRm0z8972nxK1qcvqPYwiA3B/AShYc0vD8kqUmnLlR/iK2FJ4XG M7FJli7yMta70q6q267M8hwQW5qSpRcm7rc4gk7mjNIiMLY41hWaNHqp9GSLbCNlvHLd qGE5zeVDnJNxV2Qx2Ca/wJjuAgs02dc3/2Kx11r08IXdQ2pKvD46DgF83TRqB0X13HfX J+iPY8AXw3un0UW/A77c9jTD+ujkd4PTkxYdICX94oIvBMos2YsOay0/guMD31t4kzz4 TmwF6TisTEkvpFi1r8L9Js+JBl2JD9NEixeO4JWoLOVw9fWX7ta/cTlMF0HMFHBeOFzz Fahg== X-Gm-Message-State: AOAM532rqcU/LrGMn3r3FVT+PjwkJHt+TC02WLSPoalXuLPk+6K5fJEK uPqZUzfUeOkhIqzq7cvfMOs7u1CfONMuVw== X-Google-Smtp-Source: ABdhPJyONoNR1xjtxgqTORv7cOu3pUZPlFwiYjrplcAjO/F3mfSBYL8ew1zDZg8zMk5r1aB5ccDrJA== X-Received: by 2002:aca:dcc6:: with SMTP id t189mr16290047oig.9.1616962820564; Sun, 28 Mar 2021 13:20:20 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:bca7:4b69:be8f:21bb]) by smtp.gmail.com with ESMTPSA id v30sm3898240otb.23.2021.03.28.13.20.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Mar 2021 13:20:20 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Lorenz Bauer , Jakub Sitnicki Subject: [Patch bpf-next v7 02/13] skmsg: introduce a spinlock to protect ingress_msg Date: Sun, 28 Mar 2021 13:20:02 -0700 Message-Id: <20210328202013.29223-3-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210328202013.29223-1-xiyou.wangcong@gmail.com> References: <20210328202013.29223-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Currently we rely on lock_sock to protect ingress_msg, it is too big for this, we can actually just use a spinlock to protect this list like protecting other skb queues. __tcp_bpf_recvmsg() is still special because of peeking, it still has to use lock_sock. Cc: John Fastabend Cc: Daniel Borkmann Cc: Lorenz Bauer Acked-by: Jakub Sitnicki Signed-off-by: Cong Wang Acked-by: John Fastabend --- include/linux/skmsg.h | 46 +++++++++++++++++++++++++++++++++++++++++++ net/core/skmsg.c | 3 +++ net/ipv4/tcp_bpf.c | 18 ++++++----------- 3 files changed, 55 insertions(+), 12 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 6c09d94be2e9..f2d45a73b2b2 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -89,6 +89,7 @@ struct sk_psock { #endif struct sk_buff_head ingress_skb; struct list_head ingress_msg; + spinlock_t ingress_lock; unsigned long state; struct list_head link; spinlock_t link_lock; @@ -284,7 +285,45 @@ static inline struct sk_psock *sk_psock(const struct sock *sk) static inline void sk_psock_queue_msg(struct sk_psock *psock, struct sk_msg *msg) { + spin_lock_bh(&psock->ingress_lock); list_add_tail(&msg->list, &psock->ingress_msg); + spin_unlock_bh(&psock->ingress_lock); +} + +static inline struct sk_msg *sk_psock_dequeue_msg(struct sk_psock *psock) +{ + struct sk_msg *msg; + + spin_lock_bh(&psock->ingress_lock); + msg = list_first_entry_or_null(&psock->ingress_msg, struct sk_msg, list); + if (msg) + list_del(&msg->list); + spin_unlock_bh(&psock->ingress_lock); + return msg; +} + +static inline struct sk_msg *sk_psock_peek_msg(struct sk_psock *psock) +{ + struct sk_msg *msg; + + spin_lock_bh(&psock->ingress_lock); + msg = list_first_entry_or_null(&psock->ingress_msg, struct sk_msg, list); + spin_unlock_bh(&psock->ingress_lock); + return msg; +} + +static inline struct sk_msg *sk_psock_next_msg(struct sk_psock *psock, + struct sk_msg *msg) +{ + struct sk_msg *ret; + + spin_lock_bh(&psock->ingress_lock); + if (list_is_last(&msg->list, &psock->ingress_msg)) + ret = NULL; + else + ret = list_next_entry(msg, list); + spin_unlock_bh(&psock->ingress_lock); + return ret; } static inline bool sk_psock_queue_empty(const struct sk_psock *psock) @@ -292,6 +331,13 @@ static inline bool sk_psock_queue_empty(const struct sk_psock *psock) return psock ? list_empty(&psock->ingress_msg) : true; } +static inline void kfree_sk_msg(struct sk_msg *msg) +{ + if (msg->skb) + consume_skb(msg->skb); + kfree(msg); +} + static inline void sk_psock_report_error(struct sk_psock *psock, int err) { struct sock *sk = psock->sk; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index bebf84ed4e30..305dddc51857 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -592,6 +592,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) INIT_WORK(&psock->work, sk_psock_backlog); INIT_LIST_HEAD(&psock->ingress_msg); + spin_lock_init(&psock->ingress_lock); skb_queue_head_init(&psock->ingress_skb); sk_psock_set_state(psock, SK_PSOCK_TX_ENABLED); @@ -638,7 +639,9 @@ static void sk_psock_zap_ingress(struct sk_psock *psock) skb_bpf_redirect_clear(skb); kfree_skb(skb); } + spin_lock_bh(&psock->ingress_lock); __sk_psock_purge_ingress_msg(psock); + spin_unlock_bh(&psock->ingress_lock); } static void sk_psock_link_destroy(struct sk_psock *psock) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 17c322b875fd..ae980716d896 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -18,9 +18,7 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, struct sk_msg *msg_rx; int i, copied = 0; - msg_rx = list_first_entry_or_null(&psock->ingress_msg, - struct sk_msg, list); - + msg_rx = sk_psock_peek_msg(psock); while (copied != len) { struct scatterlist *sge; @@ -68,22 +66,18 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, } while (i != msg_rx->sg.end); if (unlikely(peek)) { - if (msg_rx == list_last_entry(&psock->ingress_msg, - struct sk_msg, list)) + msg_rx = sk_psock_next_msg(psock, msg_rx); + if (!msg_rx) break; - msg_rx = list_next_entry(msg_rx, list); continue; } msg_rx->sg.start = i; if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { - list_del(&msg_rx->list); - if (msg_rx->skb) - consume_skb(msg_rx->skb); - kfree(msg_rx); + msg_rx = sk_psock_dequeue_msg(psock); + kfree_sk_msg(msg_rx); } - msg_rx = list_first_entry_or_null(&psock->ingress_msg, - struct sk_msg, list); + msg_rx = sk_psock_peek_msg(psock); } return copied; From patchwork Sun Mar 28 20:20:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12169103 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BB6AC433DB for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F3AA161951 for ; Sun, 28 Mar 2021 20:21:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229655AbhC1UUr (ORCPT ); Sun, 28 Mar 2021 16:20:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230294AbhC1UUW (ORCPT ); Sun, 28 Mar 2021 16:20:22 -0400 Received: from mail-ot1-x330.google.com (mail-ot1-x330.google.com [IPv6:2607:f8b0:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86C2BC0613B2; Sun, 28 Mar 2021 13:20:22 -0700 (PDT) Received: by mail-ot1-x330.google.com with SMTP id g8-20020a9d6c480000b02901b65ca2432cso10356504otq.3; Sun, 28 Mar 2021 13:20:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=8Wh0mzicKWG1utRZNIevS3oRZH+mXn5JWIkzgtioLig=; b=YTmQmlqDdXhCg7XR8SjN/0ejUzfeTFTfDMgbz4u2E5tV0DFl46Vm4JerQRt8eRm1PB BnPf6vweNk00PadnV0gulF7emJDu6k0mfIzqkNfqGwNjGXGelN2eYPQE0a08pHaXvey5 5QgxgWETTAhc1nS3OyKI39hAUOn41X7qRVC4JeMB6g3Zhjr5dEh5tLSlCYpo9ECVIOhT xadmzvttnHbLIMeagmSSzXPBNtpjUd12nkU5dx2STNauGBbFJmrMbpm1wTmbloE2/1ga ZqBrSLzd0/77sKXHPL2bf//09kTge5YGEK3dtlDziJmnXc6kMM3PnHWi3X1VHNnSzSmE qhQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=8Wh0mzicKWG1utRZNIevS3oRZH+mXn5JWIkzgtioLig=; b=HYvflgj62eovxVnr65mzxK40nLeF4fX+vThhxtGh3Fe+GQ9BBe0NF+CWURyLG8M95E snAi2Es61Mb/spKiDpOFeoM+IWpqBuzwZJKbHkuEnt9vGQQkfWmLB5qM5ZZurPRkeJkA sp8McXm3rlSZXbaz6WKoQEN8R6jOrLN4cJnTeixzDn1G6sA/0esvNqIFeswrrNg4YMoL zVT1tENYsAKfI/XjW5KHr8NWjvamCajgQodz9ROCS/Lnk0XwmDhPJKaqXEn+u6N4DS2k iZ7SIVlcVYRG5rN9mgHzovhbv2uRa3651SsCsv6djuwUyu6iKbqvQGh3ttR/nt2EpWNc q1+g== X-Gm-Message-State: AOAM531qghv1siLhR32bB/nam/QbvC+iQJ7To+JIxAsolwK8hgcjGoK7 mHrohC1gM0jCJDIT/PVitLaBenewMVmTxQ== X-Google-Smtp-Source: ABdhPJzVWqy3QXlyPUxFwAg1Rm21qfBLIpxjSnUz5pYJzhsnFKV8QWJAt+OrYHFtaxTqHyVttEWTXg== X-Received: by 2002:a05:6830:91:: with SMTP id a17mr20683798oto.309.1616962821798; Sun, 28 Mar 2021 13:20:21 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:bca7:4b69:be8f:21bb]) by smtp.gmail.com with ESMTPSA id v30sm3898240otb.23.2021.03.28.13.20.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Mar 2021 13:20:21 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v7 03/13] net: introduce skb_send_sock() for sock_map Date: Sun, 28 Mar 2021 13:20:03 -0700 Message-Id: <20210328202013.29223-4-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210328202013.29223-1-xiyou.wangcong@gmail.com> References: <20210328202013.29223-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang We only have skb_send_sock_locked() which requires callers to use lock_sock(). Introduce a variant skb_send_sock() which locks on its own, callers do not need to lock it any more. This will save us from adding a ->sendmsg_locked for each protocol. To reuse the code, pass function pointers to __skb_send_sock() and build skb_send_sock() and skb_send_sock_locked() on top. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skbuff.h | 1 + net/core/skbuff.c | 55 ++++++++++++++++++++++++++++++++++++------ 2 files changed, 49 insertions(+), 7 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index c8def85fcc22..dbf820a50a39 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3626,6 +3626,7 @@ int skb_splice_bits(struct sk_buff *skb, struct sock *sk, unsigned int offset, unsigned int flags); int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, int len); +int skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int len); void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to); unsigned int skb_zerocopy_headlen(const struct sk_buff *from); int skb_zerocopy(struct sk_buff *to, struct sk_buff *from, diff --git a/net/core/skbuff.c b/net/core/skbuff.c index e8320b5d651a..3ad9e8425ab2 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2500,9 +2500,32 @@ int skb_splice_bits(struct sk_buff *skb, struct sock *sk, unsigned int offset, } EXPORT_SYMBOL_GPL(skb_splice_bits); -/* Send skb data on a socket. Socket must be locked. */ -int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, - int len) +static int sendmsg_unlocked(struct sock *sk, struct msghdr *msg, + struct kvec *vec, size_t num, size_t size) +{ + struct socket *sock = sk->sk_socket; + + if (!sock) + return -EINVAL; + return kernel_sendmsg(sock, msg, vec, num, size); +} + +static int sendpage_unlocked(struct sock *sk, struct page *page, int offset, + size_t size, int flags) +{ + struct socket *sock = sk->sk_socket; + + if (!sock) + return -EINVAL; + return kernel_sendpage(sock, page, offset, size, flags); +} + +typedef int (*sendmsg_func)(struct sock *sk, struct msghdr *msg, + struct kvec *vec, size_t num, size_t size); +typedef int (*sendpage_func)(struct sock *sk, struct page *page, int offset, + size_t size, int flags); +static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, + int len, sendmsg_func sendmsg, sendpage_func sendpage) { unsigned int orig_len = len; struct sk_buff *head = skb; @@ -2522,7 +2545,8 @@ int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, memset(&msg, 0, sizeof(msg)); msg.msg_flags = MSG_DONTWAIT; - ret = kernel_sendmsg_locked(sk, &msg, &kv, 1, slen); + ret = INDIRECT_CALL_2(sendmsg, kernel_sendmsg_locked, + sendmsg_unlocked, sk, &msg, &kv, 1, slen); if (ret <= 0) goto error; @@ -2553,9 +2577,11 @@ int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, slen = min_t(size_t, len, skb_frag_size(frag) - offset); while (slen) { - ret = kernel_sendpage_locked(sk, skb_frag_page(frag), - skb_frag_off(frag) + offset, - slen, MSG_DONTWAIT); + ret = INDIRECT_CALL_2(sendpage, kernel_sendpage_locked, + sendpage_unlocked, sk, + skb_frag_page(frag), + skb_frag_off(frag) + offset, + slen, MSG_DONTWAIT); if (ret <= 0) goto error; @@ -2587,8 +2613,23 @@ int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, error: return orig_len == len ? ret : orig_len - len; } + +/* Send skb data on a socket. Socket must be locked. */ +int skb_send_sock_locked(struct sock *sk, struct sk_buff *skb, int offset, + int len) +{ + return __skb_send_sock(sk, skb, offset, len, kernel_sendmsg_locked, + kernel_sendpage_locked); +} EXPORT_SYMBOL_GPL(skb_send_sock_locked); +/* Send skb data on a socket. Socket must be unlocked. */ +int skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset, int len) +{ + return __skb_send_sock(sk, skb, offset, len, sendmsg_unlocked, + sendpage_unlocked); +} + /** * skb_store_bits - store bits from kernel buffer to skb * @skb: destination buffer From patchwork Sun Mar 28 20:20:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12169117 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D16EDC433EB for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BBE5761968 for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229595AbhC1UUr (ORCPT ); Sun, 28 Mar 2021 16:20:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35258 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230397AbhC1UUY (ORCPT ); Sun, 28 Mar 2021 16:20:24 -0400 Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADF62C061756; Sun, 28 Mar 2021 13:20:23 -0700 (PDT) Received: by mail-oi1-x22b.google.com with SMTP id x2so11264185oiv.2; Sun, 28 Mar 2021 13:20:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=chzddpAatZoL7UeA/glB0DgIXQP/ctiwnxswV0zD4Ns=; b=Z8bJAGqc9aqxqmmVxOHmcDbKi7jS+gLXh5rPf5Z8r8TAkES1xHJcPhOIvWLx0Tkplk gNYI2UTH37I7Tuif3fq3iaM91Cq7S/LbnXJ20ym6jHtl9vc+yhE92GI+9OEvVPvv0o7R AoTN6rwLCYzjlyrayyKZ0acgtKKzSMFpRNsoB2Naa1CnD4FjCjDwcf6q8bsJ1wVvbVTx dPCzJyTYdTR0TZsWdq+TqN4dvegJI1kY4N3DaUp1ag5+7iFyM50Ev2YYqCq40nsSzkUX GycYF9ePQs1LEJZXjm4cebynOFzv635R/0Zq/DL6cb9lidOvXPqUPOUUpTMMsUFOCb2c lvPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=chzddpAatZoL7UeA/glB0DgIXQP/ctiwnxswV0zD4Ns=; b=nFGIVU5MIxWOthm+RPTnT+ipxa+LG+MU5hmFo5DZ+at/exw6C8Skgn7aUeuR141wKq 9z5z6W7m+MWDvg67ChlbZl/VNaGF9SB1kmlmydY+CA4c/cx8ok3qI0Dmjqz1Qlf5H7LZ 4x+dBnQg0UcrThL3yx/H6qZqyRGC2XpR+mNjAjjuBLTRxTBMAahmLpnDP5qnGGf2SAXe +jbV+TjZgxpOqtrQyImD1BGNbrDYPYR1DzmwYLP5OEGJAc3MpgA2Sy/25FqVvSI9Ys0x DGhMbhql2/uD3hMqmx0jJ13+6HBy99v2N+hoTiSCzyDXxT9qiFZUQ/wVM3kZO+RpbdyX CUyg== X-Gm-Message-State: AOAM531x0dnG3iCyb4zXkoFC7Ex9ZW2A/aHLvuqwWYlD8uF0PnEco9sb Y/bMwVrepvXM+3uQH+BxOJwks/NsV01V1g== X-Google-Smtp-Source: ABdhPJxzuZOhXtHfJBnlqbNGcL/JiaPxstVZm5SwYnb7BTdk8aV10Vvzf0qwrPZLMNa0K2BRWZPJbg== X-Received: by 2002:aca:1c18:: with SMTP id c24mr16112798oic.7.1616962823009; Sun, 28 Mar 2021 13:20:23 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:bca7:4b69:be8f:21bb]) by smtp.gmail.com with ESMTPSA id v30sm3898240otb.23.2021.03.28.13.20.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Mar 2021 13:20:22 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v7 04/13] skmsg: avoid lock_sock() in sk_psock_backlog() Date: Sun, 28 Mar 2021 13:20:04 -0700 Message-Id: <20210328202013.29223-5-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210328202013.29223-1-xiyou.wangcong@gmail.com> References: <20210328202013.29223-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang We do not have to lock the sock to avoid losing sk_socket, instead we can purge all the ingress queues when we close the socket. Sending or receiving packets after orphaning socket makes no sense. We do purge these queues when psock refcnt reaches zero but here we want to purge them explicitly in sock_map_close(). There are also some nasty race conditions on testing bit SK_PSOCK_TX_ENABLED and queuing/canceling the psock work, we can expand psock->ingress_lock a bit to protect them too. As noticed by John, we still have to lock the psock->work, because the same work item could be running concurrently on different CPU's. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang Acked-by: John Fastabend --- include/linux/skmsg.h | 2 ++ net/core/skmsg.c | 50 +++++++++++++++++++++++++++++-------------- net/core/sock_map.c | 1 + 3 files changed, 37 insertions(+), 16 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index f2d45a73b2b2..7382c4b518d7 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -99,6 +99,7 @@ struct sk_psock { void (*saved_write_space)(struct sock *sk); void (*saved_data_ready)(struct sock *sk); struct proto *sk_proto; + struct mutex work_mutex; struct sk_psock_work_state work_state; struct work_struct work; union { @@ -347,6 +348,7 @@ static inline void sk_psock_report_error(struct sk_psock *psock, int err) } struct sk_psock *sk_psock_init(struct sock *sk, int node); +void sk_psock_stop(struct sk_psock *psock, bool wait); #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock); diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 305dddc51857..9c25020086a9 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -497,7 +497,7 @@ static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb, if (!ingress) { if (!sock_writeable(psock->sk)) return -EAGAIN; - return skb_send_sock_locked(psock->sk, skb, off, len); + return skb_send_sock(psock->sk, skb, off, len); } return sk_psock_skb_ingress(psock, skb); } @@ -511,8 +511,7 @@ static void sk_psock_backlog(struct work_struct *work) u32 len, off; int ret; - /* Lock sock to avoid losing sk_socket during loop. */ - lock_sock(psock->sk); + mutex_lock(&psock->work_mutex); if (state->skb) { skb = state->skb; len = state->len; @@ -529,7 +528,7 @@ static void sk_psock_backlog(struct work_struct *work) skb_bpf_redirect_clear(skb); do { ret = -EIO; - if (likely(psock->sk->sk_socket)) + if (!sock_flag(psock->sk, SOCK_DEAD)) ret = sk_psock_handle_skb(psock, skb, off, len, ingress); if (ret <= 0) { @@ -553,7 +552,7 @@ static void sk_psock_backlog(struct work_struct *work) kfree_skb(skb); } end: - release_sock(psock->sk); + mutex_unlock(&psock->work_mutex); } struct sk_psock *sk_psock_init(struct sock *sk, int node) @@ -591,6 +590,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) spin_lock_init(&psock->link_lock); INIT_WORK(&psock->work, sk_psock_backlog); + mutex_init(&psock->work_mutex); INIT_LIST_HEAD(&psock->ingress_msg); spin_lock_init(&psock->ingress_lock); skb_queue_head_init(&psock->ingress_skb); @@ -631,7 +631,7 @@ static void __sk_psock_purge_ingress_msg(struct sk_psock *psock) } } -static void sk_psock_zap_ingress(struct sk_psock *psock) +static void __sk_psock_zap_ingress(struct sk_psock *psock) { struct sk_buff *skb; @@ -639,9 +639,7 @@ static void sk_psock_zap_ingress(struct sk_psock *psock) skb_bpf_redirect_clear(skb); kfree_skb(skb); } - spin_lock_bh(&psock->ingress_lock); __sk_psock_purge_ingress_msg(psock); - spin_unlock_bh(&psock->ingress_lock); } static void sk_psock_link_destroy(struct sk_psock *psock) @@ -654,6 +652,18 @@ static void sk_psock_link_destroy(struct sk_psock *psock) } } +void sk_psock_stop(struct sk_psock *psock, bool wait) +{ + spin_lock_bh(&psock->ingress_lock); + sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); + sk_psock_cork_free(psock); + __sk_psock_zap_ingress(psock); + spin_unlock_bh(&psock->ingress_lock); + + if (wait) + cancel_work_sync(&psock->work); +} + static void sk_psock_done_strp(struct sk_psock *psock); static void sk_psock_destroy_deferred(struct work_struct *gc) @@ -665,12 +675,12 @@ static void sk_psock_destroy_deferred(struct work_struct *gc) sk_psock_done_strp(psock); cancel_work_sync(&psock->work); + mutex_destroy(&psock->work_mutex); psock_progs_drop(&psock->progs); sk_psock_link_destroy(psock); sk_psock_cork_free(psock); - sk_psock_zap_ingress(psock); if (psock->sk_redir) sock_put(psock->sk_redir); @@ -688,8 +698,7 @@ static void sk_psock_destroy(struct rcu_head *rcu) void sk_psock_drop(struct sock *sk, struct sk_psock *psock) { - sk_psock_cork_free(psock); - sk_psock_zap_ingress(psock); + sk_psock_stop(psock, false); write_lock_bh(&sk->sk_callback_lock); sk_psock_restore_proto(sk, psock); @@ -699,7 +708,6 @@ void sk_psock_drop(struct sock *sk, struct sk_psock *psock) else if (psock->progs.stream_verdict) sk_psock_stop_verdict(sk, psock); write_unlock_bh(&sk->sk_callback_lock); - sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); call_rcu(&psock->rcu, sk_psock_destroy); } @@ -770,14 +778,20 @@ static void sk_psock_skb_redirect(struct sk_buff *skb) * error that caused the pipe to break. We can't send a packet on * a socket that is in this state so we drop the skb. */ - if (!psock_other || sock_flag(sk_other, SOCK_DEAD) || - !sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) { + if (!psock_other || sock_flag(sk_other, SOCK_DEAD)) { + kfree_skb(skb); + return; + } + spin_lock_bh(&psock_other->ingress_lock); + if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) { + spin_unlock_bh(&psock_other->ingress_lock); kfree_skb(skb); return; } skb_queue_tail(&psock_other->ingress_skb, skb); schedule_work(&psock_other->work); + spin_unlock_bh(&psock_other->ingress_lock); } static void sk_psock_tls_verdict_apply(struct sk_buff *skb, struct sock *sk, int verdict) @@ -845,8 +859,12 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, err = sk_psock_skb_ingress_self(psock, skb); } if (err < 0) { - skb_queue_tail(&psock->ingress_skb, skb); - schedule_work(&psock->work); + spin_lock_bh(&psock->ingress_lock); + if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) { + skb_queue_tail(&psock->ingress_skb, skb); + schedule_work(&psock->work); + } + spin_unlock_bh(&psock->ingress_lock); } break; case __SK_REDIRECT: diff --git a/net/core/sock_map.c b/net/core/sock_map.c index dd53a7771d7e..e564fdeaada1 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1540,6 +1540,7 @@ void sock_map_close(struct sock *sk, long timeout) saved_close = psock->saved_close; sock_map_remove_links(sk, psock); rcu_read_unlock(); + sk_psock_stop(psock, true); release_sock(sk); saved_close(sk, timeout); } From patchwork Sun Mar 28 20:20:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12169113 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF4DAC433EA for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AA25B61974 for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231479AbhC1UUs (ORCPT ); Sun, 28 Mar 2021 16:20:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35264 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230447AbhC1UUZ (ORCPT ); Sun, 28 Mar 2021 16:20:25 -0400 Received: from mail-ot1-x332.google.com (mail-ot1-x332.google.com [IPv6:2607:f8b0:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE01EC061756; Sun, 28 Mar 2021 13:20:24 -0700 (PDT) Received: by mail-ot1-x332.google.com with SMTP id 31-20020a9d00220000b02901b64b9b50b1so10341162ota.9; Sun, 28 Mar 2021 13:20:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Kfpkxqp/EUWiWSJanqZp6gALxArnEJLF0PtroKlsYNQ=; b=USuYSB8Ud8CyG4glC78Z+vIyZ61cQoUDIsc5bbm8Jsf5ylubuHmwf/Ex201RKyEkPt k4hc8zyA6rNU7zt/3Pygcau8fh/iJjYPf83MZbOeENhnF0ipcOZTk4nxJIYK0Am/Biz6 R0MsoMGqs+BPomA+Hn21Z7u5NalhgPGCTlHt9Nd3W0qcIJ9HfpghnALHU+6q9umCNrtf rY1b/OoYtvmBQBOeuKVZR/E7FndmaciDNGSzuZJbZGDpl+CfE1GyF+G8E59OLIh/gKFs j3Uo/TMBSG2whdDiRx75xcOB/Y/HP0vslAX+N4JlNrPkPAoh86EANYzrd4tnGJz/wA8h T5ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Kfpkxqp/EUWiWSJanqZp6gALxArnEJLF0PtroKlsYNQ=; b=iK7ZxI7n/x5z78Vq9/GT3C0rGwEwg83wTPaj4BfVMRslRaOVbBj0O36Z2mcPd6ozr2 7pDxeE1EbWshK7S6jpggYcYaT3pXh2Wvky9bs1YGaJG52gQU11bbqBwPuYGsAerJF3kZ gQNdo4vm/P1KfxseZT0QDo2CALsFs8gBRUGpFCcvC8xOqpEkmubBOSdQDl9xlYpQOtw+ q3oXAf8y3rWisriLzXdjyGgj/kkNeuDtF1JCH+A8kxMfB/Fs5Eueu+NDylPVOA77aYy0 0GXVItj4g0XBjbW4d7+j1MtAvB/bHe8WnwlKItJKneKhL/5wNXp3PWsK1P3aTb6yla69 YkjQ== X-Gm-Message-State: AOAM530Kx/xz0216/zbuIyS1lJdDoG/HULkDLRcv0PH5IBgwWuqWrNTF UbqniH1xO9+YlymgSGK+icQvGspi8rmdbw== X-Google-Smtp-Source: ABdhPJzKRDMtYw4kPNlYOsTlvicjiSt9LXr88PlrSwBSKmLbf3PYMESULyMSb1EdNAd+Z//8GKlKTg== X-Received: by 2002:a05:6830:20c8:: with SMTP id z8mr20565626otq.14.1616962824198; Sun, 28 Mar 2021 13:20:24 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:bca7:4b69:be8f:21bb]) by smtp.gmail.com with ESMTPSA id v30sm3898240otb.23.2021.03.28.13.20.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Mar 2021 13:20:23 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer , John Fastabend Subject: [Patch bpf-next v7 05/13] skmsg: use rcu work for destroying psock Date: Sun, 28 Mar 2021 13:20:05 -0700 Message-Id: <20210328202013.29223-6-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210328202013.29223-1-xiyou.wangcong@gmail.com> References: <20210328202013.29223-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang The RCU callback sk_psock_destroy() only queues work psock->gc, so we can just switch to rcu work to simplify the code. Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Acked-by: John Fastabend Signed-off-by: Cong Wang Acked-by: John Fastabend --- include/linux/skmsg.h | 5 +---- net/core/skmsg.c | 17 +++++------------ 2 files changed, 6 insertions(+), 16 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 7382c4b518d7..e7aba150539d 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -102,10 +102,7 @@ struct sk_psock { struct mutex work_mutex; struct sk_psock_work_state work_state; struct work_struct work; - union { - struct rcu_head rcu; - struct work_struct gc; - }; + struct rcu_work rwork; }; int sk_msg_alloc(struct sock *sk, struct sk_msg *msg, int len, diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 9c25020086a9..d43d43905d2c 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -666,10 +666,10 @@ void sk_psock_stop(struct sk_psock *psock, bool wait) static void sk_psock_done_strp(struct sk_psock *psock); -static void sk_psock_destroy_deferred(struct work_struct *gc) +static void sk_psock_destroy(struct work_struct *work) { - struct sk_psock *psock = container_of(gc, struct sk_psock, gc); - + struct sk_psock *psock = container_of(to_rcu_work(work), + struct sk_psock, rwork); /* No sk_callback_lock since already detached. */ sk_psock_done_strp(psock); @@ -688,14 +688,6 @@ static void sk_psock_destroy_deferred(struct work_struct *gc) kfree(psock); } -static void sk_psock_destroy(struct rcu_head *rcu) -{ - struct sk_psock *psock = container_of(rcu, struct sk_psock, rcu); - - INIT_WORK(&psock->gc, sk_psock_destroy_deferred); - schedule_work(&psock->gc); -} - void sk_psock_drop(struct sock *sk, struct sk_psock *psock) { sk_psock_stop(psock, false); @@ -709,7 +701,8 @@ void sk_psock_drop(struct sock *sk, struct sk_psock *psock) sk_psock_stop_verdict(sk, psock); write_unlock_bh(&sk->sk_callback_lock); - call_rcu(&psock->rcu, sk_psock_destroy); + INIT_RCU_WORK(&psock->rwork, sk_psock_destroy); + queue_rcu_work(system_wq, &psock->rwork); } EXPORT_SYMBOL_GPL(sk_psock_drop); From patchwork Sun Mar 28 20:20:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12169115 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDFCAC433E6 for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 96D196196E for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231493AbhC1UUs (ORCPT ); Sun, 28 Mar 2021 16:20:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230450AbhC1UU0 (ORCPT ); Sun, 28 Mar 2021 16:20:26 -0400 Received: from mail-oi1-x22d.google.com (mail-oi1-x22d.google.com [IPv6:2607:f8b0:4864:20::22d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15B87C061756; Sun, 28 Mar 2021 13:20:26 -0700 (PDT) Received: by mail-oi1-x22d.google.com with SMTP id f9so11237541oiw.5; Sun, 28 Mar 2021 13:20:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Vy5/4Xe03r6L+lMBgMmyiBEF4Wk2/4jX5bPzX7yT5rI=; b=P483kmrQ3hPUF4OTht4dqG0Z6pRHDLzc5oP9lJJGFY2L4ejtgUgn4UN/i3KxFRob66 VPN0VfTGI+wp+s/WuhStq7zMUXRQnJc9KWeodKpxVIg0ixbzwyF4Vt7qhOlivVNVW4Ia XtZnHmc7QCGtGxl+8z7DLLuhtT/dLcxfawFT1FXxTxZ1/qmxpvjlHdvh+oFd7eApBqI4 N/ZR9uhT6TKcsM1CPr74yPMrXCQhANsor8vyDcXIgnlLTHu4OEdMRNMC41zWf2UHyIDN BBFrc+4jcRy6YeuoXN4iJTjND/TrWBFWLZWgIFR9GIimDNFvA7qc0HMZpQaQ+g5KPc0K KZEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Vy5/4Xe03r6L+lMBgMmyiBEF4Wk2/4jX5bPzX7yT5rI=; b=ZLZotOI2fQLy2iZnnhFskHT5/RCn/CTiKUrtTiNBGI19veMvDS5IEk5SzGFhMQcbCh qZ3SNTaXhzyFFY0mYAnkOcNJPid0uu5REZ91tAS+Rk0EkIUGfJ9dfJujRAN1LZLCUsHp RXpRViW8vBBqqvuQpr7TW6XfKuFQjckutuR2JR4/rVQb/GOOkgLNyaOrmjv8dpIkT7+B BsLyPtzLp4Yi5xCXQFT+RhY4TYAoujyjtDJqa5rqTc8XUNlfuX3wXCwjeGSeCjyC6xRh 4UlYhjekDHjG62ad7pGLEg5I6xkDGvLvFV7dDZzPO03fOIm+nbEZdfTK4QyDz6norjjT nGuQ== X-Gm-Message-State: AOAM531ThPT9CduK+e/RcNVY4T5zzrDBIzyAckMHlKbJvmKOmFxjqt5H 8j2AXWvazCuk4RucDxEEYBLypBCHfHye2Q== X-Google-Smtp-Source: ABdhPJyot2nKFpKqC+iXRR7+VLJNuoXMNp5LWNiaTqRqv8hzgmAuzhfFE1QE/GcIET64Yp/XTmrDXg== X-Received: by 2002:a05:6808:2d0:: with SMTP id a16mr16158619oid.83.1616962825391; Sun, 28 Mar 2021 13:20:25 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:bca7:4b69:be8f:21bb]) by smtp.gmail.com with ESMTPSA id v30sm3898240otb.23.2021.03.28.13.20.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Mar 2021 13:20:25 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v7 06/13] skmsg: use GFP_KERNEL in sk_psock_create_ingress_msg() Date: Sun, 28 Mar 2021 13:20:06 -0700 Message-Id: <20210328202013.29223-7-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210328202013.29223-1-xiyou.wangcong@gmail.com> References: <20210328202013.29223-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang This function is only called in process context. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang Acked-by: John Fastabend --- net/core/skmsg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index d43d43905d2c..656eceab73bc 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -410,7 +410,7 @@ static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, if (!sk_rmem_schedule(sk, skb, skb->truesize)) return NULL; - msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); + msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_KERNEL); if (unlikely(!msg)) return NULL; From patchwork Sun Mar 28 20:20:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12169111 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88C01C433E4 for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 713E661981 for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231502AbhC1UUt (ORCPT ); Sun, 28 Mar 2021 16:20:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35278 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231265AbhC1UU1 (ORCPT ); Sun, 28 Mar 2021 16:20:27 -0400 Received: from mail-ot1-x330.google.com (mail-ot1-x330.google.com [IPv6:2607:f8b0:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DE26C061756; Sun, 28 Mar 2021 13:20:27 -0700 (PDT) Received: by mail-ot1-x330.google.com with SMTP id v24-20020a9d69d80000b02901b9aec33371so10361539oto.2; Sun, 28 Mar 2021 13:20:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=txJ7hdcrfRxsh60yxdjf17Knn8BTbiq4ZEsyTzM78uo=; b=Oih5wQxoHlxb8x2ETcVztr5fFEyH+1dVjYvyqRSqsTiR9a/D5pYyl4ZxkQuKUDncQR SXS9FpO1PMl3tKDKuVm2Wm/KId4zkpRBvjji+KYoH4lS9bnjqcMJfU6A/Gg0DQpXZLIu Ufg9bI3hSUvjviJx9dzK0Lb0DIk4OACqi7UpjvKPwAR5ZrCZ9cvxGFh2ZDJzittzyHtU Ek1J+KVRhH8wqz2VcYZC0VCJLaPSeXsjxB/dNpoCXJ4Mp1DwDAZh7KEDt86NF0xQ0eYJ IB9q4JRkoX05bKOkiwu1RdYl/ylbHG7hMXaSeOVMKkRLY3CaVMMVqVi7Qpdb8WcoDrjz MZ9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=txJ7hdcrfRxsh60yxdjf17Knn8BTbiq4ZEsyTzM78uo=; b=I+zCWdwpz7H0pWevVeKbfmEzvFXYqx+Xlo6sEUOCwzs8SyMKUFt61igD2TVee7Kxan yoQoF9a5Ix9y9xYG9vYwQberD03yGVyvGt6/cF7Lh0nkH6WZ0F8lccag95Gvekvu4s5m GKDfRXc3SqLbn4SAOlUrKVN5ZmZvKiBEH3FB1dZptcphNH/JcTrL03upiDqDHZ6P0h8Y 7K5+2Q+nfeo5AbCd6qiE8fudmty0Gg7bAmirPfJtByhAwnjHgOjtBrQt8AVyHDD+/dv1 tjfMjjVeitbPA7NW0yxvYLXIhyQIxKHAH/B3JSONRix16UszrMiQ/w+vYfVlGXt07Zbw mExw== X-Gm-Message-State: AOAM530piPya13L8BHDl+nWCHquZ05xLtqFFtdwNnjeb8saSUkJWirZj nRUQKil/eaAYCQfaCWdNR9Rfu9RsNY+GQw== X-Google-Smtp-Source: ABdhPJys5YFhYrXBE4Gd8HUVjkCVYUbFnESX85OCg8qVgC4tCEJPyXoZonDczGzjwRnr6KrDmHoZ0w== X-Received: by 2002:a9d:12cb:: with SMTP id g69mr19615340otg.77.1616962826595; Sun, 28 Mar 2021 13:20:26 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:bca7:4b69:be8f:21bb]) by smtp.gmail.com with ESMTPSA id v30sm3898240otb.23.2021.03.28.13.20.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Mar 2021 13:20:26 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v7 07/13] sock_map: introduce BPF_SK_SKB_VERDICT Date: Sun, 28 Mar 2021 13:20:07 -0700 Message-Id: <20210328202013.29223-8-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210328202013.29223-1-xiyou.wangcong@gmail.com> References: <20210328202013.29223-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Reusing BPF_SK_SKB_STREAM_VERDICT is possible but its name is confusing and more importantly we still want to distinguish them from user-space. So we can just reuse the stream verdict code but introduce a new type of eBPF program, skb_verdict. Users are not allowed to set stream_verdict and skb_verdict at the same time. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 2 ++ include/uapi/linux/bpf.h | 1 + kernel/bpf/syscall.c | 1 + net/core/skmsg.c | 4 +++- net/core/sock_map.c | 23 ++++++++++++++++++++++- tools/bpf/bpftool/common.c | 1 + tools/bpf/bpftool/prog.c | 1 + tools/include/uapi/linux/bpf.h | 1 + 8 files changed, 32 insertions(+), 2 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index e7aba150539d..c83dbc2d81d9 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -58,6 +58,7 @@ struct sk_psock_progs { struct bpf_prog *msg_parser; struct bpf_prog *stream_parser; struct bpf_prog *stream_verdict; + struct bpf_prog *skb_verdict; }; enum sk_psock_state_bits { @@ -487,6 +488,7 @@ static inline void psock_progs_drop(struct sk_psock_progs *progs) psock_set_prog(&progs->msg_parser, NULL); psock_set_prog(&progs->stream_parser, NULL); psock_set_prog(&progs->stream_verdict, NULL); + psock_set_prog(&progs->skb_verdict, NULL); } int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 598716742593..49371eba98ba 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -957,6 +957,7 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_SKB_VERDICT, __MAX_BPF_ATTACH_TYPE }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 9603de81811a..6428634da57e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2948,6 +2948,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type) return BPF_PROG_TYPE_SK_MSG; case BPF_SK_SKB_STREAM_PARSER: case BPF_SK_SKB_STREAM_VERDICT: + case BPF_SK_SKB_VERDICT: return BPF_PROG_TYPE_SK_SKB; case BPF_LIRC_MODE2: return BPF_PROG_TYPE_LIRC_MODE2; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 656eceab73bc..a045812d7c78 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -697,7 +697,7 @@ void sk_psock_drop(struct sock *sk, struct sk_psock *psock) rcu_assign_sk_user_data(sk, NULL); if (psock->progs.stream_parser) sk_psock_stop_strp(sk, psock); - else if (psock->progs.stream_verdict) + else if (psock->progs.stream_verdict || psock->progs.skb_verdict) sk_psock_stop_verdict(sk, psock); write_unlock_bh(&sk->sk_callback_lock); @@ -1024,6 +1024,8 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb, } skb_set_owner_r(skb, sk); prog = READ_ONCE(psock->progs.stream_verdict); + if (!prog) + prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { skb_dst_drop(skb); skb_bpf_redirect_clear(skb); diff --git a/net/core/sock_map.c b/net/core/sock_map.c index e564fdeaada1..c46709786a49 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -155,6 +155,8 @@ static void sock_map_del_link(struct sock *sk, strp_stop = true; if (psock->saved_data_ready && stab->progs.stream_verdict) verdict_stop = true; + if (psock->saved_data_ready && stab->progs.skb_verdict) + verdict_stop = true; list_del(&link->list); sk_psock_free_link(link); } @@ -227,7 +229,7 @@ static struct sk_psock *sock_map_psock_get_checked(struct sock *sk) static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, struct sock *sk) { - struct bpf_prog *msg_parser, *stream_parser, *stream_verdict; + struct bpf_prog *msg_parser, *stream_parser, *stream_verdict, *skb_verdict; struct sk_psock *psock; int ret; @@ -256,6 +258,15 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, } } + skb_verdict = READ_ONCE(progs->skb_verdict); + if (skb_verdict) { + skb_verdict = bpf_prog_inc_not_zero(skb_verdict); + if (IS_ERR(skb_verdict)) { + ret = PTR_ERR(skb_verdict); + goto out_put_msg_parser; + } + } + psock = sock_map_psock_get_checked(sk); if (IS_ERR(psock)) { ret = PTR_ERR(psock); @@ -265,6 +276,7 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, if (psock) { if ((msg_parser && READ_ONCE(psock->progs.msg_parser)) || (stream_parser && READ_ONCE(psock->progs.stream_parser)) || + (skb_verdict && READ_ONCE(psock->progs.skb_verdict)) || (stream_verdict && READ_ONCE(psock->progs.stream_verdict))) { sk_psock_put(sk, psock); ret = -EBUSY; @@ -296,6 +308,9 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, } else if (!stream_parser && stream_verdict && !psock->saved_data_ready) { psock_set_prog(&psock->progs.stream_verdict, stream_verdict); sk_psock_start_verdict(sk,psock); + } else if (!stream_verdict && skb_verdict && !psock->saved_data_ready) { + psock_set_prog(&psock->progs.skb_verdict, skb_verdict); + sk_psock_start_verdict(sk, psock); } write_unlock_bh(&sk->sk_callback_lock); return 0; @@ -304,6 +319,9 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, out_drop: sk_psock_put(sk, psock); out_progs: + if (skb_verdict) + bpf_prog_put(skb_verdict); +out_put_msg_parser: if (msg_parser) bpf_prog_put(msg_parser); out_put_stream_parser: @@ -1468,6 +1486,9 @@ static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog, case BPF_SK_SKB_STREAM_VERDICT: pprog = &progs->stream_verdict; break; + case BPF_SK_SKB_VERDICT: + pprog = &progs->skb_verdict; + break; default: return -EOPNOTSUPP; } diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index 65303664417e..1828bba19020 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -57,6 +57,7 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = { [BPF_SK_SKB_STREAM_PARSER] = "sk_skb_stream_parser", [BPF_SK_SKB_STREAM_VERDICT] = "sk_skb_stream_verdict", + [BPF_SK_SKB_VERDICT] = "sk_skb_verdict", [BPF_SK_MSG_VERDICT] = "sk_msg_verdict", [BPF_LIRC_MODE2] = "lirc_mode2", [BPF_FLOW_DISSECTOR] = "flow_dissector", diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index f2b915b20546..3f067d2d7584 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -76,6 +76,7 @@ enum dump_mode { static const char * const attach_type_strings[] = { [BPF_SK_SKB_STREAM_PARSER] = "stream_parser", [BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict", + [BPF_SK_SKB_VERDICT] = "skb_verdict", [BPF_SK_MSG_VERDICT] = "msg_verdict", [BPF_FLOW_DISSECTOR] = "flow_dissector", [__MAX_BPF_ATTACH_TYPE] = NULL, diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index ab9f2233607c..69902603012c 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -957,6 +957,7 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_SKB_VERDICT, __MAX_BPF_ATTACH_TYPE }; From patchwork Sun Mar 28 20:20:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12169127 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C44DC433E9 for ; Sun, 28 Mar 2021 20:21:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F111161972 for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231516AbhC1UUt (ORCPT ); Sun, 28 Mar 2021 16:20:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231303AbhC1UU3 (ORCPT ); Sun, 28 Mar 2021 16:20:29 -0400 Received: from mail-oi1-x22a.google.com (mail-oi1-x22a.google.com [IPv6:2607:f8b0:4864:20::22a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 84FB2C061756; Sun, 28 Mar 2021 13:20:28 -0700 (PDT) Received: by mail-oi1-x22a.google.com with SMTP id k25so11246062oic.4; Sun, 28 Mar 2021 13:20:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zvaggGnRUD+zn2YaUId4cvDCkS8Ol7rCuAuVLFLyeI8=; b=PnFaIwb8hKHTjIk6WfVM2xNyfjr/cIaIwYeayr5po8++RCVzqThLkoF2afhY2XDjQ1 MfS/4qPfWKSLhggWJt5kDqKImOxXVqnmzuXpvex9DhUubd4bENzFRD9nSyHmds8xcoRe bQrvho7u8/Uy+SQWGLOYsmoc5HkywnDjrBdJ/A6WI6j1RzQhzWGXTqhRA2jYRp41b4XG rj2YM+nIwPDguf+qzmlLM/3C5qkLTgTLMdnedkYwsZ1ZZvuRcr1LrZFpLzXPn42rqZ+4 zXIzA0rxoDXFeU1STrp3XwAZiEiQg+CYfnDdvX5nMUkjptonWCeMmK3xLStNegSsa8Wy ihvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zvaggGnRUD+zn2YaUId4cvDCkS8Ol7rCuAuVLFLyeI8=; b=h2ZESDEBvLi0kyDKRkhaHWWUf+uGBJkx0IHQ4fRqcAxrxogSkjGBg4VyER5MixGRQ2 rQx8A4QNvXvO09vj2UWZDE0NJl3rQLK85M87w0PZGCrGGuJuh/pWI1xgQ7V66FYfvGPg O5w9opPhdjtiI0qBESAq11N78cnAyRx/8JTzWqQhLjiwtiiTI9WBzoJyG0LIcnryBei1 vnLXCx012VYW0CDZUPLzhCijTfLPfpFQZAU3Xh3ayED4K/RGBMgsUXYd4bg83p6sPeC1 1bhPDN82iCSQBu4tlnJ32iwNNk7B7x7oPrzwQ57y4dlQL2kErpb0jm8iNW6fhxxx5+qg QT+w== X-Gm-Message-State: AOAM532evyo0ynVAYml4D5Nw7CGfFi+Q01VTZDs2Mf3jfqIbr8VEcByO h+rrYsXy7T7S4KXLUhaWv3y2lX9GQlQ16g== X-Google-Smtp-Source: ABdhPJzrfoDDjC8B5dsxt1BtdKevYEeB4sL3F7Lpbuba+szbcCszIkRYzygW8M6EqmalUkyK7oL9lw== X-Received: by 2002:a05:6808:9ad:: with SMTP id e13mr2358709oig.63.1616962827818; Sun, 28 Mar 2021 13:20:27 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:bca7:4b69:be8f:21bb]) by smtp.gmail.com with ESMTPSA id v30sm3898240otb.23.2021.03.28.13.20.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Mar 2021 13:20:27 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v7 08/13] sock: introduce sk->sk_prot->psock_update_sk_prot() Date: Sun, 28 Mar 2021 13:20:08 -0700 Message-Id: <20210328202013.29223-9-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210328202013.29223-1-xiyou.wangcong@gmail.com> References: <20210328202013.29223-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Currently sockmap calls into each protocol to update the struct proto and replace it. This certainly won't work when the protocol is implemented as a module, for example, AF_UNIX. Introduce a new ops sk->sk_prot->psock_update_sk_prot(), so each protocol can implement its own way to replace the struct proto. This also helps get rid of symbol dependencies on CONFIG_INET. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 18 +++--------------- include/net/sock.h | 3 +++ include/net/tcp.h | 1 + include/net/udp.h | 1 + net/core/skmsg.c | 5 ----- net/core/sock_map.c | 24 ++++-------------------- net/ipv4/tcp_bpf.c | 24 +++++++++++++++++++++--- net/ipv4/tcp_ipv4.c | 3 +++ net/ipv4/udp.c | 3 +++ net/ipv4/udp_bpf.c | 15 +++++++++++++-- net/ipv6/tcp_ipv6.c | 3 +++ net/ipv6/udp.c | 3 +++ 12 files changed, 58 insertions(+), 45 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index c83dbc2d81d9..5e800ddc2dc6 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -99,6 +99,7 @@ struct sk_psock { void (*saved_close)(struct sock *sk, long timeout); void (*saved_write_space)(struct sock *sk); void (*saved_data_ready)(struct sock *sk); + int (*psock_update_sk_prot)(struct sock *sk, bool restore); struct proto *sk_proto; struct mutex work_mutex; struct sk_psock_work_state work_state; @@ -395,25 +396,12 @@ static inline void sk_psock_cork_free(struct sk_psock *psock) } } -static inline void sk_psock_update_proto(struct sock *sk, - struct sk_psock *psock, - struct proto *ops) -{ - /* Pairs with lockless read in sk_clone_lock() */ - WRITE_ONCE(sk->sk_prot, ops); -} - static inline void sk_psock_restore_proto(struct sock *sk, struct sk_psock *psock) { sk->sk_prot->unhash = psock->saved_unhash; - if (inet_csk_has_ulp(sk)) { - tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space); - } else { - sk->sk_write_space = psock->saved_write_space; - /* Pairs with lockless read in sk_clone_lock() */ - WRITE_ONCE(sk->sk_prot, psock->sk_proto); - } + if (psock->psock_update_sk_prot) + psock->psock_update_sk_prot(sk, true); } static inline void sk_psock_set_state(struct sk_psock *psock, diff --git a/include/net/sock.h b/include/net/sock.h index 0b6266fd6bf6..8b4155e756c2 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1184,6 +1184,9 @@ struct proto { void (*unhash)(struct sock *sk); void (*rehash)(struct sock *sk); int (*get_port)(struct sock *sk, unsigned short snum); +#ifdef CONFIG_BPF_SYSCALL + int (*psock_update_sk_prot)(struct sock *sk, bool restore); +#endif /* Keeping track of sockets in use */ #ifdef CONFIG_PROC_FS diff --git a/include/net/tcp.h b/include/net/tcp.h index 075de26f449d..2efa4e5ea23d 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2203,6 +2203,7 @@ struct sk_psock; #ifdef CONFIG_BPF_SYSCALL struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock); +int tcp_bpf_update_proto(struct sock *sk, bool restore); void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); #endif /* CONFIG_BPF_SYSCALL */ diff --git a/include/net/udp.h b/include/net/udp.h index d4d064c59232..df7cc1edc200 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -518,6 +518,7 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk, #ifdef CONFIG_BPF_SYSCALL struct sk_psock; struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock); +int udp_bpf_update_proto(struct sock *sk, bool restore); #endif #endif /* _UDP_H */ diff --git a/net/core/skmsg.c b/net/core/skmsg.c index a045812d7c78..9fc83f7cc1a0 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -562,11 +562,6 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) write_lock_bh(&sk->sk_callback_lock); - if (inet_csk_has_ulp(sk)) { - psock = ERR_PTR(-EINVAL); - goto out; - } - if (sk->sk_user_data) { psock = ERR_PTR(-EBUSY); goto out; diff --git a/net/core/sock_map.c b/net/core/sock_map.c index c46709786a49..d04b98fc8104 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -184,26 +184,10 @@ static void sock_map_unref(struct sock *sk, void *link_raw) static int sock_map_init_proto(struct sock *sk, struct sk_psock *psock) { - struct proto *prot; - - switch (sk->sk_type) { - case SOCK_STREAM: - prot = tcp_bpf_get_proto(sk, psock); - break; - - case SOCK_DGRAM: - prot = udp_bpf_get_proto(sk, psock); - break; - - default: + if (!sk->sk_prot->psock_update_sk_prot) return -EINVAL; - } - - if (IS_ERR(prot)) - return PTR_ERR(prot); - - sk_psock_update_proto(sk, psock, prot); - return 0; + psock->psock_update_sk_prot = sk->sk_prot->psock_update_sk_prot; + return sk->sk_prot->psock_update_sk_prot(sk, false); } static struct sk_psock *sock_map_psock_get_checked(struct sock *sk) @@ -570,7 +554,7 @@ static bool sock_map_redirect_allowed(const struct sock *sk) static bool sock_map_sk_is_suitable(const struct sock *sk) { - return sk_is_tcp(sk) || sk_is_udp(sk); + return !!sk->sk_prot->psock_update_sk_prot; } static bool sock_map_sk_state_allowed(const struct sock *sk) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index ae980716d896..ac8cfbaeacd2 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -595,20 +595,38 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops) ops->sendpage == tcp_sendpage ? 0 : -ENOTSUPP; } -struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock) +int tcp_bpf_update_proto(struct sock *sk, bool restore) { + struct sk_psock *psock = sk_psock(sk); int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4; int config = psock->progs.msg_parser ? TCP_BPF_TX : TCP_BPF_BASE; + if (restore) { + if (inet_csk_has_ulp(sk)) { + tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space); + } else { + sk->sk_write_space = psock->saved_write_space; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, psock->sk_proto); + } + return 0; + } + + if (inet_csk_has_ulp(sk)) + return -EINVAL; + if (sk->sk_family == AF_INET6) { if (tcp_bpf_assert_proto_ops(psock->sk_proto)) - return ERR_PTR(-EINVAL); + return -EINVAL; tcp_bpf_check_v6_needs_rebuild(psock->sk_proto); } - return &tcp_bpf_prots[family][config]; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, &tcp_bpf_prots[family][config]); + return 0; } +EXPORT_SYMBOL_GPL(tcp_bpf_update_proto); /* If a child got cloned from a listening socket that had tcp_bpf * protocol callbacks installed, we need to restore the callbacks to diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index daad4f99db32..dfc6d1c0e710 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2806,6 +2806,9 @@ struct proto tcp_prot = { .hash = inet_hash, .unhash = inet_unhash, .get_port = inet_csk_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = tcp_bpf_update_proto, +#endif .enter_memory_pressure = tcp_enter_memory_pressure, .leave_memory_pressure = tcp_leave_memory_pressure, .stream_memory_free = tcp_stream_memory_free, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 4a0478b17243..38952aaee3a1 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2849,6 +2849,9 @@ struct proto udp_prot = { .unhash = udp_lib_unhash, .rehash = udp_v4_rehash, .get_port = udp_v4_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = udp_bpf_update_proto, +#endif .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min), diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 7a94791efc1a..6001f93cd3a0 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -41,12 +41,23 @@ static int __init udp_bpf_v4_build_proto(void) } core_initcall(udp_bpf_v4_build_proto); -struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock) +int udp_bpf_update_proto(struct sock *sk, bool restore) { int family = sk->sk_family == AF_INET ? UDP_BPF_IPV4 : UDP_BPF_IPV6; + struct sk_psock *psock = sk_psock(sk); + + if (restore) { + sk->sk_write_space = psock->saved_write_space; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, psock->sk_proto); + return 0; + } if (sk->sk_family == AF_INET6) udp_bpf_check_v6_needs_rebuild(psock->sk_proto); - return &udp_bpf_prots[family]; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, &udp_bpf_prots[family]); + return 0; } +EXPORT_SYMBOL_GPL(udp_bpf_update_proto); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index d0f007741e8e..bff22d6ef516 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -2139,6 +2139,9 @@ struct proto tcpv6_prot = { .hash = inet6_hash, .unhash = inet_unhash, .get_port = inet_csk_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = tcp_bpf_update_proto, +#endif .enter_memory_pressure = tcp_enter_memory_pressure, .leave_memory_pressure = tcp_leave_memory_pressure, .stream_memory_free = tcp_stream_memory_free, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index d25e5a9252fd..ef2c75bb4771 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1713,6 +1713,9 @@ struct proto udpv6_prot = { .unhash = udp_lib_unhash, .rehash = udp_v6_rehash, .get_port = udp_v6_get_port, +#ifdef CONFIG_BPF_SYSCALL + .psock_update_sk_prot = udp_bpf_update_proto, +#endif .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min), From patchwork Sun Mar 28 20:20:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12169123 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B32DBC433F1 for ; Sun, 28 Mar 2021 20:21:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8E5E861953 for ; Sun, 28 Mar 2021 20:21:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231512AbhC1UUt (ORCPT ); Sun, 28 Mar 2021 16:20:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231305AbhC1UUa (ORCPT ); Sun, 28 Mar 2021 16:20:30 -0400 Received: from mail-oo1-xc29.google.com (mail-oo1-xc29.google.com [IPv6:2607:f8b0:4864:20::c29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE118C061762; Sun, 28 Mar 2021 13:20:29 -0700 (PDT) Received: by mail-oo1-xc29.google.com with SMTP id h3-20020a4ae8c30000b02901b68b39e2d3so2528237ooe.9; Sun, 28 Mar 2021 13:20:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=iyyf6p84HkUZhn7oEnpkYVECmHMe1tHTFg8XmIdMNhs=; b=YEw7HTgmy/IPze2HiaOE+9Dme9Y6qjM6+/ibiRThwHKTvwUAXPd49DkRpnKYLG/xKg I3kCxH2plZr8h5C5xo1mAMmdrT6F0zZ4KCjZX9nDulHOAn0MQxvm5xzyfADeun8CI+aa 0nkZ4WGqAdgGutHjI/NKQn/Q52IQAOKKOxTU6jQqx8cDfJxLYokk4uKI0P2wH4Z9EQML coOlyD7mOZHElIFlpX9sFT4Ryfbh78jy9BLV4M9KrMByGxNCL/WeBCxhhbk6mg0eWAsk Q1id8eesmHVCT3fmvcwH2aUnnGOKegYVc7cBrZCqfdkYImbxYSDIQHVd+ftdGXczivn1 FfbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=iyyf6p84HkUZhn7oEnpkYVECmHMe1tHTFg8XmIdMNhs=; b=TtEMhBRxwqGp8FbXHsuLevJa/wHqvXg9NCy6Usmoth5PT8mKChiUZ63T+mDnsxSZLG W3wT2a8NbpsiA22vMtZydh23rfivdW17Zm21OH6mVczkSdYdku3PNw8u8UmMAtWUsEcl syj+aLy6QzwZOE8pGjPZvf+WPjbpae0HNXy+HYNfDaMKTW3U3g5fdx1n7Ot9/17frQyO jEo3Fl470W8yNj8zfDxCS3nOBl1abrQQlwFYIkIe8Z6yFbQ5oVDFWIXiEUpzEaubWLwi /kmcvM+pEgWD/OCfqAH1Yls/GG1/uOwU9tr5mr1lQrrClXQ66xO1D6zOjZO45M+Iueh7 8HdQ== X-Gm-Message-State: AOAM533L8XCn9t77AYqP8XSB/Q7LsrXAIicF1Nu9ipOj9YISxtmFahbc K5OgrUJGfN8NaVUhZSanYkBPq9LUZ1Dn9A== X-Google-Smtp-Source: ABdhPJw3BAM8W3nXhpYv/JVNKXo6fWG6NQYMXdYYt5qyW1JtUsL2CXdvRoox21Oiyu/WMSLAsMh5gg== X-Received: by 2002:a4a:9823:: with SMTP id y32mr18976075ooi.35.1616962829044; Sun, 28 Mar 2021 13:20:29 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:bca7:4b69:be8f:21bb]) by smtp.gmail.com with ESMTPSA id v30sm3898240otb.23.2021.03.28.13.20.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Mar 2021 13:20:28 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v7 09/13] udp: implement ->read_sock() for sockmap Date: Sun, 28 Mar 2021 13:20:09 -0700 Message-Id: <20210328202013.29223-10-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210328202013.29223-1-xiyou.wangcong@gmail.com> References: <20210328202013.29223-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang This is similar to tcp_read_sock(), except we do not need to worry about connections, we just need to retrieve skb from UDP receive queue. Note, the return value of ->read_sock() is unused in sk_psock_verdict_data_ready(). Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/net/udp.h | 2 ++ net/ipv4/af_inet.c | 1 + net/ipv4/udp.c | 35 +++++++++++++++++++++++++++++++++++ net/ipv6/af_inet6.c | 1 + 4 files changed, 39 insertions(+) diff --git a/include/net/udp.h b/include/net/udp.h index df7cc1edc200..347b62a753c3 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -329,6 +329,8 @@ struct sock *__udp6_lib_lookup(struct net *net, struct sk_buff *skb); struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport); +int udp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor); /* UDP uses skb->dev_scratch to cache as much information as possible and avoid * possibly multiple cache miss on dequeue() diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 1355e6c0d567..f17870ee558b 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1070,6 +1070,7 @@ const struct proto_ops inet_dgram_ops = { .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, .sendmsg = inet_sendmsg, + .read_sock = udp_read_sock, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, .sendpage = inet_sendpage, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 38952aaee3a1..04620e4d64ab 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1782,6 +1782,41 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags, } EXPORT_SYMBOL(__skb_recv_udp); +int udp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + int copied = 0; + + while (1) { + int offset = 0, err; + struct sk_buff *skb; + + skb = __skb_recv_udp(sk, 0, 1, &offset, &err); + if (!skb) + return err; + if (offset < skb->len) { + size_t len; + int used; + + len = skb->len - offset; + used = recv_actor(desc, skb, offset, len); + if (used <= 0) { + if (!copied) + copied = used; + break; + } else if (used <= len) { + copied += used; + offset += used; + } + } + if (!desc->count) + break; + } + + return copied; +} +EXPORT_SYMBOL(udp_read_sock); + /* * This should be easy, if there is something there we * return it, otherwise we block. diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 802f5111805a..71de739b4a9e 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -714,6 +714,7 @@ const struct proto_ops inet6_dgram_ops = { .getsockopt = sock_common_getsockopt, /* ok */ .sendmsg = inet6_sendmsg, /* retpoline's sake */ .recvmsg = inet6_recvmsg, /* retpoline's sake */ + .read_sock = udp_read_sock, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, .set_peek_off = sk_set_peek_off, From patchwork Sun Mar 28 20:20:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12169121 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF2C7C433EC for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CC5EB6196E for ; Sun, 28 Mar 2021 20:21:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229955AbhC1UUu (ORCPT ); Sun, 28 Mar 2021 16:20:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231312AbhC1UUb (ORCPT ); Sun, 28 Mar 2021 16:20:31 -0400 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3FDBC061756; Sun, 28 Mar 2021 13:20:30 -0700 (PDT) Received: by mail-oi1-x22e.google.com with SMTP id k25so11246164oic.4; Sun, 28 Mar 2021 13:20:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KLA6Vftyh0UcPve9MLTcr/Qlwo1Hmm6TlBTby3/AhU0=; b=I0WRdD//8pVntVRBKUYbqqGzew2IVAEuhY8c4FMCySRdxf+y3MOfbI02HWBZoCCjFw LetjnLTHBk2g7cxV3W1AZRP+jWZ5G+5Q7iVARNwDLDZGySkQ0wI7Gp9CZstgSUw2NkYQ R6t9go7OPXLApBLLLpcpmhD7z2jw9EzVj83s68/aPUXiIfSm8PHGTZcwJf9BIapUQKnv VOhNkn5SG53M11uzDaaMUunXypaG0nhZGWVw5Ofy0CFQs1+cSBgFIiDfGVwlOMEk2vxQ 7+ixnREth/97sHz03cR/NFOm0VeDwvIcmRgGH94tB+nqCac5mb4taYxE/uimPhCdhBZB 9oqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KLA6Vftyh0UcPve9MLTcr/Qlwo1Hmm6TlBTby3/AhU0=; b=sOTJNQshlLvK/4nXthH8qkuzjWdIpktdDUKjCZ0T20bbtstSsBQJh6ar4qpudj8ttA AWtmN9wge8RyZY2pjN6hTu5DPqLjc+/MzMaDGF/uIvrEO4556qFBUD17aXBu1sB7FWRJ M44DA/F4ieRGA776dwZsChAO0AAe8MCy/dIZcejBMZjqN+kn+Jg4j4Pk6oqR5bjBWjl9 6AjnlZjfYmUlKO0ATGSFjOyyhMueuCoI5+jTaojGbVSMiFUFiH8cUswZ7m/9bUrorS/u wlPc9/9h6xwhuODYXQh0nWzi3hHJBH0DncDl6Xp5+kiocn3ckLgieGCVQYNsN/HUirIO QBSQ== X-Gm-Message-State: AOAM533fogg5NpWk+7BJLUgYMLTtwlL4mSdiGV+d0ia0aZsKzIbxACHX i38bw2I52zqDUdS0CORtOEtAAQ+727z0/Q== X-Google-Smtp-Source: ABdhPJxIjitrSMv21S67F5R9UNVB8f0CxIy4XYfdXogvvE01KwYh6JRLdc4YBzeTeG+WL6Uqi6xIiw== X-Received: by 2002:aca:5845:: with SMTP id m66mr16901423oib.0.1616962830259; Sun, 28 Mar 2021 13:20:30 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:bca7:4b69:be8f:21bb]) by smtp.gmail.com with ESMTPSA id v30sm3898240otb.23.2021.03.28.13.20.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Mar 2021 13:20:29 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v7 10/13] skmsg: extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data() Date: Sun, 28 Mar 2021 13:20:10 -0700 Message-Id: <20210328202013.29223-11-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210328202013.29223-1-xiyou.wangcong@gmail.com> References: <20210328202013.29223-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Although these two functions are only used by TCP, they are not specific to TCP at all, both operate on skmsg and ingress_msg, so fit in net/core/skmsg.c very well. And we will need them for non-TCP, so rename and move them to skmsg.c and export them to modules. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 4 ++ include/net/tcp.h | 2 - net/core/skmsg.c | 98 +++++++++++++++++++++++++++++++++++++++++ net/ipv4/tcp_bpf.c | 100 +----------------------------------------- net/tls/tls_sw.c | 4 +- 5 files changed, 106 insertions(+), 102 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 5e800ddc2dc6..f78e90a04a69 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -125,6 +125,10 @@ int sk_msg_zerocopy_from_iter(struct sock *sk, struct iov_iter *from, struct sk_msg *msg, u32 bytes); int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, struct sk_msg *msg, u32 bytes); +int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags, + long timeo, int *err); +int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, + int len, int flags); static inline void sk_msg_check_to_free(struct sk_msg *msg, u32 i, u32 bytes) { diff --git a/include/net/tcp.h b/include/net/tcp.h index 2efa4e5ea23d..31b1696c62ba 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2209,8 +2209,6 @@ void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, u32 bytes, int flags); -int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, - struct msghdr *msg, int len, int flags); #endif /* CONFIG_NET_SOCK_MSG */ #if !defined(CONFIG_BPF_SYSCALL) || !defined(CONFIG_NET_SOCK_MSG) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 9fc83f7cc1a0..92a83c02562a 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -399,6 +399,104 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, } EXPORT_SYMBOL_GPL(sk_msg_memcopy_from_iter); +int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags, + long timeo, int *err) +{ + DEFINE_WAIT_FUNC(wait, woken_wake_function); + int ret = 0; + + if (sk->sk_shutdown & RCV_SHUTDOWN) + return 1; + + if (!timeo) + return ret; + + add_wait_queue(sk_sleep(sk), &wait); + sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); + ret = sk_wait_event(sk, &timeo, + !list_empty(&psock->ingress_msg) || + !skb_queue_empty(&sk->sk_receive_queue), &wait); + sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); + remove_wait_queue(sk_sleep(sk), &wait); + return ret; +} +EXPORT_SYMBOL_GPL(sk_msg_wait_data); + +/* Receive sk_msg from psock->ingress_msg to @msg. */ +int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, + int len, int flags) +{ + struct iov_iter *iter = &msg->msg_iter; + int peek = flags & MSG_PEEK; + struct sk_msg *msg_rx; + int i, copied = 0; + + msg_rx = sk_psock_peek_msg(psock); + while (copied != len) { + struct scatterlist *sge; + + if (unlikely(!msg_rx)) + break; + + i = msg_rx->sg.start; + do { + struct page *page; + int copy; + + sge = sk_msg_elem(msg_rx, i); + copy = sge->length; + page = sg_page(sge); + if (copied + copy > len) + copy = len - copied; + copy = copy_page_to_iter(page, sge->offset, copy, iter); + if (!copy) + return copied ? copied : -EFAULT; + + copied += copy; + if (likely(!peek)) { + sge->offset += copy; + sge->length -= copy; + if (!msg_rx->skb) + sk_mem_uncharge(sk, copy); + msg_rx->sg.size -= copy; + + if (!sge->length) { + sk_msg_iter_var_next(i); + if (!msg_rx->skb) + put_page(page); + } + } else { + /* Lets not optimize peek case if copy_page_to_iter + * didn't copy the entire length lets just break. + */ + if (copy != sge->length) + return copied; + sk_msg_iter_var_next(i); + } + + if (copied == len) + break; + } while (i != msg_rx->sg.end); + + if (unlikely(peek)) { + msg_rx = sk_psock_next_msg(psock, msg_rx); + if (!msg_rx) + break; + continue; + } + + msg_rx->sg.start = i; + if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { + msg_rx = sk_psock_dequeue_msg(psock); + kfree_sk_msg(msg_rx); + } + msg_rx = sk_psock_peek_msg(psock); + } + + return copied; +} +EXPORT_SYMBOL_GPL(sk_msg_recvmsg); + static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, struct sk_buff *skb) { diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index ac8cfbaeacd2..3d622a0d0753 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -10,80 +10,6 @@ #include #include -int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, - struct msghdr *msg, int len, int flags) -{ - struct iov_iter *iter = &msg->msg_iter; - int peek = flags & MSG_PEEK; - struct sk_msg *msg_rx; - int i, copied = 0; - - msg_rx = sk_psock_peek_msg(psock); - while (copied != len) { - struct scatterlist *sge; - - if (unlikely(!msg_rx)) - break; - - i = msg_rx->sg.start; - do { - struct page *page; - int copy; - - sge = sk_msg_elem(msg_rx, i); - copy = sge->length; - page = sg_page(sge); - if (copied + copy > len) - copy = len - copied; - copy = copy_page_to_iter(page, sge->offset, copy, iter); - if (!copy) - return copied ? copied : -EFAULT; - - copied += copy; - if (likely(!peek)) { - sge->offset += copy; - sge->length -= copy; - if (!msg_rx->skb) - sk_mem_uncharge(sk, copy); - msg_rx->sg.size -= copy; - - if (!sge->length) { - sk_msg_iter_var_next(i); - if (!msg_rx->skb) - put_page(page); - } - } else { - /* Lets not optimize peek case if copy_page_to_iter - * didn't copy the entire length lets just break. - */ - if (copy != sge->length) - return copied; - sk_msg_iter_var_next(i); - } - - if (copied == len) - break; - } while (i != msg_rx->sg.end); - - if (unlikely(peek)) { - msg_rx = sk_psock_next_msg(psock, msg_rx); - if (!msg_rx) - break; - continue; - } - - msg_rx->sg.start = i; - if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { - msg_rx = sk_psock_dequeue_msg(psock); - kfree_sk_msg(msg_rx); - } - msg_rx = sk_psock_peek_msg(psock); - } - - return copied; -} -EXPORT_SYMBOL_GPL(__tcp_bpf_recvmsg); - static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock, struct sk_msg *msg, u32 apply_bytes, int flags) { @@ -237,28 +163,6 @@ static bool tcp_bpf_stream_read(const struct sock *sk) return !empty; } -static int tcp_bpf_wait_data(struct sock *sk, struct sk_psock *psock, - int flags, long timeo, int *err) -{ - DEFINE_WAIT_FUNC(wait, woken_wake_function); - int ret = 0; - - if (sk->sk_shutdown & RCV_SHUTDOWN) - return 1; - - if (!timeo) - return ret; - - add_wait_queue(sk_sleep(sk), &wait); - sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); - ret = sk_wait_event(sk, &timeo, - !list_empty(&psock->ingress_msg) || - !skb_queue_empty(&sk->sk_receive_queue), &wait); - sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); - remove_wait_queue(sk_sleep(sk), &wait); - return ret; -} - static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, int flags, int *addr_len) { @@ -278,13 +182,13 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, } lock_sock(sk); msg_bytes_ready: - copied = __tcp_bpf_recvmsg(sk, psock, msg, len, flags); + copied = sk_msg_recvmsg(sk, psock, msg, len, flags); if (!copied) { int data, err = 0; long timeo; timeo = sock_rcvtimeo(sk, nonblock); - data = tcp_bpf_wait_data(sk, psock, flags, timeo, &err); + data = sk_msg_wait_data(sk, psock, flags, timeo, &err); if (data) { if (!sk_psock_queue_empty(psock)) goto msg_bytes_ready; diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 01d933ae5f16..1dcb34dfd56b 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1789,8 +1789,8 @@ int tls_sw_recvmsg(struct sock *sk, skb = tls_wait_data(sk, psock, flags, timeo, &err); if (!skb) { if (psock) { - int ret = __tcp_bpf_recvmsg(sk, psock, - msg, len, flags); + int ret = sk_msg_recvmsg(sk, psock, msg, len, + flags); if (ret > 0) { decrypted += ret; From patchwork Sun Mar 28 20:20:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12169125 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3B08C433F4 for ; Sun, 28 Mar 2021 20:21:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AE5166196E for ; Sun, 28 Mar 2021 20:21:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230397AbhC1UUu (ORCPT ); Sun, 28 Mar 2021 16:20:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35304 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231349AbhC1UUc (ORCPT ); Sun, 28 Mar 2021 16:20:32 -0400 Received: from mail-oi1-x235.google.com (mail-oi1-x235.google.com [IPv6:2607:f8b0:4864:20::235]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3707EC061756; Sun, 28 Mar 2021 13:20:32 -0700 (PDT) Received: by mail-oi1-x235.google.com with SMTP id m13so11199560oiw.13; Sun, 28 Mar 2021 13:20:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ktDD3wFfhTgzuwFfQ6FhKlNOYR4srlqkDAegPxHtcbc=; b=AZvem2vjN6QCfupveqPPy/9VUQuH5BQmgmIA0KO2OqTkG/aGS8/Daa5U91lG5iZ24a D9d8IYHVUlsORcpJYnb7W5uKdeB/HPgOofOZepdo+opI0HTCuo5anESRk7lG1O/V+xSQ cHo2OwsFb1GXDysh1J3etX6FfTqBWyAodmsoCfM2Tn5RY1ngVWsARuTg/dSRCFhMwR9J YuluQaN/MmWuEVlc27TA5GiX6NdJovFfSgNGQfLHqYOJyrdvgWC5eM5e+HVY4JRbFnK7 f2iiK1keKCAUrEqoNTADUjuZfTAfsO5gwUOZJngk0FVacKaA6nKeziGACN+511Nqqa2i p2bQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ktDD3wFfhTgzuwFfQ6FhKlNOYR4srlqkDAegPxHtcbc=; b=ulchueEyN1iwK8iGfoxzzYFy/gZX5dNTZ3fXOP0EJcRrWgHiv3bHPIBDZqiCnLxgDe BBBPGcYTkFeZGh/pJbcVZanjrpduRmXKDp5qjq7vFqNmBN5SRNmFOwak+NWSLTJ99Bpb 3v3ANRTIApDcM4xTUV4TGfVVi+kXl9iLFtaOCNTY9OL8ETd6wTjpAWcqBbvpzNyrtqV/ ibFb5Y5Y6q1A1u3LkCYmSaX9hZuTgEMlGnbX4T1wGkrkYx8C9mXHfk9np9K3dwRX/Cj0 OMBYNUHTPEBsarDolPhejfLm1tlBEoGRJWUP+TBhz989subRlsS7RWzpQUiT+nBylfUQ 5JvQ== X-Gm-Message-State: AOAM532dHsG7MVEY8n3KLhgBQ1qjonqqVgYf5KH2XvFC+zpEBTAAiwxV 3Kn70IQ1HzQaK+MDTRi9XXOrhDpOKES+Ug== X-Google-Smtp-Source: ABdhPJxJ8GwEoVwRlGEvN5S0VR79hjXFo0Uyd6/fRJ05rF8+qL1Qf/buCMyJy3SJAKnGQFt874lw/w== X-Received: by 2002:aca:b506:: with SMTP id e6mr16127413oif.70.1616962831531; Sun, 28 Mar 2021 13:20:31 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:bca7:4b69:be8f:21bb]) by smtp.gmail.com with ESMTPSA id v30sm3898240otb.23.2021.03.28.13.20.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Mar 2021 13:20:31 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v7 11/13] udp: implement udp_bpf_recvmsg() for sockmap Date: Sun, 28 Mar 2021 13:20:11 -0700 Message-Id: <20210328202013.29223-12-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210328202013.29223-1-xiyou.wangcong@gmail.com> References: <20210328202013.29223-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang We have to implement udp_bpf_recvmsg() to replace the ->recvmsg() to retrieve skmsg from ingress_msg. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- net/ipv4/udp_bpf.c | 64 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 6001f93cd3a0..7d5c4ebf42fe 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -4,6 +4,68 @@ #include #include #include +#include + +#include "udp_impl.h" + +static struct proto *udpv6_prot_saved __read_mostly; + +static int sk_udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, + int noblock, int flags, int *addr_len) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (sk->sk_family == AF_INET6) + return udpv6_prot_saved->recvmsg(sk, msg, len, noblock, flags, + addr_len); +#endif + return udp_prot.recvmsg(sk, msg, len, noblock, flags, addr_len); +} + +static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, + int nonblock, int flags, int *addr_len) +{ + struct sk_psock *psock; + int copied, ret; + + if (unlikely(flags & MSG_ERRQUEUE)) + return inet_recv_error(sk, msg, len, addr_len); + + psock = sk_psock_get(sk); + if (unlikely(!psock)) + return sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + + lock_sock(sk); + if (sk_psock_queue_empty(psock)) { + ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + goto out; + } + +msg_bytes_ready: + copied = sk_msg_recvmsg(sk, psock, msg, len, flags); + if (!copied) { + int data, err = 0; + long timeo; + + timeo = sock_rcvtimeo(sk, nonblock); + data = sk_msg_wait_data(sk, psock, flags, timeo, &err); + if (data) { + if (!sk_psock_queue_empty(psock)) + goto msg_bytes_ready; + ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + goto out; + } + if (err) { + ret = err; + goto out; + } + copied = -EAGAIN; + } + ret = copied; +out: + release_sock(sk); + sk_psock_put(sk, psock); + return ret; +} enum { UDP_BPF_IPV4, @@ -11,7 +73,6 @@ enum { UDP_BPF_NUM_PROTS, }; -static struct proto *udpv6_prot_saved __read_mostly; static DEFINE_SPINLOCK(udpv6_prot_lock); static struct proto udp_bpf_prots[UDP_BPF_NUM_PROTS]; @@ -20,6 +81,7 @@ static void udp_bpf_rebuild_protos(struct proto *prot, const struct proto *base) *prot = *base; prot->unhash = sock_map_unhash; prot->close = sock_map_close; + prot->recvmsg = udp_bpf_recvmsg; } static void udp_bpf_check_v6_needs_rebuild(struct proto *ops) From patchwork Sun Mar 28 20:20:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12169129 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3D70C433F8 for ; Sun, 28 Mar 2021 20:21:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D43646197F for ; Sun, 28 Mar 2021 20:21:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231533AbhC1UUu (ORCPT ); Sun, 28 Mar 2021 16:20:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231318AbhC1UUd (ORCPT ); Sun, 28 Mar 2021 16:20:33 -0400 Received: from mail-oo1-xc29.google.com (mail-oo1-xc29.google.com [IPv6:2607:f8b0:4864:20::c29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5FC77C061756; Sun, 28 Mar 2021 13:20:33 -0700 (PDT) Received: by mail-oo1-xc29.google.com with SMTP id p2-20020a4aa8420000b02901bc7a7148c4so2533508oom.11; Sun, 28 Mar 2021 13:20:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cio+qgcDvmruwG8aqOjWedfunYy04m3r9xm+A6tj84M=; b=nuEEBeBFD1mY/wHM7bnhvnfwShfLwsU552vwpru1UzRPPXYzmT9IY2AQVd3OPWDqgV TP0tyR1nR6d5BKs4n5yJSAJ2PUmeE6dVc50kLpfOEaDq+qvYlxbhjjAhivXwC4JMeGsj 10iuXiJI+jZpt8iZnyoNinxK2d3o8tH/jtc7NVKv6CSwD9eZqMbioqpq+tG0UpHjO1WR B37QEcWwIv15+QW+V+Vc6wIPTGZPoR7vN8DP5NZjdtbWRdhiQCGSo2oYdkeC0KqfO7Ze Ml5rj/nRR3ttv3966pkE5HfQsZKOjO/6PE9BrcXlwJwVAsY8u9eUnNrsf6Nd53H67Vaz Mrqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cio+qgcDvmruwG8aqOjWedfunYy04m3r9xm+A6tj84M=; b=HE0Fb1EuNPQQtpM3ulhtXlJMewIrYVCQHHnc15KHzBKd+Oefmpg4GihVk7BVQZc2vu QYbqdNLYohuPQk40W1qULIYITnq7dWA33LW35ZnKqPVrlXshmG5gGkvP4uN/15WAvZ4n h2XEATl58bLz6ETJ09vO2cXGr9axj5fXXvzyYcXIyhaKszeNQUl/fSej8hZ6W6ZPiZ8w ZVTVnhYTJo7tVKRWNaMr6l6YKXHv/pxQw3wY6vJFi8srYpxTY6jZvNtFwWnTOB1BaH/G Z2AOtVzzvPMt+wbYWnC2httiyxTFVPYByWkabEkN3CAkmU9mLlRmNndqbDexg90FJsl0 c5aw== X-Gm-Message-State: AOAM530MoLpIGUgBvzpTp3Hesyi5hgRo74XL6osV8ulSNWQMVZYYXtml MpE8GV83Tze/S58K04cDRG3LxLscdmOzAg== X-Google-Smtp-Source: ABdhPJwuOlyL007rESEiWVt9+EGxmEGbjXGGYnTDOaHgH/8RfccD1p9HJjBSSUz4QIg33C0l1quf4g== X-Received: by 2002:a4a:4005:: with SMTP id n5mr19288444ooa.61.1616962832710; Sun, 28 Mar 2021 13:20:32 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:bca7:4b69:be8f:21bb]) by smtp.gmail.com with ESMTPSA id v30sm3898240otb.23.2021.03.28.13.20.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Mar 2021 13:20:32 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v7 12/13] sock_map: update sock type checks for UDP Date: Sun, 28 Mar 2021 13:20:12 -0700 Message-Id: <20210328202013.29223-13-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210328202013.29223-1-xiyou.wangcong@gmail.com> References: <20210328202013.29223-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Now UDP supports sockmap and redirection, we can safely update the sock type checks for it accordingly. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang Acked-by: John Fastabend --- net/core/sock_map.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index d04b98fc8104..9ed040c7d9e7 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -549,7 +549,10 @@ static bool sk_is_udp(const struct sock *sk) static bool sock_map_redirect_allowed(const struct sock *sk) { - return sk_is_tcp(sk) && sk->sk_state != TCP_LISTEN; + if (sk_is_tcp(sk)) + return sk->sk_state != TCP_LISTEN; + else + return sk->sk_state == TCP_ESTABLISHED; } static bool sock_map_sk_is_suitable(const struct sock *sk) From patchwork Sun Mar 28 20:20:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12169119 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C08B7C433F2 for ; Sun, 28 Mar 2021 20:21:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9FED66194A for ; Sun, 28 Mar 2021 20:21:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231543AbhC1UUv (ORCPT ); Sun, 28 Mar 2021 16:20:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231366AbhC1UUe (ORCPT ); Sun, 28 Mar 2021 16:20:34 -0400 Received: from mail-ot1-x335.google.com (mail-ot1-x335.google.com [IPv6:2607:f8b0:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A8B1C061756; Sun, 28 Mar 2021 13:20:34 -0700 (PDT) Received: by mail-ot1-x335.google.com with SMTP id v24-20020a9d69d80000b02901b9aec33371so10361871oto.2; Sun, 28 Mar 2021 13:20:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=gOC5unRxfv/K/WT+PzYY09V5EeMjRcHWr4Ygtd7dIvU=; b=JD1MEpFxsd38kLfDbU8C0ORD1oEkoE67u2ScKI/bhC3oBaZZFzT29Ug4wAonoXauuv LkAyuJ/doxb9RE52E4qwINZmQbMWvvlcFvY/YcVdeEegvihDgaVlE1TFJ6mSAUDkVQ9V xac6qHSDD+c4zkPM1EGVJVsBMr6MTy/comNvmmIQmvULwLzBg9u2qWx3Jor0uzGDcAWo 7V0nhp0PjakD2NsMEpoDC3zYCOJnodStzFowDxGCOE7G0q/6Z1/pB0tlEUAdMZKC1hUC cxp5UQ2LKAPPgV41A+hCsoL+JFo4qbuQiv4pvurK7PuhZOVVkmtHzHPfh5Z9AjoQo6nf 5xAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=gOC5unRxfv/K/WT+PzYY09V5EeMjRcHWr4Ygtd7dIvU=; b=QRYv4t1+UHIMEVj+Ipyd1acGGDjCPk9jcNKBJFXUytiKCltuFsjfaEB6i3Sre5JZwa 7691mRaKkS+J09jq4SDE8HcmmkZ25cF7Yr/4mOIZ/O7B2lG175wMVtRGYi4Muz96JAD0 OlrDYaZQumVbR38U2gG4lGfY+b4my2v33UgObffJLqXpTvlLIpqOxjZV84/kqLgqcESy S5ev8kHqtq9xEB9mmXcJP6wMT0S0WsxVh6wQUMs3N08lEEvVlvZ1AG9RD8jkZO35anSh b1Aps78v7JpkxY1Sk8qUdbX9S/1eKryVD/qYosfuhIk1rBt7I6f0MDzlGmPXhRCaN++3 Ye0A== X-Gm-Message-State: AOAM533AJhvRvVrINpeK2iPA6KstAk7poYjxoOvStYwn2gTLMWm0j2oV iwhS5M2nH+M5MpkMFeOZ0TDsMrBLafZwMw== X-Google-Smtp-Source: ABdhPJxjTiD3duCgNEjlRNZYyWr09XGiPGMjXNuTkY7BNOVUbZhegqjGAfZ3aV9dEtw/9/BdbwPalw== X-Received: by 2002:a9d:6959:: with SMTP id p25mr20235697oto.154.1616962833948; Sun, 28 Mar 2021 13:20:33 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:bca7:4b69:be8f:21bb]) by smtp.gmail.com with ESMTPSA id v30sm3898240otb.23.2021.03.28.13.20.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Mar 2021 13:20:33 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v7 13/13] selftests/bpf: add a test case for udp sockmap Date: Sun, 28 Mar 2021 13:20:13 -0700 Message-Id: <20210328202013.29223-14-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210328202013.29223-1-xiyou.wangcong@gmail.com> References: <20210328202013.29223-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Add a test case to ensure redirection between two UDP sockets work. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- .../selftests/bpf/prog_tests/sockmap_listen.c | 136 ++++++++++++++++++ .../selftests/bpf/progs/test_sockmap_listen.c | 22 +++ 2 files changed, 158 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index c26e6bf05e49..648d9ae898d2 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -1603,6 +1603,141 @@ static void test_reuseport(struct test_sockmap_listen *skel, } } +static void udp_redir_to_connected(int family, int sotype, int sock_mapfd, + int verd_mapfd, enum redir_mode mode) +{ + const char *log_prefix = redir_mode_str(mode); + struct sockaddr_storage addr; + int c0, c1, p0, p1; + unsigned int pass; + socklen_t len; + int err, n; + u64 value; + u32 key; + char b; + + zero_verdict_count(verd_mapfd); + + p0 = socket_loopback(family, sotype | SOCK_NONBLOCK); + if (p0 < 0) + return; + len = sizeof(addr); + err = xgetsockname(p0, sockaddr(&addr), &len); + if (err) + goto close_peer0; + + c0 = xsocket(family, sotype | SOCK_NONBLOCK, 0); + if (c0 < 0) + goto close_peer0; + err = xconnect(c0, sockaddr(&addr), len); + if (err) + goto close_cli0; + err = xgetsockname(c0, sockaddr(&addr), &len); + if (err) + goto close_cli0; + err = xconnect(p0, sockaddr(&addr), len); + if (err) + goto close_cli0; + + p1 = socket_loopback(family, sotype | SOCK_NONBLOCK); + if (p1 < 0) + goto close_cli0; + err = xgetsockname(p1, sockaddr(&addr), &len); + if (err) + goto close_cli0; + + c1 = xsocket(family, sotype | SOCK_NONBLOCK, 0); + if (c1 < 0) + goto close_peer1; + err = xconnect(c1, sockaddr(&addr), len); + if (err) + goto close_cli1; + err = xgetsockname(c1, sockaddr(&addr), &len); + if (err) + goto close_cli1; + err = xconnect(p1, sockaddr(&addr), len); + if (err) + goto close_cli1; + + key = 0; + value = p0; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_cli1; + + key = 1; + value = p1; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_cli1; + + n = write(c1, "a", 1); + if (n < 0) + FAIL_ERRNO("%s: write", log_prefix); + if (n == 0) + FAIL("%s: incomplete write", log_prefix); + if (n < 1) + goto close_cli1; + + key = SK_PASS; + err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass); + if (err) + goto close_cli1; + if (pass != 1) + FAIL("%s: want pass count 1, have %d", log_prefix, pass); + + n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1); + if (n < 0) + FAIL_ERRNO("%s: read", log_prefix); + if (n == 0) + FAIL("%s: incomplete read", log_prefix); + +close_cli1: + xclose(c1); +close_peer1: + xclose(p1); +close_cli0: + xclose(c0); +close_peer0: + xclose(p0); +} + +static void udp_skb_redir_to_connected(struct test_sockmap_listen *skel, + struct bpf_map *inner_map, int family) +{ + int verdict = bpf_program__fd(skel->progs.prog_skb_verdict); + int verdict_map = bpf_map__fd(skel->maps.verdict_map); + int sock_map = bpf_map__fd(inner_map); + int err; + + err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0); + if (err) + return; + + skel->bss->test_ingress = false; + udp_redir_to_connected(family, SOCK_DGRAM, sock_map, verdict_map, + REDIR_EGRESS); + skel->bss->test_ingress = true; + udp_redir_to_connected(family, SOCK_DGRAM, sock_map, verdict_map, + REDIR_INGRESS); + + xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT); +} + +static void test_udp_redir(struct test_sockmap_listen *skel, struct bpf_map *map, + int family) +{ + const char *family_name, *map_name; + char s[MAX_TEST_NAME]; + + family_name = family_str(family); + map_name = map_type_str(map); + snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__); + if (!test__start_subtest(s)) + return; + udp_skb_redir_to_connected(skel, map, family); +} + static void run_tests(struct test_sockmap_listen *skel, struct bpf_map *map, int family) { @@ -1611,6 +1746,7 @@ static void run_tests(struct test_sockmap_listen *skel, struct bpf_map *map, test_redir(skel, map, family, SOCK_STREAM); test_reuseport(skel, map, family, SOCK_STREAM); test_reuseport(skel, map, family, SOCK_DGRAM); + test_udp_redir(skel, map, family); } void test_sockmap_listen(void) diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c index fa221141e9c1..a39eba9f5201 100644 --- a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c +++ b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c @@ -29,6 +29,7 @@ struct { } verdict_map SEC(".maps"); static volatile bool test_sockmap; /* toggled by user-space */ +static volatile bool test_ingress; /* toggled by user-space */ SEC("sk_skb/stream_parser") int prog_stream_parser(struct __sk_buff *skb) @@ -55,6 +56,27 @@ int prog_stream_verdict(struct __sk_buff *skb) return verdict; } +SEC("sk_skb/skb_verdict") +int prog_skb_verdict(struct __sk_buff *skb) +{ + unsigned int *count; + __u32 zero = 0; + int verdict; + + if (test_sockmap) + verdict = bpf_sk_redirect_map(skb, &sock_map, zero, + test_ingress ? BPF_F_INGRESS : 0); + else + verdict = bpf_sk_redirect_hash(skb, &sock_hash, &zero, + test_ingress ? BPF_F_INGRESS : 0); + + count = bpf_map_lookup_elem(&verdict_map, &verdict); + if (count) + (*count)++; + + return verdict; +} + SEC("sk_msg") int prog_msg_verdict(struct sk_msg_md *msg) {