From patchwork Thu Jun 2 01:21:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12867418 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3952C433EF for ; Thu, 2 Jun 2022 01:21:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233127AbiFBBVR (ORCPT ); Wed, 1 Jun 2022 21:21:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38402 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232994AbiFBBVP (ORCPT ); Wed, 1 Jun 2022 21:21:15 -0400 Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com [IPv6:2607:f8b0:4864:20::830]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11803DFF4E; Wed, 1 Jun 2022 18:21:15 -0700 (PDT) Received: by mail-qt1-x830.google.com with SMTP id hh4so2473386qtb.10; Wed, 01 Jun 2022 18:21:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xbpoU2l5r45xw+H3KPQxXCkUA1MJk08XLfcITDKBoqE=; b=h88Rxu5xYYOwyjVOPiL0TS2j73TFJzpezE2bVAneSsNsmhA3KtTaXl3y/f4+SmifDC U8/O0HsjljroQKIxcfyARkDeYHXzB6RO4unPNTvqnUkl7oUBy0Cd97657k9qo2+vZoBj DQcCgX0xp66MUNk0SHHWhlGARHInJid7hlf33zkQWnEf5v/VRL0zSC9r9zFPhRAu1bjm SwH4AQxPf8r9Se+v5vrdEUX4FHvWWjUT/0ZomY7FDvP+gOQgDaCmqukxY8HCZS4Apn00 yS0HOIYcnajn2gYz8VfBKovLCHFp9dzPUolKE6HBC8W0wfEnnZ7o9133jbBdob39X7Ku EKlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xbpoU2l5r45xw+H3KPQxXCkUA1MJk08XLfcITDKBoqE=; b=Vsj9mnUM3Ix0359hGexbJGUIuHP9swMdFDnHi30Ad3htjqmcvk3E1uolIp0HWbPTe5 ICQS30nJuJ3zRwXEsMYmyzwoOU/F+FHE72jLtGfMG12NutE9cI0Os2bJk0Ty3lNxYqjj TpH+Ni6CkUevfPDFSwpSNaGXNN2R6kOhlUqFnbrs5lXXU3VGVz8Dk3HG3t8Ofu7/ftKP 3nrtb3/lBSd3XMQ+Y6/WeCMKRbW+PZn5yAIXYJueZNpRGi9i9AxMVl0W+9vKZ8eSoGTe Gp/j3m7km+qQpWxyQ/b1zA2iTA6YTIyqEPtYcCA/3bsnXisd348z1fyz4HCud4b7AroF xFcg== X-Gm-Message-State: AOAM532sL29LKm1vfDneWxiS49Y/emkWSPbdHkHQjnwN1QJVWcO2pTk0 OWhS9tSz9vvqH/irmwnHahS1x/XjWoQ= X-Google-Smtp-Source: ABdhPJwMYn+sIJRNymEs7Mef1oMsz6fskgwf5iCG1TAMqM7+BRW4hrtsTGRFVYnpNaecWWPKtVf6JQ== X-Received: by 2002:a05:622a:181a:b0:2fc:41c0:72eb with SMTP id t26-20020a05622a181a00b002fc41c072ebmr2016461qtc.397.1654132873965; Wed, 01 Jun 2022 18:21:13 -0700 (PDT) Received: from pop-os.attlocal.net ([2600:1700:65a0:ab60:a168:6dba:43b7:3240]) by smtp.gmail.com with ESMTPSA id x4-20020ac87304000000b002f39b99f670sm2077654qto.10.2022.06.01.18.21.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jun 2022 18:21:13 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, Cong Wang , Eric Dumazet , John Fastabend , Daniel Borkmann , Jakub Sitnicki Subject: [Patch bpf-next v3 1/4] tcp: introduce tcp_read_skb() Date: Wed, 1 Jun 2022 18:21:02 -0700 Message-Id: <20220602012105.58853-2-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220602012105.58853-1-xiyou.wangcong@gmail.com> References: <20220602012105.58853-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang This patch inroduces tcp_read_skb() based on tcp_read_sock(), a preparation for the next patch which actually introduces a new sock ops. TCP is special here, because it has tcp_read_sock() which is mainly used by splice(). tcp_read_sock() supports partial read and arbitrary offset, neither of them is needed for sockmap. Cc: Eric Dumazet Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Signed-off-by: Cong Wang --- include/net/tcp.h | 2 ++ net/ipv4/tcp.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 49 insertions(+) diff --git a/include/net/tcp.h b/include/net/tcp.h index 1e99f5c61f84..878544d0f8f9 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -669,6 +669,8 @@ void tcp_get_info(struct sock *, struct tcp_info *); /* Read 'sendfile()'-style from a TCP socket */ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, sk_read_actor_t recv_actor); +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor); void tcp_initialize_rcv_mss(struct sock *sk); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 9984d23a7f3e..a18e9ababf54 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1709,6 +1709,53 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, } EXPORT_SYMBOL(tcp_read_sock); +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + struct tcp_sock *tp = tcp_sk(sk); + u32 seq = tp->copied_seq; + struct sk_buff *skb; + int copied = 0; + u32 offset; + + if (sk->sk_state == TCP_LISTEN) + return -ENOTCONN; + + while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) { + int used; + + __skb_unlink(skb, &sk->sk_receive_queue); + used = recv_actor(desc, skb, 0, skb->len); + if (used <= 0) { + if (!copied) + copied = used; + break; + } + seq += used; + copied += used; + + if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) { + kfree_skb(skb); + ++seq; + break; + } + kfree_skb(skb); + if (!desc->count) + break; + WRITE_ONCE(tp->copied_seq, seq); + } + WRITE_ONCE(tp->copied_seq, seq); + + tcp_rcv_space_adjust(sk); + + /* Clean up data we have read: This will do ACK frames. */ + if (copied > 0) + tcp_cleanup_rbuf(sk, copied); + + return copied; +} +EXPORT_SYMBOL(tcp_read_skb); + int tcp_peek_len(struct socket *sock) { return tcp_inq(sock->sk); From patchwork Thu Jun 2 01:21:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12867419 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F11EC433EF for ; Thu, 2 Jun 2022 01:21:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233132AbiFBBVX (ORCPT ); Wed, 1 Jun 2022 21:21:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233128AbiFBBVR (ORCPT ); Wed, 1 Jun 2022 21:21:17 -0400 Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDB5713F40B; Wed, 1 Jun 2022 18:21:15 -0700 (PDT) Received: by mail-qt1-x82f.google.com with SMTP id 2so2522000qtw.0; Wed, 01 Jun 2022 18:21:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2zgx/Nj1R3FESeeEs10kVJw6wQK1ufWJhd3lAYxfZQY=; b=WV266bOa++yZ40ZP19vRrHg1PC7/IBLm8gwAC4nKifi1PKmAcztGqwH8x2g7hXgzOW DQzs4MkmDdpIwpmW9yHP856zXOwXlmbt1GwdK38j6Ydw1Tw/TGMBP5Qe4+tl1eI7ymsK um4tCSAKOvjdyOFdaJlay605T5HMqXVmD9sHraV0qmlqtki3fym8rz8HIZT+PneVjbe5 cu941AD/uaKQSPUFhjYOpYMbW0UB/1/VWNs8H+3JLjwp7JpKgg6VmKpeeMhA2kUvGeDV sQImEh+X6XgsasZyifciGWipriUDID4X8XiuBJKTr+zRlxgUeiEV1J5eE1HDZPKBQU2+ 0MOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2zgx/Nj1R3FESeeEs10kVJw6wQK1ufWJhd3lAYxfZQY=; b=KF1tQYDkCyr8aOFRqtbg/DsfvW1ewBYKOOXhGaDuAZoBx/NbREwg1RsKclVryufZPl VF1qwjsd02X19CtT7ewkKKF2rJLXHBXXC/R5rfShHMgOy6Z+axREGnxwIRL5OU2FPKO5 JSkVAxUxohtfg+gpUc6r0r0pmPjVoaqbxyzazT40PEyOj28wJxa8CH4eDBXpx6HspxZf FuUs9t6tBt/m1FHvcQRdN6tiNT0tyOcjxIAjoPzIgZpSnrQbnndY2ppk8wVMDiGxbAbY vZUUbufuHb8pR9kwG2Nv/nzgJCpjR0JrJmYjsS0L+V7FofIopTOtHDtZ+6NvhlhZX6Jz dwoQ== X-Gm-Message-State: AOAM532hjsam/W3A4wND/6F005/ervE892u21rcryFSEBglPz5EecQ0P VLICu4cx95MVeiQGACZtiicHE0XnFuw= X-Google-Smtp-Source: ABdhPJwYjONNE5NpeYfAs2Jru2Bwl4ExnACrdxaGwVJKA8vsbtNCDe18JQG3QZ3siOjjOwQqGb3duw== X-Received: by 2002:a05:622a:148c:b0:304:baaa:5abb with SMTP id t12-20020a05622a148c00b00304baaa5abbmr2037408qtx.678.1654132875413; Wed, 01 Jun 2022 18:21:15 -0700 (PDT) Received: from pop-os.attlocal.net ([2600:1700:65a0:ab60:a168:6dba:43b7:3240]) by smtp.gmail.com with ESMTPSA id x4-20020ac87304000000b002f39b99f670sm2077654qto.10.2022.06.01.18.21.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jun 2022 18:21:14 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, Cong Wang , Eric Dumazet , John Fastabend , Daniel Borkmann , Jakub Sitnicki Subject: [Patch bpf-next v3 2/4] net: introduce a new proto_ops ->read_skb() Date: Wed, 1 Jun 2022 18:21:03 -0700 Message-Id: <20220602012105.58853-3-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220602012105.58853-1-xiyou.wangcong@gmail.com> References: <20220602012105.58853-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Currently both splice() and sockmap use ->read_sock() to read skb from receive queue, but for sockmap we only read one entire skb at a time, so ->read_sock() is too conservative to use. Introduce a new proto_ops ->read_skb() which supports this sematic, with this we can finally pass the ownership of skb to recv actors. For non-TCP protocols, all ->read_sock() can be simply converted to ->read_skb(). Cc: Eric Dumazet Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Signed-off-by: Cong Wang --- include/linux/net.h | 4 ++++ include/net/tcp.h | 3 +-- include/net/udp.h | 3 +-- net/core/skmsg.c | 20 +++++--------------- net/ipv4/af_inet.c | 3 ++- net/ipv4/tcp.c | 9 +++------ net/ipv4/udp.c | 10 ++++------ net/ipv6/af_inet6.c | 3 ++- net/unix/af_unix.c | 23 +++++++++-------------- 9 files changed, 31 insertions(+), 47 deletions(-) diff --git a/include/linux/net.h b/include/linux/net.h index 12093f4db50c..a03485e8cbb2 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -152,6 +152,8 @@ struct module; struct sk_buff; typedef int (*sk_read_actor_t)(read_descriptor_t *, struct sk_buff *, unsigned int, size_t); +typedef int (*skb_read_actor_t)(struct sock *, struct sk_buff *); + struct proto_ops { int family; @@ -214,6 +216,8 @@ struct proto_ops { */ int (*read_sock)(struct sock *sk, read_descriptor_t *desc, sk_read_actor_t recv_actor); + /* This is different from read_sock(), it reads an entire skb at a time. */ + int (*read_skb)(struct sock *sk, skb_read_actor_t recv_actor); int (*sendpage_locked)(struct sock *sk, struct page *page, int offset, size_t size, int flags); int (*sendmsg_locked)(struct sock *sk, struct msghdr *msg, diff --git a/include/net/tcp.h b/include/net/tcp.h index 878544d0f8f9..3aa859c9a0fb 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -669,8 +669,7 @@ void tcp_get_info(struct sock *, struct tcp_info *); /* Read 'sendfile()'-style from a TCP socket */ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, sk_read_actor_t recv_actor); -int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor); +int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor); void tcp_initialize_rcv_mss(struct sock *sk); diff --git a/include/net/udp.h b/include/net/udp.h index b83a00330566..47a0e3359771 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -305,8 +305,7 @@ struct sock *__udp6_lib_lookup(struct net *net, struct sk_buff *skb); struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport); -int udp_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor); +int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor); /* UDP uses skb->dev_scratch to cache as much information as possible and avoid * possibly multiple cache miss on dequeue() diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 7e03f96e441b..f7f63b7d990c 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1160,21 +1160,17 @@ static void sk_psock_done_strp(struct sk_psock *psock) } #endif /* CONFIG_BPF_STREAM_PARSER */ -static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb, - unsigned int offset, size_t orig_len) +static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb) { - struct sock *sk = (struct sock *)desc->arg.data; struct sk_psock *psock; struct bpf_prog *prog; int ret = __SK_DROP; - int len = orig_len; + int len = skb->len; /* clone here so sk_eat_skb() in tcp_read_sock does not drop our data */ skb = skb_clone(skb, GFP_ATOMIC); - if (!skb) { - desc->error = -ENOMEM; + if (!skb) return 0; - } rcu_read_lock(); psock = sk_psock(sk); @@ -1204,16 +1200,10 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb, static void sk_psock_verdict_data_ready(struct sock *sk) { struct socket *sock = sk->sk_socket; - read_descriptor_t desc; - if (unlikely(!sock || !sock->ops || !sock->ops->read_sock)) + if (unlikely(!sock || !sock->ops || !sock->ops->read_skb)) return; - - desc.arg.data = sk; - desc.error = 0; - desc.count = 1; - - sock->ops->read_sock(sk, &desc, sk_psock_verdict_recv); + sock->ops->read_skb(sk, sk_psock_verdict_recv); } void sk_psock_start_verdict(struct sock *sk, struct sk_psock *psock) diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 93da9f783bec..f615263855d0 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1040,6 +1040,7 @@ const struct proto_ops inet_stream_ops = { .sendpage = inet_sendpage, .splice_read = tcp_splice_read, .read_sock = tcp_read_sock, + .read_skb = tcp_read_skb, .sendmsg_locked = tcp_sendmsg_locked, .sendpage_locked = tcp_sendpage_locked, .peek_len = tcp_peek_len, @@ -1067,7 +1068,7 @@ const struct proto_ops inet_dgram_ops = { .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, .sendmsg = inet_sendmsg, - .read_sock = udp_read_sock, + .read_skb = udp_read_skb, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, .sendpage = inet_sendpage, diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index a18e9ababf54..8b9327a0d0d5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1709,8 +1709,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, } EXPORT_SYMBOL(tcp_read_sock); -int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor) +int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { struct tcp_sock *tp = tcp_sk(sk); u32 seq = tp->copied_seq; @@ -1725,7 +1724,7 @@ int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, int used; __skb_unlink(skb, &sk->sk_receive_queue); - used = recv_actor(desc, skb, 0, skb->len); + used = recv_actor(sk, skb); if (used <= 0) { if (!copied) copied = used; @@ -1740,9 +1739,7 @@ int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, break; } kfree_skb(skb); - if (!desc->count) - break; - WRITE_ONCE(tp->copied_seq, seq); + break; } WRITE_ONCE(tp->copied_seq, seq); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index aa9f2ec3dc46..0a1e90b80e36 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1795,8 +1795,7 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags, } EXPORT_SYMBOL(__skb_recv_udp); -int udp_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor) +int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { int copied = 0; @@ -1818,7 +1817,7 @@ int udp_read_sock(struct sock *sk, read_descriptor_t *desc, continue; } - used = recv_actor(desc, skb, 0, skb->len); + used = recv_actor(sk, skb); if (used <= 0) { if (!copied) copied = used; @@ -1829,13 +1828,12 @@ int udp_read_sock(struct sock *sk, read_descriptor_t *desc, } kfree_skb(skb); - if (!desc->count) - break; + break; } return copied; } -EXPORT_SYMBOL(udp_read_sock); +EXPORT_SYMBOL(udp_read_skb); /* * This should be easy, if there is something there we diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 70564ddccc46..1aea5ef9bdea 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -701,6 +701,7 @@ const struct proto_ops inet6_stream_ops = { .sendpage_locked = tcp_sendpage_locked, .splice_read = tcp_splice_read, .read_sock = tcp_read_sock, + .read_skb = tcp_read_skb, .peek_len = tcp_peek_len, #ifdef CONFIG_COMPAT .compat_ioctl = inet6_compat_ioctl, @@ -726,7 +727,7 @@ const struct proto_ops inet6_dgram_ops = { .getsockopt = sock_common_getsockopt, /* ok */ .sendmsg = inet6_sendmsg, /* retpoline's sake */ .recvmsg = inet6_recvmsg, /* retpoline's sake */ - .read_sock = udp_read_sock, + .read_skb = udp_read_skb, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, .set_peek_off = sk_set_peek_off, diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 654dcef7cfb3..3a96008ec331 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -741,10 +741,8 @@ static ssize_t unix_stream_splice_read(struct socket *, loff_t *ppos, unsigned int flags); static int unix_dgram_sendmsg(struct socket *, struct msghdr *, size_t); static int unix_dgram_recvmsg(struct socket *, struct msghdr *, size_t, int); -static int unix_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor); -static int unix_stream_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor); +static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor); +static int unix_stream_read_skb(struct sock *sk, skb_read_actor_t recv_actor); static int unix_dgram_connect(struct socket *, struct sockaddr *, int, int); static int unix_seqpacket_sendmsg(struct socket *, struct msghdr *, size_t); @@ -798,7 +796,7 @@ static const struct proto_ops unix_stream_ops = { .shutdown = unix_shutdown, .sendmsg = unix_stream_sendmsg, .recvmsg = unix_stream_recvmsg, - .read_sock = unix_stream_read_sock, + .read_skb = unix_stream_read_skb, .mmap = sock_no_mmap, .sendpage = unix_stream_sendpage, .splice_read = unix_stream_splice_read, @@ -823,7 +821,7 @@ static const struct proto_ops unix_dgram_ops = { .listen = sock_no_listen, .shutdown = unix_shutdown, .sendmsg = unix_dgram_sendmsg, - .read_sock = unix_read_sock, + .read_skb = unix_read_skb, .recvmsg = unix_dgram_recvmsg, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, @@ -2487,8 +2485,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t si return __unix_dgram_recvmsg(sk, msg, size, flags); } -static int unix_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor) +static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { int copied = 0; @@ -2503,7 +2500,7 @@ static int unix_read_sock(struct sock *sk, read_descriptor_t *desc, if (!skb) return err; - used = recv_actor(desc, skb, 0, skb->len); + used = recv_actor(sk, skb); if (used <= 0) { if (!copied) copied = used; @@ -2514,8 +2511,7 @@ static int unix_read_sock(struct sock *sk, read_descriptor_t *desc, } kfree_skb(skb); - if (!desc->count) - break; + break; } return copied; @@ -2650,13 +2646,12 @@ static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk, } #endif -static int unix_stream_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor) +static int unix_stream_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { if (unlikely(sk->sk_state != TCP_ESTABLISHED)) return -ENOTCONN; - return unix_read_sock(sk, desc, recv_actor); + return unix_read_skb(sk, recv_actor); } static int unix_stream_read_generic(struct unix_stream_read_state *state, From patchwork Thu Jun 2 01:21:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12867420 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9EBABC43334 for ; Thu, 2 Jun 2022 01:21:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233140AbiFBBVY (ORCPT ); Wed, 1 Jun 2022 21:21:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232994AbiFBBVS (ORCPT ); Wed, 1 Jun 2022 21:21:18 -0400 Received: from mail-qv1-xf2b.google.com (mail-qv1-xf2b.google.com [IPv6:2607:f8b0:4864:20::f2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 215CD14086B; Wed, 1 Jun 2022 18:21:18 -0700 (PDT) Received: by mail-qv1-xf2b.google.com with SMTP id q15so1869090qvy.8; Wed, 01 Jun 2022 18:21:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2p5T12Sr5Y7Jc+Rm7GQI0KwEROHnmY0n8erF/Ry4hZE=; b=WyEBAzBs29SBdtqWDjbHbR6YHiqKRaGHsWP9xmxyczibpFipVF6WkKH/216dNKTR5b HiBod/PtHRxvXO/L+dy0m7a7s2kcecIU+KW6kYgHEFWpHmqEzTctwcBPVBmicfB2ekls rUANfb0Ra7/D2R9ueolMPuePFInkbhmjgUgwdAj2NBZJMoNE+mWtIi0XL5mgkwdm/PJe kdgGewP/wofqL5o25afHfWqzX3ygSeWoLkKjHarLiI872CWl18pk/kncAhwb3TDevDgP OYbiRUnozaX9y3v/aCbpxaHKmLiqpikZchJmJDKVYOYXI0l00TtVFXRlQDcAgSvlICOv bRpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2p5T12Sr5Y7Jc+Rm7GQI0KwEROHnmY0n8erF/Ry4hZE=; b=SiKyNY5F3ABXMXPi6eeOyVhBRwa374t8v7qVQgmmYgCF8kVxyoN46qG2RKwbPSoHfI 2zRtqm5Ck4FOYjxqHWpPYGo5U5NjAYeQbsL327NcrMzhnRP2kmgfbDUQohfX/nnXn0Kv a0tf93Z5wLOBVhG7NNXMP5+fLybR7ehq61sJ6iVhsD9ms3i25d9Mi0vs74xnrjKsC5sm HAng7DGFaHxDehW6+Sgj+N884XddV20hvqJFaQBLji14yYd54BI+dh8XAHbYBi2ks7YQ diWuNK9jWT+QIU2ebqHwvH1/nNYMwnLpHu+fK4c8Ec7dlegDe+BAOf9oBJcx5kRbZtRm Clcg== X-Gm-Message-State: AOAM532qKeqrr7Q4WPcrvcMItMhrLSQLiH2Us+/DvWDFlOBVbNRb6Um0 fMR54e0Xx9RDRQnl34O87c2MW3+3nAo= X-Google-Smtp-Source: ABdhPJx+zdY6Mw1V9xbiJamePo+weyauMMYmpi8hsd4+vwESGABYnE3xAywVX0HzJxBSKcoEDQXTjw== X-Received: by 2002:a05:6214:e47:b0:464:6235:ef0c with SMTP id o7-20020a0562140e4700b004646235ef0cmr8690277qvc.46.1654132877095; Wed, 01 Jun 2022 18:21:17 -0700 (PDT) Received: from pop-os.attlocal.net ([2600:1700:65a0:ab60:a168:6dba:43b7:3240]) by smtp.gmail.com with ESMTPSA id x4-20020ac87304000000b002f39b99f670sm2077654qto.10.2022.06.01.18.21.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jun 2022 18:21:16 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, Cong Wang , Eric Dumazet , John Fastabend , Daniel Borkmann , Jakub Sitnicki Subject: [Patch bpf-next v3 3/4] skmsg: get rid of skb_clone() Date: Wed, 1 Jun 2022 18:21:04 -0700 Message-Id: <20220602012105.58853-4-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220602012105.58853-1-xiyou.wangcong@gmail.com> References: <20220602012105.58853-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang With ->read_skb() now we have an entire skb dequeued from receive queue, now we just need to grab an addtional refcnt before passing its ownership to recv actors. And we should not touch them any more, particularly for skb->sk. Fortunately, skb->sk is already set for most of the protocols except UDP where skb->sk has been stolen, so we have to fix it up for UDP case. Cc: Eric Dumazet Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Signed-off-by: Cong Wang --- net/core/skmsg.c | 7 +------ net/ipv4/udp.c | 1 + 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index f7f63b7d990c..8b248d289c11 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1167,10 +1167,7 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb) int ret = __SK_DROP; int len = skb->len; - /* clone here so sk_eat_skb() in tcp_read_sock does not drop our data */ - skb = skb_clone(skb, GFP_ATOMIC); - if (!skb) - return 0; + skb_get(skb); rcu_read_lock(); psock = sk_psock(sk); @@ -1183,12 +1180,10 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb) if (!prog) prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { - skb->sk = sk; skb_dst_drop(skb); skb_bpf_redirect_clear(skb); ret = bpf_prog_run_pin_on_cpu(prog, skb); ret = sk_psock_map_verd(ret, skb_bpf_redirect_fetch(skb)); - skb->sk = NULL; } if (sk_psock_verdict_apply(psock, skb, ret) < 0) len = 0; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 0a1e90b80e36..b09936ccf709 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1817,6 +1817,7 @@ int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) continue; } + WARN_ON(!skb_set_owner_sk_safe(skb, sk)); used = recv_actor(sk, skb); if (used <= 0) { if (!copied) From patchwork Thu Jun 2 01:21:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12867421 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 288A8CCA47A for ; Thu, 2 Jun 2022 01:21:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233139AbiFBBVZ (ORCPT ); Wed, 1 Jun 2022 21:21:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233129AbiFBBVU (ORCPT ); Wed, 1 Jun 2022 21:21:20 -0400 Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CE39140424; Wed, 1 Jun 2022 18:21:19 -0700 (PDT) Received: by mail-qv1-xf29.google.com with SMTP id cv1so2677321qvb.5; Wed, 01 Jun 2022 18:21:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/YIsaXDf0Vla/CpgZ3fY3VVxEFB+BQX409ICdnuhF6U=; b=ZihXGbuxNpN6AzRAhkc0kafpiANTkruYdINgdmq/davY/ys00ajR6bejApb036Jvkv xGhUaY7LKRYnZAtmucaHCYdATEjgx/ITfPDrDkf19+JHI+6JQ/kLucRgGL8gaKzNFxEr 2n0IaOJATOt8+OLnAQv1jpQGVGfBUBS3UuEn3Jo8FEVSUSw5gwGh7MYiPfnaiv7cqion AZb6bNR5z3Q40jXULjq/pd+5jM06zEWZY6murrjb2LCWAvht+gfeEa9iAkbKl/BQMp+f pE5r9SmxLn+3tKPM87f9cEgajCALh0+AEXG9aDom739az0e22Y67lYQIz1lKX7g1gZKt rFiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/YIsaXDf0Vla/CpgZ3fY3VVxEFB+BQX409ICdnuhF6U=; b=sAaWt/35ePqx58tpxQJAVxijoDclbCgg9YWZElZ0LFybSEs6xJvGwC0gO46l24Tbdz wWnJcI4JDFs9+1GIHceT9ysAwrv4Eg4QaF+03B7guBUseg/dDp0P/nG0egE+WNb5jPf1 YzkLgKBIsTOgSc0Q5LUzzoR1vgFDIA86z4deoHHQfBWKYjBBXHfsNpWHkBo+2iU0B8Iz mMLAN4KDi03ujbqDDorDKiWiNZOx+dvnwFgK2nKUefBpM7sN3fQVxCCPDZS5xjBFRzBX 6tk4MrL9nZ8fveEIqi0abXW8GTNloCYblaCsDhVLuQ6tnev0AHsf1xftb7UIwloy5jy4 6Cow== X-Gm-Message-State: AOAM53123rzNZjJ0jE5TTrPLDEJpnj07+vO82AbYFgsMdrbpl3xxGffZ X35MUQz4zTw7wAth3CKXc8DdPA7rURg= X-Google-Smtp-Source: ABdhPJxzuq16EgE98GmaGvX5cKqdqj7tIfPsgQISU8VJRgsM3qhGzAAINd8setCBDJtzw8mYTLxGbw== X-Received: by 2002:a05:6214:dc3:b0:464:5efe:3d63 with SMTP id 3-20020a0562140dc300b004645efe3d63mr9854639qvt.92.1654132878489; Wed, 01 Jun 2022 18:21:18 -0700 (PDT) Received: from pop-os.attlocal.net ([2600:1700:65a0:ab60:a168:6dba:43b7:3240]) by smtp.gmail.com with ESMTPSA id x4-20020ac87304000000b002f39b99f670sm2077654qto.10.2022.06.01.18.21.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jun 2022 18:21:17 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki Subject: [Patch bpf-next v3 4/4] skmsg: get rid of unncessary memset() Date: Wed, 1 Jun 2022 18:21:05 -0700 Message-Id: <20220602012105.58853-5-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220602012105.58853-1-xiyou.wangcong@gmail.com> References: <20220602012105.58853-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang We always allocate skmsg with kzalloc(), so there is no need to call memset(0) on it, the only thing we need from sk_msg_init() is sg_init_marker(). So introduce a new helper which is just kzalloc()+sg_init_marker(), this saves an unncessary memset(0) for skmsg on fast path. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Signed-off-by: Cong Wang --- net/core/skmsg.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 8b248d289c11..4b297d67edb7 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -497,23 +497,27 @@ bool sk_msg_is_readable(struct sock *sk) } EXPORT_SYMBOL_GPL(sk_msg_is_readable); -static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, - struct sk_buff *skb) +static struct sk_msg *alloc_sk_msg(void) { struct sk_msg *msg; - if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) + msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_KERNEL); + if (unlikely(!msg)) return NULL; + sg_init_marker(msg->sg.data, NR_MSG_FRAG_IDS); + return msg; +} - if (!sk_rmem_schedule(sk, skb, skb->truesize)) +static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, + struct sk_buff *skb) +{ + if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) return NULL; - msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_KERNEL); - if (unlikely(!msg)) + if (!sk_rmem_schedule(sk, skb, skb->truesize)) return NULL; - sk_msg_init(msg); - return msg; + return alloc_sk_msg(); } static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, @@ -590,13 +594,12 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb, static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb, u32 off, u32 len) { - struct sk_msg *msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); + struct sk_msg *msg = alloc_sk_msg(); struct sock *sk = psock->sk; int err; if (unlikely(!msg)) return -EAGAIN; - sk_msg_init(msg); skb_set_owner_r(skb, sk); err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg); if (err < 0)