From patchwork Tue Sep 28 00:22:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12521229 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95C32C433F5 for ; Tue, 28 Sep 2021 00:22:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7474960E05 for ; Tue, 28 Sep 2021 00:22:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238312AbhI1AYJ (ORCPT ); Mon, 27 Sep 2021 20:24:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238285AbhI1AYJ (ORCPT ); Mon, 27 Sep 2021 20:24:09 -0400 Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F1F1C061604; Mon, 27 Sep 2021 17:22:30 -0700 (PDT) Received: by mail-qt1-x82b.google.com with SMTP id m26so6884606qtn.1; Mon, 27 Sep 2021 17:22:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7aHTllO+4GQE/ab4vnYXQ8VX+caQ6dzLbRRdC+Fk314=; b=oeu9Hi6PJUL4H0ZM5p1Sg97SUS0BAUzawkPj0QPTiHNXX22hw5LOa2ZnE0PBIk0typ Ak4NLhYN/ws9A5O+gpN9bhysJGab+4TAJw+V3KGttkAmkF3DxbMtIgRARTy4CufyEMHu 1eZ3NBgHaoQQvUZK3dZqspK/uzcB2eply7KGdpfKYjyMdf2aVb3xmYYDYGrR7No77ZQF PqG/UfMfku7UHGB/dAHV1Zyv+o3zWq5h6tGLHFmuPQ2sDEbFcznE8MMKydivJQTIW8vT 2AB1yq3wGf7OyRV0xiXVs3D8hVg3h8mPqJIdxg5ujOZFQfD2nKtElJikaQAC5uL3YGpZ z7ZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7aHTllO+4GQE/ab4vnYXQ8VX+caQ6dzLbRRdC+Fk314=; b=mS0tc8s+45YBvO7J2MizRYrISjrqUbRaanXvZqnySNkXd8oawcEJqQJ3youkxCJZg6 xmQp163r5ofHoOEiRlt5MWF+KudX5a5Wv0+Mi0K6Z1ILIe9e2UR53X55jS7zlgzeJj3Q AUPYhWvpX/28xnxdNt2cHvf/FhQudyDggkVSLBKKaCeBcftk5xtnq/1DdFhqSdh18gJ7 luHPmyDqGy4SJOEqSP9TQwIVmA+lt5O6wFdNczHnzvGkpR2r5Xu98+C2kMm2JM0KynPq ToGjV+5Ax6Y99UhVfrdhHY+gdyu0+xXC1buskbPUvk/EBx+wCrPlJEFVtmI+hLLQcLOm h/sw== X-Gm-Message-State: AOAM532I+gneHKN5dCuEIsfSv2ebSCA9M1XYeJ/GlhUH4QIe3PzAFKSQ OsuaKRLKKBjQOFtFVA7YW2mUz0bcX3o= X-Google-Smtp-Source: ABdhPJzXLTYasYaUjU2yknE6Sz7sPnUnglDNB/P+jMp5FxnQtubwVV+IQnU81rwFnBZ4RwNeI2MR5A== X-Received: by 2002:ac8:490c:: with SMTP id e12mr2922058qtq.200.1632788549451; Mon, 27 Sep 2021 17:22:29 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:1ce2:35c5:917e:20d7]) by smtp.gmail.com with ESMTPSA id 31sm5672308qtb.85.2021.09.27.17.22.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Sep 2021 17:22:29 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf v2 1/4] skmsg: introduce sk_psock_get_checked() Date: Mon, 27 Sep 2021 17:22:09 -0700 Message-Id: <20210928002212.14498-2-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210928002212.14498-1-xiyou.wangcong@gmail.com> References: <20210928002212.14498-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Although we have sk_psock_get(), it assumes the psock retrieved from sk_user_data is for sockmap, this is not sufficient if we call it outside of sockmap, for example, reuseport_array. Fortunately sock_map_psock_get_checked() is more strict and checks for sock_map_close before using psock. So we can refactor it and rename it to sk_psock_get_checked(), which can be safely called outside of sockmap. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 20 ++++++++++++++++++++ net/core/sock_map.c | 22 +--------------------- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 14ab0c0bc924..8f577739fc36 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -452,6 +452,26 @@ static inline struct sk_psock *sk_psock_get(struct sock *sk) return psock; } +static inline struct sk_psock *sk_psock_get_checked(struct sock *sk) +{ + struct sk_psock *psock; + + rcu_read_lock(); + psock = sk_psock(sk); + if (psock) { +#if defined(CONFIG_BPF_SYSCALL) + if (sk->sk_prot->close != sock_map_close) { + rcu_read_unlock(); + return ERR_PTR(-EBUSY); + } +#endif + if (!refcount_inc_not_zero(&psock->refcnt)) + psock = ERR_PTR(-EBUSY); + } + rcu_read_unlock(); + return psock; +} + void sk_psock_drop(struct sock *sk, struct sk_psock *psock); static inline void sk_psock_put(struct sock *sk, struct sk_psock *psock) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index e252b8ec2b85..6612bb0b95b5 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -191,26 +191,6 @@ static int sock_map_init_proto(struct sock *sk, struct sk_psock *psock) return sk->sk_prot->psock_update_sk_prot(sk, psock, false); } -static struct sk_psock *sock_map_psock_get_checked(struct sock *sk) -{ - struct sk_psock *psock; - - rcu_read_lock(); - psock = sk_psock(sk); - if (psock) { - if (sk->sk_prot->close != sock_map_close) { - psock = ERR_PTR(-EBUSY); - goto out; - } - - if (!refcount_inc_not_zero(&psock->refcnt)) - psock = ERR_PTR(-EBUSY); - } -out: - rcu_read_unlock(); - return psock; -} - static int sock_map_link(struct bpf_map *map, struct sock *sk) { struct sk_psock_progs *progs = sock_map_progs(map); @@ -255,7 +235,7 @@ static int sock_map_link(struct bpf_map *map, struct sock *sk) } } - psock = sock_map_psock_get_checked(sk); + psock = sk_psock_get_checked(sk); if (IS_ERR(psock)) { ret = PTR_ERR(psock); goto out_progs; From patchwork Tue Sep 28 00:22:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12521231 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 408EAC4332F for ; Tue, 28 Sep 2021 00:22:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2EA6660E05 for ; Tue, 28 Sep 2021 00:22:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238328AbhI1AYK (ORCPT ); Mon, 27 Sep 2021 20:24:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238325AbhI1AYJ (ORCPT ); Mon, 27 Sep 2021 20:24:09 -0400 Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78272C061575; Mon, 27 Sep 2021 17:22:31 -0700 (PDT) Received: by mail-qv1-xf29.google.com with SMTP id a13so12365430qvo.9; Mon, 27 Sep 2021 17:22:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SR1tPyzvAclFOM4BUlkMqlddOFBl3ZYcmIme4gahL5M=; b=iCDXeSWReWmwWB6oGTJ5anUUpOb7K/eyaN2q7DyAUniVKzXMG5LJutwQZ/8rcw3BB+ FZ3SCJGJG1t6Dlmgh+2ve6XVcmCgcEBlOjQeP1ZUapL7vvmi2uUOGpjyNIn/ige/UxdD JisjHthEdz//kxpgRYejYxjotkgrrLkAVlvv6COKzq+sZPIKpclU4ikGlWI1VI8Hb6uX iD2JWpEyoBk2r/NBTLASbcuDeFmX2TZJa+X566wVspmD2o6oCxDNUeC8h2/QbDnsqkT+ mUJEmMwrxn8tANtyZQgereklI30FHJSlUuSbpE+8cKZr5yc2VFAue5FKz+FNI+QSx3d9 gdyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SR1tPyzvAclFOM4BUlkMqlddOFBl3ZYcmIme4gahL5M=; b=Dpxi42yb/Iz4q7k/FMmyBlIKFazI4jIlXPObpmsUsozIiadsEaWCMp1DHQPlN3KVYQ J1vIdutQTUOftNMJzE3OGPHE2rcMUYKxPS58MO/vWUtWIYpwAFuVO+zfmgSHUOX8UqsR NBsYid8KSJAclrOcGRMOCf42FA6rVk1Eifiiy8N9VRt3aX2aqWCjUG1LQKLR37Ski4f8 7ciezgtkc3kowt5HXG0L+2DTB24hft5QZ4jxJQV5acvAOpM1EuglB1cIlXDae0PLweyd HIzY5oYCjGpbIDZZGEA1MenM10X4xPAnU8MJhfGcSUVstQJam6wFT86MskL5V7opzs7d 4WQw== X-Gm-Message-State: AOAM5327Unmtp9iqjovF9U/4iZV+daCQEHVma52ya4XbGEtVy+3QdiZO 1UCOWD2JDRm5rlqgoAkDRiqCgVeDy6c= X-Google-Smtp-Source: ABdhPJz/jhv/wsvj6eRvwSxCUIrI8WBrrK+eEbNXbTTxdQwIbSrh4yPIqjc9tpi6S3FHFPCQgqjv+w== X-Received: by 2002:ad4:44e4:: with SMTP id p4mr2601724qvt.40.1632788550604; Mon, 27 Sep 2021 17:22:30 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:1ce2:35c5:917e:20d7]) by smtp.gmail.com with ESMTPSA id 31sm5672308qtb.85.2021.09.27.17.22.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Sep 2021 17:22:30 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf v2 2/4] net: rename ->stream_memory_read to ->sock_is_readable Date: Mon, 27 Sep 2021 17:22:10 -0700 Message-Id: <20210928002212.14498-3-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210928002212.14498-1-xiyou.wangcong@gmail.com> References: <20210928002212.14498-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang The proto ops ->stream_memory_read is currently only used by TCP to check whether psock queue is empty or not. We need to rename it before reusing it for non-TCP protocols, and adjust the exsiting TCP functions accordingly. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/net/sock.h | 8 +++++++- include/net/tls.h | 2 +- net/ipv4/tcp.c | 5 +---- net/ipv4/tcp_bpf.c | 4 ++-- net/tls/tls_main.c | 4 ++-- net/tls/tls_sw.c | 2 +- 6 files changed, 14 insertions(+), 11 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 66a9a90f9558..5c1dcc4a2284 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1205,7 +1205,7 @@ struct proto { #endif bool (*stream_memory_free)(const struct sock *sk, int wake); - bool (*stream_memory_read)(const struct sock *sk); + bool (*sock_is_readable)(struct sock *sk); /* Memory pressure */ void (*enter_memory_pressure)(struct sock *sk); void (*leave_memory_pressure)(struct sock *sk); @@ -2787,4 +2787,10 @@ void sock_set_sndtimeo(struct sock *sk, s64 secs); int sock_bind_add(struct sock *sk, struct sockaddr *addr, int addr_len); +static inline bool sk_is_readable(struct sock *sk) +{ + if (sk->sk_prot->sock_is_readable) + return sk->sk_prot->sock_is_readable(sk); + return false; +} #endif /* _SOCK_H */ diff --git a/include/net/tls.h b/include/net/tls.h index be4b3e1cac46..01d2e3744393 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -375,7 +375,7 @@ void tls_sw_release_resources_rx(struct sock *sk); void tls_sw_free_ctx_rx(struct tls_context *tls_ctx); int tls_sw_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, int flags, int *addr_len); -bool tls_sw_stream_read(const struct sock *sk); +bool tls_sw_sock_is_readable(struct sock *sk); ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e8b48df73c85..f5c336f8b0c8 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -486,10 +486,7 @@ static bool tcp_stream_is_readable(struct sock *sk, int target) { if (tcp_epollin_ready(sk, target)) return true; - - if (sk->sk_prot->stream_memory_read) - return sk->sk_prot->stream_memory_read(sk); - return false; + return sk_is_readable(sk); } /* diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index d3e9386b493e..0175dbcb7722 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -150,7 +150,7 @@ int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, EXPORT_SYMBOL_GPL(tcp_bpf_sendmsg_redir); #ifdef CONFIG_BPF_SYSCALL -static bool tcp_bpf_stream_read(const struct sock *sk) +static bool tcp_bpf_sock_is_readable(struct sock *sk) { struct sk_psock *psock; bool empty = true; @@ -479,7 +479,7 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS], prot[TCP_BPF_BASE].unhash = sock_map_unhash; prot[TCP_BPF_BASE].close = sock_map_close; prot[TCP_BPF_BASE].recvmsg = tcp_bpf_recvmsg; - prot[TCP_BPF_BASE].stream_memory_read = tcp_bpf_stream_read; + prot[TCP_BPF_BASE].sock_is_readable = tcp_bpf_sock_is_readable; prot[TCP_BPF_TX] = prot[TCP_BPF_BASE]; prot[TCP_BPF_TX].sendmsg = tcp_bpf_sendmsg; diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index fde56ff49163..9ab81db8a654 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -681,12 +681,12 @@ static void build_protos(struct proto prot[TLS_NUM_CONFIG][TLS_NUM_CONFIG], prot[TLS_BASE][TLS_SW] = prot[TLS_BASE][TLS_BASE]; prot[TLS_BASE][TLS_SW].recvmsg = tls_sw_recvmsg; - prot[TLS_BASE][TLS_SW].stream_memory_read = tls_sw_stream_read; + prot[TLS_BASE][TLS_SW].sock_is_readable = tls_sw_sock_is_readable; prot[TLS_BASE][TLS_SW].close = tls_sk_proto_close; prot[TLS_SW][TLS_SW] = prot[TLS_SW][TLS_BASE]; prot[TLS_SW][TLS_SW].recvmsg = tls_sw_recvmsg; - prot[TLS_SW][TLS_SW].stream_memory_read = tls_sw_stream_read; + prot[TLS_SW][TLS_SW].sock_is_readable = tls_sw_sock_is_readable; prot[TLS_SW][TLS_SW].close = tls_sk_proto_close; #ifdef CONFIG_TLS_DEVICE diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 4feb95e34b64..d5d09bd817b7 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -2026,7 +2026,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, return copied ? : err; } -bool tls_sw_stream_read(const struct sock *sk) +bool tls_sw_sock_is_readable(struct sock *sk) { struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx); From patchwork Tue Sep 28 00:22:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12521233 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A64DEC433F5 for ; Tue, 28 Sep 2021 00:22:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8D2AB60F58 for ; Tue, 28 Sep 2021 00:22:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238335AbhI1AYM (ORCPT ); Mon, 27 Sep 2021 20:24:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238332AbhI1AYL (ORCPT ); Mon, 27 Sep 2021 20:24:11 -0400 Received: from mail-qk1-x736.google.com (mail-qk1-x736.google.com [IPv6:2607:f8b0:4864:20::736]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B216BC061575; Mon, 27 Sep 2021 17:22:32 -0700 (PDT) Received: by mail-qk1-x736.google.com with SMTP id m7so22070009qke.8; Mon, 27 Sep 2021 17:22:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0dOOqTPqD213PUS/mgbjl/GrYaIJKVoRPsmSTRpXF8c=; b=BF3wHI+rf7dpq6rs5/uhCevN3xwV+P7a1pa0uQ2yCQ69LPFOWT08OaM3YxhVe6oJvM ZrZkQfEuL8b2I2XZCFka+iBnSsKB8p6qsUT4D2B5gnlOq8ZI4Lf2N0lWxZNbwQ9iv655 PP2rN9bPGgc5pqg28ixH/E13oXlJZ1TEuHOgQaZP7VRovYaxaBF2gmpw55HZQP8LXcuq ULlvdpjR/HGVBV3su7i8Z3aTjtHC36R44Ubq5T29CdYuvxauI4rab2ymqOyXpIsi4n0I RHuICuhpuxvWXnKSgopbZIV0jrOhx9vtvpnikySsdD99bnf4zZ482g7f3A9PfNjZhCPx 6bPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0dOOqTPqD213PUS/mgbjl/GrYaIJKVoRPsmSTRpXF8c=; b=CCIgkFLBbBvKcmysBM7ncaksHzgYaty8wrpd9wcKsxmPXTV39mlcqTp6okzmz1VwbA jmDRBavilPpUlhGyU0astSo0cDPSJB0oTudgterCCj4HL0RE7zRcCybhav2X41fD7bm2 cIYJUkZvyBT25q4rwYeOBBydjNK29WUt55l0WA2q3Fh2Afdk+mYq7x1nBOCPqyQUHxsk 41R7vZx0EGUka9/+3dkhNDDsKitP5wPtntuRpqz4lpz5R5+0QwRN8E9ijhMvatzsQQiu dStQK4v5niHsuSrhxX3CoyMFzrQ8cNYCId2JqTEueAP0Tna1qrnd9xJTDC+mS7a/WEJR Z7WQ== X-Gm-Message-State: AOAM533iKxS0vYbbhtiPiEkV3UCpVGjgacHZb5A4sIIfYbkjXYfXynjc wV2rX6PaeSlr1fklZMPs0Dw18x4VZz0= X-Google-Smtp-Source: ABdhPJzUT7xTm+i6YO7aJMcd1vmuSjW1QrLgoyHKQRXPavZngXvlNmOuJUiG8V/w3GsBpyi/6afUng== X-Received: by 2002:a37:a953:: with SMTP id s80mr2841088qke.211.1632788551775; Mon, 27 Sep 2021 17:22:31 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:1ce2:35c5:917e:20d7]) by smtp.gmail.com with ESMTPSA id 31sm5672308qtb.85.2021.09.27.17.22.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Sep 2021 17:22:31 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, Cong Wang , Yucong Sun , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf v2 3/4] net: implement ->sock_is_readable for UDP and AF_UNIX Date: Mon, 27 Sep 2021 17:22:11 -0700 Message-Id: <20210928002212.14498-4-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210928002212.14498-1-xiyou.wangcong@gmail.com> References: <20210928002212.14498-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Yucong noticed we can't poll() sockets in sockmap even when they are the destination sockets of redirections. This is because we never poll any psock queues in ->poll(), except for TCP. Now we can overwrite >sock_is_readable() and implement and invoke it for UDP and AF_UNIX sockets. Reported-by: Yucong Sun Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 1 + net/core/skmsg.c | 14 ++++++++++++++ net/ipv4/udp.c | 2 ++ net/ipv4/udp_bpf.c | 1 + net/unix/af_unix.c | 4 ++++ net/unix/unix_bpf.c | 2 ++ 6 files changed, 24 insertions(+) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 8f577739fc36..a25434207dca 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -128,6 +128,7 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, struct sk_msg *msg, u32 bytes); int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, int len, int flags); +bool sk_msg_is_readable(struct sock *sk); static inline void sk_msg_check_to_free(struct sk_msg *msg, u32 i, u32 bytes) { diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 2d6249b28928..93ae48581ad2 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -474,6 +474,20 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, } EXPORT_SYMBOL_GPL(sk_msg_recvmsg); +bool sk_msg_is_readable(struct sock *sk) +{ + struct sk_psock *psock; + bool empty = true; + + psock = sk_psock_get_checked(sk); + if (IS_ERR_OR_NULL(psock)) + return false; + empty = sk_psock_queue_empty(psock); + sk_psock_put(sk, psock); + return !empty; +} +EXPORT_SYMBOL_GPL(sk_msg_is_readable); + static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, struct sk_buff *skb) { diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 8851c9463b4b..9f49c0967504 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2866,6 +2866,8 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait) !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1) mask &= ~(EPOLLIN | EPOLLRDNORM); + if (sk_is_readable(sk)) + mask |= EPOLLIN | EPOLLRDNORM; return mask; } diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 7a1d5f473878..bbe6569c9ad3 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -114,6 +114,7 @@ static void udp_bpf_rebuild_protos(struct proto *prot, const struct proto *base) *prot = *base; prot->close = sock_map_close; prot->recvmsg = udp_bpf_recvmsg; + prot->sock_is_readable = sk_msg_is_readable; } static void udp_bpf_check_v6_needs_rebuild(struct proto *ops) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 92345c9bb60c..f1cbaa0ccf6b 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -3014,6 +3014,8 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa /* readable? */ if (!skb_queue_empty_lockless(&sk->sk_receive_queue)) mask |= EPOLLIN | EPOLLRDNORM; + if (sk_is_readable(sk)) + mask |= EPOLLIN | EPOLLRDNORM; /* Connection-based need to check for termination and startup */ if ((sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) && @@ -3053,6 +3055,8 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock, /* readable? */ if (!skb_queue_empty_lockless(&sk->sk_receive_queue)) mask |= EPOLLIN | EPOLLRDNORM; + if (sk_is_readable(sk)) + mask |= EPOLLIN | EPOLLRDNORM; /* Connection-based need to check for termination and startup */ if (sk->sk_type == SOCK_SEQPACKET) { diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c index b927e2baae50..452376c6f419 100644 --- a/net/unix/unix_bpf.c +++ b/net/unix/unix_bpf.c @@ -102,6 +102,7 @@ static void unix_dgram_bpf_rebuild_protos(struct proto *prot, const struct proto *prot = *base; prot->close = sock_map_close; prot->recvmsg = unix_bpf_recvmsg; + prot->sock_is_readable = sk_msg_is_readable; } static void unix_stream_bpf_rebuild_protos(struct proto *prot, @@ -110,6 +111,7 @@ static void unix_stream_bpf_rebuild_protos(struct proto *prot, *prot = *base; prot->close = sock_map_close; prot->recvmsg = unix_bpf_recvmsg; + prot->sock_is_readable = sk_msg_is_readable; prot->unhash = sock_map_unhash; } From patchwork Tue Sep 28 00:22:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12521235 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06570C433EF for ; Tue, 28 Sep 2021 00:22:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DEC8260F58 for ; Tue, 28 Sep 2021 00:22:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238347AbhI1AYN (ORCPT ); Mon, 27 Sep 2021 20:24:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238338AbhI1AYM (ORCPT ); Mon, 27 Sep 2021 20:24:12 -0400 Received: from mail-qv1-xf2e.google.com (mail-qv1-xf2e.google.com [IPv6:2607:f8b0:4864:20::f2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2E70C061575; Mon, 27 Sep 2021 17:22:33 -0700 (PDT) Received: by mail-qv1-xf2e.google.com with SMTP id a14so12356867qvb.6; Mon, 27 Sep 2021 17:22:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ysNfuXl1HWktNtOSEF03F01sMlIv53eUMXjaR2LS9EI=; b=mfbvn6Qxb6H5yu9MBKFaH1blJHXyyl+qYEPAg/celuM9VfEGhGJ4m/qvS3DwYDqMQL ejS+hVvVEgRFEPlpGr+Q/YTCHgXNfBv++uKaoa0RiyKK/Rzdt5s5kCBgQeAktpeAIO27 eWu7kd5fUCRkz4SAzZGR8N/d/gIO+d88L+T/Mhz2iIkopQeSXeHuv+f8v28XrFzBnW2X s2NS4W6D71O+GE90VXGwmsQJKyqk2BKohRXPjnOmqRMsyY0Pv2FPMDJI8QvN/gq38EVj ljn19XZCG794sldFHmsi9hTH8r1hyMm9ylbB3OwRaRhR/6VSGxhMBNqdXc+S+o6r3UKe JxtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ysNfuXl1HWktNtOSEF03F01sMlIv53eUMXjaR2LS9EI=; b=tFNlhNbK2X8H5hSmwGdt/O138rDhw1Am+IlDmCh2ccRW6+pU3Lzr9L1cagkOEZFNe+ BiBmdez8jJM7buujx5rJTBpru2F2wgGh47bf9XGCAZxs2wKTyQeb+KzKPzLYxWfO3fcR gOdezfEcFwJlIbNk8SLd6ps4YK76nSjjT3up7qhqDt8qfngPHEFiWaxPwpAhSTN3B6Dg hP9r7A/ajLLdbwmhYlcdkZNPKEwa3nObGtGdr8hr71yz0n1LDbIRAnlPYXFW2VzgIzF/ P0An4cw2IdyjHrHIPgBL/HWrM9fZ3wo0EI3pIS6UYMQAk4g9VNSF7oz2IRmQgqtZ+OxC fLfw== X-Gm-Message-State: AOAM5305rLFZhaKmwuKPm/18Snx2fyQ6JtRIrjeCc6u0keJBNHTNLiQW cRNAdseEqewDG2/EyAJCVpaYdbpqQtk= X-Google-Smtp-Source: ABdhPJym2LnHEd0GiuOeqwhiDRZsx3z9TI7/b3Dza6Owd2Zidjuvp+sEWx+V9mNAlY7NSLLxpA+Iaw== X-Received: by 2002:a0c:914f:: with SMTP id q73mr2935533qvq.39.1632788552978; Mon, 27 Sep 2021 17:22:32 -0700 (PDT) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:1ce2:35c5:917e:20d7]) by smtp.gmail.com with ESMTPSA id 31sm5672308qtb.85.2021.09.27.17.22.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Sep 2021 17:22:32 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, Yucong Sun , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer , Cong Wang Subject: [Patch bpf v2 4/4] selftests/bpf: use recv_timeout() instead of retries Date: Mon, 27 Sep 2021 17:22:12 -0700 Message-Id: <20210928002212.14498-5-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210928002212.14498-1-xiyou.wangcong@gmail.com> References: <20210928002212.14498-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Yucong Sun We use non-blocking sockets in those tests, retrying for EAGAIN is ugly because there is no upper bound for the packet arrival time, at least in theory. After we fix poll() on sockmap sockets, now we can switch to select()+recv(). Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Yucong Sun Signed-off-by: Cong Wang --- .../selftests/bpf/prog_tests/sockmap_listen.c | 75 +++++-------------- 1 file changed, 20 insertions(+), 55 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index 5c5979046523..d88bb65b74cc 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -949,7 +949,6 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd, int err, n; u32 key; char b; - int retries = 100; zero_verdict_count(verd_mapfd); @@ -1002,17 +1001,11 @@ static void redir_to_connected(int family, int sotype, int sock_mapfd, goto close_peer1; if (pass != 1) FAIL("%s: want pass count 1, have %d", log_prefix, pass); -again: - n = read(c0, &b, 1); - if (n < 0) { - if (errno == EAGAIN && retries--) { - usleep(1000); - goto again; - } - FAIL_ERRNO("%s: read", log_prefix); - } + n = recv_timeout(c0, &b, 1, 0, IO_TIMEOUT_SEC); + if (n < 0) + FAIL_ERRNO("%s: recv_timeout", log_prefix); if (n == 0) - FAIL("%s: incomplete read", log_prefix); + FAIL("%s: incomplete recv", log_prefix); close_peer1: xclose(p1); @@ -1571,7 +1564,6 @@ static void unix_redir_to_connected(int sotype, int sock_mapfd, const char *log_prefix = redir_mode_str(mode); int c0, c1, p0, p1; unsigned int pass; - int retries = 100; int err, n; int sfd[2]; u32 key; @@ -1606,17 +1598,11 @@ static void unix_redir_to_connected(int sotype, int sock_mapfd, if (pass != 1) FAIL("%s: want pass count 1, have %d", log_prefix, pass); -again: - n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1); - if (n < 0) { - if (errno == EAGAIN && retries--) { - usleep(1000); - goto again; - } - FAIL_ERRNO("%s: read", log_prefix); - } + n = recv_timeout(mode == REDIR_INGRESS ? p0 : c0, &b, 1, 0, IO_TIMEOUT_SEC); + if (n < 0) + FAIL_ERRNO("%s: recv_timeout", log_prefix); if (n == 0) - FAIL("%s: incomplete read", log_prefix); + FAIL("%s: incomplete recv", log_prefix); close: xclose(c1); @@ -1748,7 +1734,6 @@ static void udp_redir_to_connected(int family, int sock_mapfd, int verd_mapfd, const char *log_prefix = redir_mode_str(mode); int c0, c1, p0, p1; unsigned int pass; - int retries = 100; int err, n; u32 key; char b; @@ -1781,17 +1766,11 @@ static void udp_redir_to_connected(int family, int sock_mapfd, int verd_mapfd, if (pass != 1) FAIL("%s: want pass count 1, have %d", log_prefix, pass); -again: - n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1); - if (n < 0) { - if (errno == EAGAIN && retries--) { - usleep(1000); - goto again; - } - FAIL_ERRNO("%s: read", log_prefix); - } + n = recv_timeout(mode == REDIR_INGRESS ? p0 : c0, &b, 1, 0, IO_TIMEOUT_SEC); + if (n < 0) + FAIL_ERRNO("%s: recv_timeout", log_prefix); if (n == 0) - FAIL("%s: incomplete read", log_prefix); + FAIL("%s: incomplete recv", log_prefix); close_cli1: xclose(c1); @@ -1841,7 +1820,6 @@ static void inet_unix_redir_to_connected(int family, int type, int sock_mapfd, const char *log_prefix = redir_mode_str(mode); int c0, c1, p0, p1; unsigned int pass; - int retries = 100; int err, n; int sfd[2]; u32 key; @@ -1876,17 +1854,11 @@ static void inet_unix_redir_to_connected(int family, int type, int sock_mapfd, if (pass != 1) FAIL("%s: want pass count 1, have %d", log_prefix, pass); -again: - n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1); - if (n < 0) { - if (errno == EAGAIN && retries--) { - usleep(1000); - goto again; - } - FAIL_ERRNO("%s: read", log_prefix); - } + n = recv_timeout(mode == REDIR_INGRESS ? p0 : c0, &b, 1, 0, IO_TIMEOUT_SEC); + if (n < 0) + FAIL_ERRNO("%s: recv_timeout", log_prefix); if (n == 0) - FAIL("%s: incomplete read", log_prefix); + FAIL("%s: incomplete recv", log_prefix); close_cli1: xclose(c1); @@ -1932,7 +1904,6 @@ static void unix_inet_redir_to_connected(int family, int type, int sock_mapfd, int sfd[2]; u32 key; char b; - int retries = 100; zero_verdict_count(verd_mapfd); @@ -1963,17 +1934,11 @@ static void unix_inet_redir_to_connected(int family, int type, int sock_mapfd, if (pass != 1) FAIL("%s: want pass count 1, have %d", log_prefix, pass); -again: - n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1); - if (n < 0) { - if (errno == EAGAIN && retries--) { - usleep(1000); - goto again; - } - FAIL_ERRNO("%s: read", log_prefix); - } + n = recv_timeout(mode == REDIR_INGRESS ? p0 : c0, &b, 1, 0, IO_TIMEOUT_SEC); + if (n < 0) + FAIL_ERRNO("%s: recv_timeout", log_prefix); if (n == 0) - FAIL("%s: incomplete read", log_prefix); + FAIL("%s: incomplete recv", log_prefix); close: xclose(c1);