From patchwork Mon Sep 4 16:24:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13374182 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08F97C2F2; Mon, 4 Sep 2023 16:25:24 +0000 (UTC) Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D60E1184; Mon, 4 Sep 2023 09:25:22 -0700 (PDT) Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-9a1de3417acso542923866b.0; Mon, 04 Sep 2023 09:25:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693844721; x=1694449521; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SqiuOvcv1MccJsjW00UyaCQ3aIlm0yVUcGJFzbQK0k8=; b=j0Zc2yWMEWuQVAF8/c7sbXl0R+QsdqnFLu0YC+NUOxhRwlVwfM+D2DMksAaGOt7W/D tE1y0dZapcCj5QpxYZ0vcygtCeDI90erIdoN6E5QmtWrB+AbZILzi/g0ecL8FOSTjo+p 9HoqCSmu/lffm9ezgoZQns0M4VQHVgTIkm2EYjkQHkYm6vADfL/dsAIYZKeNgUOwfTQG q/z771EibTJ6R4/A+jJXV/PnmqFOJq5jm9qnqjQIVIt+pgLgIvyGMw0U3JZMC+1QGmiL RrMKn3CoUVQcudq1Rfm5qL7kuYIr8hcGhDILSAxLxtXU/E9hkfwKq1/MS1+GkJSjRwTG JLrg== X-Gm-Message-State: AOJu0YxVO9Pjz6oSrqeslR9ofLA4ILPVZl63fsZ9/ATVczWOU24HVG4W UALkQWV60ipLzueOpNO1plU= X-Google-Smtp-Source: AGHT+IE0zM3MHfyNsy+4bdBdNbhQGTTsTad6iANbAQA59NTlTt3uDAD/ahrt4Yvh54VN9iK5ejR7ug== X-Received: by 2002:a17:907:72c9:b0:9a5:b247:3ab with SMTP id du9-20020a17090772c900b009a5b24703abmr15234852ejc.19.1693844721093; Mon, 04 Sep 2023 09:25:21 -0700 (PDT) Received: from localhost (fwdproxy-cln-023.fbsv.net. [2a03:2880:31ff:17::face:b00c]) by smtp.gmail.com with ESMTPSA id e7-20020a170906248700b009920a690cd9sm6356382ejb.59.2023.09.04.09.25.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Sep 2023 09:25:20 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, krisman@suse.de, Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Song Liu , Yonghong Song , KP Singh , Hao Luo , Jiri Olsa , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org Subject: [PATCH v4 01/10] bpf: Leverage sockptr_t in BPF getsockopt hook Date: Mon, 4 Sep 2023 09:24:54 -0700 Message-Id: <20230904162504.1356068-2-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230904162504.1356068-1-leitao@debian.org> References: <20230904162504.1356068-1-leitao@debian.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Leverage sockptr_t structure to pass universal pointer as argument, that holds either a userspace pointer, or, a kernel pointer. This makes this function flexible, so, we can mix and match user and kernel space pointers. The main motivation for this change is to use it in the io_uring {g,s}etsockopt(), which will use a userspace pointer for *optval, but, a kernel value for optlen. Signed-off-by: Breno Leitao --- include/linux/bpf-cgroup.h | 5 +++-- kernel/bpf/cgroup.c | 20 +++++++++++--------- net/socket.c | 5 +++-- 3 files changed, 17 insertions(+), 13 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 8506690dbb9c..f5b4fb6ed8c6 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -139,9 +139,10 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head, int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level, int *optname, char __user *optval, int *optlen, char **kernel_optval); + int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, - int optname, char __user *optval, - int __user *optlen, int max_optlen, + int optname, sockptr_t optval, + sockptr_t optlen, int max_optlen, int retval); int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level, diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 5b2741aa0d9b..ebc8c58f7e46 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1875,8 +1875,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, } int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, - int optname, char __user *optval, - int __user *optlen, int max_optlen, + int optname, sockptr_t optval, + sockptr_t optlen, int max_optlen, int retval) { struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); @@ -1903,8 +1903,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, * one that kernel returned as well to let * BPF programs inspect the value. */ - - if (get_user(ctx.optlen, optlen)) { + if (copy_from_sockptr(&ctx.optlen, optlen, + sizeof(ctx.optlen))) { ret = -EFAULT; goto out; } @@ -1915,8 +1915,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, } orig_optlen = ctx.optlen; - if (copy_from_user(ctx.optval, optval, - min(ctx.optlen, max_optlen)) != 0) { + if (copy_from_sockptr(ctx.optval, optval, + min(ctx.optlen, max_optlen))) { ret = -EFAULT; goto out; } @@ -1930,7 +1930,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, if (ret < 0) goto out; - if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) { + if (!sockptr_is_null(optval) && + (ctx.optlen > max_optlen || ctx.optlen < 0)) { if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) { pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n", ctx.optlen, max_optlen); @@ -1942,11 +1943,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, } if (ctx.optlen != 0) { - if (optval && copy_to_user(optval, ctx.optval, ctx.optlen)) { + if (!sockptr_is_null(optval) && + copy_to_sockptr(optval, ctx.optval, ctx.optlen)) { ret = -EFAULT; goto out; } - if (put_user(ctx.optlen, optlen)) { + if (copy_to_sockptr(optlen, &ctx.optlen, sizeof(ctx.optlen))) { ret = -EFAULT; goto out; } diff --git a/net/socket.c b/net/socket.c index 77f28328e387..6fda5d011521 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2355,8 +2355,9 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval, if (!in_compat_syscall()) err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname, - optval, optlen, max_optlen, - err); + USER_SOCKPTR(optval), + USER_SOCKPTR(optlen), + max_optlen, err); out_put: fput_light(sock->file, fput_needed); return err; From patchwork Mon Sep 4 16:24:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13374183 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A22B2C8EB; Mon, 4 Sep 2023 16:25:25 +0000 (UTC) Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5324A18B; Mon, 4 Sep 2023 09:25:24 -0700 (PDT) Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-52a4737a08fso2149587a12.3; Mon, 04 Sep 2023 09:25:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693844723; x=1694449523; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IUL6OhPSlmc9b0ojpccZ5UMJYWzfKfwD3kXIKD6Vw5c=; b=Ip4ojanxIMbB1us7R3axNK4aY8r2jFrdH3KO1U2WrgTZEST0mBqE9/wL2fkE5VbEBc U4X5dB1B/ErVtAPMeNWcXdRDiTFMVY7npxlVYHlW76zyGfofdtQqhdO94L1cxfBBoI/D Hl5QXnNqg2ovQny1aJS8A1Ff0lJiOfM5/QOjCrAND42qWxjmRwGjDJrCK8THFaoWglvp kbo9c/8vYPRWZOnqmVML/F4Z+kkCQsmB+5fFOTXbl6SMMNGF1m+wjgTtPPvNkaBzb1hu YwXehJuhOGol70f50B16HWbYbo6Il+yXV15tyq/pThVsxeefqC9JptTm27lzbQ4F3Q1I 35mQ== X-Gm-Message-State: AOJu0YyN+m+J/AXwrxwUUn+BR7orhBwtirlX0aC8TTzJoyZL37bykg4p 0h6hTr9rUyTyrCHGjHTGtfo= X-Google-Smtp-Source: AGHT+IGbGNrDUERojvuHE3aYx8OauARPUDOpinBUn7+ySnAYjNNZQ1ONkGITStggwcvefZ2dq5hrcw== X-Received: by 2002:a05:6402:2047:b0:523:95e:c2c0 with SMTP id bc7-20020a056402204700b00523095ec2c0mr7062520edb.42.1693844722624; Mon, 04 Sep 2023 09:25:22 -0700 (PDT) Received: from localhost (fwdproxy-cln-006.fbsv.net. [2a03:2880:31ff:6::face:b00c]) by smtp.gmail.com with ESMTPSA id w11-20020aa7d28b000000b0052565298bedsm5987110edq.34.2023.09.04.09.25.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Sep 2023 09:25:22 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, krisman@suse.de, Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Song Liu , Yonghong Song , KP Singh , Hao Luo , Jiri Olsa , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org Subject: [PATCH v4 02/10] bpf: Leverage sockptr_t in BPF setsockopt hook Date: Mon, 4 Sep 2023 09:24:55 -0700 Message-Id: <20230904162504.1356068-3-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230904162504.1356068-1-leitao@debian.org> References: <20230904162504.1356068-1-leitao@debian.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Change BPF setsockopt hook (__cgroup_bpf_run_filter_setsockopt()) to use sockptr instead of user pointers. This brings flexibility to the function, since it could be called with userspace or kernel pointers. This change will allow the creation of a core sock_setsockopt, called do_sock_setsockopt(), which will be called from both the system call path and by io_uring command. This also aligns with the getsockopt() counterpart, which is now using sockptr_t universal pointer. Signed-off-by: Breno Leitao --- include/linux/bpf-cgroup.h | 2 +- kernel/bpf/cgroup.c | 5 +++-- net/socket.c | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index f5b4fb6ed8c6..cecfe8c99f28 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -137,7 +137,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head, enum cgroup_bpf_attach_type atype); int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level, - int *optname, char __user *optval, + int *optname, sockptr_t optval, int *optlen, char **kernel_optval); int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index ebc8c58f7e46..f0dedd4f7f2e 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1785,7 +1785,7 @@ static bool sockopt_buf_allocated(struct bpf_sockopt_kern *ctx, } int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, - int *optname, char __user *optval, + int *optname, sockptr_t optval, int *optlen, char **kernel_optval) { struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); @@ -1808,7 +1808,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, ctx.optlen = *optlen; - if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) { + if (copy_from_sockptr(ctx.optval, optval, + min(*optlen, max_optlen))) { ret = -EFAULT; goto out; } diff --git a/net/socket.c b/net/socket.c index 6fda5d011521..9ec9a8a07c0e 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2287,7 +2287,7 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, if (!in_compat_syscall()) err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname, - user_optval, &optlen, + optval, &optlen, &kernel_optval); if (err < 0) goto out_put; From patchwork Mon Sep 4 16:24:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13374184 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E5164C8EB; Mon, 4 Sep 2023 16:25:26 +0000 (UTC) Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF2809D; Mon, 4 Sep 2023 09:25:25 -0700 (PDT) Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-52713d2c606so2198624a12.2; Mon, 04 Sep 2023 09:25:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693844724; x=1694449524; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PBda6rT0pL/B59nG5AgliJCLsy8+kPWSe+s29wcarWY=; b=BDajSyoHg1pnz7M7ag0XKpV0LkJPVrcUIIhZndCm2em5LDAQDiRJcB2jKZaBfpE7QK F1RVMy4ifhzk6MIbQ2W6SaVpJEFXjobkcZqLSmJ8fv7CXBsvWSo2tPILzbBZttKTNGvA 1wjN95OZpMr72384lnisUvU+/USaM4fZa5RlqatcgLXrgm0j/wZzbvYmJzvR9niWy9tJ 8xPyinC5Esx/BdR9p8k/sAurdjn3ISAsltHN7G7AP/REU9k9DbYcU3Y8WSoFIzebGSBS ViDraF3jTL/Y5A2gpHtLV5ZfmH2MnpGV6stDPB9H/EwUFkNiYV0huLaxpnuo38TBDGvz Qj4Q== X-Gm-Message-State: AOJu0YwjB/92yKlZLI3K9TLz93uqG3TqQRHud/uI7JgKA2cVx5mrXmit RLTWU4rJlcopBe1ksb54/ac= X-Google-Smtp-Source: AGHT+IHZKR+9DvETpBcmV6dj1OZh+YayfA1YpM9KTsLZ6mqWKVA1d3MF8/OaJ9OgfHCIheZcXhOCEQ== X-Received: by 2002:a17:907:78c1:b0:9a1:c991:a51c with SMTP id kv1-20020a17090778c100b009a1c991a51cmr7528046ejc.2.1693844724180; Mon, 04 Sep 2023 09:25:24 -0700 (PDT) Received: from localhost (fwdproxy-cln-118.fbsv.net. [2a03:2880:31ff:76::face:b00c]) by smtp.gmail.com with ESMTPSA id m26-20020a170906259a00b0099ca4f61a8bsm6406935ejb.92.2023.09.04.09.25.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Sep 2023 09:25:23 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, krisman@suse.de, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, Willem de Bruijn Subject: [PATCH v4 03/10] net/socket: Break down __sys_setsockopt Date: Mon, 4 Sep 2023 09:24:56 -0700 Message-Id: <20230904162504.1356068-4-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230904162504.1356068-1-leitao@debian.org> References: <20230904162504.1356068-1-leitao@debian.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org Split __sys_setsockopt() into two functions by removing the core logic into a sub-function (do_sock_setsockopt()). This will avoid code duplication when doing the same operation in other callers, for instance. do_sock_setsockopt() will be called by io_uring setsockopt() command operation in the following patch. Signed-off-by: Breno Leitao Reviewed-by: Willem de Bruijn --- include/net/sock.h | 2 ++ net/socket.c | 39 +++++++++++++++++++++++++-------------- 2 files changed, 27 insertions(+), 14 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 11d503417591..b059f9272303 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1861,6 +1861,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int sock_setsockopt(struct socket *sock, int level, int op, sockptr_t optval, unsigned int optlen); +int do_sock_setsockopt(struct socket *sock, bool compat, int level, + int optname, sockptr_t optval, int optlen); int sk_getsockopt(struct sock *sk, int level, int optname, sockptr_t optval, sockptr_t optlen); diff --git a/net/socket.c b/net/socket.c index 9ec9a8a07c0e..3bf29a27653f 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2261,31 +2261,21 @@ static bool sock_use_custom_sol_socket(const struct socket *sock) return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags); } -/* - * Set a socket option. Because we don't know the option lengths we have - * to pass the user mode parameter for the protocols to sort out. - */ -int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, - int optlen) +int do_sock_setsockopt(struct socket *sock, bool compat, int level, + int optname, sockptr_t optval, int optlen) { - sockptr_t optval = USER_SOCKPTR(user_optval); const struct proto_ops *ops; char *kernel_optval = NULL; - int err, fput_needed; - struct socket *sock; + int err; if (optlen < 0) return -EINVAL; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - return err; - err = security_socket_setsockopt(sock, level, optname); if (err) goto out_put; - if (!in_compat_syscall()) + if (!compat) err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname, optval, &optlen, &kernel_optval); @@ -2308,6 +2298,27 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, optlen); kfree(kernel_optval); out_put: + return err; +} +EXPORT_SYMBOL(do_sock_setsockopt); + +/* Set a socket option. Because we don't know the option lengths we have + * to pass the user mode parameter for the protocols to sort out. + */ +int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, + int optlen) +{ + sockptr_t optval = USER_SOCKPTR(user_optval); + bool compat = in_compat_syscall(); + int err, fput_needed; + struct socket *sock; + + sock = sockfd_lookup_light(fd, &err, &fput_needed); + if (!sock) + return err; + + err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen); + fput_light(sock->file, fput_needed); return err; } From patchwork Mon Sep 4 16:24:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13374185 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF6D0CA53; Mon, 4 Sep 2023 16:25:30 +0000 (UTC) Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79B659D; Mon, 4 Sep 2023 09:25:29 -0700 (PDT) Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-9a1de3417acso542957066b.0; Mon, 04 Sep 2023 09:25:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693844728; x=1694449528; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=82OoBXgBAbznwkXu05U9ZI1aovNZn8WR0zg3FquGFJc=; b=k8Sopg0P1FsXkxle1F3S6sAVObIuQSuBOLU41iXnxjBOSL5LyIoErzP6pYN5NO83dm Hjs/sCIdAtC1sI3LpgYqMiD9mNIp31cWadl5S1MFOoLt+Us994bVsk+a+fnT1pdEIXef WH0nzaIwbfzhpUf3MQ7dq+YfgASgln/33VsrFDtBv6kOuzBBvh1nXCIGGBrgfP68AFC5 HoE5lmoRtuth4XL9vzmXhA1VW6+5pgm95gPR1dqyUnYYScks/insI2beu+G3XBvyKTRZ z3b93BZ1aGpGv079T+oszboV78W5Hc2PQTnPtv4aOaRMVIpThfkO54m7xxNUTu3xEUtQ fMhw== X-Gm-Message-State: AOJu0YzgMJmRo/VsTQoXP1Y+0FbW2L1aSfGKhCDCUu3dOVcaMP6Ax/8p qmGHIjsvREh0TIoqL3yvDoI= X-Google-Smtp-Source: AGHT+IF4peA8bbZRtqCcH5BZyiuX1dDl1ated1EUKd9FT2qyulTi0W5D90fIRaNHdoPDFpMjb44shg== X-Received: by 2002:a17:907:1c86:b0:9a5:b2d8:e925 with SMTP id nb6-20020a1709071c8600b009a5b2d8e925mr15418113ejc.33.1693844728005; Mon, 04 Sep 2023 09:25:28 -0700 (PDT) Received: from localhost (fwdproxy-cln-120.fbsv.net. [2a03:2880:31ff:78::face:b00c]) by smtp.gmail.com with ESMTPSA id p27-20020a17090635db00b009a168ab6ee2sm6316951ejb.164.2023.09.04.09.25.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Sep 2023 09:25:27 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, krisman@suse.de, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Song Liu , Yonghong Song , John Fastabend , KP Singh , Hao Luo , Jiri Olsa , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, Kuniyuki Iwashima , Alexander Mikhalitsyn , Xin Long , David Howells , Jason Xing , Andy Shevchenko Subject: [PATCH v4 04/10] net/socket: Break down __sys_getsockopt Date: Mon, 4 Sep 2023 09:24:57 -0700 Message-Id: <20230904162504.1356068-5-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230904162504.1356068-1-leitao@debian.org> References: <20230904162504.1356068-1-leitao@debian.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Split __sys_getsockopt() into two functions by removing the core logic into a sub-function (do_sock_getsockopt()). This will avoid code duplication when doing the same operation in other callers, for instance. do_sock_getsockopt() will be called by io_uring getsockopt() command operation in the following patch. The same was done for the setsockopt pair. Suggested-by: Martin KaFai Lau Signed-off-by: Breno Leitao --- include/linux/bpf-cgroup.h | 2 +- include/net/sock.h | 2 ++ net/core/sock.c | 8 ----- net/socket.c | 62 ++++++++++++++++++++++++-------------- 4 files changed, 42 insertions(+), 32 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index cecfe8c99f28..ffaca1ab5e8d 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -378,7 +378,7 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk, ({ \ int __ret = 0; \ if (cgroup_bpf_enabled(CGROUP_GETSOCKOPT)) \ - get_user(__ret, optlen); \ + copy_from_sockptr(&__ret, optlen, sizeof(int)); \ __ret; \ }) diff --git a/include/net/sock.h b/include/net/sock.h index b059f9272303..c0185121efe4 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1863,6 +1863,8 @@ int sock_setsockopt(struct socket *sock, int level, int op, sockptr_t optval, unsigned int optlen); int do_sock_setsockopt(struct socket *sock, bool compat, int level, int optname, sockptr_t optval, int optlen); +int do_sock_getsockopt(struct socket *sock, bool compat, int level, + int optname, sockptr_t optval, sockptr_t optlen); int sk_getsockopt(struct sock *sk, int level, int optname, sockptr_t optval, sockptr_t optlen); diff --git a/net/core/sock.c b/net/core/sock.c index 666a17cab4f5..cf15394ed664 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2009,14 +2009,6 @@ int sk_getsockopt(struct sock *sk, int level, int optname, return 0; } -int sock_getsockopt(struct socket *sock, int level, int optname, - char __user *optval, int __user *optlen) -{ - return sk_getsockopt(sock->sk, level, optname, - USER_SOCKPTR(optval), - USER_SOCKPTR(optlen)); -} - /* * Initialize an sk_lock. * diff --git a/net/socket.c b/net/socket.c index 3bf29a27653f..c79d2b2b902e 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2332,6 +2332,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname, INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level, int optname)); +int do_sock_getsockopt(struct socket *sock, bool compat, int level, + int optname, sockptr_t optval, sockptr_t optlen) +{ + int max_optlen __maybe_unused; + const struct proto_ops *ops; + int err; + + err = security_socket_getsockopt(sock, level, optname); + if (err) + return err; + + ops = READ_ONCE(sock->ops); + if (level == SOL_SOCKET) { + err = sk_getsockopt(sock->sk, level, optname, optval, optlen); + } else if (unlikely(!ops->getsockopt)) { + err = -EOPNOTSUPP; + } else { + if (WARN_ONCE(optval.is_kernel || optlen.is_kernel, + "Invalid argument type")) + return -EOPNOTSUPP; + + err = ops->getsockopt(sock, level, optname, optval.user, + optlen.user); + } + + if (!compat) { + max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen); + err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname, + optval, optlen, max_optlen, + err); + } + + return err; +} +EXPORT_SYMBOL(do_sock_getsockopt); + /* * Get a socket option. Because we don't know the option lengths we have * to pass a user mode parameter for the protocols to sort out. @@ -2339,37 +2375,17 @@ INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level, int __sys_getsockopt(int fd, int level, int optname, char __user *optval, int __user *optlen) { - int max_optlen __maybe_unused; - const struct proto_ops *ops; int err, fput_needed; + bool compat = in_compat_syscall(); struct socket *sock; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) return err; - err = security_socket_getsockopt(sock, level, optname); - if (err) - goto out_put; - - if (!in_compat_syscall()) - max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen); + err = do_sock_getsockopt(sock, compat, level, optname, + USER_SOCKPTR(optval), USER_SOCKPTR(optlen)); - ops = READ_ONCE(sock->ops); - if (level == SOL_SOCKET) - err = sock_getsockopt(sock, level, optname, optval, optlen); - else if (unlikely(!ops->getsockopt)) - err = -EOPNOTSUPP; - else - err = ops->getsockopt(sock, level, optname, optval, - optlen); - - if (!in_compat_syscall()) - err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname, - USER_SOCKPTR(optval), - USER_SOCKPTR(optlen), - max_optlen, err); -out_put: fput_light(sock->file, fput_needed); return err; } From patchwork Mon Sep 4 16:24:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13374186 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 572DCCA53; Mon, 4 Sep 2023 16:25:34 +0000 (UTC) Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 132E0CCB; Mon, 4 Sep 2023 09:25:31 -0700 (PDT) Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-99c4923195dso251581466b.2; Mon, 04 Sep 2023 09:25:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693844729; x=1694449529; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gptGKutv8UrS/zDN0Qmum7MxSe9eEjLIE1n6bMLfanE=; b=PPiJxb2qVfCCL7JplmxT7GpqZWbDxxtRVA6QNPI+6ywvM8txikQnwDWRJyP40ZS86d n20kb83lAiRu/OCx45n1nOEtaQL2cZUPSSPzyHguYVzWN35NNQnRwaQGvJc0MaD5AtK8 J89j60/uh6p93PPIq4nbp1J9Fk6koEU5MOCa0NjYwaHIEVCQMVQfxL93DSvipAIfkyYp K1Eyb3pASZDPnvoojD/O+cbAQq9arYFNpSQsPmxZd/VUtEJd15sKpvNzjfXHNvqDhNrH a5VezUoT36IVGNVgM4PfXp2lbEosv3zHlol9FgwWhvHuDYfh8GHqOduz5NGbKtiXrfQ5 UXwg== X-Gm-Message-State: AOJu0Yz/UouvmosxcTvpkd7W1i4Ozljt4hkTT2PpMJ50DDUalGrShAjr JLfFTQfLg2mje+lpt9b87ow= X-Google-Smtp-Source: AGHT+IFo6xxnSKt/kN/bp1HywJmCyJh2XEsoJurKM5LSZG5dZcn95+mnh9oQe7tD1zkAgGEtZ2MM1g== X-Received: by 2002:a17:907:d690:b0:9a2:2635:dab6 with SMTP id wf16-20020a170907d69000b009a22635dab6mr8211523ejc.47.1693844729572; Mon, 04 Sep 2023 09:25:29 -0700 (PDT) Received: from localhost (fwdproxy-cln-000.fbsv.net. [2a03:2880:31ff::face:b00c]) by smtp.gmail.com with ESMTPSA id se22-20020a170906ce5600b009a1dbf55665sm6365416ejb.161.2023.09.04.09.25.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Sep 2023 09:25:29 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, krisman@suse.de Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH v4 05/10] io_uring/cmd: Pass compat mode in issue_flags Date: Mon, 4 Sep 2023 09:24:58 -0700 Message-Id: <20230904162504.1356068-6-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230904162504.1356068-1-leitao@debian.org> References: <20230904162504.1356068-1-leitao@debian.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Create a new flag to track if the operation is running compat mode. This basically check the context->compat and pass it to the issue_flags, so, it could be queried later in the callbacks. Signed-off-by: Breno Leitao --- include/linux/io_uring.h | 1 + io_uring/uring_cmd.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 106cdc55ff3b..bc53b35966ed 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -20,6 +20,7 @@ enum io_uring_cmd_flags { IO_URING_F_SQE128 = (1 << 8), IO_URING_F_CQE32 = (1 << 9), IO_URING_F_IOPOLL = (1 << 10), + IO_URING_F_COMPAT = (1 << 11), }; struct io_uring_cmd { diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 537795fddc87..60f843a357e0 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -128,6 +128,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) issue_flags |= IO_URING_F_SQE128; if (ctx->flags & IORING_SETUP_CQE32) issue_flags |= IO_URING_F_CQE32; + if (ctx->compat) + issue_flags |= IO_URING_F_COMPAT; if (ctx->flags & IORING_SETUP_IOPOLL) { if (!file->f_op->uring_cmd_iopoll) return -EOPNOTSUPP; From patchwork Mon Sep 4 16:24:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13374187 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EDB3CA53; Mon, 4 Sep 2023 16:25:46 +0000 (UTC) Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com [209.85.208.173]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA1EA172E; Mon, 4 Sep 2023 09:25:40 -0700 (PDT) Received: by mail-lj1-f173.google.com with SMTP id 38308e7fff4ca-2bcc14ea414so25299751fa.0; Mon, 04 Sep 2023 09:25:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693844739; x=1694449539; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3iLyRgYclHhZM1Y71ZKboFXo735FUOk9tO4mGahzlY4=; b=VcuLlTTaimA4SG4WtRBJizWIBArdrlUF/UXu6WOzAHLhe1kW0uaziyMsA6UpyMnCeH rZW7pw12lb9cHIgwvUy5+OrwJO6XeXcUZJ2RTwfRZ8RWvct0TlkhGoOoN8X2tj3ja1QM GV46qlQsanE0Y5QX4wivm/epF1IMyGwLKi39NkIktqqbOFjdSVdMGpSdZksgU20rP8uE MGPT8fgKCbe/vBJimv7tCDvLjyt6wz1JcLMqQF+zsHLu/LICTQU2e4FcHGb/SC3F/ar8 mdTldxBcDRz8TBIi9Jk1R9PIPQ78vupaBW81IXoSn6mPKipIA/rsIALg5ReZGy8YPi29 bFVQ== X-Gm-Message-State: AOJu0YwwPosoxJjv3Odi9FDqWGy+1/LRlm0dphKdw7kdWhWGPL9GL8Yg 8wsTerrVym/c2kCgh3i/qB4= X-Google-Smtp-Source: AGHT+IFxX3CDBSTwV5m38XwbTfu4RbguDc0xFKB1SkeL3wl67kcqOgL8VCLLC1K8ENCIJvEyT6kbYA== X-Received: by 2002:a2e:8552:0:b0:2b6:fa3f:9230 with SMTP id u18-20020a2e8552000000b002b6fa3f9230mr7010247ljj.46.1693844738689; Mon, 04 Sep 2023 09:25:38 -0700 (PDT) Received: from localhost (fwdproxy-cln-014.fbsv.net. [2a03:2880:31ff:e::face:b00c]) by smtp.gmail.com with ESMTPSA id pw9-20020a17090720a900b0098884f86e41sm6282688ejb.123.2023.09.04.09.25.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Sep 2023 09:25:38 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, krisman@suse.de, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Shuah Khan Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Subject: [PATCH v4 06/10] selftests/net: Extract uring helpers to be reusable Date: Mon, 4 Sep 2023 09:24:59 -0700 Message-Id: <20230904162504.1356068-7-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230904162504.1356068-1-leitao@debian.org> References: <20230904162504.1356068-1-leitao@debian.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org Instead of defining basic io_uring functions in the test case, move them to a common directory, so, other tests can use them. This simplify the test code and reuse the common liburing infrastructure. This is basically a copy of what we have in io_uring_zerocopy_tx with some minor improvements to make checkpatch happy. A follow-up test will use the same helpers in a BPF sockopt test. Signed-off-by: Breno Leitao --- tools/include/io_uring/mini_liburing.h | 282 ++++++++++++++++++ tools/testing/selftests/net/Makefile | 1 + .../selftests/net/io_uring_zerocopy_tx.c | 268 +---------------- 3 files changed, 285 insertions(+), 266 deletions(-) create mode 100644 tools/include/io_uring/mini_liburing.h diff --git a/tools/include/io_uring/mini_liburing.h b/tools/include/io_uring/mini_liburing.h new file mode 100644 index 000000000000..9ccb16074eb5 --- /dev/null +++ b/tools/include/io_uring/mini_liburing.h @@ -0,0 +1,282 @@ +/* SPDX-License-Identifier: MIT */ + +#include +#include +#include +#include +#include +#include + +struct io_sq_ring { + unsigned int *head; + unsigned int *tail; + unsigned int *ring_mask; + unsigned int *ring_entries; + unsigned int *flags; + unsigned int *array; +}; + +struct io_cq_ring { + unsigned int *head; + unsigned int *tail; + unsigned int *ring_mask; + unsigned int *ring_entries; + struct io_uring_cqe *cqes; +}; + +struct io_uring_sq { + unsigned int *khead; + unsigned int *ktail; + unsigned int *kring_mask; + unsigned int *kring_entries; + unsigned int *kflags; + unsigned int *kdropped; + unsigned int *array; + struct io_uring_sqe *sqes; + + unsigned int sqe_head; + unsigned int sqe_tail; + + size_t ring_sz; +}; + +struct io_uring_cq { + unsigned int *khead; + unsigned int *ktail; + unsigned int *kring_mask; + unsigned int *kring_entries; + unsigned int *koverflow; + struct io_uring_cqe *cqes; + + size_t ring_sz; +}; + +struct io_uring { + struct io_uring_sq sq; + struct io_uring_cq cq; + int ring_fd; +}; + +#if defined(__x86_64) || defined(__i386__) +#define read_barrier() __asm__ __volatile__("":::"memory") +#define write_barrier() __asm__ __volatile__("":::"memory") +#else +#define read_barrier() __sync_synchronize() +#define write_barrier() __sync_synchronize() +#endif + +static inline int io_uring_mmap(int fd, struct io_uring_params *p, + struct io_uring_sq *sq, struct io_uring_cq *cq) +{ + size_t size; + void *ptr; + int ret; + + sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned int); + ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING); + if (ptr == MAP_FAILED) + return -errno; + sq->khead = ptr + p->sq_off.head; + sq->ktail = ptr + p->sq_off.tail; + sq->kring_mask = ptr + p->sq_off.ring_mask; + sq->kring_entries = ptr + p->sq_off.ring_entries; + sq->kflags = ptr + p->sq_off.flags; + sq->kdropped = ptr + p->sq_off.dropped; + sq->array = ptr + p->sq_off.array; + + size = p->sq_entries * sizeof(struct io_uring_sqe); + sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES); + if (sq->sqes == MAP_FAILED) { + ret = -errno; +err: + munmap(sq->khead, sq->ring_sz); + return ret; + } + + cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe); + ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING); + if (ptr == MAP_FAILED) { + ret = -errno; + munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe)); + goto err; + } + cq->khead = ptr + p->cq_off.head; + cq->ktail = ptr + p->cq_off.tail; + cq->kring_mask = ptr + p->cq_off.ring_mask; + cq->kring_entries = ptr + p->cq_off.ring_entries; + cq->koverflow = ptr + p->cq_off.overflow; + cq->cqes = ptr + p->cq_off.cqes; + return 0; +} + +static inline int io_uring_setup(unsigned int entries, + struct io_uring_params *p) +{ + return syscall(__NR_io_uring_setup, entries, p); +} + +static inline int io_uring_enter(int fd, unsigned int to_submit, + unsigned int min_complete, + unsigned int flags, sigset_t *sig) +{ + return syscall(__NR_io_uring_enter, fd, to_submit, min_complete, + flags, sig, _NSIG / 8); +} + +static inline int io_uring_queue_init(unsigned int entries, + struct io_uring *ring, + unsigned int flags) +{ + struct io_uring_params p; + int fd, ret; + + memset(ring, 0, sizeof(*ring)); + memset(&p, 0, sizeof(p)); + p.flags = flags; + + fd = io_uring_setup(entries, &p); + if (fd < 0) + return fd; + ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq); + if (!ret) + ring->ring_fd = fd; + else + close(fd); + return ret; +} + +/* Get a sqe */ +static inline struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + + if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries) + return NULL; + return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask]; +} + +static inline int io_uring_wait_cqe(struct io_uring *ring, + struct io_uring_cqe **cqe_ptr) +{ + struct io_uring_cq *cq = &ring->cq; + const unsigned int mask = *cq->kring_mask; + unsigned int head = *cq->khead; + int ret; + + *cqe_ptr = NULL; + do { + read_barrier(); + if (head != *cq->ktail) { + *cqe_ptr = &cq->cqes[head & mask]; + break; + } + ret = io_uring_enter(ring->ring_fd, 0, 1, + IORING_ENTER_GETEVENTS, NULL); + if (ret < 0) + return -errno; + } while (1); + + return 0; +} + +static inline int io_uring_submit(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + const unsigned int mask = *sq->kring_mask; + unsigned int ktail, submitted, to_submit; + int ret; + + read_barrier(); + if (*sq->khead != *sq->ktail) { + submitted = *sq->kring_entries; + goto submit; + } + if (sq->sqe_head == sq->sqe_tail) + return 0; + + ktail = *sq->ktail; + to_submit = sq->sqe_tail - sq->sqe_head; + for (submitted = 0; submitted < to_submit; submitted++) { + read_barrier(); + sq->array[ktail++ & mask] = sq->sqe_head++ & mask; + } + if (!submitted) + return 0; + + if (*sq->ktail != ktail) { + write_barrier(); + *sq->ktail = ktail; + write_barrier(); + } +submit: + ret = io_uring_enter(ring->ring_fd, submitted, 0, + IORING_ENTER_GETEVENTS, NULL); + return ret < 0 ? -errno : ret; +} + +static inline void io_uring_queue_exit(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + + munmap(sq->sqes, *sq->kring_entries * sizeof(struct io_uring_sqe)); + munmap(sq->khead, sq->ring_sz); + close(ring->ring_fd); +} + +/* Prepare and send the SQE */ +static inline void io_uring_prep_cmd(struct io_uring_sqe *sqe, int op, + int sockfd, + int level, int optname, + const void *optval, + int optlen) +{ + memset(sqe, 0, sizeof(*sqe)); + sqe->opcode = (__u8)IORING_OP_URING_CMD; + sqe->fd = sockfd; + sqe->cmd_op = op; + + sqe->level = level; + sqe->optname = optname; + sqe->optval = (unsigned long long)optval; + sqe->optlen = optlen; +} + +static inline int io_uring_register_buffers(struct io_uring *ring, + const struct iovec *iovecs, + unsigned int nr_iovecs) +{ + int ret; + + ret = syscall(__NR_io_uring_register, ring->ring_fd, + IORING_REGISTER_BUFFERS, iovecs, nr_iovecs); + return (ret < 0) ? -errno : ret; +} + +static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags) +{ + memset(sqe, 0, sizeof(*sqe)); + sqe->opcode = (__u8)IORING_OP_SEND; + sqe->fd = sockfd; + sqe->addr = (unsigned long)buf; + sqe->len = len; + sqe->msg_flags = (__u32)flags; +} + +static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags, + unsigned int zc_flags) +{ + io_uring_prep_send(sqe, sockfd, buf, len, flags); + sqe->opcode = (__u8)IORING_OP_SEND_ZC; + sqe->ioprio = zc_flags; +} + +static inline void io_uring_cqe_seen(struct io_uring *ring) +{ + *(&ring->cq)->khead += 1; + write_barrier(); +} diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 8b017070960d..f8d99837b9dc 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -98,6 +98,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma $(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto $(OUTPUT)/tcp_inq: LDLIBS += -lpthread $(OUTPUT)/bind_bhash: LDLIBS += -lpthread +$(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/ # Rules to generate bpf obj nat6to4.o CLANG ?= clang diff --git a/tools/testing/selftests/net/io_uring_zerocopy_tx.c b/tools/testing/selftests/net/io_uring_zerocopy_tx.c index 154287740172..76e604e4810e 100644 --- a/tools/testing/selftests/net/io_uring_zerocopy_tx.c +++ b/tools/testing/selftests/net/io_uring_zerocopy_tx.c @@ -36,6 +36,8 @@ #include #include +#include + #define NOTIF_TAG 0xfffffffULL #define NONZC_TAG 0 #define ZC_TAG 1 @@ -60,272 +62,6 @@ static struct sockaddr_storage cfg_dst_addr; static char payload[IP_MAXPACKET] __attribute__((aligned(4096))); -struct io_sq_ring { - unsigned *head; - unsigned *tail; - unsigned *ring_mask; - unsigned *ring_entries; - unsigned *flags; - unsigned *array; -}; - -struct io_cq_ring { - unsigned *head; - unsigned *tail; - unsigned *ring_mask; - unsigned *ring_entries; - struct io_uring_cqe *cqes; -}; - -struct io_uring_sq { - unsigned *khead; - unsigned *ktail; - unsigned *kring_mask; - unsigned *kring_entries; - unsigned *kflags; - unsigned *kdropped; - unsigned *array; - struct io_uring_sqe *sqes; - - unsigned sqe_head; - unsigned sqe_tail; - - size_t ring_sz; -}; - -struct io_uring_cq { - unsigned *khead; - unsigned *ktail; - unsigned *kring_mask; - unsigned *kring_entries; - unsigned *koverflow; - struct io_uring_cqe *cqes; - - size_t ring_sz; -}; - -struct io_uring { - struct io_uring_sq sq; - struct io_uring_cq cq; - int ring_fd; -}; - -#ifdef __alpha__ -# ifndef __NR_io_uring_setup -# define __NR_io_uring_setup 535 -# endif -# ifndef __NR_io_uring_enter -# define __NR_io_uring_enter 536 -# endif -# ifndef __NR_io_uring_register -# define __NR_io_uring_register 537 -# endif -#else /* !__alpha__ */ -# ifndef __NR_io_uring_setup -# define __NR_io_uring_setup 425 -# endif -# ifndef __NR_io_uring_enter -# define __NR_io_uring_enter 426 -# endif -# ifndef __NR_io_uring_register -# define __NR_io_uring_register 427 -# endif -#endif - -#if defined(__x86_64) || defined(__i386__) -#define read_barrier() __asm__ __volatile__("":::"memory") -#define write_barrier() __asm__ __volatile__("":::"memory") -#else - -#define read_barrier() __sync_synchronize() -#define write_barrier() __sync_synchronize() -#endif - -static int io_uring_setup(unsigned int entries, struct io_uring_params *p) -{ - return syscall(__NR_io_uring_setup, entries, p); -} - -static int io_uring_enter(int fd, unsigned int to_submit, - unsigned int min_complete, - unsigned int flags, sigset_t *sig) -{ - return syscall(__NR_io_uring_enter, fd, to_submit, min_complete, - flags, sig, _NSIG / 8); -} - -static int io_uring_register_buffers(struct io_uring *ring, - const struct iovec *iovecs, - unsigned nr_iovecs) -{ - int ret; - - ret = syscall(__NR_io_uring_register, ring->ring_fd, - IORING_REGISTER_BUFFERS, iovecs, nr_iovecs); - return (ret < 0) ? -errno : ret; -} - -static int io_uring_mmap(int fd, struct io_uring_params *p, - struct io_uring_sq *sq, struct io_uring_cq *cq) -{ - size_t size; - void *ptr; - int ret; - - sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned); - ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING); - if (ptr == MAP_FAILED) - return -errno; - sq->khead = ptr + p->sq_off.head; - sq->ktail = ptr + p->sq_off.tail; - sq->kring_mask = ptr + p->sq_off.ring_mask; - sq->kring_entries = ptr + p->sq_off.ring_entries; - sq->kflags = ptr + p->sq_off.flags; - sq->kdropped = ptr + p->sq_off.dropped; - sq->array = ptr + p->sq_off.array; - - size = p->sq_entries * sizeof(struct io_uring_sqe); - sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES); - if (sq->sqes == MAP_FAILED) { - ret = -errno; -err: - munmap(sq->khead, sq->ring_sz); - return ret; - } - - cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe); - ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING); - if (ptr == MAP_FAILED) { - ret = -errno; - munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe)); - goto err; - } - cq->khead = ptr + p->cq_off.head; - cq->ktail = ptr + p->cq_off.tail; - cq->kring_mask = ptr + p->cq_off.ring_mask; - cq->kring_entries = ptr + p->cq_off.ring_entries; - cq->koverflow = ptr + p->cq_off.overflow; - cq->cqes = ptr + p->cq_off.cqes; - return 0; -} - -static int io_uring_queue_init(unsigned entries, struct io_uring *ring, - unsigned flags) -{ - struct io_uring_params p; - int fd, ret; - - memset(ring, 0, sizeof(*ring)); - memset(&p, 0, sizeof(p)); - p.flags = flags; - - fd = io_uring_setup(entries, &p); - if (fd < 0) - return fd; - ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq); - if (!ret) - ring->ring_fd = fd; - else - close(fd); - return ret; -} - -static int io_uring_submit(struct io_uring *ring) -{ - struct io_uring_sq *sq = &ring->sq; - const unsigned mask = *sq->kring_mask; - unsigned ktail, submitted, to_submit; - int ret; - - read_barrier(); - if (*sq->khead != *sq->ktail) { - submitted = *sq->kring_entries; - goto submit; - } - if (sq->sqe_head == sq->sqe_tail) - return 0; - - ktail = *sq->ktail; - to_submit = sq->sqe_tail - sq->sqe_head; - for (submitted = 0; submitted < to_submit; submitted++) { - read_barrier(); - sq->array[ktail++ & mask] = sq->sqe_head++ & mask; - } - if (!submitted) - return 0; - - if (*sq->ktail != ktail) { - write_barrier(); - *sq->ktail = ktail; - write_barrier(); - } -submit: - ret = io_uring_enter(ring->ring_fd, submitted, 0, - IORING_ENTER_GETEVENTS, NULL); - return ret < 0 ? -errno : ret; -} - -static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd, - const void *buf, size_t len, int flags) -{ - memset(sqe, 0, sizeof(*sqe)); - sqe->opcode = (__u8) IORING_OP_SEND; - sqe->fd = sockfd; - sqe->addr = (unsigned long) buf; - sqe->len = len; - sqe->msg_flags = (__u32) flags; -} - -static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd, - const void *buf, size_t len, int flags, - unsigned zc_flags) -{ - io_uring_prep_send(sqe, sockfd, buf, len, flags); - sqe->opcode = (__u8) IORING_OP_SEND_ZC; - sqe->ioprio = zc_flags; -} - -static struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring) -{ - struct io_uring_sq *sq = &ring->sq; - - if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries) - return NULL; - return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask]; -} - -static int io_uring_wait_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr) -{ - struct io_uring_cq *cq = &ring->cq; - const unsigned mask = *cq->kring_mask; - unsigned head = *cq->khead; - int ret; - - *cqe_ptr = NULL; - do { - read_barrier(); - if (head != *cq->ktail) { - *cqe_ptr = &cq->cqes[head & mask]; - break; - } - ret = io_uring_enter(ring->ring_fd, 0, 1, - IORING_ENTER_GETEVENTS, NULL); - if (ret < 0) - return -errno; - } while (1); - - return 0; -} - -static inline void io_uring_cqe_seen(struct io_uring *ring) -{ - *(&ring->cq)->khead += 1; - write_barrier(); -} - static unsigned long gettimeofday_ms(void) { struct timeval tv; From patchwork Mon Sep 4 16:25:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13374188 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D99F2CA53; Mon, 4 Sep 2023 16:25:47 +0000 (UTC) Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8C041980; Mon, 4 Sep 2023 09:25:41 -0700 (PDT) Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-99bf3f59905so249412666b.3; Mon, 04 Sep 2023 09:25:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693844740; x=1694449540; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wsGj/By1L5B4hMy2OQnJ6fNGd/zzZdrlVFpYroV1qv4=; b=JE0neR/+7jwAH5Gggc9dCrRPn1rUCcBQ5T9Qnz+YeU86pxkO4ZIXSsHrD1R/Nb44Pp CCikSpqdXo63g9j2GDgVvvGfemQKTDOze9v+AV3WOCtAgDwMkHJFkJnDrUb8zldMON3Z gbgX/NfjyNO54JlYivxzOjTYpUFLlAC6GG77oTLnkpGzyjQoKuPDAgBA1XhpVDC62otw JW9nloloiEDRGbBg2Pa0CCfI2st9Na5YAh9gKersREfxLZVI0UOdORxDPAIAZNlFOGJU jvHFFLoomcvbYiQhnDzf/+H/JzQ4AQ8tq5z7Q2ksAs1VLczfUHiKJWpxy/XbzJWCTh0v yoSg== X-Gm-Message-State: AOJu0Yz2YF/d/M1nrzodvz68KWtOmEgqWeWSTl0vEWmOaQfMfRSEwDvo f63MJV+taiZrhysKWQy6vRgWCMj7+a3nIw== X-Google-Smtp-Source: AGHT+IFfo76sWVv8zOKd/DWiU9bcymnJjY4Xcn4K6B71AxjNfnpZ9zBV1YlUDOHKgrT0OzO6tsam7w== X-Received: by 2002:a17:906:76c8:b0:99b:4668:865f with SMTP id q8-20020a17090676c800b0099b4668865fmr7688934ejn.10.1693844739936; Mon, 04 Sep 2023 09:25:39 -0700 (PDT) Received: from localhost (fwdproxy-cln-116.fbsv.net. [2a03:2880:31ff:74::face:b00c]) by smtp.gmail.com with ESMTPSA id f25-20020a170906495900b0099d9dee8108sm6447486ejt.149.2023.09.04.09.25.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Sep 2023 09:25:39 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, krisman@suse.de Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH v4 07/10] io_uring/cmd: return -EOPNOTSUPP if net is disabled Date: Mon, 4 Sep 2023 09:25:00 -0700 Message-Id: <20230904162504.1356068-8-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230904162504.1356068-1-leitao@debian.org> References: <20230904162504.1356068-1-leitao@debian.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Protect io_uring_cmd_sock() to be called if CONFIG_NET is not set. If network is not enabled, but io_uring is, then we want to return -EOPNOTSUPP for any possible socket operation. This is helpful because io_uring_cmd_sock() can now call functions that only exits if CONFIG_NET is enabled without having #ifdef CONFIG_NET inside the function itself. Signed-off-by: Breno Leitao --- io_uring/uring_cmd.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 60f843a357e0..6a91e1af7d05 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -167,6 +167,7 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, } EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed); +#if defined(CONFIG_NET) int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct socket *sock = cmd->file->private_data; @@ -192,4 +193,11 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) return -EOPNOTSUPP; } } +#else +int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) +{ + return -EOPNOTSUPP; +} +#endif + EXPORT_SYMBOL_GPL(io_uring_cmd_sock); From patchwork Mon Sep 4 16:25:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13374189 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CEE0AD2EC; Mon, 4 Sep 2023 16:25:55 +0000 (UTC) Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 308B31702; Mon, 4 Sep 2023 09:25:43 -0700 (PDT) Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-98377c5d53eso236833866b.0; Mon, 04 Sep 2023 09:25:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693844741; x=1694449541; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QGYqr/X7qoWEF8S4cg2Y5VhwvHoy/1iQUzz6K2rqzWE=; b=Zei0RopdVUIgfzDbRZyStEX3i6YLFswa080LeELOg9Uig15QRR4zi5sAlmPHra2HlO d7ftSCXBRBbDOL7GKvNZd/YETo3cnw2m73scNycjX0rL1PB0LmLQXaq1K36OOipKObJV MYGDSft3FQBvn/c1qK8dG+dmMAb132tzZW8C4n31OryTryZJDD/UXqZLceNaTyJ2E/p5 aRRzvPyszLHAuyQWBfQ6pRtsyXM3Nt18J+jBNZvA80v1W9Da1qfsDxIiLnFtVzvxQwdj ZxLNrTWPtb2CV5tQ28qPuhjmOln0g0WM3R0ujS536EsxcQt6AwC4L4MvkwxFtifaTkQ5 9M7Q== X-Gm-Message-State: AOJu0Yx2xqsnSh66TJlNs6aRBGCA9T83ELbI8UDQA2BkOcWA376Y9tKi 1DxRjDBl5Xc3+rhyvkzY6fo= X-Google-Smtp-Source: AGHT+IGMBRPqYPoiQIu9IJkYob544kMn3mCaj8Qzc4kFstpgwfFh0gEa3qEfW6zMVwR/kd843fg5+g== X-Received: by 2002:a17:906:5397:b0:993:d536:3cb7 with SMTP id g23-20020a170906539700b00993d5363cb7mr5762395ejo.11.1693844741376; Mon, 04 Sep 2023 09:25:41 -0700 (PDT) Received: from localhost (fwdproxy-cln-116.fbsv.net. [2a03:2880:31ff:74::face:b00c]) by smtp.gmail.com with ESMTPSA id t2-20020a1709064f0200b00992076f4a01sm6354066eju.190.2023.09.04.09.25.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Sep 2023 09:25:40 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, krisman@suse.de Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH v4 08/10] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Date: Mon, 4 Sep 2023 09:25:01 -0700 Message-Id: <20230904162504.1356068-9-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230904162504.1356068-1-leitao@debian.org> References: <20230904162504.1356068-1-leitao@debian.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure, where a sockptr_t is either userspace or kernel space, and handled as such. Differently from the getsockopt(2), the optlen field is not a userspace pointers. In getsockopt(2), userspace provides optlen pointer, which is overwritten by the kernel. In this implementation, userspace passes a u32, and the new value is returned in cqe->res. I.e., optlen is not a pointer. Important to say that userspace needs to keep the pointer alive until the CQE is completed. Signed-off-by: Breno Leitao Reviewed-by: Gabriel Krisman Bertazi --- include/uapi/linux/io_uring.h | 7 +++++++ io_uring/uring_cmd.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 8e61f8b7c2ce..29efa02a4dcb 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -43,6 +43,10 @@ struct io_uring_sqe { union { __u64 addr; /* pointer to buffer or iovecs */ __u64 splice_off_in; + struct { + __u32 level; + __u32 optname; + }; }; __u32 len; /* buffer size or number of iovecs */ union { @@ -79,6 +83,7 @@ struct io_uring_sqe { union { __s32 splice_fd_in; __u32 file_index; + __u32 optlen; struct { __u16 addr_len; __u16 __pad3[1]; @@ -89,6 +94,7 @@ struct io_uring_sqe { __u64 addr3; __u64 __pad2[1]; }; + __u64 optval; /* * If the ring is initialized with IORING_SETUP_SQE128, then * this field is used for 80 bytes of arbitrary command data @@ -734,6 +740,7 @@ struct io_uring_recvmsg_out { enum { SOCKET_URING_OP_SIOCINQ = 0, SOCKET_URING_OP_SIOCOUTQ, + SOCKET_URING_OP_GETSOCKOPT, }; #ifdef __cplusplus diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 6a91e1af7d05..c373e05ba9ce 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -167,6 +167,32 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, } EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed); +INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level, + int optname)); +static inline int io_uring_cmd_getsockopt(struct socket *sock, + struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval)); + bool compat = !!(issue_flags & IO_URING_F_COMPAT); + int optlen = READ_ONCE(cmd->sqe->optlen); + int optname = READ_ONCE(cmd->sqe->optname); + int level = READ_ONCE(cmd->sqe->level); + int err; + + if (level != SOL_SOCKET) + return -EOPNOTSUPP; + + err = do_sock_getsockopt(sock, compat, level, optname, + USER_SOCKPTR(optval), + KERNEL_SOCKPTR(&optlen)); + if (err) + return err; + + /* On success, return optlen */ + return optlen; +} + #if defined(CONFIG_NET) int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) { @@ -189,6 +215,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) if (ret) return ret; return arg; + case SOCKET_URING_OP_GETSOCKOPT: + return io_uring_cmd_getsockopt(sock, cmd, issue_flags); default: return -EOPNOTSUPP; } From patchwork Mon Sep 4 16:25:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13374190 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C2FA9D2EC; Mon, 4 Sep 2023 16:26:05 +0000 (UTC) Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com [209.85.208.178]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B69B91717; Mon, 4 Sep 2023 09:25:44 -0700 (PDT) Received: by mail-lj1-f178.google.com with SMTP id 38308e7fff4ca-2bcc4347d2dso23589711fa.0; Mon, 04 Sep 2023 09:25:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693844743; x=1694449543; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AGn7P2kniOoJAPFpEjpeFP2KmuJaT92ea2flbYwh/Ag=; b=VU/Z4xW4SDgj4/dTSEAYDzGbYwuQW0RC45f+5O7ePjeEJiLxWYIEolMgJe+gEHc6My XIpRuwmPwurS9v3aBLp29eJw1iD1Y1h3uqyzor+bEzxXdvSWR08CuJdfZ6fSYPTtflZ1 wpYrfQWlHZDh6GXgk7CEepeL5y2wBVgI0+O5a7wx3fKz9J6bkp791W/MKFkqgza6E0eP OXUQjbW0WA0Tc3LoMoWOUuypBhxPhmX0KA8GfQiBVuYAx7/ux2Xadc6uWXrsML1TZCaY FND3IURr9XObGE2ARzp+Zj6SMIbyc3GT1hfRDdky97yoEkdQFnB4DZgmL08OBe/lJL2H s3Zg== X-Gm-Message-State: AOJu0YxkxVyDKxK0SzJD/LLIdskcn8IN1gsuEOsE5qhd2PmGgfXy13+A KXn26k0o9oaitYKrHhKqSRk= X-Google-Smtp-Source: AGHT+IEOWdzK7pcauKxr2iSsZpnNIHeyYsYsBZ6mgLjFdwd6CFHkNerSAdwxp2WKFlDa7owY8oEGpg== X-Received: by 2002:a2e:9c97:0:b0:2bc:b75e:b8b with SMTP id x23-20020a2e9c97000000b002bcb75e0b8bmr7617249lji.38.1693844742846; Mon, 04 Sep 2023 09:25:42 -0700 (PDT) Received: from localhost (fwdproxy-cln-022.fbsv.net. [2a03:2880:31ff:16::face:b00c]) by smtp.gmail.com with ESMTPSA id gj17-20020a170906e11100b0098921e1b064sm6383648ejb.181.2023.09.04.09.25.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Sep 2023 09:25:42 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, krisman@suse.de Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH v4 09/10] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Date: Mon, 4 Sep 2023 09:25:02 -0700 Message-Id: <20230904162504.1356068-10-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230904162504.1356068-1-leitao@debian.org> References: <20230904162504.1356068-1-leitao@debian.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Add initial support for SOCKET_URING_OP_SETSOCKOPT. This new command is similar to setsockopt. This implementation leverages the function do_sock_setsockopt(), which is shared with the setsockopt() system call path. Important to say that userspace needs to keep the pointer's memory alive until the operation is completed. I.e, the memory could not be deallocated before the CQE is returned to userspace. Signed-off-by: Breno Leitao Reviewed-by: Gabriel Krisman Bertazi --- include/uapi/linux/io_uring.h | 1 + io_uring/uring_cmd.c | 17 +++++++++++++++++ 2 files changed, 18 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 29efa02a4dcb..3b443da353ba 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -741,6 +741,7 @@ enum { SOCKET_URING_OP_SIOCINQ = 0, SOCKET_URING_OP_SIOCOUTQ, SOCKET_URING_OP_GETSOCKOPT, + SOCKET_URING_OP_SETSOCKOPT, }; #ifdef __cplusplus diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index c373e05ba9ce..bec4730fb208 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -193,6 +193,21 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock, return optlen; } +static inline int io_uring_cmd_setsockopt(struct socket *sock, + struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval)); + bool compat = !!(issue_flags & IO_URING_F_COMPAT); + int optname = READ_ONCE(cmd->sqe->optname); + sockptr_t optval_s = USER_SOCKPTR(optval); + int optlen = READ_ONCE(cmd->sqe->optlen); + int level = READ_ONCE(cmd->sqe->level); + + return do_sock_setsockopt(sock, compat, level, optname, optval_s, + optlen); +} + #if defined(CONFIG_NET) int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) { @@ -217,6 +232,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) return arg; case SOCKET_URING_OP_GETSOCKOPT: return io_uring_cmd_getsockopt(sock, cmd, issue_flags); + case SOCKET_URING_OP_SETSOCKOPT: + return io_uring_cmd_setsockopt(sock, cmd, issue_flags); default: return -EOPNOTSUPP; } From patchwork Mon Sep 4 16:25:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13374191 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8EAC7D2EC; Mon, 4 Sep 2023 16:26:25 +0000 (UTC) Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F2EDCC5; Mon, 4 Sep 2023 09:25:56 -0700 (PDT) Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-99bed101b70so251809466b.3; Mon, 04 Sep 2023 09:25:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693844746; x=1694449546; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/injwFwPCEHXUwspfhEaGXaHWVZoMocD4z29NwXIPbw=; b=P05PDJzAjiJ1cWWc6lqRJ40L33pvvs7hR6H4OGrK63zBZQdiBcQ1fraI+yi2SKPOyM 579gMA6+7lcS4aMfQTU5G+LzBezTU+ZdqH44WIf9C8EEG2VTijDxpWKvLi1Px+gk6cmJ uBc3lYRCdSajPH+BhaORDJqDL+ap9WNRVSRBZkg0Xk/eaKUrJxQQvbuvj7mojMc8KCgC 9MjIXa6lHXnvGEkYWXwazgQwkQRhxYb2NXLECmKRMJSjU3MFEa7PDM6jM1czs0CsB4C0 LvqlcC/oa3sJ7y9G5n8rI5avUlyqns/4On0qehm5wRAW3dxF917buDQaDF+EJVjROf/3 53ww== X-Gm-Message-State: AOJu0YwA4pTJrBeZJ7X2u8REc5ZpLRICSZbHFzse1q+i/VyptFfHUqHp 2a6XnWYTrIVZ1dsNo3nEw88Mq+m5Hf3tyg== X-Google-Smtp-Source: AGHT+IE0M/etACOA3TJqXPFfT/nGxjhKhbwgCaaa0LEpaA3e4QJIoBD/G/DbeQfgGkp/N8WZFLjhZA== X-Received: by 2002:a17:906:18c:b0:9a1:cdf1:ba3 with SMTP id 12-20020a170906018c00b009a1cdf10ba3mr9618375ejb.27.1693844745972; Mon, 04 Sep 2023 09:25:45 -0700 (PDT) Received: from localhost (fwdproxy-cln-005.fbsv.net. [2a03:2880:31ff:5::face:b00c]) by smtp.gmail.com with ESMTPSA id l11-20020a170906938b00b0098de7d28c34sm6382983ejx.193.2023.09.04.09.25.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Sep 2023 09:25:45 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, krisman@suse.de, Andrii Nakryiko , Mykola Lysenko , Alexei Starovoitov , Daniel Borkmann , Song Liu , Yonghong Song , John Fastabend , KP Singh , Hao Luo , Jiri Olsa , Shuah Khan Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, =?utf-8?q?Dan?= =?utf-8?q?iel_M=C3=BCller?= , Wang Yufen , linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Subject: [PATCH v4 10/10] selftests/bpf/sockopt: Add io_uring support Date: Mon, 4 Sep 2023 09:25:03 -0700 Message-Id: <20230904162504.1356068-11-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230904162504.1356068-1-leitao@debian.org> References: <20230904162504.1356068-1-leitao@debian.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Expand the sockopt test to use also check for io_uring {g,s}etsockopt commands operations. This patch starts by marking each test if they support io_uring support or not. Right now, io_uring cmd getsockopt() has a limitation of only accepting level == SOL_SOCKET, otherwise it returns -EOPNOTSUPP. Since there aren't any test exercising getsockopt(level == SOL_SOCKET), this patch changes two tests to use level == SOL_SOCKET, they are "getsockopt: support smaller ctx->optlen" and "getsockopt: read ctx->optlen". There is no limitation for the setsockopt() part. Later, each test runs using regular {g,s}etsockopt systemcalls, and, if liburing is supported, execute the same test (again), but calling liburing {g,s}setsockopt commands. This patch also changes the level of two tests to use SOL_SOCKET for the following two tests. This is going to help to exercise the io_uring subsystem: * getsockopt: read ctx->optlen * getsockopt: support smaller ctx->optlen Signed-off-by: Breno Leitao --- .../selftests/bpf/prog_tests/sockopt.c | 113 +++++++++++++++++- 1 file changed, 107 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt.c b/tools/testing/selftests/bpf/prog_tests/sockopt.c index 9e6a5e3ed4de..9b1f7f0b1f7e 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockopt.c +++ b/tools/testing/selftests/bpf/prog_tests/sockopt.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include "cgroup_helpers.h" static char bpf_log_buf[4096]; @@ -38,6 +39,7 @@ static struct sockopt_test { socklen_t get_optlen_ret; enum sockopt_test_error error; + bool io_uring_support; } tests[] = { /* ==================== getsockopt ==================== */ @@ -251,7 +253,9 @@ static struct sockopt_test { .attach_type = BPF_CGROUP_GETSOCKOPT, .expected_attach_type = BPF_CGROUP_GETSOCKOPT, + .get_level = SOL_SOCKET, .get_optlen = 64, + .io_uring_support = true, }, { .descr = "getsockopt: deny bigger ctx->optlen", @@ -276,6 +280,7 @@ static struct sockopt_test { .get_optlen = 64, .error = EFAULT_GETSOCKOPT, + .io_uring_support = true, }, { .descr = "getsockopt: ignore >PAGE_SIZE optlen", @@ -318,6 +323,7 @@ static struct sockopt_test { .get_optval = {}, /* the changes are ignored */ .get_optlen = PAGE_SIZE + 1, .error = EOPNOTSUPP_GETSOCKOPT, + .io_uring_support = true, }, { .descr = "getsockopt: support smaller ctx->optlen", @@ -337,8 +343,10 @@ static struct sockopt_test { .attach_type = BPF_CGROUP_GETSOCKOPT, .expected_attach_type = BPF_CGROUP_GETSOCKOPT, + .get_level = SOL_SOCKET, .get_optlen = 64, .get_optlen_ret = 32, + .io_uring_support = true, }, { .descr = "getsockopt: deny writing to ctx->optval", @@ -518,6 +526,7 @@ static struct sockopt_test { .set_level = 123, .set_optlen = 1, + .io_uring_support = true, }, { .descr = "setsockopt: allow changing ctx->level", @@ -572,6 +581,7 @@ static struct sockopt_test { .set_optname = 123, .set_optlen = 1, + .io_uring_support = true, }, { .descr = "setsockopt: allow changing ctx->optname", @@ -624,6 +634,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_SETSOCKOPT, .set_optlen = 64, + .io_uring_support = true, }, { .descr = "setsockopt: ctx->optlen == -1 is ok", @@ -640,6 +651,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_SETSOCKOPT, .set_optlen = 64, + .io_uring_support = true, }, { .descr = "setsockopt: deny ctx->optlen < 0 (except -1)", @@ -658,6 +670,7 @@ static struct sockopt_test { .set_optlen = 4, .error = EFAULT_SETSOCKOPT, + .io_uring_support = true, }, { .descr = "setsockopt: deny ctx->optlen > input optlen", @@ -675,6 +688,7 @@ static struct sockopt_test { .set_optlen = 64, .error = EFAULT_SETSOCKOPT, + .io_uring_support = true, }, { .descr = "setsockopt: ignore >PAGE_SIZE optlen", @@ -940,7 +954,89 @@ static int load_prog(const struct bpf_insn *insns, return fd; } -static int run_test(int cgroup_fd, struct sockopt_test *test) +/* Core function that handles io_uring ring initialization, + * sending SQE with sockopt command and waiting for the CQE. + */ +static int uring_sockopt(int op, int fd, int level, int optname, + const void *optval, socklen_t optlen) +{ + struct io_uring_cqe *cqe; + struct io_uring_sqe *sqe; + struct io_uring ring; + int err; + + err = io_uring_queue_init(1, &ring, 0); + if (!ASSERT_OK(err, "Failed to initialize io_uring ring")) + return err; + + sqe = io_uring_get_sqe(&ring); + if (!ASSERT_NEQ(sqe, NULL, "Failed to get an SQE")) { + err = -1; + goto fail; + } + + io_uring_prep_cmd(sqe, op, fd, level, optname, optval, optlen); + + err = io_uring_submit(&ring); + if (!ASSERT_EQ(err, 1, "Failed to submit SQE")) + goto fail; + + err = io_uring_wait_cqe(&ring, &cqe); + if (!ASSERT_OK(err, "Failed to wait for CQE")) + goto fail; + + err = cqe->res; + +fail: + io_uring_queue_exit(&ring); + + return err; +} + +static int uring_setsockopt(int fd, int level, int optname, const void *optval, + socklen_t optlen) +{ + return uring_sockopt(SOCKET_URING_OP_SETSOCKOPT, fd, level, optname, + optval, optlen); +} + +static int uring_getsockopt(int fd, int level, int optname, void *optval, + socklen_t *optlen) +{ + int ret = uring_sockopt(SOCKET_URING_OP_GETSOCKOPT, fd, level, optname, + optval, *optlen); + if (ret < 0) + return ret; + + /* Populate optlen back to be compatible with systemcall interface, + * and simplify the test. + */ + *optlen = ret; + + return 0; +} + +/* Execute the setsocktopt operation */ +static int call_setsockopt(bool use_io_uring, int fd, int level, int optname, + const void *optval, socklen_t optlen) +{ + if (use_io_uring) + return uring_setsockopt(fd, level, optname, optval, optlen); + + return setsockopt(fd, level, optname, optval, optlen); +} + +/* Execute the getsocktopt operation */ +static int call_getsockopt(bool use_io_uring, int fd, int level, int optname, + void *optval, socklen_t *optlen) +{ + if (use_io_uring) + return uring_getsockopt(fd, level, optname, optval, optlen); + + return getsockopt(fd, level, optname, optval, optlen); +} + +static int run_test(int cgroup_fd, struct sockopt_test *test, bool use_io_uring) { int sock_fd, err, prog_fd; void *optval = NULL; @@ -980,8 +1076,9 @@ static int run_test(int cgroup_fd, struct sockopt_test *test) test->set_optlen = num_pages * sysconf(_SC_PAGESIZE) + remainder; } - err = setsockopt(sock_fd, test->set_level, test->set_optname, - test->set_optval, test->set_optlen); + err = call_setsockopt(use_io_uring, sock_fd, test->set_level, + test->set_optname, test->set_optval, + test->set_optlen); if (err) { if (errno == EPERM && test->error == EPERM_SETSOCKOPT) goto close_sock_fd; @@ -1008,8 +1105,8 @@ static int run_test(int cgroup_fd, struct sockopt_test *test) socklen_t expected_get_optlen = test->get_optlen_ret ?: test->get_optlen; - err = getsockopt(sock_fd, test->get_level, test->get_optname, - optval, &optlen); + err = call_getsockopt(use_io_uring, sock_fd, test->get_level, + test->get_optname, optval, &optlen); if (err) { if (errno == EOPNOTSUPP && test->error == EOPNOTSUPP_GETSOCKOPT) goto free_optval; @@ -1063,7 +1160,11 @@ void test_sockopt(void) if (!test__start_subtest(tests[i].descr)) continue; - ASSERT_OK(run_test(cgroup_fd, &tests[i]), tests[i].descr); + ASSERT_OK(run_test(cgroup_fd, &tests[i], false), + tests[i].descr); + if (tests[i].io_uring_support) + ASSERT_OK(run_test(cgroup_fd, &tests[i], true), + tests[i].descr); } close(cgroup_fd);