From patchwork Mon Oct 16 13:47:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13423454 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3D83CDB474 for ; Mon, 16 Oct 2023 14:01:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232003AbjJPOBn (ORCPT ); Mon, 16 Oct 2023 10:01:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232300AbjJPOBj (ORCPT ); Mon, 16 Oct 2023 10:01:39 -0400 Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F9DFD9; Mon, 16 Oct 2023 07:01:36 -0700 (PDT) Received: by mail-ej1-f50.google.com with SMTP id a640c23a62f3a-99c3d3c3db9so728968366b.3; Mon, 16 Oct 2023 07:01:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697464895; x=1698069695; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BGpYuQ/HvFRXAQLjvk1eNuhCfrp6z3QssX2Bjc4Qs8s=; b=YhQe2nRPqpsxLol/M3CzgBKcVPgQ7EJ2wSxdhO/e4WjSVaAmP8MbisMcE7w/OiAiug PIYEYbR7xvvr4odMwiDkZFRlCYUDeo1KJWqQ6FNx5/9vRsGOSDT8OC0UKUp0nvhj38kr DL9LNSfXnlHYoajob5x39//kM+c1qaexIRkWkpL7G6h/RZEDVANFiB1OIc/GiMPB/WsH OxoEIRinWXyD3+Q2kAytsUJsGWqY7wVs0zgjNdPoJOstXlzLLdgIslvYXb3N993NMqqf UY11rux/n4quVotVXymylU2X9MY5f0Gs2ftYVWg3ftTDQ7PLDsVTgeYSkLTXVmZ/BwCE lO4g== X-Gm-Message-State: AOJu0YzXYz/YvpDpcGb5KY2xqyMzz3WzuSHRUo5z3L9pdNuhruaZCiEy Jrjvb9yUNeRjALWjuzYLTH9qDifw3BfCbA== X-Google-Smtp-Source: AGHT+IENToe/pC3Zq3wZAuelAOKNiCnB7z+pnfbfOxPWte7jDbaNexD1RWad2EsnGhYmBqUe4EOzdA== X-Received: by 2002:a17:907:7ea0:b0:9a5:9038:b1e7 with SMTP id qb32-20020a1709077ea000b009a59038b1e7mr37648574ejc.36.1697464894500; Mon, 16 Oct 2023 07:01:34 -0700 (PDT) Received: from localhost (fwdproxy-cln-117.fbsv.net. [2a03:2880:31ff:75::face:b00c]) by smtp.gmail.com with ESMTPSA id l16-20020a1709066b9000b009b65b2be80bsm4085535ejr.76.2023.10.16.07.01.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 07:01:34 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, kuba@kernel.org, pabeni@redhat.com, martin.lau@linux.dev, krisman@suse.de, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Song Liu , Yonghong Song , John Fastabend , KP Singh , Hao Luo , Jiri Olsa , "David S. Miller" , Eric Dumazet Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org Subject: [PATCH v7 01/11] bpf: Add sockptr support for getsockopt Date: Mon, 16 Oct 2023 06:47:39 -0700 Message-Id: <20231016134750.1381153-2-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231016134750.1381153-1-leitao@debian.org> References: <20231016134750.1381153-1-leitao@debian.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org The whole network stack uses sockptr, and while it doesn't move to something more modern, let's use sockptr in getsockptr BPF hooks, so, it could be used by other callers. The main motivation for this change is to use it in the io_uring {g,s}etsockopt(), which will use a userspace pointer for *optval, but, a kernel value for optlen. Link: https://lore.kernel.org/all/ZSArfLaaGcfd8LH8@gmail.com/ Signed-off-by: Breno Leitao Acked-by: Martin KaFai Lau --- include/linux/bpf-cgroup.h | 5 +++-- kernel/bpf/cgroup.c | 20 +++++++++++--------- net/socket.c | 5 +++-- 3 files changed, 17 insertions(+), 13 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 98b8cea904fe..7b55844f6ba7 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -145,9 +145,10 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head, int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level, int *optname, char __user *optval, int *optlen, char **kernel_optval); + int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, - int optname, char __user *optval, - int __user *optlen, int max_optlen, + int optname, sockptr_t optval, + sockptr_t optlen, int max_optlen, int retval); int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level, diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 74ad2215e1ba..97745f67ac15 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1890,8 +1890,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, } int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, - int optname, char __user *optval, - int __user *optlen, int max_optlen, + int optname, sockptr_t optval, + sockptr_t optlen, int max_optlen, int retval) { struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); @@ -1918,8 +1918,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, * one that kernel returned as well to let * BPF programs inspect the value. */ - - if (get_user(ctx.optlen, optlen)) { + if (copy_from_sockptr(&ctx.optlen, optlen, + sizeof(ctx.optlen))) { ret = -EFAULT; goto out; } @@ -1930,8 +1930,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, } orig_optlen = ctx.optlen; - if (copy_from_user(ctx.optval, optval, - min(ctx.optlen, max_optlen)) != 0) { + if (copy_from_sockptr(ctx.optval, optval, + min(ctx.optlen, max_optlen))) { ret = -EFAULT; goto out; } @@ -1945,7 +1945,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, if (ret < 0) goto out; - if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) { + if (!sockptr_is_null(optval) && + (ctx.optlen > max_optlen || ctx.optlen < 0)) { if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) { pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n", ctx.optlen, max_optlen); @@ -1957,11 +1958,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, } if (ctx.optlen != 0) { - if (optval && copy_to_user(optval, ctx.optval, ctx.optlen)) { + if (!sockptr_is_null(optval) && + copy_to_sockptr(optval, ctx.optval, ctx.optlen)) { ret = -EFAULT; goto out; } - if (put_user(ctx.optlen, optlen)) { + if (copy_to_sockptr(optlen, &ctx.optlen, sizeof(ctx.optlen))) { ret = -EFAULT; goto out; } diff --git a/net/socket.c b/net/socket.c index 5740475e084c..6b47dd499218 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2373,8 +2373,9 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval, if (!in_compat_syscall()) err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname, - optval, optlen, max_optlen, - err); + USER_SOCKPTR(optval), + USER_SOCKPTR(optlen), + max_optlen, err); out_put: fput_light(sock->file, fput_needed); return err; From patchwork Mon Oct 16 13:47:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13423456 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A0D0CDB474 for ; Mon, 16 Oct 2023 14:01:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233560AbjJPOBp (ORCPT ); Mon, 16 Oct 2023 10:01:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233542AbjJPOBm (ORCPT ); Mon, 16 Oct 2023 10:01:42 -0400 Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52316EE; Mon, 16 Oct 2023 07:01:38 -0700 (PDT) Received: by mail-ej1-f45.google.com with SMTP id a640c23a62f3a-9be02fcf268so411182366b.3; Mon, 16 Oct 2023 07:01:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697464897; x=1698069697; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WWPtWPFuY/M4SoBkCUSlFAjHNbqqRZMGeiXE9WRKkYQ=; b=P/DwvOQEQve6ZpxmuubzwgxbR245NuITfohSUZQePM7oqRKD+SMiIXg9/rix7rZZBE PkFkTEA2SoZWnj1R5Q66k2UB/to77Fpa4AjQdbx1kFJO+F+PExKwfcbBWcS2AQ6t/gan pPJDUCyViIDodjXCi7ELFTmlPr75NiP7lGYyzIA/yj9L96f+UCvt5FKhWEn3TejEJtM5 MBV9lu19RFtcKMUWxt5Cn/FBW0dUC6iEnm9BAvs4PvmBxjnXsK6eDym9dzjnFCfaKTPw iJNtTCxP6164tfq0m5k+37hJTUJFgZK6T1Bs0EpKqGl7y2axFIyUtPnD/JSvg0yvHy1M o40Q== X-Gm-Message-State: AOJu0YzB10EPkb1Z2XZTAOj7p1L72SihgK4nSigH/a928Rwkk4ac23Oh 8SQ9Dyv8NjINEkbnXmzOvRsIn20Zwppxaw== X-Google-Smtp-Source: AGHT+IFHwgMnqwu+7mLuJU+UPYetaVODLxb6FdjWiU8/9L+Gcywfc8TRnfR/y135aYRThNIO9vMR4w== X-Received: by 2002:a17:907:3f93:b0:9be:84c1:447e with SMTP id hr19-20020a1709073f9300b009be84c1447emr7074691ejc.41.1697464896425; Mon, 16 Oct 2023 07:01:36 -0700 (PDT) Received: from localhost (fwdproxy-cln-014.fbsv.net. [2a03:2880:31ff:e::face:b00c]) by smtp.gmail.com with ESMTPSA id v2-20020a1709062f0200b009829d2e892csm4197386eji.15.2023.10.16.07.01.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 07:01:36 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, kuba@kernel.org, pabeni@redhat.com, martin.lau@linux.dev, krisman@suse.de, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Song Liu , Yonghong Song , John Fastabend , KP Singh , Hao Luo , Jiri Olsa , "David S. Miller" , Eric Dumazet Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org Subject: [PATCH v7 02/11] bpf: Add sockptr support for setsockopt Date: Mon, 16 Oct 2023 06:47:40 -0700 Message-Id: <20231016134750.1381153-3-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231016134750.1381153-1-leitao@debian.org> References: <20231016134750.1381153-1-leitao@debian.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org The whole network stack uses sockptr, and while it doesn't move to something more modern, let's use sockptr in setsockptr BPF hooks, so, it could be used by other callers. The main motivation for this change is to use it in the io_uring {g,s}etsockopt(), which will use a userspace pointer for *optval, but, a kernel value for optlen. Link: https://lore.kernel.org/all/ZSArfLaaGcfd8LH8@gmail.com/ Signed-off-by: Breno Leitao Acked-by: Martin KaFai Lau --- include/linux/bpf-cgroup.h | 2 +- kernel/bpf/cgroup.c | 5 +++-- net/socket.c | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 7b55844f6ba7..2912dce9144e 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -143,7 +143,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head, enum cgroup_bpf_attach_type atype); int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level, - int *optname, char __user *optval, + int *optname, sockptr_t optval, int *optlen, char **kernel_optval); int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 97745f67ac15..491d20038cbe 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1800,7 +1800,7 @@ static bool sockopt_buf_allocated(struct bpf_sockopt_kern *ctx, } int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, - int *optname, char __user *optval, + int *optname, sockptr_t optval, int *optlen, char **kernel_optval) { struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); @@ -1823,7 +1823,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, ctx.optlen = *optlen; - if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) { + if (copy_from_sockptr(ctx.optval, optval, + min(*optlen, max_optlen))) { ret = -EFAULT; goto out; } diff --git a/net/socket.c b/net/socket.c index 6b47dd499218..28d3eb339514 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2305,7 +2305,7 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, if (!in_compat_syscall()) err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname, - user_optval, &optlen, + optval, &optlen, &kernel_optval); if (err < 0) goto out_put; From patchwork Mon Oct 16 13:47:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13423455 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11789CDB482 for ; Mon, 16 Oct 2023 14:01:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229784AbjJPOBo (ORCPT ); Mon, 16 Oct 2023 10:01:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233496AbjJPOBl (ORCPT ); Mon, 16 Oct 2023 10:01:41 -0400 Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7BFD1F2; Mon, 16 Oct 2023 07:01:39 -0700 (PDT) Received: by mail-ej1-f50.google.com with SMTP id a640c23a62f3a-9adca291f99so709066666b.2; Mon, 16 Oct 2023 07:01:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697464898; x=1698069698; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SQzK26J7czJDylxtxHZHIwCv9ElLol5pJu5kvBMM+Nc=; b=OooOgbKnmj5/iXvkRaewpp3iUs/c7waHJn66tHBcwJsGMDh9GtkyFImWGLgJyGlPLY BUIfJLEcyt9m3obuZ6En3dyf1ebjmFj7+BPST0sU47Qkt21hndYbIL2KHWwzuFlksg1l rSyEDoWtMINrfXJdlZ6rZeppzBUsLLN3IGjzaz0LdbsxBrW4rcv0uRISXpaUpYr+LQ3c VbipQiH0W2hxhIPSCj+k18BFgfWK68iOONjCxPSBWXVs9O0382QBu32llYkaktO7ecaz gWQ31f7ACfDXqlEFwqe4RW5y6det7t3j/lHX/UFvSET2Gvc/WLjK3J9DqX+a8X4SfWS1 +P3g== X-Gm-Message-State: AOJu0YyT6W4QzGTaDmIBQJTk21hdkM2N0JH1U6vsBTSq8+XDIre0hQs+ zcgLTvxp8aBiDrJ1N+ReD3w= X-Google-Smtp-Source: AGHT+IHe5Q+UzhZXa0SQryazPxzDrcN0ZkJgMojEXPmzisHt8+a8LsG17q9bkIEWJQHq6L6jUQFrCw== X-Received: by 2002:a17:907:6e8a:b0:9be:2963:5671 with SMTP id sh10-20020a1709076e8a00b009be29635671mr7882396ejc.69.1697464897888; Mon, 16 Oct 2023 07:01:37 -0700 (PDT) Received: from localhost (fwdproxy-cln-009.fbsv.net. [2a03:2880:31ff:9::face:b00c]) by smtp.gmail.com with ESMTPSA id l16-20020a1709066b9000b009b65b2be80bsm4085616ejr.76.2023.10.16.07.01.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 07:01:37 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, kuba@kernel.org, pabeni@redhat.com, martin.lau@linux.dev, krisman@suse.de, "David S. Miller" , Eric Dumazet Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, Willem de Bruijn Subject: [PATCH v7 03/11] net/socket: Break down __sys_setsockopt Date: Mon, 16 Oct 2023 06:47:41 -0700 Message-Id: <20231016134750.1381153-4-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231016134750.1381153-1-leitao@debian.org> References: <20231016134750.1381153-1-leitao@debian.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Split __sys_setsockopt() into two functions by removing the core logic into a sub-function (do_sock_setsockopt()). This will avoid code duplication when doing the same operation in other callers, for instance. do_sock_setsockopt() will be called by io_uring setsockopt() command operation in the following patch. Signed-off-by: Breno Leitao Reviewed-by: Willem de Bruijn Acked-by: Jakub Kicinski Acked-by: Martin KaFai Lau --- include/net/sock.h | 2 ++ net/socket.c | 39 +++++++++++++++++++++++++-------------- 2 files changed, 27 insertions(+), 14 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 242590308d64..00103e3143c4 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1864,6 +1864,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int sock_setsockopt(struct socket *sock, int level, int op, sockptr_t optval, unsigned int optlen); +int do_sock_setsockopt(struct socket *sock, bool compat, int level, + int optname, sockptr_t optval, int optlen); int sk_getsockopt(struct sock *sk, int level, int optname, sockptr_t optval, sockptr_t optlen); diff --git a/net/socket.c b/net/socket.c index 28d3eb339514..0087f8c071e7 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2279,31 +2279,21 @@ static bool sock_use_custom_sol_socket(const struct socket *sock) return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags); } -/* - * Set a socket option. Because we don't know the option lengths we have - * to pass the user mode parameter for the protocols to sort out. - */ -int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, - int optlen) +int do_sock_setsockopt(struct socket *sock, bool compat, int level, + int optname, sockptr_t optval, int optlen) { - sockptr_t optval = USER_SOCKPTR(user_optval); const struct proto_ops *ops; char *kernel_optval = NULL; - int err, fput_needed; - struct socket *sock; + int err; if (optlen < 0) return -EINVAL; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - return err; - err = security_socket_setsockopt(sock, level, optname); if (err) goto out_put; - if (!in_compat_syscall()) + if (!compat) err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname, optval, &optlen, &kernel_optval); @@ -2326,6 +2316,27 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, optlen); kfree(kernel_optval); out_put: + return err; +} +EXPORT_SYMBOL(do_sock_setsockopt); + +/* Set a socket option. Because we don't know the option lengths we have + * to pass the user mode parameter for the protocols to sort out. + */ +int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, + int optlen) +{ + sockptr_t optval = USER_SOCKPTR(user_optval); + bool compat = in_compat_syscall(); + int err, fput_needed; + struct socket *sock; + + sock = sockfd_lookup_light(fd, &err, &fput_needed); + if (!sock) + return err; + + err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen); + fput_light(sock->file, fput_needed); return err; } From patchwork Mon Oct 16 13:47:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13423458 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6ADFFCDB483 for ; Mon, 16 Oct 2023 14:01:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233440AbjJPOBr (ORCPT ); Mon, 16 Oct 2023 10:01:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233530AbjJPOBp (ORCPT ); Mon, 16 Oct 2023 10:01:45 -0400 Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8577F9C; Mon, 16 Oct 2023 07:01:43 -0700 (PDT) Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-9adb9fa7200so939372966b.0; Mon, 16 Oct 2023 07:01:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697464902; x=1698069702; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T6jABNP5fatbdRkQ6crb5lOkIMwJ1jpThfXk6besz34=; b=svSPev20qqFYc6cxgBdXbsM146GrIMKnDFLGTIIXScZ378s8NiVXMknTvoxi3qMzv7 gzBrpBq6pdwALNBc1sLUhzY7WZfiTissGFOKnvYiv6+U/4F+PVdz5Wae4Q85rg4ebPCP snvt16e5tfyRO07ccug2rTnq0jtoZ6ezcQGhYQwu1Rv/fhqTdQROEffE4x2ds/lDDa99 NtHx58FR2D+rJN/sW8pGMJIGnXbvWw1vw9a9j4epT9Q0eoEiU2Ree4v50lRrfFz5JDVh 2CnxyNxjGqI5Yq1JJSN+2ePBESMIk1aZplLxwrbqcmGh4GIdSSc5/tSNRRaL+cAN8g7m Q90w== X-Gm-Message-State: AOJu0YzBKYU1+TEBeknCXZURN94IGLbiwXAvnokUw8V+GJ8i44wUrPLK v6orHw2QQcwKfdw8oRQ6RnM= X-Google-Smtp-Source: AGHT+IHUu/g9ym5TxgIGSDFjs7IALdnvoQvEUE8XDlq1Xf3v/iRN5OBKz0QrdD/SLdTvKjZJbWnARQ== X-Received: by 2002:a17:906:3fd1:b0:9bf:952d:a8ad with SMTP id k17-20020a1709063fd100b009bf952da8admr3081678ejj.5.1697464901665; Mon, 16 Oct 2023 07:01:41 -0700 (PDT) Received: from localhost (fwdproxy-cln-013.fbsv.net. [2a03:2880:31ff:d::face:b00c]) by smtp.gmail.com with ESMTPSA id gw11-20020a170906f14b00b0098669cc16b2sm4055503ejb.83.2023.10.16.07.01.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 07:01:41 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, kuba@kernel.org, pabeni@redhat.com, martin.lau@linux.dev, krisman@suse.de, Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Song Liu , Yonghong Song , KP Singh , Hao Luo , Jiri Olsa , "David S. Miller" , Eric Dumazet Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, Kuniyuki Iwashima , Alexander Mikhalitsyn , David Howells Subject: [PATCH v7 04/11] net/socket: Break down __sys_getsockopt Date: Mon, 16 Oct 2023 06:47:42 -0700 Message-Id: <20231016134750.1381153-5-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231016134750.1381153-1-leitao@debian.org> References: <20231016134750.1381153-1-leitao@debian.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Split __sys_getsockopt() into two functions by removing the core logic into a sub-function (do_sock_getsockopt()). This will avoid code duplication when doing the same operation in other callers, for instance. do_sock_getsockopt() will be called by io_uring getsockopt() command operation in the following patch. The same was done for the setsockopt pair. Suggested-by: Martin KaFai Lau Signed-off-by: Breno Leitao Acked-by: Jakub Kicinski Acked-by: Martin KaFai Lau --- include/linux/bpf-cgroup.h | 2 +- include/net/sock.h | 4 +-- net/core/sock.c | 8 ----- net/socket.c | 63 ++++++++++++++++++++++++-------------- 4 files changed, 43 insertions(+), 34 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 2912dce9144e..a789266feac3 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -393,7 +393,7 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk, ({ \ int __ret = 0; \ if (cgroup_bpf_enabled(CGROUP_GETSOCKOPT)) \ - get_user(__ret, optlen); \ + copy_from_sockptr(&__ret, optlen, sizeof(int)); \ __ret; \ }) diff --git a/include/net/sock.h b/include/net/sock.h index 00103e3143c4..1d6931caf0c3 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1866,11 +1866,11 @@ int sock_setsockopt(struct socket *sock, int level, int op, sockptr_t optval, unsigned int optlen); int do_sock_setsockopt(struct socket *sock, bool compat, int level, int optname, sockptr_t optval, int optlen); +int do_sock_getsockopt(struct socket *sock, bool compat, int level, + int optname, sockptr_t optval, sockptr_t optlen); int sk_getsockopt(struct sock *sk, int level, int optname, sockptr_t optval, sockptr_t optlen); -int sock_getsockopt(struct socket *sock, int level, int op, - char __user *optval, int __user *optlen); int sock_gettstamp(struct socket *sock, void __user *userstamp, bool timeval, bool time32); struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len, diff --git a/net/core/sock.c b/net/core/sock.c index 290165954379..d4cb8d6e75b7 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2003,14 +2003,6 @@ int sk_getsockopt(struct sock *sk, int level, int optname, return 0; } -int sock_getsockopt(struct socket *sock, int level, int optname, - char __user *optval, int __user *optlen) -{ - return sk_getsockopt(sock->sk, level, optname, - USER_SOCKPTR(optval), - USER_SOCKPTR(optlen)); -} - /* * Initialize an sk_lock. * diff --git a/net/socket.c b/net/socket.c index 0087f8c071e7..f4c156a1987e 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2350,6 +2350,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname, INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level, int optname)); +int do_sock_getsockopt(struct socket *sock, bool compat, int level, + int optname, sockptr_t optval, sockptr_t optlen) +{ + int max_optlen __maybe_unused; + const struct proto_ops *ops; + int err; + + err = security_socket_getsockopt(sock, level, optname); + if (err) + return err; + + ops = READ_ONCE(sock->ops); + if (level == SOL_SOCKET) { + err = sk_getsockopt(sock->sk, level, optname, optval, optlen); + } else if (unlikely(!ops->getsockopt)) { + err = -EOPNOTSUPP; + } else { + if (WARN_ONCE(optval.is_kernel || optlen.is_kernel, + "Invalid argument type")) + return -EOPNOTSUPP; + + err = ops->getsockopt(sock, level, optname, optval.user, + optlen.user); + } + + if (!compat) { + max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen); + err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname, + optval, optlen, max_optlen, + err); + } + + return err; +} +EXPORT_SYMBOL(do_sock_getsockopt); + /* * Get a socket option. Because we don't know the option lengths we have * to pass a user mode parameter for the protocols to sort out. @@ -2357,37 +2393,18 @@ INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level, int __sys_getsockopt(int fd, int level, int optname, char __user *optval, int __user *optlen) { - int max_optlen __maybe_unused; - const struct proto_ops *ops; int err, fput_needed; struct socket *sock; + bool compat; sock = sockfd_lookup_light(fd, &err, &fput_needed); if (!sock) return err; - err = security_socket_getsockopt(sock, level, optname); - if (err) - goto out_put; - - if (!in_compat_syscall()) - max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen); + compat = in_compat_syscall(); + err = do_sock_getsockopt(sock, compat, level, optname, + USER_SOCKPTR(optval), USER_SOCKPTR(optlen)); - ops = READ_ONCE(sock->ops); - if (level == SOL_SOCKET) - err = sock_getsockopt(sock, level, optname, optval, optlen); - else if (unlikely(!ops->getsockopt)) - err = -EOPNOTSUPP; - else - err = ops->getsockopt(sock, level, optname, optval, - optlen); - - if (!in_compat_syscall()) - err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname, - USER_SOCKPTR(optval), - USER_SOCKPTR(optlen), - max_optlen, err); -out_put: fput_light(sock->file, fput_needed); return err; } From patchwork Mon Oct 16 13:47:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13423459 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09593CDB474 for ; Mon, 16 Oct 2023 14:01:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233562AbjJPOBt (ORCPT ); Mon, 16 Oct 2023 10:01:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44562 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233445AbjJPOBq (ORCPT ); Mon, 16 Oct 2023 10:01:46 -0400 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 037ECE3; Mon, 16 Oct 2023 07:01:45 -0700 (PDT) Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-523100882f2so7633308a12.2; Mon, 16 Oct 2023 07:01:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697464903; x=1698069703; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VMTVz0v3sd6CbNeaWSc/IAFY8fAG0gp37H36LBoCuEY=; b=HgMGfVmEq4HtnYPh+fovghJbeoV2Wv46wCW1O0AjIUeN/CYoXFgFJVB6hHMesbxj0I RF2gQW1X69ra/yxsheRemdJ+2tmMbPz6IZrL1Pp0dm/8U/bA/Uhh3j90U7VSmILtazOv K4XLYPWt9+zzkinKDzxlecIiVNZX8aLTrXVaM5en8qReWVXBzeyW9MJI4SoO2zGmmxaP MF9ZdPbhcPtT8SkLIIJhd+17/9GJwKalyxcD2z4lZJz6XBcGJAaE8A6QO7mccJv+lc7l CqZXEG9rxgwXi0ZdU5fKZODpe2pBUPgsl5qKUqx3/T0Oxg2GfiYatJ7mSbGWZ7y56yU9 YdoA== X-Gm-Message-State: AOJu0YzcSbSVRpHEEg2DEZxuAZFRI+6GXNUvrGt2m+A6O7tnNHqCTxLw 9GevXmQ8h6dXEW9c69u2BLI= X-Google-Smtp-Source: AGHT+IGGw8pWYeUP4kPNXFpNNDHaQySfWE7Hr5VNG/1M9aHu6uc+ou050XkJAQBc24YjTpyU/sylTw== X-Received: by 2002:a17:907:8b8c:b0:9a2:28dc:4166 with SMTP id tb12-20020a1709078b8c00b009a228dc4166mr32176684ejc.75.1697464903116; Mon, 16 Oct 2023 07:01:43 -0700 (PDT) Received: from localhost (fwdproxy-cln-016.fbsv.net. [2a03:2880:31ff:10::face:b00c]) by smtp.gmail.com with ESMTPSA id gx11-20020a1709068a4b00b0099290e2c163sm4040171ejc.204.2023.10.16.07.01.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 07:01:42 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, kuba@kernel.org, pabeni@redhat.com, martin.lau@linux.dev, krisman@suse.de Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org Subject: [PATCH v7 05/11] io_uring/cmd: Pass compat mode in issue_flags Date: Mon, 16 Oct 2023 06:47:43 -0700 Message-Id: <20231016134750.1381153-6-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231016134750.1381153-1-leitao@debian.org> References: <20231016134750.1381153-1-leitao@debian.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Create a new flag to track if the operation is running compat mode. This basically check the context->compat and pass it to the issue_flags, so, it could be queried later in the callbacks. Signed-off-by: Breno Leitao Reviewed-by: Gabriel Krisman Bertazi --- include/linux/io_uring.h | 1 + io_uring/uring_cmd.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index b4391e0a9bc8..aefb73eeeebf 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -23,6 +23,7 @@ enum io_uring_cmd_flags { /* set when uring wants to cancel a previously issued command */ IO_URING_F_CANCEL = (1 << 11), + IO_URING_F_COMPAT = (1 << 12), }; /* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */ diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 00a5e5621a28..4bedd633c08c 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -175,6 +175,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) issue_flags |= IO_URING_F_SQE128; if (ctx->flags & IORING_SETUP_CQE32) issue_flags |= IO_URING_F_CQE32; + if (ctx->compat) + issue_flags |= IO_URING_F_COMPAT; if (ctx->flags & IORING_SETUP_IOPOLL) { if (!file->f_op->uring_cmd_iopoll) return -EOPNOTSUPP; From patchwork Mon Oct 16 13:47:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13423460 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD8B7CDB474 for ; Mon, 16 Oct 2023 14:02:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233782AbjJPOCN (ORCPT ); Mon, 16 Oct 2023 10:02:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233690AbjJPOCG (ORCPT ); Mon, 16 Oct 2023 10:02:06 -0400 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA97CF2; Mon, 16 Oct 2023 07:01:55 -0700 (PDT) Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-9c3aec5f326so239915366b.1; Mon, 16 Oct 2023 07:01:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697464914; x=1698069714; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TOnBBB7HL34j5OvmN1qxPA18trQywrMm4qSU/BBaRxA=; b=jzbOK75fvwrXY5fsczs8MrdRnqR42UrCXc7IG4xvYbbfKl9A9rWxwH05Geu4ZHvXmS /pXmiZEgqCDnr0QhayDTgNyPI367La1w8x4NvjRtS5gpLXzAsL5zT1aE4GFo9eMjvwA6 V+jSz5JVtlF4BFGu4eS84mMUKYkPj2uvlcajaFw7Th3Kr8CifSg3U2I4O8ISkJBcs+BG Uo7pWhZoVmla8eN8O4/EJrXMog7XRdDhSmWkRi1IphmGffsDiM8jiaPnAYL6KEAMIHnJ 73aB5q1VuP9yMuSnw5SWVYq5reFOsT79oLuAROAJ7BGDdsYqd7dXjiVmEjsmvqKzVH1j tfOg== X-Gm-Message-State: AOJu0YzHRoqtIax52sgNuG4xPtylON7Yx3lnHSMmI3WITTBAzHqWNwU0 v/lZWBisBA7WfQWWyWoif5g= X-Google-Smtp-Source: AGHT+IGoCdCTWg6RfJ7UnVbu5OU9vyL08+8QQ0AZYvyASDG4VV1gq5PBeqrNNjTy5fuelOFWyDhbgA== X-Received: by 2002:a17:907:1c94:b0:9bd:d405:4e8a with SMTP id nb20-20020a1709071c9400b009bdd4054e8amr6667809ejc.17.1697464913880; Mon, 16 Oct 2023 07:01:53 -0700 (PDT) Received: from localhost (fwdproxy-cln-004.fbsv.net. [2a03:2880:31ff:4::face:b00c]) by smtp.gmail.com with ESMTPSA id ga19-20020a170906b85300b0099b76c3041csm4130942ejb.7.2023.10.16.07.01.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 07:01:53 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, kuba@kernel.org, pabeni@redhat.com, martin.lau@linux.dev, krisman@suse.de Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, "Peter Zijlstra (Intel)" , Stefan Metzmacher , Josh Triplett Subject: [PATCH v7 06/11] tools headers: Grab copy of io_uring.h Date: Mon, 16 Oct 2023 06:47:44 -0700 Message-Id: <20231016134750.1381153-7-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231016134750.1381153-1-leitao@debian.org> References: <20231016134750.1381153-1-leitao@debian.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This file will be used by mini_uring.h and allow tests to run without the need of installing liburing to run the tests. This is needed to run io_uring tests in BPF, such as (tools/testing/selftests/bpf/prog_tests/sockopt.c). Signed-off-by: Breno Leitao --- tools/include/uapi/linux/io_uring.h | 757 ++++++++++++++++++++++++++++ 1 file changed, 757 insertions(+) create mode 100644 tools/include/uapi/linux/io_uring.h diff --git a/tools/include/uapi/linux/io_uring.h b/tools/include/uapi/linux/io_uring.h new file mode 100644 index 000000000000..f1c16f817742 --- /dev/null +++ b/tools/include/uapi/linux/io_uring.h @@ -0,0 +1,757 @@ +/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR MIT */ +/* + * Header file for the io_uring interface. + * + * Copyright (C) 2019 Jens Axboe + * Copyright (C) 2019 Christoph Hellwig + */ +#ifndef LINUX_IO_URING_H +#define LINUX_IO_URING_H + +#include +#include +/* + * this file is shared with liburing and that has to autodetect + * if linux/time_types.h is available or not, it can + * define UAPI_LINUX_IO_URING_H_SKIP_LINUX_TIME_TYPES_H + * if linux/time_types.h is not available + */ +#ifndef UAPI_LINUX_IO_URING_H_SKIP_LINUX_TIME_TYPES_H +#include +#endif + +#ifdef __cplusplus +extern "C" { +#endif + +/* + * IO submission data structure (Submission Queue Entry) + */ +struct io_uring_sqe { + __u8 opcode; /* type of operation for this sqe */ + __u8 flags; /* IOSQE_ flags */ + __u16 ioprio; /* ioprio for the request */ + __s32 fd; /* file descriptor to do IO on */ + union { + __u64 off; /* offset into file */ + __u64 addr2; + struct { + __u32 cmd_op; + __u32 __pad1; + }; + }; + union { + __u64 addr; /* pointer to buffer or iovecs */ + __u64 splice_off_in; + struct { + __u32 level; + __u32 optname; + }; + }; + __u32 len; /* buffer size or number of iovecs */ + union { + __kernel_rwf_t rw_flags; + __u32 fsync_flags; + __u16 poll_events; /* compatibility */ + __u32 poll32_events; /* word-reversed for BE */ + __u32 sync_range_flags; + __u32 msg_flags; + __u32 timeout_flags; + __u32 accept_flags; + __u32 cancel_flags; + __u32 open_flags; + __u32 statx_flags; + __u32 fadvise_advice; + __u32 splice_flags; + __u32 rename_flags; + __u32 unlink_flags; + __u32 hardlink_flags; + __u32 xattr_flags; + __u32 msg_ring_flags; + __u32 uring_cmd_flags; + __u32 waitid_flags; + __u32 futex_flags; + }; + __u64 user_data; /* data to be passed back at completion time */ + /* pack this to avoid bogus arm OABI complaints */ + union { + /* index into fixed buffers, if used */ + __u16 buf_index; + /* for grouped buffer selection */ + __u16 buf_group; + } __attribute__((packed)); + /* personality to use, if used */ + __u16 personality; + union { + __s32 splice_fd_in; + __u32 file_index; + __u32 optlen; + struct { + __u16 addr_len; + __u16 __pad3[1]; + }; + }; + union { + struct { + __u64 addr3; + __u64 __pad2[1]; + }; + __u64 optval; + /* + * If the ring is initialized with IORING_SETUP_SQE128, then + * this field is used for 80 bytes of arbitrary command data + */ + __u8 cmd[0]; + }; +}; + +/* + * If sqe->file_index is set to this for opcodes that instantiate a new + * direct descriptor (like openat/openat2/accept), then io_uring will allocate + * an available direct descriptor instead of having the application pass one + * in. The picked direct descriptor will be returned in cqe->res, or -ENFILE + * if the space is full. + */ +#define IORING_FILE_INDEX_ALLOC (~0U) + +enum { + IOSQE_FIXED_FILE_BIT, + IOSQE_IO_DRAIN_BIT, + IOSQE_IO_LINK_BIT, + IOSQE_IO_HARDLINK_BIT, + IOSQE_ASYNC_BIT, + IOSQE_BUFFER_SELECT_BIT, + IOSQE_CQE_SKIP_SUCCESS_BIT, +}; + +/* + * sqe->flags + */ +/* use fixed fileset */ +#define IOSQE_FIXED_FILE (1U << IOSQE_FIXED_FILE_BIT) +/* issue after inflight IO */ +#define IOSQE_IO_DRAIN (1U << IOSQE_IO_DRAIN_BIT) +/* links next sqe */ +#define IOSQE_IO_LINK (1U << IOSQE_IO_LINK_BIT) +/* like LINK, but stronger */ +#define IOSQE_IO_HARDLINK (1U << IOSQE_IO_HARDLINK_BIT) +/* always go async */ +#define IOSQE_ASYNC (1U << IOSQE_ASYNC_BIT) +/* select buffer from sqe->buf_group */ +#define IOSQE_BUFFER_SELECT (1U << IOSQE_BUFFER_SELECT_BIT) +/* don't post CQE if request succeeded */ +#define IOSQE_CQE_SKIP_SUCCESS (1U << IOSQE_CQE_SKIP_SUCCESS_BIT) + +/* + * io_uring_setup() flags + */ +#define IORING_SETUP_IOPOLL (1U << 0) /* io_context is polled */ +#define IORING_SETUP_SQPOLL (1U << 1) /* SQ poll thread */ +#define IORING_SETUP_SQ_AFF (1U << 2) /* sq_thread_cpu is valid */ +#define IORING_SETUP_CQSIZE (1U << 3) /* app defines CQ size */ +#define IORING_SETUP_CLAMP (1U << 4) /* clamp SQ/CQ ring sizes */ +#define IORING_SETUP_ATTACH_WQ (1U << 5) /* attach to existing wq */ +#define IORING_SETUP_R_DISABLED (1U << 6) /* start with ring disabled */ +#define IORING_SETUP_SUBMIT_ALL (1U << 7) /* continue submit on error */ +/* + * Cooperative task running. When requests complete, they often require + * forcing the submitter to transition to the kernel to complete. If this + * flag is set, work will be done when the task transitions anyway, rather + * than force an inter-processor interrupt reschedule. This avoids interrupting + * a task running in userspace, and saves an IPI. + */ +#define IORING_SETUP_COOP_TASKRUN (1U << 8) +/* + * If COOP_TASKRUN is set, get notified if task work is available for + * running and a kernel transition would be needed to run it. This sets + * IORING_SQ_TASKRUN in the sq ring flags. Not valid with COOP_TASKRUN. + */ +#define IORING_SETUP_TASKRUN_FLAG (1U << 9) +#define IORING_SETUP_SQE128 (1U << 10) /* SQEs are 128 byte */ +#define IORING_SETUP_CQE32 (1U << 11) /* CQEs are 32 byte */ +/* + * Only one task is allowed to submit requests + */ +#define IORING_SETUP_SINGLE_ISSUER (1U << 12) + +/* + * Defer running task work to get events. + * Rather than running bits of task work whenever the task transitions + * try to do it just before it is needed. + */ +#define IORING_SETUP_DEFER_TASKRUN (1U << 13) + +/* + * Application provides the memory for the rings + */ +#define IORING_SETUP_NO_MMAP (1U << 14) + +/* + * Register the ring fd in itself for use with + * IORING_REGISTER_USE_REGISTERED_RING; return a registered fd index rather + * than an fd. + */ +#define IORING_SETUP_REGISTERED_FD_ONLY (1U << 15) + +/* + * Removes indirection through the SQ index array. + */ +#define IORING_SETUP_NO_SQARRAY (1U << 16) + +enum io_uring_op { + IORING_OP_NOP, + IORING_OP_READV, + IORING_OP_WRITEV, + IORING_OP_FSYNC, + IORING_OP_READ_FIXED, + IORING_OP_WRITE_FIXED, + IORING_OP_POLL_ADD, + IORING_OP_POLL_REMOVE, + IORING_OP_SYNC_FILE_RANGE, + IORING_OP_SENDMSG, + IORING_OP_RECVMSG, + IORING_OP_TIMEOUT, + IORING_OP_TIMEOUT_REMOVE, + IORING_OP_ACCEPT, + IORING_OP_ASYNC_CANCEL, + IORING_OP_LINK_TIMEOUT, + IORING_OP_CONNECT, + IORING_OP_FALLOCATE, + IORING_OP_OPENAT, + IORING_OP_CLOSE, + IORING_OP_FILES_UPDATE, + IORING_OP_STATX, + IORING_OP_READ, + IORING_OP_WRITE, + IORING_OP_FADVISE, + IORING_OP_MADVISE, + IORING_OP_SEND, + IORING_OP_RECV, + IORING_OP_OPENAT2, + IORING_OP_EPOLL_CTL, + IORING_OP_SPLICE, + IORING_OP_PROVIDE_BUFFERS, + IORING_OP_REMOVE_BUFFERS, + IORING_OP_TEE, + IORING_OP_SHUTDOWN, + IORING_OP_RENAMEAT, + IORING_OP_UNLINKAT, + IORING_OP_MKDIRAT, + IORING_OP_SYMLINKAT, + IORING_OP_LINKAT, + IORING_OP_MSG_RING, + IORING_OP_FSETXATTR, + IORING_OP_SETXATTR, + IORING_OP_FGETXATTR, + IORING_OP_GETXATTR, + IORING_OP_SOCKET, + IORING_OP_URING_CMD, + IORING_OP_SEND_ZC, + IORING_OP_SENDMSG_ZC, + IORING_OP_READ_MULTISHOT, + IORING_OP_WAITID, + IORING_OP_FUTEX_WAIT, + IORING_OP_FUTEX_WAKE, + IORING_OP_FUTEX_WAITV, + + /* this goes last, obviously */ + IORING_OP_LAST, +}; + +/* + * sqe->uring_cmd_flags top 8bits aren't available for userspace + * IORING_URING_CMD_FIXED use registered buffer; pass this flag + * along with setting sqe->buf_index. + */ +#define IORING_URING_CMD_FIXED (1U << 0) +#define IORING_URING_CMD_MASK IORING_URING_CMD_FIXED + + +/* + * sqe->fsync_flags + */ +#define IORING_FSYNC_DATASYNC (1U << 0) + +/* + * sqe->timeout_flags + */ +#define IORING_TIMEOUT_ABS (1U << 0) +#define IORING_TIMEOUT_UPDATE (1U << 1) +#define IORING_TIMEOUT_BOOTTIME (1U << 2) +#define IORING_TIMEOUT_REALTIME (1U << 3) +#define IORING_LINK_TIMEOUT_UPDATE (1U << 4) +#define IORING_TIMEOUT_ETIME_SUCCESS (1U << 5) +#define IORING_TIMEOUT_MULTISHOT (1U << 6) +#define IORING_TIMEOUT_CLOCK_MASK (IORING_TIMEOUT_BOOTTIME | IORING_TIMEOUT_REALTIME) +#define IORING_TIMEOUT_UPDATE_MASK (IORING_TIMEOUT_UPDATE | IORING_LINK_TIMEOUT_UPDATE) +/* + * sqe->splice_flags + * extends splice(2) flags + */ +#define SPLICE_F_FD_IN_FIXED (1U << 31) /* the last bit of __u32 */ + +/* + * POLL_ADD flags. Note that since sqe->poll_events is the flag space, the + * command flags for POLL_ADD are stored in sqe->len. + * + * IORING_POLL_ADD_MULTI Multishot poll. Sets IORING_CQE_F_MORE if + * the poll handler will continue to report + * CQEs on behalf of the same SQE. + * + * IORING_POLL_UPDATE Update existing poll request, matching + * sqe->addr as the old user_data field. + * + * IORING_POLL_LEVEL Level triggered poll. + */ +#define IORING_POLL_ADD_MULTI (1U << 0) +#define IORING_POLL_UPDATE_EVENTS (1U << 1) +#define IORING_POLL_UPDATE_USER_DATA (1U << 2) +#define IORING_POLL_ADD_LEVEL (1U << 3) + +/* + * ASYNC_CANCEL flags. + * + * IORING_ASYNC_CANCEL_ALL Cancel all requests that match the given key + * IORING_ASYNC_CANCEL_FD Key off 'fd' for cancelation rather than the + * request 'user_data' + * IORING_ASYNC_CANCEL_ANY Match any request + * IORING_ASYNC_CANCEL_FD_FIXED 'fd' passed in is a fixed descriptor + * IORING_ASYNC_CANCEL_USERDATA Match on user_data, default for no other key + * IORING_ASYNC_CANCEL_OP Match request based on opcode + */ +#define IORING_ASYNC_CANCEL_ALL (1U << 0) +#define IORING_ASYNC_CANCEL_FD (1U << 1) +#define IORING_ASYNC_CANCEL_ANY (1U << 2) +#define IORING_ASYNC_CANCEL_FD_FIXED (1U << 3) +#define IORING_ASYNC_CANCEL_USERDATA (1U << 4) +#define IORING_ASYNC_CANCEL_OP (1U << 5) + +/* + * send/sendmsg and recv/recvmsg flags (sqe->ioprio) + * + * IORING_RECVSEND_POLL_FIRST If set, instead of first attempting to send + * or receive and arm poll if that yields an + * -EAGAIN result, arm poll upfront and skip + * the initial transfer attempt. + * + * IORING_RECV_MULTISHOT Multishot recv. Sets IORING_CQE_F_MORE if + * the handler will continue to report + * CQEs on behalf of the same SQE. + * + * IORING_RECVSEND_FIXED_BUF Use registered buffers, the index is stored in + * the buf_index field. + * + * IORING_SEND_ZC_REPORT_USAGE + * If set, SEND[MSG]_ZC should report + * the zerocopy usage in cqe.res + * for the IORING_CQE_F_NOTIF cqe. + * 0 is reported if zerocopy was actually possible. + * IORING_NOTIF_USAGE_ZC_COPIED if data was copied + * (at least partially). + */ +#define IORING_RECVSEND_POLL_FIRST (1U << 0) +#define IORING_RECV_MULTISHOT (1U << 1) +#define IORING_RECVSEND_FIXED_BUF (1U << 2) +#define IORING_SEND_ZC_REPORT_USAGE (1U << 3) + +/* + * cqe.res for IORING_CQE_F_NOTIF if + * IORING_SEND_ZC_REPORT_USAGE was requested + * + * It should be treated as a flag, all other + * bits of cqe.res should be treated as reserved! + */ +#define IORING_NOTIF_USAGE_ZC_COPIED (1U << 31) + +/* + * accept flags stored in sqe->ioprio + */ +#define IORING_ACCEPT_MULTISHOT (1U << 0) + +/* + * IORING_OP_MSG_RING command types, stored in sqe->addr + */ +enum { + IORING_MSG_DATA, /* pass sqe->len as 'res' and off as user_data */ + IORING_MSG_SEND_FD, /* send a registered fd to another ring */ +}; + +/* + * IORING_OP_MSG_RING flags (sqe->msg_ring_flags) + * + * IORING_MSG_RING_CQE_SKIP Don't post a CQE to the target ring. Not + * applicable for IORING_MSG_DATA, obviously. + */ +#define IORING_MSG_RING_CQE_SKIP (1U << 0) +/* Pass through the flags from sqe->file_index to cqe->flags */ +#define IORING_MSG_RING_FLAGS_PASS (1U << 1) + +/* + * IO completion data structure (Completion Queue Entry) + */ +struct io_uring_cqe { + __u64 user_data; /* sqe->data submission passed back */ + __s32 res; /* result code for this event */ + __u32 flags; + + /* + * If the ring is initialized with IORING_SETUP_CQE32, then this field + * contains 16-bytes of padding, doubling the size of the CQE. + */ + __u64 big_cqe[]; +}; + +/* + * cqe->flags + * + * IORING_CQE_F_BUFFER If set, the upper 16 bits are the buffer ID + * IORING_CQE_F_MORE If set, parent SQE will generate more CQE entries + * IORING_CQE_F_SOCK_NONEMPTY If set, more data to read after socket recv + * IORING_CQE_F_NOTIF Set for notification CQEs. Can be used to distinct + * them from sends. + */ +#define IORING_CQE_F_BUFFER (1U << 0) +#define IORING_CQE_F_MORE (1U << 1) +#define IORING_CQE_F_SOCK_NONEMPTY (1U << 2) +#define IORING_CQE_F_NOTIF (1U << 3) + +enum { + IORING_CQE_BUFFER_SHIFT = 16, +}; + +/* + * Magic offsets for the application to mmap the data it needs + */ +#define IORING_OFF_SQ_RING 0ULL +#define IORING_OFF_CQ_RING 0x8000000ULL +#define IORING_OFF_SQES 0x10000000ULL +#define IORING_OFF_PBUF_RING 0x80000000ULL +#define IORING_OFF_PBUF_SHIFT 16 +#define IORING_OFF_MMAP_MASK 0xf8000000ULL + +/* + * Filled with the offset for mmap(2) + */ +struct io_sqring_offsets { + __u32 head; + __u32 tail; + __u32 ring_mask; + __u32 ring_entries; + __u32 flags; + __u32 dropped; + __u32 array; + __u32 resv1; + __u64 user_addr; +}; + +/* + * sq_ring->flags + */ +#define IORING_SQ_NEED_WAKEUP (1U << 0) /* needs io_uring_enter wakeup */ +#define IORING_SQ_CQ_OVERFLOW (1U << 1) /* CQ ring is overflown */ +#define IORING_SQ_TASKRUN (1U << 2) /* task should enter the kernel */ + +struct io_cqring_offsets { + __u32 head; + __u32 tail; + __u32 ring_mask; + __u32 ring_entries; + __u32 overflow; + __u32 cqes; + __u32 flags; + __u32 resv1; + __u64 user_addr; +}; + +/* + * cq_ring->flags + */ + +/* disable eventfd notifications */ +#define IORING_CQ_EVENTFD_DISABLED (1U << 0) + +/* + * io_uring_enter(2) flags + */ +#define IORING_ENTER_GETEVENTS (1U << 0) +#define IORING_ENTER_SQ_WAKEUP (1U << 1) +#define IORING_ENTER_SQ_WAIT (1U << 2) +#define IORING_ENTER_EXT_ARG (1U << 3) +#define IORING_ENTER_REGISTERED_RING (1U << 4) + +/* + * Passed in for io_uring_setup(2). Copied back with updated info on success + */ +struct io_uring_params { + __u32 sq_entries; + __u32 cq_entries; + __u32 flags; + __u32 sq_thread_cpu; + __u32 sq_thread_idle; + __u32 features; + __u32 wq_fd; + __u32 resv[3]; + struct io_sqring_offsets sq_off; + struct io_cqring_offsets cq_off; +}; + +/* + * io_uring_params->features flags + */ +#define IORING_FEAT_SINGLE_MMAP (1U << 0) +#define IORING_FEAT_NODROP (1U << 1) +#define IORING_FEAT_SUBMIT_STABLE (1U << 2) +#define IORING_FEAT_RW_CUR_POS (1U << 3) +#define IORING_FEAT_CUR_PERSONALITY (1U << 4) +#define IORING_FEAT_FAST_POLL (1U << 5) +#define IORING_FEAT_POLL_32BITS (1U << 6) +#define IORING_FEAT_SQPOLL_NONFIXED (1U << 7) +#define IORING_FEAT_EXT_ARG (1U << 8) +#define IORING_FEAT_NATIVE_WORKERS (1U << 9) +#define IORING_FEAT_RSRC_TAGS (1U << 10) +#define IORING_FEAT_CQE_SKIP (1U << 11) +#define IORING_FEAT_LINKED_FILE (1U << 12) +#define IORING_FEAT_REG_REG_RING (1U << 13) + +/* + * io_uring_register(2) opcodes and arguments + */ +enum { + IORING_REGISTER_BUFFERS = 0, + IORING_UNREGISTER_BUFFERS = 1, + IORING_REGISTER_FILES = 2, + IORING_UNREGISTER_FILES = 3, + IORING_REGISTER_EVENTFD = 4, + IORING_UNREGISTER_EVENTFD = 5, + IORING_REGISTER_FILES_UPDATE = 6, + IORING_REGISTER_EVENTFD_ASYNC = 7, + IORING_REGISTER_PROBE = 8, + IORING_REGISTER_PERSONALITY = 9, + IORING_UNREGISTER_PERSONALITY = 10, + IORING_REGISTER_RESTRICTIONS = 11, + IORING_REGISTER_ENABLE_RINGS = 12, + + /* extended with tagging */ + IORING_REGISTER_FILES2 = 13, + IORING_REGISTER_FILES_UPDATE2 = 14, + IORING_REGISTER_BUFFERS2 = 15, + IORING_REGISTER_BUFFERS_UPDATE = 16, + + /* set/clear io-wq thread affinities */ + IORING_REGISTER_IOWQ_AFF = 17, + IORING_UNREGISTER_IOWQ_AFF = 18, + + /* set/get max number of io-wq workers */ + IORING_REGISTER_IOWQ_MAX_WORKERS = 19, + + /* register/unregister io_uring fd with the ring */ + IORING_REGISTER_RING_FDS = 20, + IORING_UNREGISTER_RING_FDS = 21, + + /* register ring based provide buffer group */ + IORING_REGISTER_PBUF_RING = 22, + IORING_UNREGISTER_PBUF_RING = 23, + + /* sync cancelation API */ + IORING_REGISTER_SYNC_CANCEL = 24, + + /* register a range of fixed file slots for automatic slot allocation */ + IORING_REGISTER_FILE_ALLOC_RANGE = 25, + + /* this goes last */ + IORING_REGISTER_LAST, + + /* flag added to the opcode to use a registered ring fd */ + IORING_REGISTER_USE_REGISTERED_RING = 1U << 31 +}; + +/* io-wq worker categories */ +enum { + IO_WQ_BOUND, + IO_WQ_UNBOUND, +}; + +/* deprecated, see struct io_uring_rsrc_update */ +struct io_uring_files_update { + __u32 offset; + __u32 resv; + __aligned_u64 /* __s32 * */ fds; +}; + +/* + * Register a fully sparse file space, rather than pass in an array of all + * -1 file descriptors. + */ +#define IORING_RSRC_REGISTER_SPARSE (1U << 0) + +struct io_uring_rsrc_register { + __u32 nr; + __u32 flags; + __u64 resv2; + __aligned_u64 data; + __aligned_u64 tags; +}; + +struct io_uring_rsrc_update { + __u32 offset; + __u32 resv; + __aligned_u64 data; +}; + +struct io_uring_rsrc_update2 { + __u32 offset; + __u32 resv; + __aligned_u64 data; + __aligned_u64 tags; + __u32 nr; + __u32 resv2; +}; + +/* Skip updating fd indexes set to this value in the fd table */ +#define IORING_REGISTER_FILES_SKIP (-2) + +#define IO_URING_OP_SUPPORTED (1U << 0) + +struct io_uring_probe_op { + __u8 op; + __u8 resv; + __u16 flags; /* IO_URING_OP_* flags */ + __u32 resv2; +}; + +struct io_uring_probe { + __u8 last_op; /* last opcode supported */ + __u8 ops_len; /* length of ops[] array below */ + __u16 resv; + __u32 resv2[3]; + struct io_uring_probe_op ops[]; +}; + +struct io_uring_restriction { + __u16 opcode; + union { + __u8 register_op; /* IORING_RESTRICTION_REGISTER_OP */ + __u8 sqe_op; /* IORING_RESTRICTION_SQE_OP */ + __u8 sqe_flags; /* IORING_RESTRICTION_SQE_FLAGS_* */ + }; + __u8 resv; + __u32 resv2[3]; +}; + +struct io_uring_buf { + __u64 addr; + __u32 len; + __u16 bid; + __u16 resv; +}; + +struct io_uring_buf_ring { + union { + /* + * To avoid spilling into more pages than we need to, the + * ring tail is overlaid with the io_uring_buf->resv field. + */ + struct { + __u64 resv1; + __u32 resv2; + __u16 resv3; + __u16 tail; + }; + __DECLARE_FLEX_ARRAY(struct io_uring_buf, bufs); + }; +}; + +/* + * Flags for IORING_REGISTER_PBUF_RING. + * + * IOU_PBUF_RING_MMAP: If set, kernel will allocate the memory for the ring. + * The application must not set a ring_addr in struct + * io_uring_buf_reg, instead it must subsequently call + * mmap(2) with the offset set as: + * IORING_OFF_PBUF_RING | (bgid << IORING_OFF_PBUF_SHIFT) + * to get a virtual mapping for the ring. + */ +enum { + IOU_PBUF_RING_MMAP = 1, +}; + +/* argument for IORING_(UN)REGISTER_PBUF_RING */ +struct io_uring_buf_reg { + __u64 ring_addr; + __u32 ring_entries; + __u16 bgid; + __u16 flags; + __u64 resv[3]; +}; + +/* + * io_uring_restriction->opcode values + */ +enum { + /* Allow an io_uring_register(2) opcode */ + IORING_RESTRICTION_REGISTER_OP = 0, + + /* Allow an sqe opcode */ + IORING_RESTRICTION_SQE_OP = 1, + + /* Allow sqe flags */ + IORING_RESTRICTION_SQE_FLAGS_ALLOWED = 2, + + /* Require sqe flags (these flags must be set on each submission) */ + IORING_RESTRICTION_SQE_FLAGS_REQUIRED = 3, + + IORING_RESTRICTION_LAST +}; + +struct io_uring_getevents_arg { + __u64 sigmask; + __u32 sigmask_sz; + __u32 pad; + __u64 ts; +}; + +/* + * Argument for IORING_REGISTER_SYNC_CANCEL + */ +struct io_uring_sync_cancel_reg { + __u64 addr; + __s32 fd; + __u32 flags; + struct __kernel_timespec timeout; + __u8 opcode; + __u8 pad[7]; + __u64 pad2[3]; +}; + +/* + * Argument for IORING_REGISTER_FILE_ALLOC_RANGE + * The range is specified as [off, off + len) + */ +struct io_uring_file_index_range { + __u32 off; + __u32 len; + __u64 resv; +}; + +struct io_uring_recvmsg_out { + __u32 namelen; + __u32 controllen; + __u32 payloadlen; + __u32 flags; +}; + +/* + * Argument for IORING_OP_URING_CMD when file is a socket + */ +enum { + SOCKET_URING_OP_SIOCINQ = 0, + SOCKET_URING_OP_SIOCOUTQ, + SOCKET_URING_OP_GETSOCKOPT, + SOCKET_URING_OP_SETSOCKOPT, +}; + +#ifdef __cplusplus +} +#endif + +#endif From patchwork Mon Oct 16 13:47:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13423461 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CD0DCDB482 for ; Mon, 16 Oct 2023 14:02:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233941AbjJPOCl (ORCPT ); Mon, 16 Oct 2023 10:02:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233794AbjJPOCN (ORCPT ); Mon, 16 Oct 2023 10:02:13 -0400 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CAA7135; Mon, 16 Oct 2023 07:02:04 -0700 (PDT) Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-99c3d3c3db9so729111266b.3; Mon, 16 Oct 2023 07:02:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697464923; x=1698069723; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SfV5ews/VmW9YW7ToLjXL1Db+Ad3infUQzYV78GPNNA=; b=kNQLkFny7tsAK3msTPqW1OPHN5kHNMTqH4m7UxuHbKMra7y9LSCo9dpXmqcxbE1xMH bkITyFkRGF7R+/B39LA1qT4GXXbkCQE/woLw7i+Chs2yXk3dboqXKz3H+LnvAxut/wGl +L+uVSTMo63Bpi94XyINXKiNLOUSbJ8ek808lq2/Nu42uMEFGKrhCCmH8UvBvNVRQ4uK V80HfTyNmPqjCPky5QJRRK2FUHHO/7lvAuNV78mMmz9xU1NqoQ67ft1DM/C7BuxGvfEn kMReD6OAKGbVksVvEi8vk0jq1Ftks7WVcex+nT2fGkQwt4XQI7X4DQs0yjHU1nznJ26n 4uIA== X-Gm-Message-State: AOJu0YxS28i3SA82m1ypdPkszktYaQdrn/IQR4VwrOZNF7Un27utSwzl X4VlQfRJAT0TBUYwa1xLhrw= X-Google-Smtp-Source: AGHT+IGefvqUkZIpMi5smsTwU4Djg5nyVrll30fDZwbs2RC1t6/HEFBaJsCZ1WPheSfBh03yut07uQ== X-Received: by 2002:a17:906:7389:b0:9ae:406c:3420 with SMTP id f9-20020a170906738900b009ae406c3420mr28494156ejl.30.1697464922870; Mon, 16 Oct 2023 07:02:02 -0700 (PDT) Received: from localhost (fwdproxy-cln-119.fbsv.net. [2a03:2880:31ff:77::face:b00c]) by smtp.gmail.com with ESMTPSA id y23-20020a170906519700b009b947aacb4bsm4182512ejk.191.2023.10.16.07.02.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 07:02:02 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, kuba@kernel.org, pabeni@redhat.com, martin.lau@linux.dev, krisman@suse.de, "David S. Miller" , Eric Dumazet , Shuah Khan Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Subject: [PATCH v7 07/11] selftests/net: Extract uring helpers to be reusable Date: Mon, 16 Oct 2023 06:47:45 -0700 Message-Id: <20231016134750.1381153-8-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231016134750.1381153-1-leitao@debian.org> References: <20231016134750.1381153-1-leitao@debian.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Instead of defining basic io_uring functions in the test case, move them to a common directory, so, other tests can use them. This simplify the test code and reuse the common liburing infrastructure. This is basically a copy of what we have in io_uring_zerocopy_tx with some minor improvements to make checkpatch happy. A follow-up test will use the same helpers in a BPF sockopt test. Signed-off-by: Breno Leitao --- tools/include/io_uring/mini_liburing.h | 282 ++++++++++++++++++ tools/testing/selftests/net/Makefile | 1 + .../selftests/net/io_uring_zerocopy_tx.c | 268 +---------------- 3 files changed, 285 insertions(+), 266 deletions(-) create mode 100644 tools/include/io_uring/mini_liburing.h diff --git a/tools/include/io_uring/mini_liburing.h b/tools/include/io_uring/mini_liburing.h new file mode 100644 index 000000000000..9ccb16074eb5 --- /dev/null +++ b/tools/include/io_uring/mini_liburing.h @@ -0,0 +1,282 @@ +/* SPDX-License-Identifier: MIT */ + +#include +#include +#include +#include +#include +#include + +struct io_sq_ring { + unsigned int *head; + unsigned int *tail; + unsigned int *ring_mask; + unsigned int *ring_entries; + unsigned int *flags; + unsigned int *array; +}; + +struct io_cq_ring { + unsigned int *head; + unsigned int *tail; + unsigned int *ring_mask; + unsigned int *ring_entries; + struct io_uring_cqe *cqes; +}; + +struct io_uring_sq { + unsigned int *khead; + unsigned int *ktail; + unsigned int *kring_mask; + unsigned int *kring_entries; + unsigned int *kflags; + unsigned int *kdropped; + unsigned int *array; + struct io_uring_sqe *sqes; + + unsigned int sqe_head; + unsigned int sqe_tail; + + size_t ring_sz; +}; + +struct io_uring_cq { + unsigned int *khead; + unsigned int *ktail; + unsigned int *kring_mask; + unsigned int *kring_entries; + unsigned int *koverflow; + struct io_uring_cqe *cqes; + + size_t ring_sz; +}; + +struct io_uring { + struct io_uring_sq sq; + struct io_uring_cq cq; + int ring_fd; +}; + +#if defined(__x86_64) || defined(__i386__) +#define read_barrier() __asm__ __volatile__("":::"memory") +#define write_barrier() __asm__ __volatile__("":::"memory") +#else +#define read_barrier() __sync_synchronize() +#define write_barrier() __sync_synchronize() +#endif + +static inline int io_uring_mmap(int fd, struct io_uring_params *p, + struct io_uring_sq *sq, struct io_uring_cq *cq) +{ + size_t size; + void *ptr; + int ret; + + sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned int); + ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING); + if (ptr == MAP_FAILED) + return -errno; + sq->khead = ptr + p->sq_off.head; + sq->ktail = ptr + p->sq_off.tail; + sq->kring_mask = ptr + p->sq_off.ring_mask; + sq->kring_entries = ptr + p->sq_off.ring_entries; + sq->kflags = ptr + p->sq_off.flags; + sq->kdropped = ptr + p->sq_off.dropped; + sq->array = ptr + p->sq_off.array; + + size = p->sq_entries * sizeof(struct io_uring_sqe); + sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES); + if (sq->sqes == MAP_FAILED) { + ret = -errno; +err: + munmap(sq->khead, sq->ring_sz); + return ret; + } + + cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe); + ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING); + if (ptr == MAP_FAILED) { + ret = -errno; + munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe)); + goto err; + } + cq->khead = ptr + p->cq_off.head; + cq->ktail = ptr + p->cq_off.tail; + cq->kring_mask = ptr + p->cq_off.ring_mask; + cq->kring_entries = ptr + p->cq_off.ring_entries; + cq->koverflow = ptr + p->cq_off.overflow; + cq->cqes = ptr + p->cq_off.cqes; + return 0; +} + +static inline int io_uring_setup(unsigned int entries, + struct io_uring_params *p) +{ + return syscall(__NR_io_uring_setup, entries, p); +} + +static inline int io_uring_enter(int fd, unsigned int to_submit, + unsigned int min_complete, + unsigned int flags, sigset_t *sig) +{ + return syscall(__NR_io_uring_enter, fd, to_submit, min_complete, + flags, sig, _NSIG / 8); +} + +static inline int io_uring_queue_init(unsigned int entries, + struct io_uring *ring, + unsigned int flags) +{ + struct io_uring_params p; + int fd, ret; + + memset(ring, 0, sizeof(*ring)); + memset(&p, 0, sizeof(p)); + p.flags = flags; + + fd = io_uring_setup(entries, &p); + if (fd < 0) + return fd; + ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq); + if (!ret) + ring->ring_fd = fd; + else + close(fd); + return ret; +} + +/* Get a sqe */ +static inline struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + + if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries) + return NULL; + return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask]; +} + +static inline int io_uring_wait_cqe(struct io_uring *ring, + struct io_uring_cqe **cqe_ptr) +{ + struct io_uring_cq *cq = &ring->cq; + const unsigned int mask = *cq->kring_mask; + unsigned int head = *cq->khead; + int ret; + + *cqe_ptr = NULL; + do { + read_barrier(); + if (head != *cq->ktail) { + *cqe_ptr = &cq->cqes[head & mask]; + break; + } + ret = io_uring_enter(ring->ring_fd, 0, 1, + IORING_ENTER_GETEVENTS, NULL); + if (ret < 0) + return -errno; + } while (1); + + return 0; +} + +static inline int io_uring_submit(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + const unsigned int mask = *sq->kring_mask; + unsigned int ktail, submitted, to_submit; + int ret; + + read_barrier(); + if (*sq->khead != *sq->ktail) { + submitted = *sq->kring_entries; + goto submit; + } + if (sq->sqe_head == sq->sqe_tail) + return 0; + + ktail = *sq->ktail; + to_submit = sq->sqe_tail - sq->sqe_head; + for (submitted = 0; submitted < to_submit; submitted++) { + read_barrier(); + sq->array[ktail++ & mask] = sq->sqe_head++ & mask; + } + if (!submitted) + return 0; + + if (*sq->ktail != ktail) { + write_barrier(); + *sq->ktail = ktail; + write_barrier(); + } +submit: + ret = io_uring_enter(ring->ring_fd, submitted, 0, + IORING_ENTER_GETEVENTS, NULL); + return ret < 0 ? -errno : ret; +} + +static inline void io_uring_queue_exit(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + + munmap(sq->sqes, *sq->kring_entries * sizeof(struct io_uring_sqe)); + munmap(sq->khead, sq->ring_sz); + close(ring->ring_fd); +} + +/* Prepare and send the SQE */ +static inline void io_uring_prep_cmd(struct io_uring_sqe *sqe, int op, + int sockfd, + int level, int optname, + const void *optval, + int optlen) +{ + memset(sqe, 0, sizeof(*sqe)); + sqe->opcode = (__u8)IORING_OP_URING_CMD; + sqe->fd = sockfd; + sqe->cmd_op = op; + + sqe->level = level; + sqe->optname = optname; + sqe->optval = (unsigned long long)optval; + sqe->optlen = optlen; +} + +static inline int io_uring_register_buffers(struct io_uring *ring, + const struct iovec *iovecs, + unsigned int nr_iovecs) +{ + int ret; + + ret = syscall(__NR_io_uring_register, ring->ring_fd, + IORING_REGISTER_BUFFERS, iovecs, nr_iovecs); + return (ret < 0) ? -errno : ret; +} + +static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags) +{ + memset(sqe, 0, sizeof(*sqe)); + sqe->opcode = (__u8)IORING_OP_SEND; + sqe->fd = sockfd; + sqe->addr = (unsigned long)buf; + sqe->len = len; + sqe->msg_flags = (__u32)flags; +} + +static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags, + unsigned int zc_flags) +{ + io_uring_prep_send(sqe, sockfd, buf, len, flags); + sqe->opcode = (__u8)IORING_OP_SEND_ZC; + sqe->ioprio = zc_flags; +} + +static inline void io_uring_cqe_seen(struct io_uring *ring) +{ + *(&ring->cq)->khead += 1; + write_barrier(); +} diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 61939a695f95..4adfe2186f39 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -99,6 +99,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma $(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto $(OUTPUT)/tcp_inq: LDLIBS += -lpthread $(OUTPUT)/bind_bhash: LDLIBS += -lpthread +$(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/ # Rules to generate bpf obj nat6to4.o CLANG ?= clang diff --git a/tools/testing/selftests/net/io_uring_zerocopy_tx.c b/tools/testing/selftests/net/io_uring_zerocopy_tx.c index 154287740172..76e604e4810e 100644 --- a/tools/testing/selftests/net/io_uring_zerocopy_tx.c +++ b/tools/testing/selftests/net/io_uring_zerocopy_tx.c @@ -36,6 +36,8 @@ #include #include +#include + #define NOTIF_TAG 0xfffffffULL #define NONZC_TAG 0 #define ZC_TAG 1 @@ -60,272 +62,6 @@ static struct sockaddr_storage cfg_dst_addr; static char payload[IP_MAXPACKET] __attribute__((aligned(4096))); -struct io_sq_ring { - unsigned *head; - unsigned *tail; - unsigned *ring_mask; - unsigned *ring_entries; - unsigned *flags; - unsigned *array; -}; - -struct io_cq_ring { - unsigned *head; - unsigned *tail; - unsigned *ring_mask; - unsigned *ring_entries; - struct io_uring_cqe *cqes; -}; - -struct io_uring_sq { - unsigned *khead; - unsigned *ktail; - unsigned *kring_mask; - unsigned *kring_entries; - unsigned *kflags; - unsigned *kdropped; - unsigned *array; - struct io_uring_sqe *sqes; - - unsigned sqe_head; - unsigned sqe_tail; - - size_t ring_sz; -}; - -struct io_uring_cq { - unsigned *khead; - unsigned *ktail; - unsigned *kring_mask; - unsigned *kring_entries; - unsigned *koverflow; - struct io_uring_cqe *cqes; - - size_t ring_sz; -}; - -struct io_uring { - struct io_uring_sq sq; - struct io_uring_cq cq; - int ring_fd; -}; - -#ifdef __alpha__ -# ifndef __NR_io_uring_setup -# define __NR_io_uring_setup 535 -# endif -# ifndef __NR_io_uring_enter -# define __NR_io_uring_enter 536 -# endif -# ifndef __NR_io_uring_register -# define __NR_io_uring_register 537 -# endif -#else /* !__alpha__ */ -# ifndef __NR_io_uring_setup -# define __NR_io_uring_setup 425 -# endif -# ifndef __NR_io_uring_enter -# define __NR_io_uring_enter 426 -# endif -# ifndef __NR_io_uring_register -# define __NR_io_uring_register 427 -# endif -#endif - -#if defined(__x86_64) || defined(__i386__) -#define read_barrier() __asm__ __volatile__("":::"memory") -#define write_barrier() __asm__ __volatile__("":::"memory") -#else - -#define read_barrier() __sync_synchronize() -#define write_barrier() __sync_synchronize() -#endif - -static int io_uring_setup(unsigned int entries, struct io_uring_params *p) -{ - return syscall(__NR_io_uring_setup, entries, p); -} - -static int io_uring_enter(int fd, unsigned int to_submit, - unsigned int min_complete, - unsigned int flags, sigset_t *sig) -{ - return syscall(__NR_io_uring_enter, fd, to_submit, min_complete, - flags, sig, _NSIG / 8); -} - -static int io_uring_register_buffers(struct io_uring *ring, - const struct iovec *iovecs, - unsigned nr_iovecs) -{ - int ret; - - ret = syscall(__NR_io_uring_register, ring->ring_fd, - IORING_REGISTER_BUFFERS, iovecs, nr_iovecs); - return (ret < 0) ? -errno : ret; -} - -static int io_uring_mmap(int fd, struct io_uring_params *p, - struct io_uring_sq *sq, struct io_uring_cq *cq) -{ - size_t size; - void *ptr; - int ret; - - sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned); - ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING); - if (ptr == MAP_FAILED) - return -errno; - sq->khead = ptr + p->sq_off.head; - sq->ktail = ptr + p->sq_off.tail; - sq->kring_mask = ptr + p->sq_off.ring_mask; - sq->kring_entries = ptr + p->sq_off.ring_entries; - sq->kflags = ptr + p->sq_off.flags; - sq->kdropped = ptr + p->sq_off.dropped; - sq->array = ptr + p->sq_off.array; - - size = p->sq_entries * sizeof(struct io_uring_sqe); - sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES); - if (sq->sqes == MAP_FAILED) { - ret = -errno; -err: - munmap(sq->khead, sq->ring_sz); - return ret; - } - - cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe); - ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING); - if (ptr == MAP_FAILED) { - ret = -errno; - munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe)); - goto err; - } - cq->khead = ptr + p->cq_off.head; - cq->ktail = ptr + p->cq_off.tail; - cq->kring_mask = ptr + p->cq_off.ring_mask; - cq->kring_entries = ptr + p->cq_off.ring_entries; - cq->koverflow = ptr + p->cq_off.overflow; - cq->cqes = ptr + p->cq_off.cqes; - return 0; -} - -static int io_uring_queue_init(unsigned entries, struct io_uring *ring, - unsigned flags) -{ - struct io_uring_params p; - int fd, ret; - - memset(ring, 0, sizeof(*ring)); - memset(&p, 0, sizeof(p)); - p.flags = flags; - - fd = io_uring_setup(entries, &p); - if (fd < 0) - return fd; - ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq); - if (!ret) - ring->ring_fd = fd; - else - close(fd); - return ret; -} - -static int io_uring_submit(struct io_uring *ring) -{ - struct io_uring_sq *sq = &ring->sq; - const unsigned mask = *sq->kring_mask; - unsigned ktail, submitted, to_submit; - int ret; - - read_barrier(); - if (*sq->khead != *sq->ktail) { - submitted = *sq->kring_entries; - goto submit; - } - if (sq->sqe_head == sq->sqe_tail) - return 0; - - ktail = *sq->ktail; - to_submit = sq->sqe_tail - sq->sqe_head; - for (submitted = 0; submitted < to_submit; submitted++) { - read_barrier(); - sq->array[ktail++ & mask] = sq->sqe_head++ & mask; - } - if (!submitted) - return 0; - - if (*sq->ktail != ktail) { - write_barrier(); - *sq->ktail = ktail; - write_barrier(); - } -submit: - ret = io_uring_enter(ring->ring_fd, submitted, 0, - IORING_ENTER_GETEVENTS, NULL); - return ret < 0 ? -errno : ret; -} - -static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd, - const void *buf, size_t len, int flags) -{ - memset(sqe, 0, sizeof(*sqe)); - sqe->opcode = (__u8) IORING_OP_SEND; - sqe->fd = sockfd; - sqe->addr = (unsigned long) buf; - sqe->len = len; - sqe->msg_flags = (__u32) flags; -} - -static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd, - const void *buf, size_t len, int flags, - unsigned zc_flags) -{ - io_uring_prep_send(sqe, sockfd, buf, len, flags); - sqe->opcode = (__u8) IORING_OP_SEND_ZC; - sqe->ioprio = zc_flags; -} - -static struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring) -{ - struct io_uring_sq *sq = &ring->sq; - - if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries) - return NULL; - return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask]; -} - -static int io_uring_wait_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr) -{ - struct io_uring_cq *cq = &ring->cq; - const unsigned mask = *cq->kring_mask; - unsigned head = *cq->khead; - int ret; - - *cqe_ptr = NULL; - do { - read_barrier(); - if (head != *cq->ktail) { - *cqe_ptr = &cq->cqes[head & mask]; - break; - } - ret = io_uring_enter(ring->ring_fd, 0, 1, - IORING_ENTER_GETEVENTS, NULL); - if (ret < 0) - return -errno; - } while (1); - - return 0; -} - -static inline void io_uring_cqe_seen(struct io_uring *ring) -{ - *(&ring->cq)->khead += 1; - write_barrier(); -} - static unsigned long gettimeofday_ms(void) { struct timeval tv; From patchwork Mon Oct 16 13:47:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13423462 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6CBBCDB474 for ; Mon, 16 Oct 2023 14:02:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233949AbjJPOCl (ORCPT ); Mon, 16 Oct 2023 10:02:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233805AbjJPOCN (ORCPT ); Mon, 16 Oct 2023 10:02:13 -0400 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2A68FA; Mon, 16 Oct 2023 07:02:07 -0700 (PDT) Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-9a58dbd5daeso738058566b.2; Mon, 16 Oct 2023 07:02:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697464926; x=1698069726; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=J+1IratzCwRElqGveWEFhyqeH/oYCw38flsead1Kukk=; b=odDokHQE8P3E56FUu42jksy1Av+ieE2a5YsXEPWdEfrvsUuECO6AEFHt/MYSpIGS/8 WhAx7yPgdZG2M7OgvKC4d3GBXgHl1ydlDnNKsJIvpN9nTgJ3C2L1aPUZYxpbGBH+pPQH 2ORS9iRAW5hIC0rHMiXvhylhDlh2pFfORu4/Q8eTpjHGncwrHUZQlQcpZG2KuYA8mU5s pF5GmTD43sEhy946ihSEmjauGL2a9r10rNUAN14JXLoQx/lxLh8UHt7XF1corsBKmerz 8i1Bm2Y7MGWAvOc80zZOt+5niJVXCPTexTTs0W3Uj+7EyTNbNsXH1cep53gsWFXhuT6H oI4Q== X-Gm-Message-State: AOJu0Yzotkr0i+58a9kykLK9Ea5Vl+Wu0Uf5J7sZ/7AhwQi21Ry8+BRL /69tHX/Ym1Zm18LArKrkgTc= X-Google-Smtp-Source: AGHT+IHIsRRPxvGmUNafNTdOb4Jjg58zUoVbjH0qpqprTYNhKgJKU32t1NuqnNL7u1YlCPcJp4oWBQ== X-Received: by 2002:a17:907:d24:b0:9bf:697b:8f44 with SMTP id gn36-20020a1709070d2400b009bf697b8f44mr5817823ejc.6.1697464924186; Mon, 16 Oct 2023 07:02:04 -0700 (PDT) Received: from localhost (fwdproxy-cln-017.fbsv.net. [2a03:2880:31ff:11::face:b00c]) by smtp.gmail.com with ESMTPSA id jz28-20020a17090775fc00b009ae57888718sm3997303ejc.207.2023.10.16.07.02.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 07:02:03 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, kuba@kernel.org, pabeni@redhat.com, martin.lau@linux.dev, krisman@suse.de Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org Subject: [PATCH v7 08/11] io_uring/cmd: return -EOPNOTSUPP if net is disabled Date: Mon, 16 Oct 2023 06:47:46 -0700 Message-Id: <20231016134750.1381153-9-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231016134750.1381153-1-leitao@debian.org> References: <20231016134750.1381153-1-leitao@debian.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Protect io_uring_cmd_sock() to be called if CONFIG_NET is not set. If network is not enabled, but io_uring is, then we want to return -EOPNOTSUPP for any possible socket operation. This is helpful because io_uring_cmd_sock() can now call functions that only exits if CONFIG_NET is enabled without having #ifdef CONFIG_NET inside the function itself. Signed-off-by: Breno Leitao --- io_uring/uring_cmd.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 4bedd633c08c..42694c07d8fd 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -214,6 +214,7 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, } EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed); +#if defined(CONFIG_NET) int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct socket *sock = cmd->file->private_data; @@ -240,3 +241,4 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) } } EXPORT_SYMBOL_GPL(io_uring_cmd_sock); +#endif From patchwork Mon Oct 16 13:47:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13423463 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF88BCDB482 for ; Mon, 16 Oct 2023 14:02:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233656AbjJPOCt (ORCPT ); Mon, 16 Oct 2023 10:02:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233848AbjJPOCh (ORCPT ); Mon, 16 Oct 2023 10:02:37 -0400 Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 328891A2; Mon, 16 Oct 2023 07:02:10 -0700 (PDT) Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-9be02fcf268so411309466b.3; Mon, 16 Oct 2023 07:02:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697464928; x=1698069728; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GoPtknS6OIlEb2FwqHC7u4ghwC0fruWw/wSEYWNl/e0=; b=qbnHvy/K/XgTwfjLl6LWrZvq/MtRRmxUfgHP0hgjCXfNTAHJfhjGq9Oz+T1dVSEY4w 4geTeq1Yca3Qmt46qalBij4pnLiwcrhjdSRnWSqt4cV76LHgBtXsDYV5uAjWi/dxoOVR RCeZOtGGw4bfrU/Rr0lPFotmesFItUuT3XhTAsrsGEFZaYaI+8JsO2nTtbajxe7hIAFk WN3RSJ8LtDL5oT7sjeTqMr2z0/S1jlJC46dfRdhWFXQff2YApa0++RlDNfNVzsypw2K+ hiYaZ6p1FagU6BYVQg7OA5Nhu/AVLgHy1yZPCcXuKt3MJmvf2EfwF2Ny8GPrIpZz7L9z vJCw== X-Gm-Message-State: AOJu0YwOUYLZ2XwittgQ+22g6dH/DRPyyiqp0a8hXbf+oPm8BRaoQGQ7 KIZpPaTUVQRNgWD/FYCsNqI= X-Google-Smtp-Source: AGHT+IF5vzM1NqZ0WpxDbsBqtuU9dNFJF+Awb8uzQopHZd+JsoW4kOcSpZ+J6oiV/juHhfAfdsoyOA== X-Received: by 2002:a17:906:32db:b0:9ba:2d67:a450 with SMTP id k27-20020a17090632db00b009ba2d67a450mr18844084ejk.40.1697464925891; Mon, 16 Oct 2023 07:02:05 -0700 (PDT) Received: from localhost (fwdproxy-cln-008.fbsv.net. [2a03:2880:31ff:8::face:b00c]) by smtp.gmail.com with ESMTPSA id jl24-20020a17090775d800b009b94c545678sm4119594ejc.153.2023.10.16.07.02.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 07:02:05 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, kuba@kernel.org, pabeni@redhat.com, martin.lau@linux.dev, krisman@suse.de Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org Subject: [PATCH v7 09/11] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Date: Mon, 16 Oct 2023 06:47:47 -0700 Message-Id: <20231016134750.1381153-10-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231016134750.1381153-1-leitao@debian.org> References: <20231016134750.1381153-1-leitao@debian.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure, where a sockptr_t is either userspace or kernel space, and handled as such. Differently from the getsockopt(2), the optlen field is not a userspace pointers. In getsockopt(2), userspace provides optlen pointer, which is overwritten by the kernel. In this implementation, userspace passes a u32, and the new value is returned in cqe->res. I.e., optlen is not a pointer. Important to say that userspace needs to keep the pointer alive until the CQE is completed. Signed-off-by: Breno Leitao --- include/uapi/linux/io_uring.h | 7 +++++++ io_uring/uring_cmd.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 92be89a871fc..9628d4f5daba 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -43,6 +43,10 @@ struct io_uring_sqe { union { __u64 addr; /* pointer to buffer or iovecs */ __u64 splice_off_in; + struct { + __u32 level; + __u32 optname; + }; }; __u32 len; /* buffer size or number of iovecs */ union { @@ -81,6 +85,7 @@ struct io_uring_sqe { union { __s32 splice_fd_in; __u32 file_index; + __u32 optlen; struct { __u16 addr_len; __u16 __pad3[1]; @@ -91,6 +96,7 @@ struct io_uring_sqe { __u64 addr3; __u64 __pad2[1]; }; + __u64 optval; /* * If the ring is initialized with IORING_SETUP_SQE128, then * this field is used for 80 bytes of arbitrary command data @@ -740,6 +746,7 @@ struct io_uring_recvmsg_out { enum { SOCKET_URING_OP_SIOCINQ = 0, SOCKET_URING_OP_SIOCOUTQ, + SOCKET_URING_OP_GETSOCKOPT, }; #ifdef __cplusplus diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 42694c07d8fd..8b045830b0d9 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -214,6 +214,32 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, } EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed); +static inline int io_uring_cmd_getsockopt(struct socket *sock, + struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + bool compat = !!(issue_flags & IO_URING_F_COMPAT); + int optlen, optname, level, err; + void __user *optval; + + level = READ_ONCE(cmd->sqe->level); + if (level != SOL_SOCKET) + return -EOPNOTSUPP; + + optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval)); + optname = READ_ONCE(cmd->sqe->optname); + optlen = READ_ONCE(cmd->sqe->optlen); + + err = do_sock_getsockopt(sock, compat, level, optname, + USER_SOCKPTR(optval), + KERNEL_SOCKPTR(&optlen)); + if (err) + return err; + + /* On success, return optlen */ + return optlen; +} + #if defined(CONFIG_NET) int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) { @@ -236,6 +262,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) if (ret) return ret; return arg; + case SOCKET_URING_OP_GETSOCKOPT: + return io_uring_cmd_getsockopt(sock, cmd, issue_flags); default: return -EOPNOTSUPP; } From patchwork Mon Oct 16 13:47:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13423464 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFB21CDB465 for ; Mon, 16 Oct 2023 14:02:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233750AbjJPOC5 (ORCPT ); Mon, 16 Oct 2023 10:02:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58064 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233741AbjJPOCh (ORCPT ); Mon, 16 Oct 2023 10:02:37 -0400 Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE380107; Mon, 16 Oct 2023 07:02:11 -0700 (PDT) Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-9b2cee40de8so933528266b.1; Mon, 16 Oct 2023 07:02:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697464930; x=1698069730; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w0wd+OTWCVvK4oFazQDiqozW0gDG2Ec9q983374w//U=; b=K/mdLlO8j/3nyPlFgYDaUFfimGpKxD3k/zbnsBYrSnE5ZiwvJnypHBTP1kbVN9T9Lt zlJCAxIyA4kAVZJIM0B0D0/d9cH+0XrzEv9E1M+Nhq+0vdaEP7uYVnkEXGfir0qiYDHh +do4pqeWD9gZ8uh82n7RP6JT7d5J71nGLOAl5GVg+gBoh8rYcXaMOz1OqdZmjX3zejyM /vk4huSazxFpTXiRHQjVApKMyLmHXTm4X++b4TFPdZLQuR0IkbevjaTAUN0kaDoH6+wI IYfr9SCJrjyqHd67JDEmkghVcA9TuY4SnlVIIaACdoS3rTJyuPM4lS8kxBeDMB2tSdlE T3iw== X-Gm-Message-State: AOJu0Yy6nz5S9HYPY0UUmXNYRc1u3T1hqnhR9edY28fcKfR2hTRLnfA0 NJV0o1OdktviwYOgCe3oamHyy4DPy2U= X-Google-Smtp-Source: AGHT+IE/t+V78wH2E5WXRhfhOr+WjqaQXREnO586qgNGleN9dM85Q3gVRiWW1xLWltrg1UooQqY//Q== X-Received: by 2002:a17:907:318e:b0:9a5:7d34:e68a with SMTP id xe14-20020a170907318e00b009a57d34e68amr5784529ejb.28.1697464929941; Mon, 16 Oct 2023 07:02:09 -0700 (PDT) Received: from localhost (fwdproxy-cln-008.fbsv.net. [2a03:2880:31ff:8::face:b00c]) by smtp.gmail.com with ESMTPSA id c16-20020a170906529000b009a1a653770bsm4101720ejm.87.2023.10.16.07.02.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 07:02:09 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, kuba@kernel.org, pabeni@redhat.com, martin.lau@linux.dev, krisman@suse.de Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org Subject: [PATCH v7 10/11] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Date: Mon, 16 Oct 2023 06:47:48 -0700 Message-Id: <20231016134750.1381153-11-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231016134750.1381153-1-leitao@debian.org> References: <20231016134750.1381153-1-leitao@debian.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add initial support for SOCKET_URING_OP_SETSOCKOPT. This new command is similar to setsockopt. This implementation leverages the function do_sock_setsockopt(), which is shared with the setsockopt() system call path. Important to say that userspace needs to keep the pointer's memory alive until the operation is completed. I.e, the memory could not be deallocated before the CQE is returned to userspace. Signed-off-by: Breno Leitao --- include/uapi/linux/io_uring.h | 1 + io_uring/uring_cmd.c | 21 +++++++++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 9628d4f5daba..f1c16f817742 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -747,6 +747,7 @@ enum { SOCKET_URING_OP_SIOCINQ = 0, SOCKET_URING_OP_SIOCOUTQ, SOCKET_URING_OP_GETSOCKOPT, + SOCKET_URING_OP_SETSOCKOPT, }; #ifdef __cplusplus diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 8b045830b0d9..acbc2924ecd2 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -240,6 +240,25 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock, return optlen; } +static inline int io_uring_cmd_setsockopt(struct socket *sock, + struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + bool compat = !!(issue_flags & IO_URING_F_COMPAT); + int optname, optlen, level; + void __user *optval; + sockptr_t optval_s; + + optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval)); + optname = READ_ONCE(cmd->sqe->optname); + optlen = READ_ONCE(cmd->sqe->optlen); + level = READ_ONCE(cmd->sqe->level); + optval_s = USER_SOCKPTR(optval); + + return do_sock_setsockopt(sock, compat, level, optname, optval_s, + optlen); +} + #if defined(CONFIG_NET) int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) { @@ -264,6 +283,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) return arg; case SOCKET_URING_OP_GETSOCKOPT: return io_uring_cmd_getsockopt(sock, cmd, issue_flags); + case SOCKET_URING_OP_SETSOCKOPT: + return io_uring_cmd_setsockopt(sock, cmd, issue_flags); default: return -EOPNOTSUPP; } From patchwork Mon Oct 16 13:47:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13423465 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5858CDB474 for ; Mon, 16 Oct 2023 14:03:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233781AbjJPODF (ORCPT ); Mon, 16 Oct 2023 10:03:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36390 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233897AbjJPOCk (ORCPT ); Mon, 16 Oct 2023 10:02:40 -0400 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 204E011D; Mon, 16 Oct 2023 07:02:15 -0700 (PDT) Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-53d9b94731aso8141893a12.1; Mon, 16 Oct 2023 07:02:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697464933; x=1698069733; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3ygDojQGLotNc+hKSW16iP2jFFy8A16V/NuLMLtN4+4=; b=KC2cVU1m8Si9d7UT6PbhBe+NFl7JbeJhzsu0K7jM0sfOXmuRfH1HBIgK1LYi/fr+G5 INC7uEgZi35p+x9v38YltEngFI/APOzFSyz09hI4Vnel1D/igWrlC8BV7AJgTRskwNPl 7FVY/j8XaQchhWBSf+taTFc4WdzEu2MPHTHaLxtaXWsdeOEx3P7NYGdnL7pqF836Z31/ u8u7gp0ZLsI2k7SFZAP+b8T/HAXZ7EijPQjEUr9OWez/J3KO+1dCXikF5plyMKVzzJgh kiEljif9YAMByjGXXQJdmfZteIyGsRUMq9PwJLdRL5dY/dxCBbHnIAb4pPtfGBk7kz8J 0Zvw== X-Gm-Message-State: AOJu0YyX0Tj1Ci6cbACy8B8QIPngD6TDZSCaqbW0bLFaXIWExuJBGO0Q cDBrPHYIeiMSd8cVzPDfyP4= X-Google-Smtp-Source: AGHT+IGrMyp6sQpD3D1xspvVk1aGA875YUeRNnpdlYvQMt6SIfSu6fxlRIZmFkgfG3t8TSTL3iSDMg== X-Received: by 2002:a05:6402:3985:b0:53d:bc68:633a with SMTP id fk5-20020a056402398500b0053dbc68633amr16373880edb.5.1697464933493; Mon, 16 Oct 2023 07:02:13 -0700 (PDT) Received: from localhost (fwdproxy-cln-010.fbsv.net. [2a03:2880:31ff:a::face:b00c]) by smtp.gmail.com with ESMTPSA id n30-20020a50935e000000b0053e775e428csm3730388eda.83.2023.10.16.07.02.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 07:02:13 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, kuba@kernel.org, pabeni@redhat.com, martin.lau@linux.dev, krisman@suse.de, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Song Liu , Yonghong Song , John Fastabend , KP Singh , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, =?utf-8?q?Daniel_M=C3=BCller?= , linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Subject: [PATCH v7 11/11] selftests/bpf/sockopt: Add io_uring support Date: Mon, 16 Oct 2023 06:47:49 -0700 Message-Id: <20231016134750.1381153-12-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231016134750.1381153-1-leitao@debian.org> References: <20231016134750.1381153-1-leitao@debian.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Expand the sockopt test to use also check for io_uring {g,s}etsockopt commands operations. This patch starts by marking each test if they support io_uring support or not. Right now, io_uring cmd getsockopt() has a limitation of only accepting level == SOL_SOCKET, otherwise it returns -EOPNOTSUPP. Since there aren't any test exercising getsockopt(level == SOL_SOCKET), this patch changes two tests to use level == SOL_SOCKET, they are "getsockopt: support smaller ctx->optlen" and "getsockopt: read ctx->optlen". There is no limitation for the setsockopt() part. Later, each test runs using regular {g,s}etsockopt systemcalls, and, if liburing is supported, execute the same test (again), but calling liburing {g,s}setsockopt commands. This patch also changes the level of two tests to use SOL_SOCKET for the following two tests. This is going to help to exercise the io_uring subsystem: * getsockopt: read ctx->optlen * getsockopt: support smaller ctx->optlen Signed-off-by: Breno Leitao Acked-by: Martin KaFai Lau --- .../selftests/bpf/prog_tests/sockopt.c | 113 +++++++++++++++++- 1 file changed, 107 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt.c b/tools/testing/selftests/bpf/prog_tests/sockopt.c index 9e6a5e3ed4de..5a4491d4edfe 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockopt.c +++ b/tools/testing/selftests/bpf/prog_tests/sockopt.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include "cgroup_helpers.h" static char bpf_log_buf[4096]; @@ -38,6 +39,7 @@ static struct sockopt_test { socklen_t get_optlen_ret; enum sockopt_test_error error; + bool io_uring_support; } tests[] = { /* ==================== getsockopt ==================== */ @@ -251,7 +253,9 @@ static struct sockopt_test { .attach_type = BPF_CGROUP_GETSOCKOPT, .expected_attach_type = BPF_CGROUP_GETSOCKOPT, + .get_level = SOL_SOCKET, .get_optlen = 64, + .io_uring_support = true, }, { .descr = "getsockopt: deny bigger ctx->optlen", @@ -276,6 +280,7 @@ static struct sockopt_test { .get_optlen = 64, .error = EFAULT_GETSOCKOPT, + .io_uring_support = true, }, { .descr = "getsockopt: ignore >PAGE_SIZE optlen", @@ -318,6 +323,7 @@ static struct sockopt_test { .get_optval = {}, /* the changes are ignored */ .get_optlen = PAGE_SIZE + 1, .error = EOPNOTSUPP_GETSOCKOPT, + .io_uring_support = true, }, { .descr = "getsockopt: support smaller ctx->optlen", @@ -337,8 +343,10 @@ static struct sockopt_test { .attach_type = BPF_CGROUP_GETSOCKOPT, .expected_attach_type = BPF_CGROUP_GETSOCKOPT, + .get_level = SOL_SOCKET, .get_optlen = 64, .get_optlen_ret = 32, + .io_uring_support = true, }, { .descr = "getsockopt: deny writing to ctx->optval", @@ -518,6 +526,7 @@ static struct sockopt_test { .set_level = 123, .set_optlen = 1, + .io_uring_support = true, }, { .descr = "setsockopt: allow changing ctx->level", @@ -572,6 +581,7 @@ static struct sockopt_test { .set_optname = 123, .set_optlen = 1, + .io_uring_support = true, }, { .descr = "setsockopt: allow changing ctx->optname", @@ -624,6 +634,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_SETSOCKOPT, .set_optlen = 64, + .io_uring_support = true, }, { .descr = "setsockopt: ctx->optlen == -1 is ok", @@ -640,6 +651,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_SETSOCKOPT, .set_optlen = 64, + .io_uring_support = true, }, { .descr = "setsockopt: deny ctx->optlen < 0 (except -1)", @@ -658,6 +670,7 @@ static struct sockopt_test { .set_optlen = 4, .error = EFAULT_SETSOCKOPT, + .io_uring_support = true, }, { .descr = "setsockopt: deny ctx->optlen > input optlen", @@ -675,6 +688,7 @@ static struct sockopt_test { .set_optlen = 64, .error = EFAULT_SETSOCKOPT, + .io_uring_support = true, }, { .descr = "setsockopt: ignore >PAGE_SIZE optlen", @@ -940,7 +954,89 @@ static int load_prog(const struct bpf_insn *insns, return fd; } -static int run_test(int cgroup_fd, struct sockopt_test *test) +/* Core function that handles io_uring ring initialization, + * sending SQE with sockopt command and waiting for the CQE. + */ +static int uring_sockopt(int op, int fd, int level, int optname, + const void *optval, socklen_t optlen) +{ + struct io_uring_cqe *cqe; + struct io_uring_sqe *sqe; + struct io_uring ring; + int err; + + err = io_uring_queue_init(1, &ring, 0); + if (!ASSERT_OK(err, "io_uring initialization")) + return err; + + sqe = io_uring_get_sqe(&ring); + if (!ASSERT_NEQ(sqe, NULL, "Get an SQE")) { + err = -1; + goto fail; + } + + io_uring_prep_cmd(sqe, op, fd, level, optname, optval, optlen); + + err = io_uring_submit(&ring); + if (!ASSERT_EQ(err, 1, "Submit SQE")) + goto fail; + + err = io_uring_wait_cqe(&ring, &cqe); + if (!ASSERT_OK(err, "Wait for CQE")) + goto fail; + + err = cqe->res; + +fail: + io_uring_queue_exit(&ring); + + return err; +} + +static int uring_setsockopt(int fd, int level, int optname, const void *optval, + socklen_t optlen) +{ + return uring_sockopt(SOCKET_URING_OP_SETSOCKOPT, fd, level, optname, + optval, optlen); +} + +static int uring_getsockopt(int fd, int level, int optname, void *optval, + socklen_t *optlen) +{ + int ret = uring_sockopt(SOCKET_URING_OP_GETSOCKOPT, fd, level, optname, + optval, *optlen); + if (ret < 0) + return ret; + + /* Populate optlen back to be compatible with systemcall interface, + * and simplify the test. + */ + *optlen = ret; + + return 0; +} + +/* Execute the setsocktopt operation */ +static int call_setsockopt(bool use_io_uring, int fd, int level, int optname, + const void *optval, socklen_t optlen) +{ + if (use_io_uring) + return uring_setsockopt(fd, level, optname, optval, optlen); + + return setsockopt(fd, level, optname, optval, optlen); +} + +/* Execute the getsocktopt operation */ +static int call_getsockopt(bool use_io_uring, int fd, int level, int optname, + void *optval, socklen_t *optlen) +{ + if (use_io_uring) + return uring_getsockopt(fd, level, optname, optval, optlen); + + return getsockopt(fd, level, optname, optval, optlen); +} + +static int run_test(int cgroup_fd, struct sockopt_test *test, bool use_io_uring) { int sock_fd, err, prog_fd; void *optval = NULL; @@ -980,8 +1076,9 @@ static int run_test(int cgroup_fd, struct sockopt_test *test) test->set_optlen = num_pages * sysconf(_SC_PAGESIZE) + remainder; } - err = setsockopt(sock_fd, test->set_level, test->set_optname, - test->set_optval, test->set_optlen); + err = call_setsockopt(use_io_uring, sock_fd, test->set_level, + test->set_optname, test->set_optval, + test->set_optlen); if (err) { if (errno == EPERM && test->error == EPERM_SETSOCKOPT) goto close_sock_fd; @@ -1008,8 +1105,8 @@ static int run_test(int cgroup_fd, struct sockopt_test *test) socklen_t expected_get_optlen = test->get_optlen_ret ?: test->get_optlen; - err = getsockopt(sock_fd, test->get_level, test->get_optname, - optval, &optlen); + err = call_getsockopt(use_io_uring, sock_fd, test->get_level, + test->get_optname, optval, &optlen); if (err) { if (errno == EOPNOTSUPP && test->error == EOPNOTSUPP_GETSOCKOPT) goto free_optval; @@ -1063,7 +1160,11 @@ void test_sockopt(void) if (!test__start_subtest(tests[i].descr)) continue; - ASSERT_OK(run_test(cgroup_fd, &tests[i]), tests[i].descr); + ASSERT_OK(run_test(cgroup_fd, &tests[i], false), + tests[i].descr); + if (tests[i].io_uring_support) + ASSERT_OK(run_test(cgroup_fd, &tests[i], true), + tests[i].descr); } close(cgroup_fd);