From patchwork Thu Aug 17 14:55:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13356647 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5AACA14F7A; Thu, 17 Aug 2023 14:56:28 +0000 (UTC) Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F75F358E; Thu, 17 Aug 2023 07:56:10 -0700 (PDT) Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-522bd411679so10206592a12.0; Thu, 17 Aug 2023 07:56:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692284169; x=1692888969; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WwuNwSQO4JjwcfqL+k1N+kwu+e1Kuo7mf/uUCHCBGDE=; b=CXvhM0Vhbi6roe+5kIgCd6luWv9/ZadHYeG/ZJMBj8+9SrWkSBZYiVH/bDrmrMrUBW Z9j6Y8AZIPtJ61SFdx4Je9sn+P9ZN5fbpgFxQU8QToa7DtXPlxcGjHmMLPHKXEvY7dHE 7BZ9wwJEgFbaDvwVcJcpvXm9zhGoTbXkK7XDGCHZiAoatcOx1F7cYHjbP3a4ukXFqrZM 2MYkGxhAWXSHAdkLmjBopQvYZ5cpBhgJy1wUHXGOmwdeEq4OnesXkl9nKJj6uURvfsiJ rVPqX3bdkpVl0QkEggZ3yXNwijzyoSlYcLtrp1HSj++2GmNavEnsPGq2MY+G5sPpcoub iEgg== X-Gm-Message-State: AOJu0YxnDqgjyYD89mALM4g0RvV7OMLkeOkHDeLwPHnvF7oMGMyqUBcb InrLD5Wm7XXrh4X7/CfZXI0= X-Google-Smtp-Source: AGHT+IH69eq5q3hg7Pjx6qNXWbwLOY3Lw+KGLPZEbAz/4Y9szF9pwR5XXir5f760qH5Z8I8liHK8Xw== X-Received: by 2002:aa7:d6c2:0:b0:528:89b0:ec50 with SMTP id x2-20020aa7d6c2000000b0052889b0ec50mr465165edr.21.1692284168871; Thu, 17 Aug 2023 07:56:08 -0700 (PDT) Received: from localhost (fwdproxy-cln-017.fbsv.net. [2a03:2880:31ff:11::face:b00c]) by smtp.gmail.com with ESMTPSA id w4-20020a50fa84000000b00523463540e0sm9825049edr.85.2023.08.17.07.56.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Aug 2023 07:56:08 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Song Liu , Yonghong Song , KP Singh , Hao Luo , Jiri Olsa , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, krisman@suse.de Subject: [PATCH v3 1/9] bpf: Leverage sockptr_t in BPF getsockopt hook Date: Thu, 17 Aug 2023 07:55:46 -0700 Message-Id: <20230817145554.892543-2-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230817145554.892543-1-leitao@debian.org> References: <20230817145554.892543-1-leitao@debian.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Leverage sockptr_t structure to have an argument that is either an userspace pointer, or, a kernel pointer. This makes this function flexible, so, we can mix and match user and kernel space pointers. The main motivation for this change is to use it in the io_uring {g,s}etsockopt(), which will use a userspace pointer for *optval, but, a kernel value for optlen. Signed-off-by: Breno Leitao --- include/linux/bpf-cgroup.h | 5 +++-- kernel/bpf/cgroup.c | 20 +++++++++++--------- net/socket.c | 5 +++-- 3 files changed, 17 insertions(+), 13 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 57e9e109257e..d16cb99fd4f1 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -139,9 +139,10 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head, int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level, int *optname, char __user *optval, int *optlen, char **kernel_optval); + int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, - int optname, char __user *optval, - int __user *optlen, int max_optlen, + int optname, sockptr_t optval, + sockptr_t optlen, int max_optlen, int retval); int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level, diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 5b2741aa0d9b..ebc8c58f7e46 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1875,8 +1875,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, } int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, - int optname, char __user *optval, - int __user *optlen, int max_optlen, + int optname, sockptr_t optval, + sockptr_t optlen, int max_optlen, int retval) { struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); @@ -1903,8 +1903,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, * one that kernel returned as well to let * BPF programs inspect the value. */ - - if (get_user(ctx.optlen, optlen)) { + if (copy_from_sockptr(&ctx.optlen, optlen, + sizeof(ctx.optlen))) { ret = -EFAULT; goto out; } @@ -1915,8 +1915,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, } orig_optlen = ctx.optlen; - if (copy_from_user(ctx.optval, optval, - min(ctx.optlen, max_optlen)) != 0) { + if (copy_from_sockptr(ctx.optval, optval, + min(ctx.optlen, max_optlen))) { ret = -EFAULT; goto out; } @@ -1930,7 +1930,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, if (ret < 0) goto out; - if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) { + if (!sockptr_is_null(optval) && + (ctx.optlen > max_optlen || ctx.optlen < 0)) { if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) { pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n", ctx.optlen, max_optlen); @@ -1942,11 +1943,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, } if (ctx.optlen != 0) { - if (optval && copy_to_user(optval, ctx.optval, ctx.optlen)) { + if (!sockptr_is_null(optval) && + copy_to_sockptr(optval, ctx.optval, ctx.optlen)) { ret = -EFAULT; goto out; } - if (put_user(ctx.optlen, optlen)) { + if (copy_to_sockptr(optlen, &ctx.optlen, sizeof(ctx.optlen))) { ret = -EFAULT; goto out; } diff --git a/net/socket.c b/net/socket.c index 1dc23f5298ba..33ea5eb91ade 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2311,8 +2311,9 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval, if (!in_compat_syscall()) err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname, - optval, optlen, max_optlen, - err); + USER_SOCKPTR(optval), + USER_SOCKPTR(optlen), + max_optlen, err); out_put: fput_light(sock->file, fput_needed); return err; From patchwork Thu Aug 17 14:55:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13356648 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1B0B14F7A; Thu, 17 Aug 2023 14:56:33 +0000 (UTC) Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com [209.85.208.174]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FD252D7E; Thu, 17 Aug 2023 07:56:12 -0700 (PDT) Received: by mail-lj1-f174.google.com with SMTP id 38308e7fff4ca-2b9d07a8d84so124572461fa.3; Thu, 17 Aug 2023 07:56:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692284170; x=1692888970; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sFl0K4oWgy7f8a4WXCtEXv199RlbPY7g5Pt61qAlA0w=; b=clnhMBMHs4F1Q/7DyAg+V97bO7bjVayc2QUyCve9zkuryEbXNssfan6aerpkiuFrjM n6/jao9OhExrIJUQI6Nh6epANWgmQoo0vHjCbs8bzSI3v0MOSsC+SAZ7Vy5hBbzz5sU1 3TE3i31AoC5xx54YbX6SYRz1m6lrGZpwJnpmoRJilrU5aNKX6dx8LxVVG4thj/a1bXN9 nDvuAB5Zy0uQ1bn1Zm1UiwrjvnM73n+VAfiut2C/64LnC0KfCMISGEib2pqdVDa3z2SN jBK1/842KAmx+E3jAH1k2lj3y5rJvIETS9+3nCHN0xmbZ/srLyNZ5LgwCJqszc9mfwBF 29lA== X-Gm-Message-State: AOJu0Yzu9qkcNguUom+mQWdjUlnPAVga5CFLB+xvI4hzFsd5kpFtOmly iP1HY3pex2yLquWuAn71Kwc= X-Google-Smtp-Source: AGHT+IEStrqnxtqy79a9E7YYqT+ceFg5PifcaufTfDZmHhEIH8Vsz9l4pM67f5gocI4p4ngLu1Wa7g== X-Received: by 2002:a2e:7a07:0:b0:2b6:cad9:75ba with SMTP id v7-20020a2e7a07000000b002b6cad975bamr4402075ljc.29.1692284170410; Thu, 17 Aug 2023 07:56:10 -0700 (PDT) Received: from localhost (fwdproxy-cln-017.fbsv.net. [2a03:2880:31ff:11::face:b00c]) by smtp.gmail.com with ESMTPSA id n13-20020a170906b30d00b0099b4d86fbccsm10370998ejz.141.2023.08.17.07.56.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Aug 2023 07:56:10 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, Alexei Starovoitov , Daniel Borkmann , John Fastabend , Andrii Nakryiko , Song Liu , Yonghong Song , KP Singh , Hao Luo , Jiri Olsa , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, krisman@suse.de Subject: [PATCH v3 2/9] bpf: Leverage sockptr_t in BPF setsockopt hook Date: Thu, 17 Aug 2023 07:55:47 -0700 Message-Id: <20230817145554.892543-3-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230817145554.892543-1-leitao@debian.org> References: <20230817145554.892543-1-leitao@debian.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Change BPF setsockopt hook (__cgroup_bpf_run_filter_setsockopt()) to use sockptr instead of user pointers. This brings flexibility to the function, since it could be called with userspace or kernel pointers. This change will allow the creation of a core sock_setsockopt, called do_sock_setsockopt(), which will be called from both the system call path and by io_uring command. This also aligns with the getsockopt() counterpart, which is now using sockptr_t types. Signed-off-by: Breno Leitao --- include/linux/bpf-cgroup.h | 2 +- kernel/bpf/cgroup.c | 5 +++-- net/socket.c | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index d16cb99fd4f1..5e3419eb267a 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -137,7 +137,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head, enum cgroup_bpf_attach_type atype); int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level, - int *optname, char __user *optval, + int *optname, sockptr_t optval, int *optlen, char **kernel_optval); int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level, diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index ebc8c58f7e46..f0dedd4f7f2e 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1785,7 +1785,7 @@ static bool sockopt_buf_allocated(struct bpf_sockopt_kern *ctx, } int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, - int *optname, char __user *optval, + int *optname, sockptr_t optval, int *optlen, char **kernel_optval) { struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); @@ -1808,7 +1808,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level, ctx.optlen = *optlen; - if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) { + if (copy_from_sockptr(ctx.optval, optval, + min(*optlen, max_optlen))) { ret = -EFAULT; goto out; } diff --git a/net/socket.c b/net/socket.c index 33ea5eb91ade..2c3ef8862b4f 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2246,7 +2246,7 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, if (!in_compat_syscall()) err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname, - user_optval, &optlen, + optval, &optlen, &kernel_optval); if (err < 0) goto out_put; From patchwork Thu Aug 17 14:55:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13356649 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E49C915496; Thu, 17 Aug 2023 14:56:35 +0000 (UTC) Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C332F30DE; Thu, 17 Aug 2023 07:56:13 -0700 (PDT) Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-9923833737eso1035599966b.3; Thu, 17 Aug 2023 07:56:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692284172; x=1692888972; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=07ENq1IVrS75f2tHkaWZQfDYx4uyK/eennegrV/Q9BY=; b=eUElgFImSn2iZBFv9ALM7bP5yoUrNa6bB88hZ0GlXZjbuq51DRTXL2D1m+yhpd1vzI 8NDjwECRXcnxDQtYnUKd9BLQtzNPjWsQ2kI6W7OG7hwB+P75vwICN3s4bAKqFp3VHxM4 cVg4T/cibCGHCzdiq9x6nSO6aMEfpiE4ZiZlys/BtQkb9pYolebI5FSvqA9HL5C1UFeZ 9BENy2pAqp/rfP3JQCuJJF64nhrO+tsu7xUFbcxFxPJA69M7irvm0JoIrTBWnLE7KDZJ QIaTsDSkfkvgC9mSSc1vnI7J2z5GYo8YtEk76+rpkwsPP+iA2qUgY5dqyoj4oy3XOo45 oGoQ== X-Gm-Message-State: AOJu0YzPnqecP8IqrS+HeiSPsXAD4E8Gx+kWxv5xCuY+WCJBX1IGhDrf DsOcs0vZNBqG9nqDAFQYd4o= X-Google-Smtp-Source: AGHT+IGHMFukyZW7NjVpxCzf5qHvR+CXI0/jTXaxVfGokF53UyD0G+B7x3n49Y4x/Di3eesbN33oYA== X-Received: by 2002:a17:906:d18c:b0:98d:4000:1bf9 with SMTP id c12-20020a170906d18c00b0098d40001bf9mr4253849ejz.65.1692284171876; Thu, 17 Aug 2023 07:56:11 -0700 (PDT) Received: from localhost (fwdproxy-cln-000.fbsv.net. [2a03:2880:31ff::face:b00c]) by smtp.gmail.com with ESMTPSA id qx27-20020a170906fcdb00b0098e0a937a6asm10222970ejb.69.2023.08.17.07.56.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Aug 2023 07:56:11 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, krisman@suse.de Subject: [PATCH v3 3/9] net/socket: Break down __sys_setsockopt Date: Thu, 17 Aug 2023 07:55:48 -0700 Message-Id: <20230817145554.892543-4-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230817145554.892543-1-leitao@debian.org> References: <20230817145554.892543-1-leitao@debian.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org Split __sys_setsockopt() into two functions by removing the core logic into a sub-function (do_sock_setsockopt()). This will avoid code duplication when doing the same operation in other callers, for instance. do_sock_setsockopt() will be called by io_uring setsockopt() command operation in the following patch. Signed-off-by: Breno Leitao Reviewed-by: Willem de Bruijn --- include/net/sock.h | 2 ++ net/socket.c | 39 +++++++++++++++++++++++++-------------- 2 files changed, 27 insertions(+), 14 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 2eb916d1ff64..2a0324275347 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1853,6 +1853,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval, unsigned int optlen); int sock_setsockopt(struct socket *sock, int level, int op, sockptr_t optval, unsigned int optlen); +int do_sock_setsockopt(struct socket *sock, bool compat, int level, + int optname, sockptr_t optval, int optlen); int sk_getsockopt(struct sock *sk, int level, int optname, sockptr_t optval, sockptr_t optlen); diff --git a/net/socket.c b/net/socket.c index 2c3ef8862b4f..b5e4398a6b4d 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2221,30 +2221,20 @@ static bool sock_use_custom_sol_socket(const struct socket *sock) return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags); } -/* - * Set a socket option. Because we don't know the option lengths we have - * to pass the user mode parameter for the protocols to sort out. - */ -int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, - int optlen) +int do_sock_setsockopt(struct socket *sock, bool compat, int level, + int optname, sockptr_t optval, int optlen) { - sockptr_t optval = USER_SOCKPTR(user_optval); char *kernel_optval = NULL; - int err, fput_needed; - struct socket *sock; + int err; if (optlen < 0) return -EINVAL; - sock = sockfd_lookup_light(fd, &err, &fput_needed); - if (!sock) - return err; - err = security_socket_setsockopt(sock, level, optname); if (err) goto out_put; - if (!in_compat_syscall()) + if (!compat) err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname, optval, &optlen, &kernel_optval); @@ -2266,6 +2256,27 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, optlen); kfree(kernel_optval); out_put: + return err; +} +EXPORT_SYMBOL(do_sock_setsockopt); + +/* Set a socket option. Because we don't know the option lengths we have + * to pass the user mode parameter for the protocols to sort out. + */ +int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval, + int optlen) +{ + sockptr_t optval = USER_SOCKPTR(user_optval); + bool compat = in_compat_syscall(); + int err, fput_needed; + struct socket *sock; + + sock = sockfd_lookup_light(fd, &err, &fput_needed); + if (!sock) + return err; + + err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen); + fput_light(sock->file, fput_needed); return err; } From patchwork Thu Aug 17 14:55:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13356650 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E99115496; Thu, 17 Aug 2023 14:56:39 +0000 (UTC) Received: from mail-lf1-f41.google.com (mail-lf1-f41.google.com [209.85.167.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8735D3A97; Thu, 17 Aug 2023 07:56:15 -0700 (PDT) Received: by mail-lf1-f41.google.com with SMTP id 2adb3069b0e04-4fe28f92d8eso12187734e87.1; Thu, 17 Aug 2023 07:56:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692284173; x=1692888973; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BZB7zodArDhWbuVqBQO5jZlDc1cl4dj72Y3+CgGSOBM=; b=B3TCKNGJ/H8/9j1l0709nsRCR57AsXAfsqQvDVHfmHrIq7Gl04KZnOl0XHveK/q3x9 XNZzWUNHnkWCC4ndHBMW8J5hU3Vi0+U2G24OjFdUPj/VfGIJJwIW33q8nNBTHX43SlJS DnACohA2NKT/TeJNavr4XyoAOWU/loMps/1EZbbkaddeVyBve1qKjpb3oibq0W7dfS6b cFLDIt4Ilqxmrew0LEO4BWINN7kZfoVXN6u6DSn7DrK/+3Oq1bdj4sEg1QT9X+eXt9OK 7yagysII9bUReSqzIFReAh+rJWKSwP3pCISvSQ176+rhQarGUCppPMgKEfHik26g3Z3O cMWQ== X-Gm-Message-State: AOJu0Yw1DCwFIb2nO4eGf5KXMg0ECWrhWC5ouu4eucCIoHAuMrhHTWo+ q2YuBNTQhSIluNqpyQZKWN4= X-Google-Smtp-Source: AGHT+IHkFtWdt13rVSZyph83D2DvyT4aJpO+ylqoJpkWBio3b0LJQDIihgMYn0enN+UH1BA82JADqg== X-Received: by 2002:a05:6512:3b8d:b0:4ff:8c9e:eb0d with SMTP id g13-20020a0565123b8d00b004ff8c9eeb0dmr5941971lfv.0.1692284173282; Thu, 17 Aug 2023 07:56:13 -0700 (PDT) Received: from localhost (fwdproxy-cln-119.fbsv.net. [2a03:2880:31ff:77::face:b00c]) by smtp.gmail.com with ESMTPSA id t8-20020a056402020800b005236b47116asm9911547edv.70.2023.08.17.07.56.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Aug 2023 07:56:12 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, krisman@suse.de Subject: [PATCH v3 4/9] io_uring/cmd: Pass compat mode in issue_flags Date: Thu, 17 Aug 2023 07:55:49 -0700 Message-Id: <20230817145554.892543-5-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230817145554.892543-1-leitao@debian.org> References: <20230817145554.892543-1-leitao@debian.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Create a new flag to track if the operation is running compat mode. This basically check the context->compat and pass it to the issue_flags, so, it could be queried later in the callbacks. Signed-off-by: Breno Leitao --- include/linux/io_uring.h | 1 + io_uring/uring_cmd.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 106cdc55ff3b..bc53b35966ed 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -20,6 +20,7 @@ enum io_uring_cmd_flags { IO_URING_F_SQE128 = (1 << 8), IO_URING_F_CQE32 = (1 << 9), IO_URING_F_IOPOLL = (1 << 10), + IO_URING_F_COMPAT = (1 << 11), }; struct io_uring_cmd { diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 8e7a03c1b20e..5f32083bd0a5 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -129,6 +129,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) issue_flags |= IO_URING_F_SQE128; if (ctx->flags & IORING_SETUP_CQE32) issue_flags |= IO_URING_F_CQE32; + if (ctx->compat) + issue_flags |= IO_URING_F_COMPAT; if (ctx->flags & IORING_SETUP_IOPOLL) { if (!file->f_op->uring_cmd_iopoll) return -EOPNOTSUPP; From patchwork Thu Aug 17 14:55:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13356653 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19A2014F68; Thu, 17 Aug 2023 14:56:45 +0000 (UTC) Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC93410E; Thu, 17 Aug 2023 07:56:23 -0700 (PDT) Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-99cdb0fd093so1052701666b.1; Thu, 17 Aug 2023 07:56:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692284182; x=1692888982; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RGNwDE5n1Pywpt5f1ql5HbOxSxrsPq6WY4GFBT0GDAY=; b=VxGcvaO2Z6yeKOyj0+QwE8uqkeuWrB7xXE2hag6JGxOCmXhRYEcr3Y4nRf9tBhIbzB EI0se3M39h9hrdqYryVGoGfiWNH66BigutvNKUwzkn79FYMM5WXg6VdLGIsmSYLDrCBH 9/mVEcr5645gT9gNx7PLvoc/w3WNCHaUKdg8cCvlm5mDuYs32qsb7AVh3wo2kfUa9DRx a06CyWK2K350cgD0Q+XtuiwxCoLVHZjjxRwneBgZBn8I6XXkDPW3ReRelkgmlhMxSpjg /tcMdcF9b0W3hvL9Y0OE4x4SWwxIhQ0sJEyqgUHwV7TJSvWc2fyZI8gJMvpBcr942JlV v4vg== X-Gm-Message-State: AOJu0YyBbuiYqnpCvDYctwNN3kQRr4VpDvRFaV57dfapL+QQOnSoU4TI 6g1eT7cP0kMj8jqcfOZ6u4Y= X-Google-Smtp-Source: AGHT+IHgnxNCmJjExpKazyCtK8CJurvrlMn8eTc1TGVTX048rGDlw4O5A2oPGZYBo+GPvbkNmHpwcg== X-Received: by 2002:a17:906:5dd9:b0:99c:e38d:e47f with SMTP id p25-20020a1709065dd900b0099ce38de47fmr3752671ejv.33.1692284182147; Thu, 17 Aug 2023 07:56:22 -0700 (PDT) Received: from localhost (fwdproxy-cln-013.fbsv.net. [2a03:2880:31ff:d::face:b00c]) by smtp.gmail.com with ESMTPSA id j19-20020a170906255300b0099b42c90830sm10193510ejb.36.2023.08.17.07.56.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Aug 2023 07:56:21 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Shuah Khan Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, krisman@suse.de, linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Subject: [PATCH v3 5/9] selftests/net: Extract uring helpers to be reusable Date: Thu, 17 Aug 2023 07:55:50 -0700 Message-Id: <20230817145554.892543-6-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230817145554.892543-1-leitao@debian.org> References: <20230817145554.892543-1-leitao@debian.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org Instead of defining basic io_uring functions in the test case, move them to a common directory, so, other tests can use them. This simplify the test code and reuse the common liburing infrastructure. This is basically a copy of what we have in io_uring_zerocopy_tx with some minor improvements to make checkpatch happy. A follow-up test will use the same helpers in a BPF sockopt test. Signed-off-by: Breno Leitao --- tools/include/io_uring/mini_liburing.h | 282 ++++++++++++++++++ tools/testing/selftests/net/Makefile | 1 + .../selftests/net/io_uring_zerocopy_tx.c | 268 +---------------- 3 files changed, 285 insertions(+), 266 deletions(-) create mode 100644 tools/include/io_uring/mini_liburing.h diff --git a/tools/include/io_uring/mini_liburing.h b/tools/include/io_uring/mini_liburing.h new file mode 100644 index 000000000000..9ccb16074eb5 --- /dev/null +++ b/tools/include/io_uring/mini_liburing.h @@ -0,0 +1,282 @@ +/* SPDX-License-Identifier: MIT */ + +#include +#include +#include +#include +#include +#include + +struct io_sq_ring { + unsigned int *head; + unsigned int *tail; + unsigned int *ring_mask; + unsigned int *ring_entries; + unsigned int *flags; + unsigned int *array; +}; + +struct io_cq_ring { + unsigned int *head; + unsigned int *tail; + unsigned int *ring_mask; + unsigned int *ring_entries; + struct io_uring_cqe *cqes; +}; + +struct io_uring_sq { + unsigned int *khead; + unsigned int *ktail; + unsigned int *kring_mask; + unsigned int *kring_entries; + unsigned int *kflags; + unsigned int *kdropped; + unsigned int *array; + struct io_uring_sqe *sqes; + + unsigned int sqe_head; + unsigned int sqe_tail; + + size_t ring_sz; +}; + +struct io_uring_cq { + unsigned int *khead; + unsigned int *ktail; + unsigned int *kring_mask; + unsigned int *kring_entries; + unsigned int *koverflow; + struct io_uring_cqe *cqes; + + size_t ring_sz; +}; + +struct io_uring { + struct io_uring_sq sq; + struct io_uring_cq cq; + int ring_fd; +}; + +#if defined(__x86_64) || defined(__i386__) +#define read_barrier() __asm__ __volatile__("":::"memory") +#define write_barrier() __asm__ __volatile__("":::"memory") +#else +#define read_barrier() __sync_synchronize() +#define write_barrier() __sync_synchronize() +#endif + +static inline int io_uring_mmap(int fd, struct io_uring_params *p, + struct io_uring_sq *sq, struct io_uring_cq *cq) +{ + size_t size; + void *ptr; + int ret; + + sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned int); + ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING); + if (ptr == MAP_FAILED) + return -errno; + sq->khead = ptr + p->sq_off.head; + sq->ktail = ptr + p->sq_off.tail; + sq->kring_mask = ptr + p->sq_off.ring_mask; + sq->kring_entries = ptr + p->sq_off.ring_entries; + sq->kflags = ptr + p->sq_off.flags; + sq->kdropped = ptr + p->sq_off.dropped; + sq->array = ptr + p->sq_off.array; + + size = p->sq_entries * sizeof(struct io_uring_sqe); + sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES); + if (sq->sqes == MAP_FAILED) { + ret = -errno; +err: + munmap(sq->khead, sq->ring_sz); + return ret; + } + + cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe); + ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING); + if (ptr == MAP_FAILED) { + ret = -errno; + munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe)); + goto err; + } + cq->khead = ptr + p->cq_off.head; + cq->ktail = ptr + p->cq_off.tail; + cq->kring_mask = ptr + p->cq_off.ring_mask; + cq->kring_entries = ptr + p->cq_off.ring_entries; + cq->koverflow = ptr + p->cq_off.overflow; + cq->cqes = ptr + p->cq_off.cqes; + return 0; +} + +static inline int io_uring_setup(unsigned int entries, + struct io_uring_params *p) +{ + return syscall(__NR_io_uring_setup, entries, p); +} + +static inline int io_uring_enter(int fd, unsigned int to_submit, + unsigned int min_complete, + unsigned int flags, sigset_t *sig) +{ + return syscall(__NR_io_uring_enter, fd, to_submit, min_complete, + flags, sig, _NSIG / 8); +} + +static inline int io_uring_queue_init(unsigned int entries, + struct io_uring *ring, + unsigned int flags) +{ + struct io_uring_params p; + int fd, ret; + + memset(ring, 0, sizeof(*ring)); + memset(&p, 0, sizeof(p)); + p.flags = flags; + + fd = io_uring_setup(entries, &p); + if (fd < 0) + return fd; + ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq); + if (!ret) + ring->ring_fd = fd; + else + close(fd); + return ret; +} + +/* Get a sqe */ +static inline struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + + if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries) + return NULL; + return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask]; +} + +static inline int io_uring_wait_cqe(struct io_uring *ring, + struct io_uring_cqe **cqe_ptr) +{ + struct io_uring_cq *cq = &ring->cq; + const unsigned int mask = *cq->kring_mask; + unsigned int head = *cq->khead; + int ret; + + *cqe_ptr = NULL; + do { + read_barrier(); + if (head != *cq->ktail) { + *cqe_ptr = &cq->cqes[head & mask]; + break; + } + ret = io_uring_enter(ring->ring_fd, 0, 1, + IORING_ENTER_GETEVENTS, NULL); + if (ret < 0) + return -errno; + } while (1); + + return 0; +} + +static inline int io_uring_submit(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + const unsigned int mask = *sq->kring_mask; + unsigned int ktail, submitted, to_submit; + int ret; + + read_barrier(); + if (*sq->khead != *sq->ktail) { + submitted = *sq->kring_entries; + goto submit; + } + if (sq->sqe_head == sq->sqe_tail) + return 0; + + ktail = *sq->ktail; + to_submit = sq->sqe_tail - sq->sqe_head; + for (submitted = 0; submitted < to_submit; submitted++) { + read_barrier(); + sq->array[ktail++ & mask] = sq->sqe_head++ & mask; + } + if (!submitted) + return 0; + + if (*sq->ktail != ktail) { + write_barrier(); + *sq->ktail = ktail; + write_barrier(); + } +submit: + ret = io_uring_enter(ring->ring_fd, submitted, 0, + IORING_ENTER_GETEVENTS, NULL); + return ret < 0 ? -errno : ret; +} + +static inline void io_uring_queue_exit(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + + munmap(sq->sqes, *sq->kring_entries * sizeof(struct io_uring_sqe)); + munmap(sq->khead, sq->ring_sz); + close(ring->ring_fd); +} + +/* Prepare and send the SQE */ +static inline void io_uring_prep_cmd(struct io_uring_sqe *sqe, int op, + int sockfd, + int level, int optname, + const void *optval, + int optlen) +{ + memset(sqe, 0, sizeof(*sqe)); + sqe->opcode = (__u8)IORING_OP_URING_CMD; + sqe->fd = sockfd; + sqe->cmd_op = op; + + sqe->level = level; + sqe->optname = optname; + sqe->optval = (unsigned long long)optval; + sqe->optlen = optlen; +} + +static inline int io_uring_register_buffers(struct io_uring *ring, + const struct iovec *iovecs, + unsigned int nr_iovecs) +{ + int ret; + + ret = syscall(__NR_io_uring_register, ring->ring_fd, + IORING_REGISTER_BUFFERS, iovecs, nr_iovecs); + return (ret < 0) ? -errno : ret; +} + +static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags) +{ + memset(sqe, 0, sizeof(*sqe)); + sqe->opcode = (__u8)IORING_OP_SEND; + sqe->fd = sockfd; + sqe->addr = (unsigned long)buf; + sqe->len = len; + sqe->msg_flags = (__u32)flags; +} + +static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags, + unsigned int zc_flags) +{ + io_uring_prep_send(sqe, sockfd, buf, len, flags); + sqe->opcode = (__u8)IORING_OP_SEND_ZC; + sqe->ioprio = zc_flags; +} + +static inline void io_uring_cqe_seen(struct io_uring *ring) +{ + *(&ring->cq)->khead += 1; + write_barrier(); +} diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 7f3ab2a93ed6..70f4a5982b01 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -94,6 +94,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma $(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto $(OUTPUT)/tcp_inq: LDLIBS += -lpthread $(OUTPUT)/bind_bhash: LDLIBS += -lpthread +$(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/ # Rules to generate bpf obj nat6to4.o CLANG ?= clang diff --git a/tools/testing/selftests/net/io_uring_zerocopy_tx.c b/tools/testing/selftests/net/io_uring_zerocopy_tx.c index 154287740172..76e604e4810e 100644 --- a/tools/testing/selftests/net/io_uring_zerocopy_tx.c +++ b/tools/testing/selftests/net/io_uring_zerocopy_tx.c @@ -36,6 +36,8 @@ #include #include +#include + #define NOTIF_TAG 0xfffffffULL #define NONZC_TAG 0 #define ZC_TAG 1 @@ -60,272 +62,6 @@ static struct sockaddr_storage cfg_dst_addr; static char payload[IP_MAXPACKET] __attribute__((aligned(4096))); -struct io_sq_ring { - unsigned *head; - unsigned *tail; - unsigned *ring_mask; - unsigned *ring_entries; - unsigned *flags; - unsigned *array; -}; - -struct io_cq_ring { - unsigned *head; - unsigned *tail; - unsigned *ring_mask; - unsigned *ring_entries; - struct io_uring_cqe *cqes; -}; - -struct io_uring_sq { - unsigned *khead; - unsigned *ktail; - unsigned *kring_mask; - unsigned *kring_entries; - unsigned *kflags; - unsigned *kdropped; - unsigned *array; - struct io_uring_sqe *sqes; - - unsigned sqe_head; - unsigned sqe_tail; - - size_t ring_sz; -}; - -struct io_uring_cq { - unsigned *khead; - unsigned *ktail; - unsigned *kring_mask; - unsigned *kring_entries; - unsigned *koverflow; - struct io_uring_cqe *cqes; - - size_t ring_sz; -}; - -struct io_uring { - struct io_uring_sq sq; - struct io_uring_cq cq; - int ring_fd; -}; - -#ifdef __alpha__ -# ifndef __NR_io_uring_setup -# define __NR_io_uring_setup 535 -# endif -# ifndef __NR_io_uring_enter -# define __NR_io_uring_enter 536 -# endif -# ifndef __NR_io_uring_register -# define __NR_io_uring_register 537 -# endif -#else /* !__alpha__ */ -# ifndef __NR_io_uring_setup -# define __NR_io_uring_setup 425 -# endif -# ifndef __NR_io_uring_enter -# define __NR_io_uring_enter 426 -# endif -# ifndef __NR_io_uring_register -# define __NR_io_uring_register 427 -# endif -#endif - -#if defined(__x86_64) || defined(__i386__) -#define read_barrier() __asm__ __volatile__("":::"memory") -#define write_barrier() __asm__ __volatile__("":::"memory") -#else - -#define read_barrier() __sync_synchronize() -#define write_barrier() __sync_synchronize() -#endif - -static int io_uring_setup(unsigned int entries, struct io_uring_params *p) -{ - return syscall(__NR_io_uring_setup, entries, p); -} - -static int io_uring_enter(int fd, unsigned int to_submit, - unsigned int min_complete, - unsigned int flags, sigset_t *sig) -{ - return syscall(__NR_io_uring_enter, fd, to_submit, min_complete, - flags, sig, _NSIG / 8); -} - -static int io_uring_register_buffers(struct io_uring *ring, - const struct iovec *iovecs, - unsigned nr_iovecs) -{ - int ret; - - ret = syscall(__NR_io_uring_register, ring->ring_fd, - IORING_REGISTER_BUFFERS, iovecs, nr_iovecs); - return (ret < 0) ? -errno : ret; -} - -static int io_uring_mmap(int fd, struct io_uring_params *p, - struct io_uring_sq *sq, struct io_uring_cq *cq) -{ - size_t size; - void *ptr; - int ret; - - sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned); - ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING); - if (ptr == MAP_FAILED) - return -errno; - sq->khead = ptr + p->sq_off.head; - sq->ktail = ptr + p->sq_off.tail; - sq->kring_mask = ptr + p->sq_off.ring_mask; - sq->kring_entries = ptr + p->sq_off.ring_entries; - sq->kflags = ptr + p->sq_off.flags; - sq->kdropped = ptr + p->sq_off.dropped; - sq->array = ptr + p->sq_off.array; - - size = p->sq_entries * sizeof(struct io_uring_sqe); - sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES); - if (sq->sqes == MAP_FAILED) { - ret = -errno; -err: - munmap(sq->khead, sq->ring_sz); - return ret; - } - - cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe); - ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING); - if (ptr == MAP_FAILED) { - ret = -errno; - munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe)); - goto err; - } - cq->khead = ptr + p->cq_off.head; - cq->ktail = ptr + p->cq_off.tail; - cq->kring_mask = ptr + p->cq_off.ring_mask; - cq->kring_entries = ptr + p->cq_off.ring_entries; - cq->koverflow = ptr + p->cq_off.overflow; - cq->cqes = ptr + p->cq_off.cqes; - return 0; -} - -static int io_uring_queue_init(unsigned entries, struct io_uring *ring, - unsigned flags) -{ - struct io_uring_params p; - int fd, ret; - - memset(ring, 0, sizeof(*ring)); - memset(&p, 0, sizeof(p)); - p.flags = flags; - - fd = io_uring_setup(entries, &p); - if (fd < 0) - return fd; - ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq); - if (!ret) - ring->ring_fd = fd; - else - close(fd); - return ret; -} - -static int io_uring_submit(struct io_uring *ring) -{ - struct io_uring_sq *sq = &ring->sq; - const unsigned mask = *sq->kring_mask; - unsigned ktail, submitted, to_submit; - int ret; - - read_barrier(); - if (*sq->khead != *sq->ktail) { - submitted = *sq->kring_entries; - goto submit; - } - if (sq->sqe_head == sq->sqe_tail) - return 0; - - ktail = *sq->ktail; - to_submit = sq->sqe_tail - sq->sqe_head; - for (submitted = 0; submitted < to_submit; submitted++) { - read_barrier(); - sq->array[ktail++ & mask] = sq->sqe_head++ & mask; - } - if (!submitted) - return 0; - - if (*sq->ktail != ktail) { - write_barrier(); - *sq->ktail = ktail; - write_barrier(); - } -submit: - ret = io_uring_enter(ring->ring_fd, submitted, 0, - IORING_ENTER_GETEVENTS, NULL); - return ret < 0 ? -errno : ret; -} - -static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd, - const void *buf, size_t len, int flags) -{ - memset(sqe, 0, sizeof(*sqe)); - sqe->opcode = (__u8) IORING_OP_SEND; - sqe->fd = sockfd; - sqe->addr = (unsigned long) buf; - sqe->len = len; - sqe->msg_flags = (__u32) flags; -} - -static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd, - const void *buf, size_t len, int flags, - unsigned zc_flags) -{ - io_uring_prep_send(sqe, sockfd, buf, len, flags); - sqe->opcode = (__u8) IORING_OP_SEND_ZC; - sqe->ioprio = zc_flags; -} - -static struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring) -{ - struct io_uring_sq *sq = &ring->sq; - - if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries) - return NULL; - return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask]; -} - -static int io_uring_wait_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr) -{ - struct io_uring_cq *cq = &ring->cq; - const unsigned mask = *cq->kring_mask; - unsigned head = *cq->khead; - int ret; - - *cqe_ptr = NULL; - do { - read_barrier(); - if (head != *cq->ktail) { - *cqe_ptr = &cq->cqes[head & mask]; - break; - } - ret = io_uring_enter(ring->ring_fd, 0, 1, - IORING_ENTER_GETEVENTS, NULL); - if (ret < 0) - return -errno; - } while (1); - - return 0; -} - -static inline void io_uring_cqe_seen(struct io_uring *ring) -{ - *(&ring->cq)->khead += 1; - write_barrier(); -} - static unsigned long gettimeofday_ms(void) { struct timeval tv; From patchwork Thu Aug 17 14:55:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13356652 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5723915ADC; Thu, 17 Aug 2023 14:56:45 +0000 (UTC) Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55E3830F5; Thu, 17 Aug 2023 07:56:25 -0700 (PDT) Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-52256241c50so10545496a12.3; Thu, 17 Aug 2023 07:56:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692284184; x=1692888984; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K+xKbzOAOfyHzPl3gLRZBxDpNmJnPRkN8+wjSWU3fB0=; b=GWH6ompBRUMMUjjVOxlPlV9ggAD2jXnWokXzklV1v1yVrWeGWo5OOS7j2KFYhcqA0g gp4fxG1/dLVqXGy/R3hnWVBuFN08DUGSEG44RRsN8BOgjGhh/qErGxhd/9LLy1VHkzZv c7zICFiRckBHZid4tQBlNIsq2vgf6iYkoFOxQCzdKGMBIePsxmJdD6mBdB/P+r1VHCwR CWLIwcHqSrHHg3u5RrkUqIWn3BAvCa2w+KV7WEyuuXeVJHtTkfUivWefiMpeyZvsHVN6 2hZHrDAHNrstW5hZzRaeWnS1K4nsh//U80vlUZH34T8a4HaLNclP79jr+v3ZzO6duCwy P/Zw== X-Gm-Message-State: AOJu0Yx7nCCvNbjqR2OUL2jr0GXHxORxyQmjy8oOGtl+B8IFtu/Ru8d6 xJyeygI31CHEcL/GLszR/QPUdH52Et8= X-Google-Smtp-Source: AGHT+IFbdnCec5/GG4elrASYIqG3L05Gl6U1bRerKBPRZirmGIhRzKnsh/EyfWRmkmvtSfoDA9LleA== X-Received: by 2002:aa7:ca55:0:b0:523:b1b0:f69f with SMTP id j21-20020aa7ca55000000b00523b1b0f69fmr5285145edt.32.1692284183717; Thu, 17 Aug 2023 07:56:23 -0700 (PDT) Received: from localhost (fwdproxy-cln-014.fbsv.net. [2a03:2880:31ff:e::face:b00c]) by smtp.gmail.com with ESMTPSA id c5-20020aa7df05000000b005222b471dc4sm9750582edy.95.2023.08.17.07.56.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Aug 2023 07:56:23 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, krisman@suse.de Subject: [PATCH v3 6/9] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Date: Thu, 17 Aug 2023 07:55:51 -0700 Message-Id: <20230817145554.892543-7-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230817145554.892543-1-leitao@debian.org> References: <20230817145554.892543-1-leitao@debian.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure, where a sockptr_t is either userspace or kernel space, and handled as such. Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt(). Differently from the getsockopt(2), the optlen field is not a userspace pointers. In getsockopt(2), userspace provides optlen pointer, which is overwritten by the kernel. In this implementation, userspace passes a u32, and the new value is returned in cqe->res. I.e., optlen is not a pointer. Important to say that userspace needs to keep the pointer alive until the CQE is completed. Signed-off-by: Breno Leitao --- include/uapi/linux/io_uring.h | 7 +++++++ io_uring/uring_cmd.c | 38 +++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 9fc7195f25df..8152151080db 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -43,6 +43,10 @@ struct io_uring_sqe { union { __u64 addr; /* pointer to buffer or iovecs */ __u64 splice_off_in; + struct { + __u32 level; + __u32 optname; + }; }; __u32 len; /* buffer size or number of iovecs */ union { @@ -79,6 +83,7 @@ struct io_uring_sqe { union { __s32 splice_fd_in; __u32 file_index; + __u32 optlen; struct { __u16 addr_len; __u16 __pad3[1]; @@ -89,6 +94,7 @@ struct io_uring_sqe { __u64 addr3; __u64 __pad2[1]; }; + __u64 optval; /* * If the ring is initialized with IORING_SETUP_SQE128, then * this field is used for 80 bytes of arbitrary command data @@ -729,6 +735,7 @@ struct io_uring_recvmsg_out { enum { SOCKET_URING_OP_SIOCINQ = 0, SOCKET_URING_OP_SIOCOUTQ, + SOCKET_URING_OP_GETSOCKOPT, }; #ifdef __cplusplus diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 5f32083bd0a5..7b768f1bf4df 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -168,6 +168,36 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, } EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed); +INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level, + int optname)); +static inline int io_uring_cmd_getsockopt(struct socket *sock, + struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval)); + int optname = READ_ONCE(cmd->sqe->optname); + int optlen = READ_ONCE(cmd->sqe->optlen); + int level = READ_ONCE(cmd->sqe->level); + int err; + + err = security_socket_getsockopt(sock, level, optname); + if (err) + return err; + + if (level == SOL_SOCKET) { + err = sk_getsockopt(sock->sk, level, optname, + USER_SOCKPTR(optval), + KERNEL_SOCKPTR(&optlen)); + if (err) + return err; + + return optlen; + } + + return -EOPNOTSUPP; +} + +#if defined(CONFIG_NET) int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) { struct socket *sock = cmd->file->private_data; @@ -189,8 +219,16 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) if (ret) return ret; return arg; + case SOCKET_URING_OP_GETSOCKOPT: + return io_uring_cmd_getsockopt(sock, cmd, issue_flags); default: return -EOPNOTSUPP; } } +#else +int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) +{ + return -EOPNOTSUPP; +} +#endif EXPORT_SYMBOL_GPL(io_uring_cmd_sock); From patchwork Thu Aug 17 14:55:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13356651 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42D69156FE; Thu, 17 Aug 2023 14:56:45 +0000 (UTC) Received: from mail-lf1-f46.google.com (mail-lf1-f46.google.com [209.85.167.46]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D89E2711; Thu, 17 Aug 2023 07:56:27 -0700 (PDT) Received: by mail-lf1-f46.google.com with SMTP id 2adb3069b0e04-4ff9bbc83a5so1172221e87.1; Thu, 17 Aug 2023 07:56:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692284185; x=1692888985; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1zsiVk5fOq/xRYSBgy8waxMhVPUrHepcvkTYmhSMbvQ=; b=HxHlsb/k1TJ1T2giGWOMMKVLGCrvhkBFVagfZOq5mMo3XYgpzma5On/iDIqgO9VoaJ kHTcFPP1Lyjz6x6HGqZTo8FBXqlHu43pKClyOmmxSrNEMqWUn7wCbGbbAjIE2ms35+sy KCwlkemrgPvyz3AVqunVKhWr3LFdTLcQdwLIwykA6MzlSP1tvFvtrN3cT9MEHkcaDutb 91TKxzW4ZboeptQ+Y1lhjiaGggUUioEnsdqa+vFyA3HVZ0Uo7W6qe5QJy8H+NcoPQlo2 Y+4UgcmTx3ETbWF+GZAkFE5C3O6vsFDdzZ7AWAAKASadwrrCIaVTeKBrqO2uDoMdJXC0 DeEQ== X-Gm-Message-State: AOJu0YwHi5vug7dvfQjjJpVeyrYJBSaEJVAsMCLS/MjHzAXsRsB90OAN +2Qr/6+/VgFrKYqED6xuHx2QWuZJuzc= X-Google-Smtp-Source: AGHT+IHXQDkzLphQZrs8p1+383d4Zsu9MU09SBJmIQm0eNZXILoDP/CuV7WCev740FE2T6uKjVLeoA== X-Received: by 2002:ac2:5938:0:b0:4fe:c4e:709f with SMTP id v24-20020ac25938000000b004fe0c4e709fmr3647259lfi.20.1692284185184; Thu, 17 Aug 2023 07:56:25 -0700 (PDT) Received: from localhost (fwdproxy-cln-006.fbsv.net. [2a03:2880:31ff:6::face:b00c]) by smtp.gmail.com with ESMTPSA id ck9-20020a170906c44900b00993664a9987sm10251897ejb.103.2023.08.17.07.56.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Aug 2023 07:56:24 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, krisman@suse.de Subject: [PATCH v3 7/9] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Date: Thu, 17 Aug 2023 07:55:52 -0700 Message-Id: <20230817145554.892543-8-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230817145554.892543-1-leitao@debian.org> References: <20230817145554.892543-1-leitao@debian.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Add initial support for SOCKET_URING_OP_SETSOCKOPT. This new command is similar to setsockopt. This implementation leverages the function do_sock_setsockopt(), which is shared with the setsockopt() system call path. Important to say that userspace needs to keep the pointer's memory alive until the operation is completed. I.e, the memory could not be deallocated before the CQE is returned to userspace. Signed-off-by: Breno Leitao --- include/uapi/linux/io_uring.h | 1 + io_uring/uring_cmd.c | 16 ++++++++++++++++ 2 files changed, 17 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 8152151080db..3fe82df06abf 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -736,6 +736,7 @@ enum { SOCKET_URING_OP_SIOCINQ = 0, SOCKET_URING_OP_SIOCOUTQ, SOCKET_URING_OP_GETSOCKOPT, + SOCKET_URING_OP_SETSOCKOPT, }; #ifdef __cplusplus diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 7b768f1bf4df..a567dd32df00 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -197,6 +197,20 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock, return -EOPNOTSUPP; } +static inline int io_uring_cmd_setsockopt(struct socket *sock, + struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval)); + bool compat = !!(issue_flags & IO_URING_F_COMPAT); + int optname = READ_ONCE(cmd->sqe->optname); + sockptr_t optval_s = USER_SOCKPTR(optval); + int optlen = READ_ONCE(cmd->sqe->optlen); + int level = READ_ONCE(cmd->sqe->level); + + return do_sock_setsockopt(sock, compat, level, optname, optval_s, optlen); +} + #if defined(CONFIG_NET) int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) { @@ -221,6 +235,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags) return arg; case SOCKET_URING_OP_GETSOCKOPT: return io_uring_cmd_getsockopt(sock, cmd, issue_flags); + case SOCKET_URING_OP_SETSOCKOPT: + return io_uring_cmd_setsockopt(sock, cmd, issue_flags); default: return -EOPNOTSUPP; } From patchwork Thu Aug 17 14:55:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13356654 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC08F168BF; Thu, 17 Aug 2023 14:56:45 +0000 (UTC) Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55AA2358D; Thu, 17 Aug 2023 07:56:28 -0700 (PDT) Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-99c3c8adb27so1058627066b.1; Thu, 17 Aug 2023 07:56:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692284187; x=1692888987; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nQ0mVRxVAzaTLmEd/HemwV6+ljHNBK9BmdpK1e4n1Fk=; b=HhakUhuWD8IuqO5cUkbKW+BqG/HC7D/HtZSJNr27Ku0wGEnd3vu2KNLKr/ywy3oKCH eR8RJD/Nq9XCGNVEIMJcz1gNlkPjv8STlLavmrKS/Vl4cUlth7fKImD/uVTtd43QARK1 Oxm+r/aeAZ6u8wESyJMtdRJ1jd8hVQmxsp3dWodb+m8mthgCQBRRN2L4R0yBXXofy3Ad wIFb2roG7pE7mrsrRZtqmMPtDJrG79WYf4UqTY3YCNkCuRUqxhQBXMzyVS64XXlF65VL 37rvGg7odpOXQulsJjksBgFNQt6iiqr4m3LwbHNJ++pZGs1g2W0FR5ylRgeuvd0cZ2qZ vpcw== X-Gm-Message-State: AOJu0YxryfRWS1k02PbgLaJCUnSUSkcRcGeZtr+gae6bPi05Bsuo4dGC yzTS972Xw70hhyT+CTeYdNM= X-Google-Smtp-Source: AGHT+IEsohlieh21iQeLkKanrDXE3z5P+/3xxLzQkFGsKChB+60ow3ngaXkrzdOg3/s7riUp6DU8+g== X-Received: by 2002:a17:906:9be4:b0:99e:46b:6a58 with SMTP id de36-20020a1709069be400b0099e046b6a58mr1629764ejc.37.1692284186466; Thu, 17 Aug 2023 07:56:26 -0700 (PDT) Received: from localhost (fwdproxy-cln-003.fbsv.net. [2a03:2880:31ff:3::face:b00c]) by smtp.gmail.com with ESMTPSA id a17-20020a1709062b1100b0099bd5b72d93sm10207857ejg.43.2023.08.17.07.56.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Aug 2023 07:56:26 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, krisman@suse.de Subject: [PATCH v3 8/9] io_uring/cmd: BPF hook for getsockopt cmd Date: Thu, 17 Aug 2023 07:55:53 -0700 Message-Id: <20230817145554.892543-9-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230817145554.892543-1-leitao@debian.org> References: <20230817145554.892543-1-leitao@debian.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Add BPF hook support for getsockopts io_uring command. So, BPF cgroups programs can run when SOCKET_URING_OP_GETSOCKOPT command is executed through io_uring. This implementation follows a similar approach to what __sys_getsockopt() does, but, using USER_SOCKPTR() for optval instead of kernel pointer. Signed-off-by: Breno Leitao --- io_uring/uring_cmd.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index a567dd32df00..9e08a14760c3 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -5,6 +5,8 @@ #include #include #include +#include +#include #include #include @@ -184,17 +186,23 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock, if (err) return err; - if (level == SOL_SOCKET) { + err = -EOPNOTSUPP; + if (level == SOL_SOCKET) err = sk_getsockopt(sock->sk, level, optname, USER_SOCKPTR(optval), KERNEL_SOCKPTR(&optlen)); - if (err) - return err; + if (!(issue_flags & IO_URING_F_COMPAT)) + err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, + optname, + USER_SOCKPTR(optval), + KERNEL_SOCKPTR(&optlen), + optlen, err); + + if (!err) return optlen; - } - return -EOPNOTSUPP; + return err; } static inline int io_uring_cmd_setsockopt(struct socket *sock, From patchwork Thu Aug 17 14:55:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Breno Leitao X-Patchwork-Id: 13356655 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF390171A2; Thu, 17 Aug 2023 14:56:47 +0000 (UTC) Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5743A35AA; Thu, 17 Aug 2023 07:56:31 -0700 (PDT) Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-5255da974c4so6380970a12.3; Thu, 17 Aug 2023 07:56:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692284190; x=1692888990; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ke4IWzmqO1N+bGCPCfqmfSAH5jeG6iX85TTRlsQcMDs=; b=I5LxGv+5Y4qlBcuPQRMVz+pQ/yxKydRfY2Ou/ZGP/OWlOfdCcMb6j5aea2RzkmlWZT 7SeWnIhAYr5+3MKzoAlLrXcomiN9FNmlW++dKmt1XqEJFI1r980W9m7t9KCFlhP9jfcL SbordW4ll0zPKfk9om+Bdc0S4n0OV+GrxTg9dF9OZ8s2OzhizviBeyXbvVndCwVgeFcx aJ2ORoknB/domvDdCwN/Q20DzYpX/KnD87JurqSq/TkhFzXjciJQxoBs15GJe39hnhY1 4hLYccZb6VnKbJdCjCmeStMCM2J6nEWmYq5NQMGuw10feDuCghZmjlp1k6SIq0CdNhSd lkRA== X-Gm-Message-State: AOJu0Yx/xFHqf169G+po0Cl0UymHxQYrpw5RPWWd9yOqK9D+ZtYYXxuO y/E4LCMCbJei8rkjhA4Ze8Q= X-Google-Smtp-Source: AGHT+IGG++fNvmWQJgN7uptZFJRzrGCuJT2vPHS0WF5JyTy6uNOX81AEB0/UY7tK5PTgf6IM32MEpw== X-Received: by 2002:a17:907:98e4:b0:994:542c:8718 with SMTP id ke4-20020a17090798e400b00994542c8718mr4187880ejc.76.1692284189770; Thu, 17 Aug 2023 07:56:29 -0700 (PDT) Received: from localhost (fwdproxy-cln-012.fbsv.net. [2a03:2880:31ff:c::face:b00c]) by smtp.gmail.com with ESMTPSA id sd17-20020a170906ce3100b0099cc3c7ace2sm10327123ejb.140.2023.08.17.07.56.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Aug 2023 07:56:29 -0700 (PDT) From: Breno Leitao To: sdf@google.com, axboe@kernel.dk, asml.silence@gmail.com, willemdebruijn.kernel@gmail.com, martin.lau@linux.dev, Andrii Nakryiko , Mykola Lysenko , Alexei Starovoitov , Daniel Borkmann , Song Liu , Yonghong Song , John Fastabend , KP Singh , Hao Luo , Jiri Olsa , Shuah Khan Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, krisman@suse.de, Wang Yufen , =?utf-8?q?Daniel_M?= =?utf-8?q?=C3=BCller?= , linux-kselftest@vger.kernel.org (open list:KERNEL SELFTEST FRAMEWORK) Subject: [PATCH v3 9/9] selftests/bpf/sockopt: Add io_uring support Date: Thu, 17 Aug 2023 07:55:54 -0700 Message-Id: <20230817145554.892543-10-leitao@debian.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230817145554.892543-1-leitao@debian.org> References: <20230817145554.892543-1-leitao@debian.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Expand the sockopt test to use also check for io_uring {g,s}etsockopt commands operations. This patch starts by marking marks each test if they support io_uring support or not. Right now, io_uring cmd getsockopt() has a limitation of only accepting level == SOL_SOCKET, otherwise it returns -EOPNOTSUPP. Later, each test runs using regular {g,s}etsockopt systemcalls, and, if liburing is supported, execute the same test (again), but calling liburing {g,s}setsockopt commands. For that, leverage the mini liburing to do basic io_uring operations. Signed-off-by: Breno Leitao --- tools/testing/selftests/bpf/Makefile | 1 + .../selftests/bpf/prog_tests/sockopt.c | 129 +++++++++++++++++- 2 files changed, 124 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 538df8fb8c42..4da04242b848 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -362,6 +362,7 @@ CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \ $(OUTPUT)/test_l4lb_noinline.o: BPF_CFLAGS += -fno-inline $(OUTPUT)/test_xdp_noinline.o: BPF_CFLAGS += -fno-inline +$(OUTPUT)/test_progs.o: CFLAGS += -I../../../include/ $(OUTPUT)/flow_dissector_load.o: flow_dissector_load.h $(OUTPUT)/cgroup_getset_retval_hooks.o: cgroup_getset_retval_hooks.h diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt.c b/tools/testing/selftests/bpf/prog_tests/sockopt.c index 9e6a5e3ed4de..4693ad8bfe8f 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockopt.c +++ b/tools/testing/selftests/bpf/prog_tests/sockopt.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include "cgroup_helpers.h" static char bpf_log_buf[4096]; @@ -38,6 +39,7 @@ static struct sockopt_test { socklen_t get_optlen_ret; enum sockopt_test_error error; + bool io_uring_support; } tests[] = { /* ==================== getsockopt ==================== */ @@ -53,6 +55,7 @@ static struct sockopt_test { .attach_type = BPF_CGROUP_GETSOCKOPT, .expected_attach_type = 0, .error = DENY_LOAD, + .io_uring_support = true, }, { .descr = "getsockopt: wrong expected_attach_type", @@ -65,6 +68,7 @@ static struct sockopt_test { .attach_type = BPF_CGROUP_GETSOCKOPT, .expected_attach_type = BPF_CGROUP_SETSOCKOPT, .error = DENY_ATTACH, + .io_uring_support = true, }, { .descr = "getsockopt: bypass bpf hook", @@ -121,6 +125,7 @@ static struct sockopt_test { .attach_type = BPF_CGROUP_GETSOCKOPT, .expected_attach_type = BPF_CGROUP_GETSOCKOPT, .error = DENY_LOAD, + .io_uring_support = true, }, { .descr = "getsockopt: read ctx->level", @@ -164,6 +169,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_GETSOCKOPT, .error = DENY_LOAD, + .io_uring_support = true, }, { .descr = "getsockopt: read ctx->optname", @@ -225,6 +231,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_GETSOCKOPT, .error = DENY_LOAD, + .io_uring_support = true, }, { .descr = "getsockopt: read ctx->optlen", @@ -252,6 +259,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_GETSOCKOPT, .get_optlen = 64, + .io_uring_support = true, }, { .descr = "getsockopt: deny bigger ctx->optlen", @@ -276,6 +284,7 @@ static struct sockopt_test { .get_optlen = 64, .error = EFAULT_GETSOCKOPT, + .io_uring_support = true, }, { .descr = "getsockopt: ignore >PAGE_SIZE optlen", @@ -318,6 +327,7 @@ static struct sockopt_test { .get_optval = {}, /* the changes are ignored */ .get_optlen = PAGE_SIZE + 1, .error = EOPNOTSUPP_GETSOCKOPT, + .io_uring_support = true, }, { .descr = "getsockopt: support smaller ctx->optlen", @@ -339,6 +349,7 @@ static struct sockopt_test { .get_optlen = 64, .get_optlen_ret = 32, + .io_uring_support = true, }, { .descr = "getsockopt: deny writing to ctx->optval", @@ -353,6 +364,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_GETSOCKOPT, .error = DENY_LOAD, + .io_uring_support = true, }, { .descr = "getsockopt: deny writing to ctx->optval_end", @@ -367,6 +379,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_GETSOCKOPT, .error = DENY_LOAD, + .io_uring_support = true, }, { .descr = "getsockopt: rewrite value", @@ -421,6 +434,7 @@ static struct sockopt_test { .attach_type = BPF_CGROUP_SETSOCKOPT, .expected_attach_type = 0, .error = DENY_LOAD, + .io_uring_support = true, }, { .descr = "setsockopt: wrong expected_attach_type", @@ -433,6 +447,7 @@ static struct sockopt_test { .attach_type = BPF_CGROUP_SETSOCKOPT, .expected_attach_type = BPF_CGROUP_GETSOCKOPT, .error = DENY_ATTACH, + .io_uring_support = true, }, { .descr = "setsockopt: bypass bpf hook", @@ -518,6 +533,7 @@ static struct sockopt_test { .set_level = 123, .set_optlen = 1, + .io_uring_support = true, }, { .descr = "setsockopt: allow changing ctx->level", @@ -572,6 +588,7 @@ static struct sockopt_test { .set_optname = 123, .set_optlen = 1, + .io_uring_support = true, }, { .descr = "setsockopt: allow changing ctx->optname", @@ -624,6 +641,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_SETSOCKOPT, .set_optlen = 64, + .io_uring_support = true, }, { .descr = "setsockopt: ctx->optlen == -1 is ok", @@ -640,6 +658,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_SETSOCKOPT, .set_optlen = 64, + .io_uring_support = true, }, { .descr = "setsockopt: deny ctx->optlen < 0 (except -1)", @@ -658,6 +677,7 @@ static struct sockopt_test { .set_optlen = 4, .error = EFAULT_SETSOCKOPT, + .io_uring_support = true, }, { .descr = "setsockopt: deny ctx->optlen > input optlen", @@ -675,6 +695,7 @@ static struct sockopt_test { .set_optlen = 64, .error = EFAULT_SETSOCKOPT, + .io_uring_support = true, }, { .descr = "setsockopt: ignore >PAGE_SIZE optlen", @@ -775,6 +796,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_SETSOCKOPT, .error = DENY_LOAD, + .io_uring_support = true, }, { .descr = "setsockopt: deny read ctx->retval", @@ -791,6 +813,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_SETSOCKOPT, .error = DENY_LOAD, + .io_uring_support = true, }, { .descr = "setsockopt: deny writing to ctx->optval", @@ -805,6 +828,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_SETSOCKOPT, .error = DENY_LOAD, + .io_uring_support = true, }, { .descr = "setsockopt: deny writing to ctx->optval_end", @@ -819,6 +843,7 @@ static struct sockopt_test { .expected_attach_type = BPF_CGROUP_SETSOCKOPT, .error = DENY_LOAD, + .io_uring_support = true, }, { .descr = "setsockopt: allow IP_TOS <= 128", @@ -940,7 +965,94 @@ static int load_prog(const struct bpf_insn *insns, return fd; } -static int run_test(int cgroup_fd, struct sockopt_test *test) +/* Core function that handles io_uring ring initialization, + * sending SQE with sockopt command and waiting for the CQE. + */ +static int uring_sockopt(int op, int fd, int level, int optname, + const void *optval, socklen_t optlen) +{ + struct io_uring_cqe *cqe; + struct io_uring_sqe *sqe; + struct io_uring ring; + int err; + + err = io_uring_queue_init(1, &ring, 0); + if (err) { + log_err("Failed to initialize io_uring ring"); + return err; + } + + sqe = io_uring_get_sqe(&ring); + if (!sqe) { + log_err("Failed to get an SQE"); + return -1; + } + + io_uring_prep_cmd(sqe, op, fd, level, optname, optval, optlen); + + err = io_uring_submit(&ring); + if (err != 1) { + log_err("Failed to submit SQE"); + return err; + } + + err = io_uring_wait_cqe(&ring, &cqe); + if (err) { + log_err("Failed to wait for CQE"); + return err; + } + + err = cqe->res; + + io_uring_queue_exit(&ring); + + return err; +} + +static int uring_setsockopt(int fd, int level, int optname, const void *optval, + socklen_t optlen) +{ + return uring_sockopt(SOCKET_URING_OP_SETSOCKOPT, fd, level, optname, + optval, optlen); +} + +static int uring_getsockopt(int fd, int level, int optname, void *optval, + socklen_t *optlen) +{ + int ret = uring_sockopt(SOCKET_URING_OP_GETSOCKOPT, fd, level, optname, + optval, *optlen); + if (ret < 0) + return ret; + + /* Populate optlen back to be compatible with systemcall interface, + * and simplify the test. + */ + *optlen = ret; + + return 0; +} + +/* Execute the setsocktopt operation */ +static int call_setsockopt(bool use_io_uring, int fd, int level, int optname, + const void *optval, socklen_t optlen) +{ + if (use_io_uring) + return uring_setsockopt(fd, level, optname, optval, optlen); + + return setsockopt(fd, level, optname, optval, optlen); +} + +/* Execute the getsocktopt operation */ +static int call_getsockopt(bool use_io_uring, int fd, int level, int optname, + void *optval, socklen_t *optlen) +{ + if (use_io_uring) + return uring_getsockopt(fd, level, optname, optval, optlen); + + return getsockopt(fd, level, optname, optval, optlen); +} + +static int run_test(int cgroup_fd, struct sockopt_test *test, bool use_io_uring) { int sock_fd, err, prog_fd; void *optval = NULL; @@ -980,8 +1092,9 @@ static int run_test(int cgroup_fd, struct sockopt_test *test) test->set_optlen = num_pages * sysconf(_SC_PAGESIZE) + remainder; } - err = setsockopt(sock_fd, test->set_level, test->set_optname, - test->set_optval, test->set_optlen); + err = call_setsockopt(use_io_uring, sock_fd, test->set_level, + test->set_optname, test->set_optval, + test->set_optlen); if (err) { if (errno == EPERM && test->error == EPERM_SETSOCKOPT) goto close_sock_fd; @@ -1008,8 +1121,8 @@ static int run_test(int cgroup_fd, struct sockopt_test *test) socklen_t expected_get_optlen = test->get_optlen_ret ?: test->get_optlen; - err = getsockopt(sock_fd, test->get_level, test->get_optname, - optval, &optlen); + err = call_getsockopt(use_io_uring, sock_fd, test->get_level, + test->get_optname, optval, &optlen); if (err) { if (errno == EOPNOTSUPP && test->error == EOPNOTSUPP_GETSOCKOPT) goto free_optval; @@ -1063,7 +1176,11 @@ void test_sockopt(void) if (!test__start_subtest(tests[i].descr)) continue; - ASSERT_OK(run_test(cgroup_fd, &tests[i]), tests[i].descr); + ASSERT_OK(run_test(cgroup_fd, &tests[i], false), + tests[i].descr); + if (tests[i].io_uring_support) + ASSERT_OK(run_test(cgroup_fd, &tests[i], true), + tests[i].descr); } close(cgroup_fd);