From patchwork Mon Nov 11 01:50:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13870157 Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 700F012A177 for ; Mon, 11 Nov 2024 01:50:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731289808; cv=none; b=kHsjgGkLU6BVNSKu+Yzo5ZXEr68HaikhJArIrkjCAuoyI6thCTtHHCg6OMY/n9OKfBIhwhLNVHZuN1C1wDD7oA4unMbjUVhwbFYTYRiL+y224Fbvi4+PZiJ4Pi+AVqPoay6MoZzF/cG8RN/zmAPJ/USRRA9jnYg3nTHYeH5l6OQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731289808; c=relaxed/simple; bh=oEN3PTJSvYYENoMEeU/fXjb0Mx1Jf3wvTKILFlqvdSc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kSwMuNBrhCSkW5q1KE0iwLLB4eBn0vhPKmytqLfWNKKFFFjU6sBEt2FONg1O0prubGYcLV/QBQuJdw5H2q4RaqMnOBGHUhci8OsmdcvCFVVtO7XeyP+0AAKI8nmpEUDSG/dxLWzPMfO2qZon1XOQfqCl4nQpZ4ZWB2mnrtV+VW4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HkKNhbZP; arc=none smtp.client-ip=209.85.221.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HkKNhbZP" Received: by mail-wr1-f49.google.com with SMTP id ffacd0b85a97d-37d495d217bso3742459f8f.0 for ; Sun, 10 Nov 2024 17:50:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731289804; x=1731894604; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ngqLuMC/mZAvUzUaZXEWfX8nsydLs3KWDt2xoq7rbDk=; b=HkKNhbZPom+NJ0o/yDjxnAoOFO4Xo9zfUabAZ7mG5DXVuPZZF75Anv9bOhlZVPfb7E WEJyYl0QeorSvHNoJ6nHe8aDqtqO1xSkQoQIXl6tAgvDFjoLrSkIIlzClKeBagqr60hO MfVaVeCsozD2Ao0zQHCSp5YY3J9xLgKV6kRI4B4/DjkNvol6IXxa8u2EnxK9lLoic4uI fklkOT/JeiIBqpgWb/JTeE1w2393TIP0ydss830ARgzzQa+FZ8heql8cQ7lFci5VdVXM YHMbFskuPrEri5ln14e790xJQopVyxo3NTBtAo9FoJPuL68z2+1aQRIrEHazPj4hDiaX T+nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731289804; x=1731894604; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ngqLuMC/mZAvUzUaZXEWfX8nsydLs3KWDt2xoq7rbDk=; b=YFhMngLYuC0sq3fCKJqJQOgAFYJn5XdKvdccIz8lmQmU1GSees4KnFJfAXykcBdxRp tf9D+1kaFtytzAaU02g0ycqIMykTW6U+lwgZ0TQRkHK4/sT9P+93PWUV9V0T3Q3GcTek c75Jp1EfMLn4KNcb8+TBwy8U7JtmxENtQX8f6SAGufMh7vrHRXp+OKKaAe19gjYOHf4i 9S44hQJksIQMUuGwTY4tFX2AHoFcLjs+YhpLK8L4bQ4Mub8LVPRrp74HTj4vZ27A2KAH qJj0oxp1RM+kfslCNOx3Fm8VMkVonhivlyW48xY6FK6gdHWQVfAWAKZ3go9JJuTqe0nu 7+3g== X-Gm-Message-State: AOJu0YzhD3tqfctjxtO0NixV76IvEmP7iVO59Xkk9XYsV3rOwQTHIbuB k47NcwVJUMui1+shZDRFRTl8n3RV1Qapc9+ey0hvWIrJf8kqnuzvBdFDUg== X-Google-Smtp-Source: AGHT+IHfMaPGXVe+eREEpmSckVLmCOutwi7NZlRuJICkkBG8gR45DAtxff3Mesmt1zSMkv3AmccfRA== X-Received: by 2002:a05:6000:1fad:b0:37c:cc4b:d1d6 with SMTP id ffacd0b85a97d-381f1872041mr12519580f8f.27.1731289804318; Sun, 10 Nov 2024 17:50:04 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.234.98]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-432b05c18e0sm161494685e9.28.2024.11.10.17.50.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Nov 2024 17:50:03 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com Subject: [RFC 1/3] bpf/io_uring: add io_uring program type Date: Mon, 11 Nov 2024 01:50:44 +0000 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add a new BPF program type and bare minimum implementation that would be responsible orchestrating in-kernel request handling in the io_uring waiting loop. The program is supposed to replace the logic which terminates the traditional waiting loop based on a number of parameters like the number of completion event to wait for, and it returns one of the IOU_BPF_RET_* return codes telling the kernel whether it should return back to the user space or continue waiting. At the moment there is no way to attach it anywhere, and the program is pretty useless and doesn't know yet how to interact with io_uring. Signed-off-by: Pavel Begunkov --- include/linux/bpf.h | 1 + include/linux/bpf_types.h | 4 ++++ include/linux/io_uring/bpf.h | 10 ++++++++++ include/uapi/linux/bpf.h | 1 + include/uapi/linux/io_uring/bpf.h | 22 ++++++++++++++++++++++ io_uring/Makefile | 1 + io_uring/bpf.c | 24 ++++++++++++++++++++++++ kernel/bpf/btf.c | 3 +++ kernel/bpf/syscall.c | 1 + kernel/bpf/verifier.c | 10 +++++++++- 10 files changed, 76 insertions(+), 1 deletion(-) create mode 100644 include/linux/io_uring/bpf.h create mode 100644 include/uapi/linux/io_uring/bpf.h create mode 100644 io_uring/bpf.c diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 19d8ca8ac960..bccd99dd58c4 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -30,6 +30,7 @@ #include #include #include +#include struct bpf_verifier_env; struct bpf_verifier_log; diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 9f2a6b83b49e..24293e1ee0b1 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -83,6 +83,10 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_SYSCALL, bpf_syscall, BPF_PROG_TYPE(BPF_PROG_TYPE_NETFILTER, netfilter, struct bpf_nf_ctx, struct bpf_nf_ctx) #endif +#ifdef CONFIG_IO_URING +BPF_PROG_TYPE(BPF_PROG_TYPE_IOURING, bpf_io_uring, + struct io_uring_bpf_ctx, struct io_bpf_ctx_kern) +#endif BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops) diff --git a/include/linux/io_uring/bpf.h b/include/linux/io_uring/bpf.h new file mode 100644 index 000000000000..b700a4b65111 --- /dev/null +++ b/include/linux/io_uring/bpf.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _LINUX_IO_URING_BPF_H +#define _LINUX_IO_URING_BPF_H + +#include + +struct io_bpf_ctx_kern { +}; + +#endif diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e8241b320c6d..1945430d31a6 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1055,6 +1055,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_SK_LOOKUP, BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */ BPF_PROG_TYPE_NETFILTER, + BPF_PROG_TYPE_IOURING, __MAX_BPF_PROG_TYPE }; diff --git a/include/uapi/linux/io_uring/bpf.h b/include/uapi/linux/io_uring/bpf.h new file mode 100644 index 000000000000..da749fe7251c --- /dev/null +++ b/include/uapi/linux/io_uring/bpf.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR MIT */ +/* + * Header file for the io_uring bpf interface. + * + * Copyright (C) 2024 Pavel Begunkov + */ +#ifndef LINUX_IO_URING_BPF_H +#define LINUX_IO_URING_BPF_H + +#include + +enum { + IOU_BPF_RET_OK, + IOU_BPF_RET_STOP, + + __IOU_BPF_RET_MAX, +}; + +struct io_uring_bpf_ctx { +}; + +#endif diff --git a/io_uring/Makefile b/io_uring/Makefile index 53167bef37d7..5da66ecc98e5 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -17,3 +17,4 @@ obj-$(CONFIG_IO_URING) += io_uring.o opdef.o kbuf.o rsrc.o notif.o \ obj-$(CONFIG_IO_WQ) += io-wq.o obj-$(CONFIG_FUTEX) += futex.o obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o +obj-$(CONFIG_BPF) += bpf.o diff --git a/io_uring/bpf.c b/io_uring/bpf.c new file mode 100644 index 000000000000..6eb0c47b4aa9 --- /dev/null +++ b/io_uring/bpf.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include + +static const struct bpf_func_proto * +io_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + return bpf_base_func_proto(func_id, prog); +} + +static bool io_bpf_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + return false; +} + +const struct bpf_prog_ops bpf_io_uring_prog_ops = {}; + +const struct bpf_verifier_ops bpf_io_uring_verifier_ops = { + .get_func_proto = io_bpf_func_proto, + .is_valid_access = io_bpf_is_valid_access, +}; diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 5cd1c7a23848..e102ee7c530a 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -219,6 +219,7 @@ enum btf_kfunc_hook { BTF_KFUNC_HOOK_LWT, BTF_KFUNC_HOOK_NETFILTER, BTF_KFUNC_HOOK_KPROBE, + BTF_KFUNC_HOOK_IOURING, BTF_KFUNC_HOOK_MAX, }; @@ -8393,6 +8394,8 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type) return BTF_KFUNC_HOOK_NETFILTER; case BPF_PROG_TYPE_KPROBE: return BTF_KFUNC_HOOK_KPROBE; + case BPF_PROG_TYPE_IOURING: + return BTF_KFUNC_HOOK_IOURING; default: return BTF_KFUNC_HOOK_MAX; } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 8cfa7183d2ef..5587ede39ae2 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2571,6 +2571,7 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, return -EINVAL; case BPF_PROG_TYPE_SYSCALL: case BPF_PROG_TYPE_EXT: + case BPF_PROG_TYPE_IOURING: if (expected_attach_type) return -EINVAL; fallthrough; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 411ab1b57af4..14de335ba66b 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -15946,6 +15946,9 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char case BPF_PROG_TYPE_NETFILTER: range = retval_range(NF_DROP, NF_ACCEPT); break; + case BPF_PROG_TYPE_IOURING: + range = retval_range(IOU_BPF_RET_OK, __IOU_BPF_RET_MAX - 1); + break; case BPF_PROG_TYPE_EXT: /* freplace program can return anything as its return value * depends on the to-be-replaced kernel func or bpf program. @@ -22209,7 +22212,8 @@ static bool can_be_sleepable(struct bpf_prog *prog) } return prog->type == BPF_PROG_TYPE_LSM || prog->type == BPF_PROG_TYPE_KPROBE /* only for uprobes */ || - prog->type == BPF_PROG_TYPE_STRUCT_OPS; + prog->type == BPF_PROG_TYPE_STRUCT_OPS || + prog->type == BPF_PROG_TYPE_IOURING; } static int check_attach_btf_id(struct bpf_verifier_env *env) @@ -22229,6 +22233,10 @@ static int check_attach_btf_id(struct bpf_verifier_env *env) verbose(env, "Syscall programs can only be sleepable\n"); return -EINVAL; } + if (prog->type == BPF_PROG_TYPE_IOURING && !prog->sleepable) { + verbose(env, "io_uring programs can only be sleepable\n"); + return -EINVAL; + } if (prog->sleepable && !can_be_sleepable(prog)) { verbose(env, "Only fentry/fexit/fmod_ret, lsm, iter, uprobe, and struct_ops programs can be sleepable\n"); From patchwork Mon Nov 11 01:50:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13870158 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 917CD191 for ; Mon, 11 Nov 2024 01:50:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731289809; cv=none; b=r0Nwh/DMCQfzwarzv4lup9iRF8rtdSpi79yFb+lfNvGhD2OWkRiulMU/lKGSIlJ+n3W5K4WIU5dRPBT4d9y/JihBnlo69N96txkfLQAc7YLr7U3AqARNIfdEYZXk6JwZOSk38teJ7z5rvzoPSnNU5eGS+gNkRXWjsqzpEQ7jNPQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731289809; c=relaxed/simple; bh=iH6qTgZ44UH25t0wCGPZLsz3M4EypapHwV/Mp50Tt/Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZHu1m54lvVYj936C84XBaObA1iaF629cBuJxW8ma/PUr/+yQSCEQLGL+OxrfwPsXoks6fugOo+iqMo5d8iHEOp6HJXE/DaHScY9d5q5PNi/dEO0zUEF02kW4P0QSh+0eaPRCcYL5F/ajOnbf+p+7GssYCRso7fRAGaxysYZiYrk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MMQP8jJt; arc=none smtp.client-ip=209.85.128.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MMQP8jJt" Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-4314fa33a35so31927555e9.1 for ; Sun, 10 Nov 2024 17:50:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731289806; x=1731894606; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=j6L78z4Hh26abEH1CPBN8l+aYUegXdN+Rj7ZXyVTb2I=; b=MMQP8jJtprCcKDMUBdzgjlz/dXZBj+gM2wHD5PINUzOZzalxg+dcbVY7t8r64nhred D4c/Zi5Qx9nPYxOsxBYGxyn61UPZBfKwT9riLmgZzBYZg+1eFXUZ+m20FXPusFJoe7pD qCmmdggD45MXZLdeCCEKAKyCt3ia6bOOlvvbmWxRhh3fXmWuGz5wJYOm7UvQwFUcLSU2 E088PMAbf3yvrc8Sl65muaRlfUYcJupoPBy0xCDMaGjpedjB59AlMUh3kScEpaqce7dU Eg4xzi5nF758PddZzOBp6t9lNvk2vMxZt2BjU31jyHn1kZcO+83sjwLhrvYaXCRPIByM j7KA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731289806; x=1731894606; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=j6L78z4Hh26abEH1CPBN8l+aYUegXdN+Rj7ZXyVTb2I=; b=q8N6XlcepWiL6kAl4JfsBiz6UWURtti6b8boPZ6l3v+wKjWO1jmNHRMbwwKo8Q6Wx6 TEYrTsQXmvB55oMnL1OOnAnpEdJ3OyCyC/gcyM3VyI56OBoi85+i9IB4w9dEpIAWRXLu BK657jWUC5NXv6a0G2sMw32NXYujB5ODmOp7BOuS+REVWNeeaLhGuYXPVOGMmpISSVNN GDZwBqW0vWVdv2q6WVqaf1KEm9+6NGPaf5cn/B1i/eYfIx4Pl5lQSVToA9LvmdQWKPn5 ZnT+2Aki1KiyaXOKDROpo+wmpFyCGCRgYnZI8yjeyvWJMUj9AiqPx1G0RNNsc/ywAwwg dslA== X-Gm-Message-State: AOJu0YyFMHC882aSqz2Gy03YhyWYdGxp8Kudd1nbbla61FgKGJZSSYP2 1N3Stds3hpIx3EGhBSEgAS2+NGBozFbUfiQGnQBzl1dH3Dqg2/OfmzJbpA== X-Google-Smtp-Source: AGHT+IGBh4erGzcw5Ev2sh8F/HfTNh3A0/jAguGsWgNvecyw6dVWeyh92MG/FgDc/ZzSPFBn9/xIBQ== X-Received: by 2002:a05:600c:3c9c:b0:431:588a:4498 with SMTP id 5b1f17b1804b1-432b7501d07mr94307345e9.14.1731289805485; Sun, 10 Nov 2024 17:50:05 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.234.98]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-432b05c18e0sm161494685e9.28.2024.11.10.17.50.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Nov 2024 17:50:05 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com Subject: [RFC 2/3] io_uring/bpf: allow to register and run BPF programs Date: Mon, 11 Nov 2024 01:50:45 +0000 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Let the user to register a BPF_PROG_TYPE_IOURING BPF program to a ring. The progrma will be run in the waiting loop every time something happens, i.e. the task was woken up by a task_work / signal / etc. Signed-off-by: Pavel Begunkov --- include/linux/io_uring_types.h | 4 +++ include/uapi/linux/io_uring.h | 9 +++++ io_uring/bpf.c | 63 ++++++++++++++++++++++++++++++++++ io_uring/bpf.h | 41 ++++++++++++++++++++++ io_uring/io_uring.c | 15 ++++++++ io_uring/register.c | 7 ++++ 6 files changed, 139 insertions(+) create mode 100644 io_uring/bpf.h diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index ad5001102c86..50cee0d3622e 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -8,6 +8,8 @@ #include #include +struct io_bpf_ctx; + enum { /* * A hint to not wake right away but delay until there are enough of @@ -246,6 +248,8 @@ struct io_ring_ctx { enum task_work_notify_mode notify_method; unsigned sq_thread_idle; + + struct io_bpf_ctx *bpf_ctx; } ____cacheline_aligned_in_smp; /* submission data */ diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index ba373deb8406..f2c2fefc8514 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -634,6 +634,8 @@ enum io_uring_register_op { /* register fixed io_uring_reg_wait arguments */ IORING_REGISTER_CQWAIT_REG = 34, + IORING_REGISTER_BPF = 35, + /* this goes last */ IORING_REGISTER_LAST, @@ -905,6 +907,13 @@ enum io_uring_socket_op { SOCKET_URING_OP_SETSOCKOPT, }; +struct io_uring_bpf_reg { + __u64 prog_fd; + __u32 flags; + __u32 resv1; + __u64 resv2[2]; +}; + #ifdef __cplusplus } #endif diff --git a/io_uring/bpf.c b/io_uring/bpf.c index 6eb0c47b4aa9..8b7c74761c63 100644 --- a/io_uring/bpf.c +++ b/io_uring/bpf.c @@ -1,6 +1,9 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include + +#include "bpf.h" static const struct bpf_func_proto * io_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) @@ -22,3 +25,63 @@ const struct bpf_verifier_ops bpf_io_uring_verifier_ops = { .get_func_proto = io_bpf_func_proto, .is_valid_access = io_bpf_is_valid_access, }; + +int io_run_bpf(struct io_ring_ctx *ctx) +{ + struct io_bpf_ctx *bc = ctx->bpf_ctx; + int ret; + + mutex_lock(&ctx->uring_lock); + ret = bpf_prog_run_pin_on_cpu(bc->prog, bc); + mutex_unlock(&ctx->uring_lock); + return ret; +} + +int io_unregister_bpf(struct io_ring_ctx *ctx) +{ + struct io_bpf_ctx *bc = ctx->bpf_ctx; + + if (!bc) + return -ENXIO; + bpf_prog_put(bc->prog); + kfree(bc); + ctx->bpf_ctx = NULL; + return 0; +} + +int io_register_bpf(struct io_ring_ctx *ctx, void __user *arg, + unsigned int nr_args) +{ + struct __user io_uring_bpf_reg *bpf_reg_usr = arg; + struct io_uring_bpf_reg bpf_reg; + struct io_bpf_ctx *bc; + struct bpf_prog *prog; + + if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN)) + return -EOPNOTSUPP; + + if (nr_args != 1) + return -EINVAL; + if (copy_from_user(&bpf_reg, bpf_reg_usr, sizeof(bpf_reg))) + return -EFAULT; + if (bpf_reg.flags || bpf_reg.resv1 || + bpf_reg.resv2[0] || bpf_reg.resv2[1]) + return -EINVAL; + + if (ctx->bpf_ctx) + return -ENXIO; + + bc = kzalloc(sizeof(*bc), GFP_KERNEL); + if (!bc) + return -ENOMEM; + + prog = bpf_prog_get_type(bpf_reg.prog_fd, BPF_PROG_TYPE_IOURING); + if (IS_ERR(prog)) { + kfree(bc); + return PTR_ERR(prog); + } + + bc->prog = prog; + ctx->bpf_ctx = bc; + return 0; +} diff --git a/io_uring/bpf.h b/io_uring/bpf.h new file mode 100644 index 000000000000..2b4e555ff07a --- /dev/null +++ b/io_uring/bpf.h @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef IOU_BPF_H +#define IOU_BPF_H + +#include +#include + +struct bpf_prog; + +struct io_bpf_ctx { + struct io_bpf_ctx_kern kern; + struct bpf_prog *prog; +}; + +static inline bool io_bpf_enabled(struct io_ring_ctx *ctx) +{ + return IS_ENABLED(CONFIG_BPF) && ctx->bpf_ctx != NULL; +} + +#ifdef CONFIG_BPF +int io_register_bpf(struct io_ring_ctx *ctx, void __user *arg, + unsigned int nr_args); +int io_unregister_bpf(struct io_ring_ctx *ctx); +int io_run_bpf(struct io_ring_ctx *ctx); + +#else +static inline int io_register_bpf(struct io_ring_ctx *ctx, void __user *arg, + unsigned int nr_args) +{ + return -EOPNOTSUPP; +} +static inline int io_unregister_bpf(struct io_ring_ctx *ctx) +{ + return -EOPNOTSUPP; +} +static inline int io_run_bpf(struct io_ring_ctx *ctx) +{ +} +#endif + +#endif diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index f34fa1ead2cf..82599e2a888a 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -104,6 +104,7 @@ #include "rw.h" #include "alloc_cache.h" #include "eventfd.h" +#include "bpf.h" #define SQE_COMMON_FLAGS (IOSQE_FIXED_FILE | IOSQE_IO_LINK | \ IOSQE_IO_HARDLINK | IOSQE_ASYNC) @@ -2834,6 +2835,12 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags, io_napi_busy_loop(ctx, &iowq); + if (io_bpf_enabled(ctx)) { + ret = io_run_bpf(ctx); + if (ret == IOU_BPF_RET_STOP) + return 0; + } + trace_io_uring_cqring_wait(ctx, min_events); do { unsigned long check_cq; @@ -2879,6 +2886,13 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags, if (ret < 0) break; + if (io_bpf_enabled(ctx)) { + ret = io_run_bpf(ctx); + if (ret == IOU_BPF_RET_STOP) + break; + continue; + } + check_cq = READ_ONCE(ctx->check_cq); if (unlikely(check_cq)) { /* let the caller flush overflows, retry */ @@ -3009,6 +3023,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) io_futex_cache_free(ctx); io_destroy_buffers(ctx); io_unregister_cqwait_reg(ctx); + io_unregister_bpf(ctx); mutex_unlock(&ctx->uring_lock); if (ctx->sq_creds) put_cred(ctx->sq_creds); diff --git a/io_uring/register.c b/io_uring/register.c index 45edfc57963a..2a8efeacf2db 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -30,6 +30,7 @@ #include "eventfd.h" #include "msg_ring.h" #include "memmap.h" +#include "bpf.h" #define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \ IORING_REGISTER_LAST + IORING_OP_LAST) @@ -846,6 +847,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_cqwait_reg(ctx, arg); break; + case IORING_REGISTER_BPF: + ret = -EINVAL; + if (!arg) + break; + ret = io_register_bpf(ctx, arg, nr_args); + break; default: ret = -EINVAL; break; From patchwork Mon Nov 11 01:50:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13870159 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA531250F8 for ; Mon, 11 Nov 2024 01:50:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731289810; cv=none; b=b+lL/nixRjtKtA7OPn26p4sjHmT7sk5SwrgNUEjTl1apnCyq1rAZ2ZpdbSrla6CfZE0xN2icRvh6aN/59osmB6gUghBzMQCLPbiRhVQngcKIfYwZqHNhupEG20o7Y6DnMENq3b3pRX+gmn37NCM8ANV0vHWbg7eeC2Q6B6n5u3g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731289810; c=relaxed/simple; bh=jvbIrsiF5aXfGP9grQIq8ERPXnt0zJUL365wHEj2yyw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qpoVIP7NAsB70Zep+v7y+y61VC+8ePcRoOsrMNq6A647SXckwIU7jmrdq/O6FE8/lJCEe4egZX8So71eQhV5M0Fqs5WEF1G7tKBzWkfokfVg5ryueejYjhyr3YfzbSMX2ZLoQmEfd9TViEUBukO880/YZff9G2wUINbaLNDD0xw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=I8rEg6oo; arc=none smtp.client-ip=209.85.128.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="I8rEg6oo" Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-4315839a7c9so38090515e9.3 for ; Sun, 10 Nov 2024 17:50:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731289807; x=1731894607; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=77w6NopLgeZ4XUkJxQUmUmdzyiKjbX9yzh81+v5A00E=; b=I8rEg6oo//SWxxaX9gK/p1LQddjH5yBAoKHdGfdpLm8WDFNNQlePtI6y/mNAcBLbCb ECUzz8zITvhZUXs7A1dcgBtJE3jgn0U43deHX85BQxeWPj7ouxs6/vnHQOIzDFbt/4NM 2hUaBq6JyxzclCOIGW69kQ6BS+hLEjXcSE+CvwtWOqqLAIozmTEA6jdIV5sju5kwOnse wwXTHJfmokn4FhxE5y57UbRVzs4f0Hdo0XJ9lZGB7B3eqKx+qQkpRMuTMi6xbGvUJ2yC MGExNn6UvJcW+oPmFKGRtxEqj/DIyV7Yr1XtJZ7Qs3pNKi2TtoBUv+5QcQ92IL/W4h6w T12Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731289807; x=1731894607; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=77w6NopLgeZ4XUkJxQUmUmdzyiKjbX9yzh81+v5A00E=; b=wcW8FLRc6f5B2k98sSqJ8a0NScNJN5NdJ52rKZYWPOfDdhSzXkvAQrGmj8DwQyEup5 zmGQjs6evzZjEbF5qeH3i5+01Qb94dMA0wefviqcGJyJ6MBs6/+qtBTBfP7JbnAyna0W uK8avnpCWC/lk2DtXu3oqL1cNCF/m15lDywBqgOPKR0q5kwVNpe7SZlUIJO85tRrcVVR onKuVZSxCI5VOlTUM9ptHpbK+nzoq2lcDxi9L0yFdEenzirCJsWLGW/zKF7DkPpwYxzu Oh4igGCuhd3My9avgmw9/N1f03sV1PqOiAyJt9ULFnCcNF+UowsPMbnSMy44PlZm5Y1U oH7g== X-Gm-Message-State: AOJu0YyghCjKbvQVCqfN16gC1cxCTpNMLo6tJJSkoQS3pQNtv5uM71PW fkpp661mYgANJ1fq3chxWY5Od0nIFj7e+GMtJglnXfuN9KS60Ooa29fmBA== X-Google-Smtp-Source: AGHT+IFiLHGGI1Kt4DqGkB6SUar8Y+UlQRdQVDTQNwyxKcVixS/RnNDAFv0ps2hRaLXLaIRfRqrPlA== X-Received: by 2002:a05:600c:3b82:b0:42c:bb10:7292 with SMTP id 5b1f17b1804b1-432b74fc98cmr95180895e9.1.1731289806824; Sun, 10 Nov 2024 17:50:06 -0800 (PST) Received: from 127.0.0.1localhost ([85.255.234.98]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-432b05c18e0sm161494685e9.28.2024.11.10.17.50.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Nov 2024 17:50:06 -0800 (PST) From: Pavel Begunkov To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com Subject: [RFC 3/3] io_uring/bpf: add kfuncs for BPF programs Date: Mon, 11 Nov 2024 01:50:46 +0000 Message-ID: X-Mailer: git-send-email 2.46.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add a way for io_uring BPF programs to look at CQEs and submit new requests. Signed-off-by: Pavel Begunkov --- io_uring/bpf.c | 118 ++++++++++++++++++++++++++++++++++++++++++++ io_uring/bpf.h | 2 + io_uring/io_uring.c | 1 + 3 files changed, 121 insertions(+) diff --git a/io_uring/bpf.c b/io_uring/bpf.c index 8b7c74761c63..d413c3712612 100644 --- a/io_uring/bpf.c +++ b/io_uring/bpf.c @@ -4,6 +4,123 @@ #include #include "bpf.h" +#include "io_uring.h" + +static inline struct io_bpf_ctx *io_user_to_bpf_ctx(struct io_uring_bpf_ctx *ctx) +{ + struct io_bpf_ctx_kern *bc = (struct io_bpf_ctx_kern *)ctx; + + return container_of(bc, struct io_bpf_ctx, kern); +} + +__bpf_kfunc_start_defs(); + +__bpf_kfunc int bpf_io_uring_queue_sqe(struct io_uring_bpf_ctx *user_ctx, + void *bpf_sqe, int mem__sz) +{ + struct io_bpf_ctx *bc = io_user_to_bpf_ctx(user_ctx); + struct io_ring_ctx *ctx = bc->ctx; + unsigned tail = ctx->rings->sq.tail; + struct io_uring_sqe *sqe; + + if (mem__sz != sizeof(*sqe)) + return -EINVAL; + + ctx->rings->sq.tail++; + tail &= (ctx->sq_entries - 1); + /* double index for 128-byte SQEs, twice as long */ + if (ctx->flags & IORING_SETUP_SQE128) + tail <<= 1; + sqe = &ctx->sq_sqes[tail]; + memcpy(sqe, bpf_sqe, sizeof(*sqe)); + return 0; +} + +__bpf_kfunc int bpf_io_uring_submit_sqes(struct io_uring_bpf_ctx *user_ctx, + unsigned nr) +{ + struct io_bpf_ctx *bc = io_user_to_bpf_ctx(user_ctx); + struct io_ring_ctx *ctx = bc->ctx; + + return io_submit_sqes(ctx, nr); +} + +__bpf_kfunc int bpf_io_uring_get_cqe(struct io_uring_bpf_ctx *user_ctx, + struct io_uring_cqe *res__uninit) +{ + struct io_bpf_ctx *bc = io_user_to_bpf_ctx(user_ctx); + struct io_ring_ctx *ctx = bc->ctx; + struct io_rings *rings = ctx->rings; + unsigned int mask = ctx->cq_entries - 1; + unsigned head = rings->cq.head; + struct io_uring_cqe *cqe; + + /* TODO CQE32 */ + if (head == rings->cq.tail) + goto fail; + + cqe = &rings->cqes[head & mask]; + memcpy(res__uninit, cqe, sizeof(*cqe)); + rings->cq.head++; + return 0; +fail: + memset(res__uninit, 0, sizeof(*res__uninit)); + return -EINVAL; +} + +__bpf_kfunc +struct io_uring_cqe *bpf_io_uring_get_cqe2(struct io_uring_bpf_ctx *user_ctx) +{ + struct io_bpf_ctx *bc = io_user_to_bpf_ctx(user_ctx); + struct io_ring_ctx *ctx = bc->ctx; + struct io_rings *rings = ctx->rings; + unsigned int mask = ctx->cq_entries - 1; + unsigned head = rings->cq.head; + struct io_uring_cqe *cqe; + + /* TODO CQE32 */ + if (head == rings->cq.tail) + return NULL; + + cqe = &rings->cqes[head & mask]; + rings->cq.head++; + return cqe; +} + +__bpf_kfunc +void bpf_io_uring_set_wait_params(struct io_uring_bpf_ctx *user_ctx, + unsigned wait_nr) +{ + struct io_bpf_ctx *bc = io_user_to_bpf_ctx(user_ctx); + struct io_ring_ctx *ctx = bc->ctx; + struct io_wait_queue *wq = bc->waitq; + + wait_nr = min_t(unsigned, wait_nr, ctx->cq_entries); + wq->cq_tail = READ_ONCE(ctx->rings->cq.head) + wait_nr; +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(io_uring_kfunc_set) +BTF_ID_FLAGS(func, bpf_io_uring_queue_sqe, KF_SLEEPABLE); +BTF_ID_FLAGS(func, bpf_io_uring_submit_sqes, KF_SLEEPABLE); +BTF_ID_FLAGS(func, bpf_io_uring_get_cqe, 0); +BTF_ID_FLAGS(func, bpf_io_uring_get_cqe2, KF_RET_NULL); +BTF_ID_FLAGS(func, bpf_io_uring_set_wait_params, 0); +BTF_KFUNCS_END(io_uring_kfunc_set) + +static const struct btf_kfunc_id_set bpf_io_uring_kfunc_set = { + .owner = THIS_MODULE, + .set = &io_uring_kfunc_set, +}; + +static int init_io_uring_bpf(void) +{ + return register_btf_kfunc_id_set(BPF_PROG_TYPE_IOURING, + &bpf_io_uring_kfunc_set); +} +late_initcall(init_io_uring_bpf); + static const struct bpf_func_proto * io_bpf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) @@ -82,6 +199,7 @@ int io_register_bpf(struct io_ring_ctx *ctx, void __user *arg, } bc->prog = prog; + bc->ctx = ctx; ctx->bpf_ctx = bc; return 0; } diff --git a/io_uring/bpf.h b/io_uring/bpf.h index 2b4e555ff07a..9f578a48ce2e 100644 --- a/io_uring/bpf.h +++ b/io_uring/bpf.h @@ -9,6 +9,8 @@ struct bpf_prog; struct io_bpf_ctx { struct io_bpf_ctx_kern kern; + struct io_ring_ctx *ctx; + struct io_wait_queue *waitq; struct bpf_prog *prog; }; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 82599e2a888a..98206e68ce70 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2836,6 +2836,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags, io_napi_busy_loop(ctx, &iowq); if (io_bpf_enabled(ctx)) { + ctx->bpf_ctx->waitq = &iowq; ret = io_run_bpf(ctx); if (ret == IOU_BPF_RET_STOP) return 0;