From patchwork Tue Feb 4 19:46:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13959694 Received: from mail-il1-f175.google.com (mail-il1-f175.google.com [209.85.166.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC466158558 for ; Tue, 4 Feb 2025 19:48:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698501; cv=none; b=mog4PHK83X3jpFhqSt2cHNesu5m6GWwG2e8c3AAuVtvNroBLO4dkfiRIGKDc/GFrpTAIM14E9HrDsy/r5EvhfJvbD4myVvfK8m62pSP/XmAGOlse9czVkAnX8Cq6IKnbV4zbCpUelOzaXa6mHe5liWCjfsn8H0rx0fYgkPNDc98= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698501; c=relaxed/simple; bh=eGiwkNWqVn4n6sELTRMgfaMcoF7E583l9mfCxaa+7q8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HJDjwU7Ap5M/hNen+0vmtRSTvjQL+QMKhG83te/XehVynpw4rhLhQWjUyi6oK45gHx4iuXcxgRJxG8+gUF+M5mg8/0Z5syf7E/LNP7D7RvCcJpn7ySduUEhywCx8Ox7xtSEmh5N91exBKvx0Od6q2lCizYuSp3Z2q4abQg4YXeg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=zteLo+n2; arc=none smtp.client-ip=209.85.166.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="zteLo+n2" Received: by mail-il1-f175.google.com with SMTP id e9e14a558f8ab-3ce76b8d5bcso50554165ab.0 for ; Tue, 04 Feb 2025 11:48:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738698499; x=1739303299; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=enKCXdT18Z+amYe4hyaD8mGmZru39ch23j+4VBK1x/o=; b=zteLo+n2FC35FIPsZDSxuqqv4X6FZ/zA5Mxf6HtHr+rS5PsAH2Tc9nPPjhUclGQyk1 9U0y/s3dVfs8BlPJBi5rQQx7VYwzwJ6HsqnvAy8+x8TFvZ7HmFOFI3MlUD994UpkSgW6 WJwoNdaCgSe2cBqOkIdT+a616L+9l8KbQ5r3sJswZ8j6qXO4nneZ25artKEClg5Ax4ab ErQ+37NTO04TCoiOwPhcnxixR30NVqD6cxRRoipQmhwDOvXu/WKcDa4EejIEubY43MKM j1CeKrAYpIDpTzfPoLwu8MbzocMHs+YX+6vo/5CAZwUtRA8ReDpz4OM/2kWvNTic5i8p huFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738698499; x=1739303299; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=enKCXdT18Z+amYe4hyaD8mGmZru39ch23j+4VBK1x/o=; b=Uap8Fs1DPaDyhtD6GfHRa1aKKI6sS4EW5udifzzS0mUv+8OJKLLdDUdUoW3sjhHyKx m0P8unYGCcyKdWrB+Uj1o5FzpExFZMLJliDiczyjCeGcxzchGws7ulZSfqzx99m74BHm 13NZWIpCC2mYJTJCOxgcWekeUgwktEQL+mO5MnyXEHOfQURxhbFYomFZc9PLiWcxeteh s22Bw3na/II4+0l3C1qqznz3kyE65OLbR5S6igvXFXbz5mD1fEIq0wd4JpNucU+i7bio tzC6StdT6zgLT4/O31DLjolJKicQD4xN/vzKbsXNE3OTcfFU3sMbD27wIvqG12Ym7slL sq0g== X-Gm-Message-State: AOJu0YzI2gnHAVB0HBtRJN232z+Pm8FedfuKYulmTQ6tZOIzJI9BYqcT 3Vh9CyuNd23dP0Xr/rQHr0TfPmGp8pWGrEFkus6jVjEwR9VUVxG/v5+ZQ9WiUmw= X-Gm-Gg: ASbGncsF/kUoblSDs2/nSZdo7qQU32t042nwpT28pfY2oGHi683ZI+1EEwGyrRpi0F7 TaL5yCgqsMrFz8Sxe1WZRff/JA4l9+6j/U689vbvBYa52w8VUpHEuBwuhVR1BwDTxRr0ovj6Afq 7I/TKvMzICQZkl1DZj1JRS9AGrduLnnW7xuI3oNMFePjiHhj2b3+YWRIiTGU9qmufTFdW6uBtMF C+uakJ45j/wcRah5m+y7/oqWeGvvU9oO0dGvd0oEghOc7JGq9fbPOb5vcWIHnMfFGVTHRjvD/bZ HCptcadGbPOhSbVV3tY= X-Google-Smtp-Source: AGHT+IHPAymKEUBTBKdzksMiXR6c5ug44eAf4HMNGF1lBWnftCR85DIzydMrvesVJgCfTmEerFHJ2Q== X-Received: by 2002:a05:6e02:1545:b0:3d0:405d:e94f with SMTP id e9e14a558f8ab-3d04f917886mr1410995ab.17.1738698498799; Tue, 04 Feb 2025 11:48:18 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ec746c95c4sm2841466173.127.2025.02.04.11.48.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2025 11:48:17 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 01/11] eventpoll: abstract out main epoll reaper into a function Date: Tue, 4 Feb 2025 12:46:35 -0700 Message-ID: <20250204194814.393112-2-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250204194814.393112-1-axboe@kernel.dk> References: <20250204194814.393112-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add epoll_wait(), which takes a struct file and the number of events etc to reap. This can then be called by do_epoll_wait(), and used by io_uring as well. No intended functional changes in this patch. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 31 ++++++++++++++++++------------- include/linux/eventpoll.h | 4 ++++ 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 7c0980db77b3..73b639caed3d 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2445,12 +2445,8 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, return do_epoll_ctl(epfd, op, fd, &epds, false); } -/* - * Implement the event wait interface for the eventpoll file. It is the kernel - * part of the user space epoll_wait(2). - */ -static int do_epoll_wait(int epfd, struct epoll_event __user *events, - int maxevents, struct timespec64 *to) +int epoll_wait(struct file *file, struct epoll_event __user *events, + int maxevents, struct timespec64 *to) { struct eventpoll *ep; @@ -2462,28 +2458,37 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events, if (!access_ok(events, maxevents * sizeof(struct epoll_event))) return -EFAULT; - /* Get the "struct file *" for the eventpoll file */ - CLASS(fd, f)(epfd); - if (fd_empty(f)) - return -EBADF; - /* * We have to check that the file structure underneath the fd * the user passed to us _is_ an eventpoll file. */ - if (!is_file_epoll(fd_file(f))) + if (!is_file_epoll(file)) return -EINVAL; /* * At this point it is safe to assume that the "private_data" contains * our own data structure. */ - ep = fd_file(f)->private_data; + ep = file->private_data; /* Time to fish for events ... */ return ep_poll(ep, events, maxevents, to); } +/* + * Implement the event wait interface for the eventpoll file. It is the kernel + * part of the user space epoll_wait(2). + */ +static int do_epoll_wait(int epfd, struct epoll_event __user *events, + int maxevents, struct timespec64 *to) +{ + /* Get the "struct file *" for the eventpoll file */ + CLASS(fd, f)(epfd); + if (!fd_empty(f)) + return epoll_wait(fd_file(f), events, maxevents, to); + return -EBADF; +} + SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events, int, maxevents, int, timeout) { diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 0c0d00fcd131..f37fea931c44 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -25,6 +25,10 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long t /* Used to release the epoll bits inside the "struct file" */ void eventpoll_release_file(struct file *file); +/* Use to reap events */ +int epoll_wait(struct file *file, struct epoll_event __user *events, + int maxevents, struct timespec64 *to); + /* * This is called from inside fs/file_table.c:__fput() to unlink files * from the eventpoll interface. We need to have this facility to cleanup From patchwork Tue Feb 4 19:46:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13959695 Received: from mail-io1-f49.google.com (mail-io1-f49.google.com [209.85.166.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A298216E3B for ; Tue, 4 Feb 2025 19:48:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698502; cv=none; b=NRf8vWDXaWgwyae5b5yGch3PUPGHuGF2OcNctg+UEPJHSEvqUMbDsmMhADj2HI2DB7bQnV9ep9OtI6rDnFf+hWBz3nUhNWVBgjjDyJ6EwpGcTxDbIvRFqmdMuACOyZCjGiXtCGBTMzbLS0pKI+BRlxBJW8Sc/r9rFpQStWayLCE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698502; c=relaxed/simple; bh=HCmSTkxKH/HdcO7Uglkp2F2gW3Sb8auKNqzvPdoBqxk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RTjmAmU1ty2dlZzOixkd1K7oAWshvfccoXF45LFVVqoF11MyGF5kRLdlss9EhICaMRET5dkO6Y0M2qwNhWXaeuywaX4PVdKkFXgpibGrsbabBIokZ7UmDd/4tlrt+G8qDAcWZwkAhaamNsMhU6LZfFHBlSsPMTDhIqygXeazsVU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=r+9bmMqR; arc=none smtp.client-ip=209.85.166.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="r+9bmMqR" Received: by mail-io1-f49.google.com with SMTP id ca18e2360f4ac-844cd85f5ebso413346139f.3 for ; Tue, 04 Feb 2025 11:48:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738698500; x=1739303300; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Z+1svFbJCvXV9Zr5117Ph4lZhgiz5rV+2RtGExbCLLw=; b=r+9bmMqREeRvjz0JKPd01cJZECmzs/jCDM43H0wXvZ+1HdOKH5yjdtPrp986avfsIX 21lBytDhIBQNX3bBMqegzYYLPcPq96YhEJTeXiprlUbYpV2gbjWBI1Ln3mctPafy4mMI nI0DTF+AMH2QBFu8ULUoKE8uwLvsU4Jkr6ogvxCG6gEyJ9mbSojYWjVg/2vfkuwm4w6L 8UTIMkW/ppU5DVlxKku615GQEq3s5L03CtHu4DuRDuGxFmx5JZ5mCkABxteLe0+SXGg+ /6pLlOfGm5tYc4k/NdkyB7X/lZQ9ERAHMaEbj8aRb7jJKqOO3c7a8qar1s5jATmYnpHf kreA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738698500; x=1739303300; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z+1svFbJCvXV9Zr5117Ph4lZhgiz5rV+2RtGExbCLLw=; b=tjzmlqOgzbe4dwl76DjApeCF0D5uoQ+YXRUQ4YuiURRwimyADfCBKRxC+2WZmPSgYY qbaW+7gBdxhlAbxpF51VW/SrhTiuN9pR1NhrKVknry/mUeOLLqRY7sziqFypTXiAe4b9 rWwOnZSMcPyuMZQUFnQwFoH0hmbzXlW2jm42c/OmPuu1er7Rt8tRhDVqwrgYy2z3dIb8 AJwx3y8DwGfuVN/RsSjAYh7kkM7Vj5wTKH9Fq42CeGjtpxSdKMJ+slzAv8JIVezgBFAg F3rWHcJSBGXYYH6sS5RgDYHNo8kzEcJUD5l/FDdJCAgTAHlcB/ATYy+gZ2YhLLBfWGEw 1TzQ== X-Gm-Message-State: AOJu0YwK1blzw97d3CAgt3Kc1218ogfCKTz6Yp1Wi8teXxx0iu/i6/d0 chPY7iBfE/iT+b3rXZe4K3ilPw1TEdKsBBXd/57rpIDp+BWnj91FYFZOK1h4Osc= X-Gm-Gg: ASbGnctkY6XasvPUg0k7s/50zdFm6IWGck8/R9x9TXf3qD0CSipuBqKqneuXoZrALDK 1kLIzZ58i3WdY584kuz3dmZ0lZWAwqwWA5uQay1QH93bHuAicrvC0VtKZfdKxsBvxgvdApaSOqV ZSD8TNnpxmZ7c8A4JgAJFu90yBj9lqZNhDZPr3MP34Bmn+SBLYrLf4+zwddS3UtUWV26ZJiOJbS wQV63l/jaChfEoDeHc/uFCmUTEuTECQCM7H96B5O0Di/SMud+5Jr0Nz0xe24/3LnFbBqiL8EfNU UQ8kMgfmkqzfO7ln0yU= X-Google-Smtp-Source: AGHT+IGyhMhWzZeNFelBaJa5XqWbciXoqrb56RQyl9GgEyL9hOLIOlPxlzAjAP9SCizEF/4ij6wYeg== X-Received: by 2002:a05:6602:7210:b0:84f:2929:5ee0 with SMTP id ca18e2360f4ac-854ea50f874mr24700439f.10.1738698500067; Tue, 04 Feb 2025 11:48:20 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ec746c95c4sm2841466173.127.2025.02.04.11.48.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2025 11:48:19 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 02/11] eventpoll: add helper to remove wait entry from wait queue head Date: Tue, 4 Feb 2025 12:46:36 -0700 Message-ID: <20250204194814.393112-3-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250204194814.393112-1-axboe@kernel.dk> References: <20250204194814.393112-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 __epoll_wait_remove() is the core helper, it kills a given wait_queue_entry from the eventpoll wait_queue_head. Use it internally, and provide an overall helper, epoll_wait_remove(), which takes a struct file and provides the same functionality. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 58 +++++++++++++++++++++++++-------------- include/linux/eventpoll.h | 3 ++ 2 files changed, 40 insertions(+), 21 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 73b639caed3d..01edbee5c766 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1980,6 +1980,42 @@ static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, return ret; } +static int __epoll_wait_remove(struct eventpoll *ep, + struct wait_queue_entry *wait, int timed_out) +{ + int eavail; + + /* + * We were woken up, thus go and try to harvest some events. If timed + * out and still on the wait queue, recheck eavail carefully under + * lock, below. + */ + eavail = 1; + + if (!list_empty_careful(&wait->entry)) { + write_lock_irq(&ep->lock); + /* + * If the thread timed out and is not on the wait queue, it + * means that the thread was woken up after its timeout expired + * before it could reacquire the lock. Thus, when wait.entry is + * empty, it needs to harvest events. + */ + if (timed_out) + eavail = list_empty(&wait->entry); + __remove_wait_queue(&ep->wq, wait); + write_unlock_irq(&ep->lock); + } + + return eavail; +} + +int epoll_wait_remove(struct file *file, struct wait_queue_entry *wait) +{ + if (is_file_epoll(file)) + return __epoll_wait_remove(file->private_data, wait, false); + return -EINVAL; +} + /** * ep_poll - Retrieves ready events, and delivers them to the caller-supplied * event buffer. @@ -2100,27 +2136,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, HRTIMER_MODE_ABS); __set_current_state(TASK_RUNNING); - /* - * We were woken up, thus go and try to harvest some events. - * If timed out and still on the wait queue, recheck eavail - * carefully under lock, below. - */ - eavail = 1; - - if (!list_empty_careful(&wait.entry)) { - write_lock_irq(&ep->lock); - /* - * If the thread timed out and is not on the wait queue, - * it means that the thread was woken up after its - * timeout expired before it could reacquire the lock. - * Thus, when wait.entry is empty, it needs to harvest - * events. - */ - if (timed_out) - eavail = list_empty(&wait.entry); - __remove_wait_queue(&ep->wq, &wait); - write_unlock_irq(&ep->lock); - } + eavail = __epoll_wait_remove(ep, &wait, timed_out); } } diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index f37fea931c44..1301fc74aca0 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -29,6 +29,9 @@ void eventpoll_release_file(struct file *file); int epoll_wait(struct file *file, struct epoll_event __user *events, int maxevents, struct timespec64 *to); +/* Remove wait entry */ +int epoll_wait_remove(struct file *file, struct wait_queue_entry *wait); + /* * This is called from inside fs/file_table.c:__fput() to unlink files * from the eventpoll interface. We need to have this facility to cleanup From patchwork Tue Feb 4 19:46:37 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13959696 Received: from mail-il1-f174.google.com (mail-il1-f174.google.com [209.85.166.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58B132185BC for ; Tue, 4 Feb 2025 19:48:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698503; cv=none; b=ozdp+NCMc4o1ZMYaAliGRJlF7JK5F2UXdclJU63K8UeFieyMg1KPreCIAsO5y42fXGj4xoc25La1cBB14w4DG/Tbg9wWOHNod+kx4AnD2ii5o+IwkLbc+qARsTo1iZMm3S/6SxuaQVcKSNTpBxy5CaQF4toXrwMAqWUmwmKSqEM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698503; c=relaxed/simple; bh=FfOsgqQUbrqMBseqWoxvdeHMXbEwUwBb1lrRINWoyNY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VQ2vCmeQO7mPsL+F6bhBgy+ltwUDvt2Lg98lzA9XFQW7sMev1Y/W2iEKLnfBVPBhjO5lHkWr+BtuXU5x3BizbOL+pKqOwGBSWLKYwTX5H/t0wHEek+JbQTQpcHQ2lDrvIX8n9BS4XpWlINmRdUxdHQGhKAnEJw6onavQFtsLmAA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=deb31OxN; arc=none smtp.client-ip=209.85.166.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="deb31OxN" Received: by mail-il1-f174.google.com with SMTP id e9e14a558f8ab-3d04fc1ea14so364255ab.2 for ; Tue, 04 Feb 2025 11:48:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738698501; x=1739303301; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=W4mLZ0JfeLifQ8U+aoK/TU8xkbtkVLx+Q+iBJc1yFPs=; b=deb31OxNU1FrYW9c2lj1rJc6b8gWtKbFVb5TKGPbva+mmxr0DaB/gl3FW32XO6Vd9P fv/Ofbej1UEoi9ntN2mhcq78ornCTfVBmTeswca4nbnzg+s1jgomg+22ZRSDQ1njaFeC 0uqdXmH8wpLQ5wTcT5YY51on44ogrMFmVsuBIxSnO9xol6AiFlYRVLYVewRhZ5bM0qSb iTWxkO4IxMF4FpWngUQB2goF5hWWkLHJv7N5drCgG6LOfsmBA+VVBzOBw/gOFCfhnCnW qpaW6jhFf1B31wHrCK097hSORcOU+it22VuRG0wQCZ2zbtPTbRaZq2OA/ybGkIEdhqJv y74A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738698501; x=1739303301; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W4mLZ0JfeLifQ8U+aoK/TU8xkbtkVLx+Q+iBJc1yFPs=; b=NYFrxvMqASNIFvAYWDPM+CyJhZ4GNYHjzTtlNJVumXyzVBk7uHoh3ixVvBDtXF1aa0 3BGO77fzGMk7OrrFKhABfXYPcVfjC3kJG1wP+zowLNP6tbJrk4kZwoOvoEl6Ls1v5sAV IvdF5rEulhumtuGgd1T0p+vjslIW1n2QHBaWE+cw8kQ3lr9E1Wn3nrHBkiGVziyVxCdd 2q64K3zugUomwi0KL/CRbN1NhUzzwDMSyneJzDvmmdPdscAcbkOodzGbdS10JYm0vudc KIrbpp8OCbxwhAOC/N83pUqT2DnXg9Xx495nOkx1JKtuZ4Ucc/DhYx6igeUUNHrrjW93 rKJw== X-Gm-Message-State: AOJu0YzaRysjVTS5p2Yib3dxBkn4VlpJQ/N1MchDntm/ZnH0raXWbC4Q 4ZIuqZv6pgd272OQOEnZVDVZYDlRea8TPJ8ri2aF/g6LDdEaDVV2gJ5EokfBUe4= X-Gm-Gg: ASbGncuyPcNMnzbLv5hBJdtNUEyKxKCF244qErYlUfxaDj9MK5TCgEw759xSntzQ14E Ecq4CFki6+IyXG+/AT6QjvTCjzHS1dtRyCZVy4qmYcuALzfzobA/Rh3O9DCmklwYF5K/0r+2C9/ 8PODhatB/D8kMsRdeCNq20pAdq+AV2ykPV0LexR/0V94Tyry2vKHSPl5naacQN217WztkhLyna2 dXJB52Ey6YPE2L5e1iXRKJrvxSpgZtU9ybV5ofotRAcSrBVQtd3X5aRvbMCJGMKgU1CgqVVjFkQ bLIbJ+nPIXmkSXM+smA= X-Google-Smtp-Source: AGHT+IETLElqsMwzUMAfae/TCAVOpEGuh8jeYsJtvh7I5ZQpGcY/b5dQsufHQP1bMD6MomTu5+GjAw== X-Received: by 2002:a05:6e02:1d1a:b0:3cf:fb97:c313 with SMTP id e9e14a558f8ab-3d04f8f6ee9mr1728975ab.18.1738698501376; Tue, 04 Feb 2025 11:48:21 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ec746c95c4sm2841466173.127.2025.02.04.11.48.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2025 11:48:20 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 03/11] eventpoll: abstract out ep_try_send_events() helper Date: Tue, 4 Feb 2025 12:46:37 -0700 Message-ID: <20250204194814.393112-4-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250204194814.393112-1-axboe@kernel.dk> References: <20250204194814.393112-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for reusing this helper in another epoll setup helper, abstract it out. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 01edbee5c766..3cbd290503c7 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2016,6 +2016,22 @@ int epoll_wait_remove(struct file *file, struct wait_queue_entry *wait) return -EINVAL; } +static int ep_try_send_events(struct eventpoll *ep, + struct epoll_event __user *events, int maxevents) +{ + int res; + + /* + * Try to transfer events to user space. In case we get 0 events and + * there's still timeout left over, we go trying again in search of + * more luck. + */ + res = ep_send_events(ep, events, maxevents); + if (res > 0) + ep_suspend_napi_irqs(ep); + return res; +} + /** * ep_poll - Retrieves ready events, and delivers them to the caller-supplied * event buffer. @@ -2067,17 +2083,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, while (1) { if (eavail) { - /* - * Try to transfer events to user space. In case we get - * 0 events and there's still timeout left over, we go - * trying again in search of more luck. - */ - res = ep_send_events(ep, events, maxevents); - if (res) { - if (res > 0) - ep_suspend_napi_irqs(ep); + res = ep_try_send_events(ep, events, maxevents); + if (res) return res; - } } if (timed_out) From patchwork Tue Feb 4 19:46:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13959697 Received: from mail-il1-f170.google.com (mail-il1-f170.google.com [209.85.166.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 94893219A6E for ; Tue, 4 Feb 2025 19:48:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698505; cv=none; b=moZuCdI0MfGIxyOzGLvPxlaBQKKPX7HibickXWVo9MixIDcTGCexa44GhMHPMHqQClF46MoFKtHVEu8drWyUWkYGmMB9Ks9GL7tkYGgm1M1woBzE2LHfV8Dsvukwgc0MJtQSlMWsv93X6iKzFRIsGbh7EinihTg4NRXv5dTLEB8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698505; c=relaxed/simple; bh=Lq/TvPbkX+M9fh/ofSu3jq3ZsdO0u7WjpsDGycNAOmQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ekMIyhTfGLMuMyyeReDLtOm78aI9uCeO/or+JyxbT0W5mn3rNhkKG3LGCes+Fwu5m11FQZGgPY+47Q6q5eIntUVnlXKG2XNJ2pblVN4PptavTKjIx8BAxLbO1NkC/1yj1JfOwkmfgp+53n41UbeRSMRLgWY1sT/8fyMskt+WSew= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=ZwGeFWo3; arc=none smtp.client-ip=209.85.166.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="ZwGeFWo3" Received: by mail-il1-f170.google.com with SMTP id e9e14a558f8ab-3d01dc5a7f6so9382055ab.0 for ; Tue, 04 Feb 2025 11:48:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738698502; x=1739303302; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=enUf4uTUQbLHbQ4xi6ZoCFVbZFlA1oPnBMw51mV/SrU=; b=ZwGeFWo3zMUKS9HrCbnBiv/tn63qE7Xgsu88K9cm/VUkuR6mjBqKNrSQG+H61BjtcD nxXbIHI7R+jYirPeKiTXyxujvOrPfNB5+UHadKTdf42A94l5aAADp+pBcA9OopN0OPcI RZhpIzpFUEn/Utj/n9D03kuHegtGznSlgWLlx7GRDBGjZfB5b37GGCdZlbO2RxGWPezn aV2Y5iQY6F0KpoJsz2+GBI4o5o3KlizEebm7v5fDy11Vki+J5t7Wb96Pnah3FvddMCMX HJ8t0DcUBko0+k+CjctwUa2pwnxl3CJs9ShL6x4PnJ9xX4U5A6S+9O81kWZ3vLwEVhRY FWiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738698502; x=1739303302; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=enUf4uTUQbLHbQ4xi6ZoCFVbZFlA1oPnBMw51mV/SrU=; b=Ra8V0ut+MCp5d0GedbkkD9fXB2I/RC/srHcwJzbBetvPv9WMHmv9Vs9NGBbkIBZPbc tKhio/BRY/6tM2gXCh6cd6OJ3VUvVd+uI0wDfBntbg67D9YD1hcVzY6TfH6Pz9Saa8nW xkJZJCA9OPypKmVQWqXlxnRDTYUDoSa9ofOF+S52ijx63+ebFLmI3k775s0vK8hXBgWk lhI4e6kDcJlXJMIO3bBUafaw3bj2vnKLHeQNnbZUX7YtBQ8qYVMkDSnua6bIw9AXzI9r c9CiqBnvlabOg71LckBRqULcfEAptG+frzld7D1Ak79krtRDHjv1IKsW4CT70KhnMSaw aTyA== X-Gm-Message-State: AOJu0Yx4PyHGPucANZ5Lp7WV8fKr4mLXXo0mvo4iZOQqfs4ffZsylnnS Uu87imPNymgcLqNpl2C+RcSK/4Nr2l+GueWFUvnpn93zveazPhDT7yHBV8f6MLI= X-Gm-Gg: ASbGncvV3D+ikS25FCDMcn6iTYeU1ZO7cIrqN19ZS6Id35U5nVopITz4A2DIiXWoVBJ aiFvEepaoG1xo5ahZrjVN+ZhJ/lbTtCN/pcyE+KEF8F1QbO978tr+eR5LnUKfK7eCSmcX1ckQK9 E+GNJ6Et1G/8KbaesupylL3TezUtjq/lnSgZ8d2dcHwfoCilpkefbUlIA6vXuki/IZavFZkvotK QLJn+uQcSUj/Qq+mpY3jG6biV/76yd/ULlDh6m5oWqrDo3/x2LSBByjp8VZbfjaMYD/mmL4cUFQ COZsgqxw++jkloPRPs4= X-Google-Smtp-Source: AGHT+IF/nxjM7bisnj4ikl2AuC0UZKr/Ut8VzJ+BcIeuFUeqlzaZ7Tx/qZxbD+5/2uh2JILKnff/YA== X-Received: by 2002:a05:6e02:1705:b0:3a7:87f2:b010 with SMTP id e9e14a558f8ab-3d04f4052damr2387755ab.5.1738698502608; Tue, 04 Feb 2025 11:48:22 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ec746c95c4sm2841466173.127.2025.02.04.11.48.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2025 11:48:21 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 04/11] eventpoll: add struct wait_queue_entry argument to epoll_wait() Date: Tue, 4 Feb 2025 12:46:38 -0700 Message-ID: <20250204194814.393112-5-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250204194814.393112-1-axboe@kernel.dk> References: <20250204194814.393112-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for allowing an outside caller to add itself to the epoll waitqueue, pass in a struct wait_queue_entry. Unused in its current form, but will be utilized shortly. No intended functional changes in this patch. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 5 +++-- include/linux/eventpoll.h | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 3cbd290503c7..ecaa5591f4be 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2470,7 +2470,8 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, } int epoll_wait(struct file *file, struct epoll_event __user *events, - int maxevents, struct timespec64 *to) + int maxevents, struct timespec64 *to, + struct wait_queue_entry *wait) { struct eventpoll *ep; @@ -2509,7 +2510,7 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events, /* Get the "struct file *" for the eventpoll file */ CLASS(fd, f)(epfd); if (!fd_empty(f)) - return epoll_wait(fd_file(f), events, maxevents, to); + return epoll_wait(fd_file(f), events, maxevents, to, NULL); return -EBADF; } diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 1301fc74aca0..24f9344df5a3 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -27,7 +27,8 @@ void eventpoll_release_file(struct file *file); /* Use to reap events */ int epoll_wait(struct file *file, struct epoll_event __user *events, - int maxevents, struct timespec64 *to); + int maxevents, struct timespec64 *to, + struct wait_queue_entry *wait); /* Remove wait entry */ int epoll_wait_remove(struct file *file, struct wait_queue_entry *wait); From patchwork Tue Feb 4 19:46:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13959698 Received: from mail-io1-f45.google.com (mail-io1-f45.google.com [209.85.166.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A7DED219A8E for ; Tue, 4 Feb 2025 19:48:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698506; cv=none; b=QtC1sUItlmSA6eWCSLeGGGzpTL+NkpilVYON64yLGYpFWrqldxR7NuCQwHGsZXCy5sOou1S/cxc07obc3AMEi0T+bSucTRnki+wAEGnPLXxhmzLsKRBVgwMIeAGty3aCnczw09dC/g/xJswC8DdXlPzrqjfEjRGQbE1R9w8f0hc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698506; c=relaxed/simple; bh=ayD177cZGh9bhC4WV7iQTen+4xZyD0WE3g92TyrwJnY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KDjbkEJDV9J924c5ByKnxnM8d6HyeFe3GuO5LeuW3p2VMEcYER8DETeTV9XeDrvftrAgg7DJLvho42He6QjP66HLWBJBbjOiQtBcgzoEF1eTzoluWS8E1vT5EYcLrEpMybq608tZl0rjsh4y/RExFmO7KHhRtpd0cfqMKVetKuE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=u8fDDmQL; arc=none smtp.client-ip=209.85.166.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="u8fDDmQL" Received: by mail-io1-f45.google.com with SMTP id ca18e2360f4ac-844eac51429so448532539f.2 for ; Tue, 04 Feb 2025 11:48:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738698504; x=1739303304; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mCWW/DHnmpnY+C0n2fbNEGbgNNkUYylpMG8Oip3AQsg=; b=u8fDDmQL7S1VYlCJPIfJ2aKUY8hDEL4Iqyndt8ycWgiD3+2/XyClbeQlitSIwJF1Po dIcv6tAvHUuyL+SWToIfnCl7uqNDWCDovzTvV/vfms3GMTcejFXWt+3tVeU+K1DiAL5H p8tz7V/Uk3NUjLxWrH3fHfpDYPogxvK6q2sTyrRdKSvpvKmMHGEJSBhbh5BAcjthG/Yp fhuf+ZwULFcH+hJhrQ/DjExnNu+PKzEmIDbNYtrIi/u4S+FWauqnqzDTntEfCmvGf7i9 znI9CqNai9ijOByKXCGFe7Zir1mWV3IE0yy74BrVuKZjBcmiBqBl/Go55NJoS42Oercd OPtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738698504; x=1739303304; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mCWW/DHnmpnY+C0n2fbNEGbgNNkUYylpMG8Oip3AQsg=; b=m8bMjquBqtySA/9JaMnp34+4hnvazIb/fJGj5tfvSk50jvJ0xLlMG1L1CQCBSETBnz CLcG1/1qqbM7CKYEKPHaZppoR2+GoMerNUWWd3pgTVVum4cDyXDQyCovTGP50o+uolsL iGoTrTXy8DHUlX/TfZ6Qo0Cr4qaWk8ULctZ19vHgr1xsMbk24S+CfpxKUn79j0RLCtKx 3nPPdVZcgvNxqrT9OCYg8bHM/QRcMrcC37FrjcN0MiIPzjsKfeYHlWzMv7nxgC5sou77 N8zarHJT8z7LeMlQI4+bny9HAAlcpoQMhuLHTWPC8OcXgHsLbZzBiKBjO4EenY2zNkmu 8jEg== X-Gm-Message-State: AOJu0YwY5EGFn5FdziOkNHkFuw5p7ZooXQrZxapq40UufBXMyDwb2UD/ 1V1i4UCAVZdQNsIo6yQkau5UuS88Uwrs3VclP0zg5SeP/sdX4FGe7UzJEw/EMK0= X-Gm-Gg: ASbGncu7Pb6Xi6BBU7zx2vQNfjTI31hsxlqPoQ50F/5gK1km/t6CmiBH0nMXmsKjpcb KkHvCTh3E90XGpDzYnnuxGHozY8tk2M4Q82wI8UUldqOPVGfjTPKgZZlRZpI747Dk195NrcEbA8 nNwu1ncJzekKT5zgwn82pof/Fgk5gf022F9bjM7zzgdh1P5JMh+ByZ0HVSoNV1hHK0JLIsTPXv/ T4ep9A1NvhdpVqCN1gfO6vQBJwIJOD11VhBxBtR0cJm0Pq6RBSOlDFpKXQ46IUb0XxNGP0nWUiv 4jH8hYxLUvc6r0eJxYY= X-Google-Smtp-Source: AGHT+IF7dUsRk254fowl2HD6TqYE+4+YaIYbCM0ktGhfBv+SkQBsJEtqt76l4Q1QMeTr6rPavpxGEg== X-Received: by 2002:a05:6602:3689:b0:84f:5547:8398 with SMTP id ca18e2360f4ac-854ea50fbfdmr28213839f.11.1738698503845; Tue, 04 Feb 2025 11:48:23 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ec746c95c4sm2841466173.127.2025.02.04.11.48.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2025 11:48:22 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 05/11] eventpoll: add ep_poll_queue() loop Date: Tue, 4 Feb 2025 12:46:39 -0700 Message-ID: <20250204194814.393112-6-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250204194814.393112-1-axboe@kernel.dk> References: <20250204194814.393112-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If a wait_queue_entry is passed in to epoll_wait(), then utilize this new helper for reaping events and/or adding to the epoll waitqueue rather than calling the potentially sleeping ep_poll(). It works like ep_poll(), except it doesn't block - it either returns the events that are already available, or it adds the specified entry to the struct eventpoll waitqueue to get a callback when events are triggered. It returns -EIOCBQUEUED for that case. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index ecaa5591f4be..a8be0c7110e4 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2032,6 +2032,39 @@ static int ep_try_send_events(struct eventpoll *ep, return res; } +static int ep_poll_queue(struct eventpoll *ep, + struct epoll_event __user *events, int maxevents, + struct wait_queue_entry *wait) +{ + int res, eavail; + + /* See ep_poll() for commentary */ + eavail = ep_events_available(ep); + while (1) { + if (eavail) { + res = ep_try_send_events(ep, events, maxevents); + if (res) + return res; + } + + eavail = ep_busy_loop(ep, true); + if (eavail) + continue; + + if (!list_empty_careful(&wait->entry)) + return -EIOCBQUEUED; + + write_lock_irq(&ep->lock); + eavail = ep_events_available(ep); + if (!eavail) + __add_wait_queue_exclusive(&ep->wq, wait); + write_unlock_irq(&ep->lock); + + if (!eavail) + return -EIOCBQUEUED; + } +} + /** * ep_poll - Retrieves ready events, and delivers them to the caller-supplied * event buffer. @@ -2497,7 +2530,9 @@ int epoll_wait(struct file *file, struct epoll_event __user *events, ep = file->private_data; /* Time to fish for events ... */ - return ep_poll(ep, events, maxevents, to); + if (!wait) + return ep_poll(ep, events, maxevents, to); + return ep_poll_queue(ep, events, maxevents, wait); } /* From patchwork Tue Feb 4 19:46:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13959699 Received: from mail-il1-f177.google.com (mail-il1-f177.google.com [209.85.166.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4C43204F7F for ; Tue, 4 Feb 2025 19:48:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698507; cv=none; b=oKNJ2WAcjvnID2G2AGgC/f3qdYyDeJqTBtsT2mRTCfMJPrVxd9lz3ffxC+oqXthW6POQysw1xB6QmOztTUp/Gx4Z/EhQ2CprSfcr0c4MgJzol6wghBFcAfw40zxW2eIlOf/dQQFRyVg71bMCZEvCJVJcOThN171mWAsCu2M0U58= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698507; c=relaxed/simple; bh=AJ4EpG+a/HnjEaQKeFR8lZf7Ex4++mtSxvprrt1iy8A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=i+1mNWfmXby45iY/srkdBrYtvAsS5zDvt9uk6ykRbZTfQ8Yi4gdrVqFaWK92Qy+gznKbr/lqWD4PGToKcyR3TtPcSPryTtGODUzFpv4s2I1Aas2M2Cb40CthpPvosj5OK5d9Jfl1PUHP3ZZ1M5S+wcHun9P2JXFJdBN+96C2ch4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=JfN2gut8; arc=none smtp.client-ip=209.85.166.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="JfN2gut8" Received: by mail-il1-f177.google.com with SMTP id e9e14a558f8ab-3d04fc1ea14so364865ab.2 for ; Tue, 04 Feb 2025 11:48:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738698505; x=1739303305; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Yl4I89AW9N+9ZjPLUSwNXzZLneK6YdFoM+i0fKJ0Giw=; b=JfN2gut8Mxyx+Qm9lTtApKyUPE9Vk+jkt8VT758RePaXAhRuka29qcPnwrIGQbmros krHD88HtdrvAy7cvfMXtiSm8wweQQHxGYn3gUzI9ybt5QgY/7Y96xJzhHpsuc01X082K q1vSCkQdUCY9fKWJ2jEiTZ7BZKbWWat33liqs0jSJvaJ95+I6syqal1eZ3Iw1eeCxBXB vD9Fei9So0vXWrLR5M0XAULuPFoiKMjCqtQ1E8K9X2WajZzi70eST6m+POhNxZ7kn5J5 NbtEB2/L/MwVMVx8n+pgMz+gOCbR52DnnnrbOneufoeJFEPLwgz7uCOdKPLdRVpD6FVU W3iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738698505; x=1739303305; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Yl4I89AW9N+9ZjPLUSwNXzZLneK6YdFoM+i0fKJ0Giw=; b=FFun6dokaV4WoVPPGZXIDyU0S6w4TgQbFH+8GH8zGEE6dj7zdKGwSr4+Ns6ICOWVAZ ZHBOt2VlyHM84WSvAuzV2zepqc3jUyF2S2gDIYJ42cgVrslucefgRwkI0+sBuy4lPOb5 w6Pfeqa0jf7w0Q5lcFAeB5pBOurV+dxssY0HHKcrswg2dcfulxQT4UDwznAdObvToxkg +BIcQx8emDpiCuxq2ZpEDvsJMvnGWnJwS6XjXX0xKEP7s0h0to8lwuIK9rsVWnmYGRzE JK/QFfxh50HTQqj7YA62whLkRxU0G+cJXeFOpJjFsdHA3XZk+kexpGMnxuaj7d5jv8dS xCPQ== X-Gm-Message-State: AOJu0YwZ6d2M4Iig4bt7mFdl+r6G1o/23nudR8YqHvEgn7mMJRBUVJN+ kjDpp5z7RvkDu16qZ3acPSfZPqZn8MjazlwOHenJPraKp1L+BjNF1P0MlQXWjTUCyTSYcqnxS2v M X-Gm-Gg: ASbGncu1wAOjS2XVDufwFqnWVHr9ic+LD7Ckeh4XwEcw8A6fd/BAxbSAjiXFMc3fd+v S7GgNjnm60Aoaml9BJPTYZKluj+daTJ1rTTxcs7hDskjsiMMY2R610HlZBCoZrKXsBhLM7BJxGl +BAJIHGJBT/RV92BK85Ljwh2IWZbMgZcatkH32s+HmieSl1gk+Tmp9rvJzUxUXQsgDc6RGb28bp ZNXpCP29+tZskugt/VT57KAk8tQh29KMcZYPR7aTKJXjp3Lw6KJX3PXO4R/Zg9LEeW3ZvDznQns bdA2mk7tU8QzFmAF4Ek= X-Google-Smtp-Source: AGHT+IELVjxsL1tqCfebEN1r34RIVmp0hKbUIDlK1Ylpdn3ib9har3+McLtmJPMliq4v7hQ7wmzlMg== X-Received: by 2002:a05:6e02:f:b0:3cf:b2b0:5d35 with SMTP id e9e14a558f8ab-3d04f41ad8amr2550205ab.7.1738698504932; Tue, 04 Feb 2025 11:48:24 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ec746c95c4sm2841466173.127.2025.02.04.11.48.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2025 11:48:24 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 06/11] io_uring/epoll: remove CONFIG_EPOLL guards Date: Tue, 4 Feb 2025 12:46:40 -0700 Message-ID: <20250204194814.393112-7-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250204194814.393112-1-axboe@kernel.dk> References: <20250204194814.393112-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Just have the Makefile add the object if epoll is enabled, then it's not necessary to guard the entire epoll.c file inside an CONFIG_EPOLL ifdef. Signed-off-by: Jens Axboe --- io_uring/Makefile | 9 +++++---- io_uring/epoll.c | 2 -- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/io_uring/Makefile b/io_uring/Makefile index d695b60dba4f..7114a6dbd439 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -11,9 +11,10 @@ obj-$(CONFIG_IO_URING) += io_uring.o opdef.o kbuf.o rsrc.o notif.o \ eventfd.o uring_cmd.o openclose.o \ sqpoll.o xattr.o nop.o fs.o splice.o \ sync.o msg_ring.o advise.o openclose.o \ - epoll.o statx.o timeout.o fdinfo.o \ - cancel.o waitid.o register.o \ - truncate.o memmap.o alloc_cache.o + statx.o timeout.o fdinfo.o cancel.o \ + waitid.o register.o truncate.o \ + memmap.o alloc_cache.o obj-$(CONFIG_IO_WQ) += io-wq.o obj-$(CONFIG_FUTEX) += futex.o -obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o +obj-$(CONFIG_EPOLL) += epoll.o +obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o diff --git a/io_uring/epoll.c b/io_uring/epoll.c index 89bff2068a19..7848d9cc073d 100644 --- a/io_uring/epoll.c +++ b/io_uring/epoll.c @@ -12,7 +12,6 @@ #include "io_uring.h" #include "epoll.h" -#if defined(CONFIG_EPOLL) struct io_epoll { struct file *file; int epfd; @@ -58,4 +57,3 @@ int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags) io_req_set_res(req, ret, 0); return IOU_OK; } -#endif From patchwork Tue Feb 4 19:46:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13959700 Received: from mail-io1-f44.google.com (mail-io1-f44.google.com [209.85.166.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 017A621A422 for ; Tue, 4 Feb 2025 19:48:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698508; cv=none; b=oRCQ2+yv/vMWLIq5PLxaaJdbaOgFV+y/31dvujz7zn2+CTdjwt+j9CBrVIAlo9bEoFl4PIQYOhUlWr67ZINdio1gnDQ8OfzIL6LxLonaCCh8ldbHo11iftARkPbXwBHvzsmfcoo4zA0GFBoZ1s/mwGcMyni6OHXWNe+xsS8begM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698508; c=relaxed/simple; bh=im1ntOsYLHYjJiEvx6njeC6vx3wirWuYtKuqP171Gs4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UvDgDgd//4b3WSpDstxeyuqR5DWNhkw+OrkS5C2UbO4Gvy8wUoKapfLegBhSLyuWrBDtKhimegOd5BwozCYxJtOQDchlR00RJ6OmtKrCgeGEGF2Z8RTa8d3gTgcO0q0NyxoXIMwcvGWHr1yDodmXqC9fgEmIKKiW/8ywsCtPxCg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=SsvWfTrP; arc=none smtp.client-ip=209.85.166.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="SsvWfTrP" Received: by mail-io1-f44.google.com with SMTP id ca18e2360f4ac-84a1ce51187so163989539f.1 for ; Tue, 04 Feb 2025 11:48:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738698506; x=1739303306; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TQ8sKu9TA7JXcFTCTxuSEZOOG4qI+mYE1gZPeZ19VjI=; b=SsvWfTrPPFbY5eWm1kGdmTK4bIFiTSznfzgR9fXviYxF+7VaB4SmDSUx0rSPBeIMNn f13pUbio3mp2TBgMC8WZwbwK1zCyDob5A4EJTdHW9WqdeyiDvNAxao/ZQPZd3k1q2mLH gvErpQzOaNyQJVuLFAQjYgS9J9CoNJwlNmdglJwZua0gm1dyAsT13bn32QR9Q7MkC1J4 4+itUYlkQkeILa4XSo8s0biwRTaykHDnYmcZurDxm471Zitk1kLsEH2ZzFOn+otjMiF4 a8G16c5/f/cobgJNwUyXWca8THYyA1mre63/j31MnKXssrODU695XAhza5iMtRaGsb6S NF6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738698506; x=1739303306; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TQ8sKu9TA7JXcFTCTxuSEZOOG4qI+mYE1gZPeZ19VjI=; b=HS31LadA+Wb5VLPWpupZ0lbGBqrrm3cVRNS16e7sGm1P2yH097q3t36r6xxryaZ6/n JZwcixeZA+2hwS/WDEEYNnMnxuD6lVfJygw7Gf1Hu6COwPAMnNicD9/cKxiMAq02P+TS vlQRLJG0J15u+gFu3dc2UUT99fT1ILAyPm/XrmV3BEz0q+U5R2QLkFZ2xOZtxYDDkfgI +KwQygbb2pSlI501n0Ib98CwlOg2QA8xv7GnVkaD0FTV3sGOoI7fwMBIHmqRJiJxBKRY pZYTuK+0bh2JSdSRmh+A8HZedMSF0M6AC+a8ZKjGKyvKLpw84ssTUBTK3vzXc6TslMWO vyzw== X-Gm-Message-State: AOJu0YzVcQEvbSy7QSRf3LpNCqg8wWDC7yQbgLd5PUZTVqSl8zGfLufU HSuCT1Tb2PCjNP7CTO16fiPfaFBu0U1M2mXKUci5DEzWDkEH8JpcC/RJhl9j3a8= X-Gm-Gg: ASbGncumMwVFDEzFZqVPgBPulGQq3SI1AO/A8EHLUnANT7XwLjWk062ww593nmAvySU AbE1ZPGhRqjz+6OJ9hkn68QmDUFUUdiCqdryvv3+7b5TBNa9hY7+EQhrWHhFhIOI5zjRKzLpkMn ZxXZnhY4KFEvMS+sAKnW7WVro0TQqQ6mFlpES3ti428hWytC9jWTgWxKrMfNZaq2bvTphvktJi+ +rhagwwrBKhIieDUVePg+PN6XW4csMBW2hD2omjrAQ0UEajmPAfoMNvkRDeAFmS6DbadNsDTwtj gJPIWufGjL7VDFJ8p+E= X-Google-Smtp-Source: AGHT+IGgKsTAIuMWIIUyjhHQg9pg2TRiKqIeaaiPRkZN0hdZykJCKB/DPuXnqQ84gaSqM31W+Yz5gA== X-Received: by 2002:a05:6602:4192:b0:844:cbd0:66ca with SMTP id ca18e2360f4ac-854ea411c82mr28620039f.1.1738698506071; Tue, 04 Feb 2025 11:48:26 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ec746c95c4sm2841466173.127.2025.02.04.11.48.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2025 11:48:25 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 07/11] io_uring/poll: pull ownership handling into poll.h Date: Tue, 4 Feb 2025 12:46:41 -0700 Message-ID: <20250204194814.393112-8-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250204194814.393112-1-axboe@kernel.dk> References: <20250204194814.393112-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for using it from somewhere else. Rather than try and duplicate the functionality, just make it generically available to io_uring opcodes. Note: would have to be used carefully, cannot be used by opcodes that can trigger poll logic. Signed-off-by: Jens Axboe --- io_uring/poll.c | 30 +----------------------------- io_uring/poll.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+), 29 deletions(-) diff --git a/io_uring/poll.c b/io_uring/poll.c index bb1c0cd4f809..5e44ac562491 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -41,16 +41,6 @@ struct io_poll_table { __poll_t result_mask; }; -#define IO_POLL_CANCEL_FLAG BIT(31) -#define IO_POLL_RETRY_FLAG BIT(30) -#define IO_POLL_REF_MASK GENMASK(29, 0) - -/* - * We usually have 1-2 refs taken, 128 is more than enough and we want to - * maximise the margin between this amount and the moment when it overflows. - */ -#define IO_POLL_REF_BIAS 128 - #define IO_WQE_F_DOUBLE 1 static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, @@ -70,7 +60,7 @@ static inline bool wqe_is_double(struct wait_queue_entry *wqe) return priv & IO_WQE_F_DOUBLE; } -static bool io_poll_get_ownership_slowpath(struct io_kiocb *req) +bool io_poll_get_ownership_slowpath(struct io_kiocb *req) { int v; @@ -85,24 +75,6 @@ static bool io_poll_get_ownership_slowpath(struct io_kiocb *req) return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK); } -/* - * If refs part of ->poll_refs (see IO_POLL_REF_MASK) is 0, it's free. We can - * bump it and acquire ownership. It's disallowed to modify requests while not - * owning it, that prevents from races for enqueueing task_work's and b/w - * arming poll and wakeups. - */ -static inline bool io_poll_get_ownership(struct io_kiocb *req) -{ - if (unlikely(atomic_read(&req->poll_refs) >= IO_POLL_REF_BIAS)) - return io_poll_get_ownership_slowpath(req); - return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK); -} - -static void io_poll_mark_cancelled(struct io_kiocb *req) -{ - atomic_or(IO_POLL_CANCEL_FLAG, &req->poll_refs); -} - static struct io_poll *io_poll_get_double(struct io_kiocb *req) { /* pure poll stashes this in ->async_data, poll driven retry elsewhere */ diff --git a/io_uring/poll.h b/io_uring/poll.h index 04ede93113dc..2f416cd3be13 100644 --- a/io_uring/poll.h +++ b/io_uring/poll.h @@ -21,6 +21,18 @@ struct async_poll { struct io_poll *double_poll; }; +#define IO_POLL_CANCEL_FLAG BIT(31) +#define IO_POLL_RETRY_FLAG BIT(30) +#define IO_POLL_REF_MASK GENMASK(29, 0) + +bool io_poll_get_ownership_slowpath(struct io_kiocb *req); + +/* + * We usually have 1-2 refs taken, 128 is more than enough and we want to + * maximise the margin between this amount and the moment when it overflows. + */ +#define IO_POLL_REF_BIAS 128 + /* * Must only be called inside issue_flags & IO_URING_F_MULTISHOT, or * potentially other cases where we already "own" this poll request. @@ -30,6 +42,25 @@ static inline void io_poll_multishot_retry(struct io_kiocb *req) atomic_inc(&req->poll_refs); } +/* + * If refs part of ->poll_refs (see IO_POLL_REF_MASK) is 0, it's free. We can + * bump it and acquire ownership. It's disallowed to modify requests while not + * owning it, that prevents from races for enqueueing task_work's and b/w + * arming poll and wakeups. + */ +static inline bool io_poll_get_ownership(struct io_kiocb *req) +{ + if (unlikely(atomic_read(&req->poll_refs) >= IO_POLL_REF_BIAS)) + return io_poll_get_ownership_slowpath(req); + return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK); +} + +static inline void io_poll_mark_cancelled(struct io_kiocb *req) +{ + atomic_or(IO_POLL_CANCEL_FLAG, &req->poll_refs); +} + + int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_poll_add(struct io_kiocb *req, unsigned int issue_flags); From patchwork Tue Feb 4 19:46:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13959701 Received: from mail-io1-f48.google.com (mail-io1-f48.google.com [209.85.166.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0E87204F7F for ; Tue, 4 Feb 2025 19:48:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698510; cv=none; b=ZX02UWlYd5EjBDIT/SIl5rsVYGY7Ga4E6IzVJr+E3dfX95NsijvwIQYcNKzJhopPCHVos4VgMsl1rL6YT/M7QuCLJB4I+sFywqJsOQKtBzTDiwJ8g6C2egeIuxq3Ng3P9qpWR+WvrsfNKAiTPwuit7gsFkJ4AT46Y70PuIrnA4Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698510; c=relaxed/simple; bh=iDg1pcafrTMf1M9THTPfS2st6QLt98jXRHhiBzRj5oQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lFaQJgUwRuuvwc+47xqFGhF5u1udNL5F5FN5WEHWWrxtsVKSIWXk1f7GX3P9CwWubw6LFkFCXSOMunqzymmXxqZed8l+7yQLypJ0CnO08fqgKs1jVlmM5SlDg5bp8m/GeHLKavtLH62fLN1YnIYgwgY0Io5pRTmvGQ1Awl9jcZY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=iQnT28YV; arc=none smtp.client-ip=209.85.166.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="iQnT28YV" Received: by mail-io1-f48.google.com with SMTP id ca18e2360f4ac-84a012f7232so4410239f.0 for ; Tue, 04 Feb 2025 11:48:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738698508; x=1739303308; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1DNE4ajdvoJ7lxr6HNYBzebwW7qzyN+uH2k2BONZWcI=; b=iQnT28YVwSV9Z2vBpZk4m77N/DpfBaU2BP70tZxHLHYdXmiEsj6Qw1CRan1uWQM926 PRiQbvOJTa/aFo9zNbRJdEYXepBM55tnzkB7VMrpBEgzwDjRSiPTmt7eEsOrNxMs7Vfk ZNDN2aCdEC9tms2i/PVWKHw6qQXLgy7yrO1l7EXa8Sp65njp//ie2UG0PW5yMVY0AZ6l tqDTcIAi2dZuxx+CAt817FQP7GPljxyMwt+/HAaNW0DXzcZFYuRkUSZ06lKDXkyNhb7K ikkqH6J3un3gv7sDJ4RgFnUWlr28p8RUSIxbQy3QZ3rYs1FWPhfCPa/mZ5ZYGvhNszgx j2SA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738698508; x=1739303308; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1DNE4ajdvoJ7lxr6HNYBzebwW7qzyN+uH2k2BONZWcI=; b=IYRlajSaG+F5XiAiKihqqCw9g2qDu9grYl5IZPCM51Slt0O0M9B3lciKlp/ebEWi3s vj3JDahCYAXiF/9ljA0qbtuYqLvJYQqX0Gty2XnSWFN1Difu0+JQyUIgzdmvOFSUZvmi Rf9JAaIbTf7De6dI227hKlGMQ6s996N6sc1z8VGOnTmWvGPxa6sFb0sIY+m4J1xR+t6s j7c8VaEB4rf04Dk8XF/9BNZp41Uif12hr3qGn7KrFmO/z8Yo6Gvzg8x1FzuC3C3IT2Lu ycjDFWSWQEkETfl5TQ1W7CEZ6Im3d+lNAFeB0OerI/K7qiKWzfaJOQO/T+CrROv4CXM8 85gA== X-Gm-Message-State: AOJu0YzkDcuz9SuA7WxSZVlcB/8N+euu+YWq49X1nvMCNp+w6QW4JaWh +SSQpGHP80qFhc993+5tb+WoshcdvkkQcAGBYLNmqlp2sMLJGEuOJxxhvrCsQsnCv3oAsFw20oj O X-Gm-Gg: ASbGncssckbEsLwaGSzXgtCcEINiZA6AAkTT2/AZYFZPy1JTVdXTPG3bwcYHu0xhroQ Fz8+REHYgNXiq0J9FrKD2lgWy8etHhFpQs9vAYHIZVLHwtNSU8a/yiXuBQy39dV4wUfpHDFrZEQ HEY6RDXSaQGleusZrfNSLsJ+IW0Hdq0wvmiwHtZ65ygvr/e3PagjDsWF3JG1YcykbVwsfAlEr/W LEL4wmSOB4EdUGlEJbl4C8XqIQ9tYZYDPkN2ayrvKaWA7dsezxM0qj90BHOYQ8fkJJ4KTUFrpNy +PoYschPI5KtAHh68is= X-Google-Smtp-Source: AGHT+IFNPD1IIdcxu8TcCpr/goDKctc2AwLqLRzlAImxHhejCrluUFCk1NTnS7+2N+VAFH1n4BQjJA== X-Received: by 2002:a5d:8d91:0:b0:841:9225:1f56 with SMTP id ca18e2360f4ac-854de076c3fmr394580239f.3.1738698507837; Tue, 04 Feb 2025 11:48:27 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ec746c95c4sm2841466173.127.2025.02.04.11.48.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2025 11:48:26 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 08/11] io_uring/poll: add IO_POLL_FINISH_FLAG Date: Tue, 4 Feb 2025 12:46:42 -0700 Message-ID: <20250204194814.393112-9-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250204194814.393112-1-axboe@kernel.dk> References: <20250204194814.393112-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Use the same value as the retry, doesn't need to be unique, just available if the poll owning mechanism user wishes to use it for something other than a retry condition. Signed-off-by: Jens Axboe --- io_uring/poll.h | 1 + 1 file changed, 1 insertion(+) diff --git a/io_uring/poll.h b/io_uring/poll.h index 2f416cd3be13..97d14b2b2751 100644 --- a/io_uring/poll.h +++ b/io_uring/poll.h @@ -23,6 +23,7 @@ struct async_poll { #define IO_POLL_CANCEL_FLAG BIT(31) #define IO_POLL_RETRY_FLAG BIT(30) +#define IO_POLL_FINISH_FLAG IO_POLL_RETRY_FLAG #define IO_POLL_REF_MASK GENMASK(29, 0) bool io_poll_get_ownership_slowpath(struct io_kiocb *req); From patchwork Tue Feb 4 19:46:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13959702 Received: from mail-il1-f169.google.com (mail-il1-f169.google.com [209.85.166.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E03F2185BC for ; Tue, 4 Feb 2025 19:48:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698512; cv=none; b=mW0TUwqjO1pMu//8SrFuvsrFFV8Te/b5HIAba/QzJSOR2LqFc0iif+TMUPqEbprsuFhIDB9Lvt5UbA3luedr8W5D/ebkEY3Ds9rGm4+XH8vI4vIQVPUnxsY/6Sf418Z0tIsm1B+sUNNkvjZIPziKuJq4oEo3uckd0oiOtcoH05Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698512; c=relaxed/simple; bh=Rzi4n7BnGBjUIYN0OxwHtFC3zQZ/7IVkoBCMx/gBgAw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rzP5nKl/BdQ1o/3+T1okJZ6qhnjTQVSEPaLiyh2qVtBNvQ8aCTidwbw6JZPxfQMlTqpkNQ4h/R+VBN21cs59NedGgHt/Xp726tcChAqII2/Sy0sIRYWnvhH/tlO2uAGdt0LF3qJcvDLUaa648cPAMYA3dejHmHEnC1YhLlZtT4g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=wUYGxhdq; arc=none smtp.client-ip=209.85.166.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="wUYGxhdq" Received: by mail-il1-f169.google.com with SMTP id e9e14a558f8ab-3ce886a2d5bso50610845ab.1 for ; Tue, 04 Feb 2025 11:48:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738698509; x=1739303309; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NI6IDh8GTM14O03ITyxFHnMkqCbc9YajSaStyhMKY54=; b=wUYGxhdqBlckqeYszEpke9XaBMzvqX6gTTWjLzNEnkFNWSNbanp2Nhai7Xr6DRT5Qq TX1WMXZR/5YXQ83cRrqa652nwYRg8FmgfGTZ0u2I52T6psehwO45cxyNYNGDW40Dwcwk GUaAZdSQkO+TfvqPPSfZgfo/Wnh2U9g3/bLoKYLQH6Qi+T2fHhgiRpwY8OaaPuj+hNJt L1OKB79sAzZnQiqUFRr8rm+3ZLPH7W08BnK9UFPqHYpmdEQVK4NdmBiQNh9hUt1Io9qb o6SVe/YtFHEH3pA+rjh3LeC9kdImxog4JZlPRz8Sf7ClZt0PXTLQxN8ves4U7EIaCzrM ifxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738698509; x=1739303309; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NI6IDh8GTM14O03ITyxFHnMkqCbc9YajSaStyhMKY54=; b=N4wenzzId9UM9xR+gh2X3ZMIV4vTTv6MXCaScHmQ9o1Q8yF35QAKiojSwSD08mVUH/ +1n7yWeL3up0gS7x2W9zRhJZ4AugXnj+6CCxYkTxPYBT1+GCvh84oyx3Q12KkOa0T2/M ysk9hNQyA3LkT+1t5dyRE0TyEYgEXNWxoe7KHh47cp6puM2YRHMcNnheYJfhnb/ou6Zd CSXxGvti9Y+KsDTo1Jr0PjTuGaeU1g2QHm3tXE3LnV7J26D+0WPXi+RKCT0y7v2Q/TAL 8bBK4p6JGcCyQrp45m/7MSzXKkvIWeYssfQxYdKiwdFqXf04GKyojJEeAh+zOp6XM0Oe xOwQ== X-Gm-Message-State: AOJu0Yx+2h0Y//7W5ZUYrzP3j8l9x86IEfPvimbsliNwe13c8y1wB5nD +ya+WEhWGn65hvq1De0tt9EsBOGVC3L1pfust5nTo3E/V/TuxGrTZr3xYlCiNes= X-Gm-Gg: ASbGncvz+EzAjapxdjSM3Q23zF5fz77q5+yI0/ULzES2BmRy2iaHBFUpFgejPxA1flk S/umRHtXuS+QL7GpG8UfrbgBr0XrzfXS1r9xnYWoiKgUjtrMoQzfF2yn+vbGV6Op/B9ovgLNh61 nV/cVHPi6yYi5EthD/Qhh3rCXLnflD/dXPrDiZvcxTNaQRd3ZQDN/JXBN/DqFFX+dJcaXzxSwXw kuGOazSAaxDAauOBYKZHgnfYJz/F2xapqPPpTSrmgeQ5zBAKLTqqSKgyqSKrJgS7P8jLKV9iM/+ 2RliwVhdVR7jWcsPsNA= X-Google-Smtp-Source: AGHT+IGk/L0GfIidAS69VEUawevznso8/J9/9a0NNkUKQpGh8FZKaQywp1szfXpo+wo39blZLaGRNw== X-Received: by 2002:a05:6e02:1989:b0:3cf:b6c9:5fc9 with SMTP id e9e14a558f8ab-3d04f417460mr2712465ab.8.1738698509070; Tue, 04 Feb 2025 11:48:29 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ec746c95c4sm2841466173.127.2025.02.04.11.48.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2025 11:48:28 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 09/11] io_uring/epoll: add support for IORING_OP_EPOLL_WAIT Date: Tue, 4 Feb 2025 12:46:43 -0700 Message-ID: <20250204194814.393112-10-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250204194814.393112-1-axboe@kernel.dk> References: <20250204194814.393112-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 For existing epoll event loops that can't fully convert to io_uring, the used approach is usually to add the io_uring fd to the epoll instance and use epoll_wait() to wait on both "legacy" and io_uring events. While this work, it isn't optimal as: 1) epoll_wait() is pretty limited in what it can do. It does not support partial reaping of events, or waiting on a batch of events. 2) When an io_uring ring is added to an epoll instance, it activates the io_uring "I'm being polled" logic which slows things down. Rather than use this approach, with EPOLL_WAIT support added to io_uring, event loops can use the normal io_uring wait logic for everything, as long as an epoll wait request has been armed with io_uring. Note that IORING_OP_EPOLL_WAIT does NOT take a timeout value, as this is an async request. Waiting on io_uring events in general has various timeout parameters, and those are the ones that should be used when waiting on any kind of request. If events are immediately available for reaping, then This opcode will return those immediately. If none are available, then it will post an async completion when they become available. cqe->res will contain either an error code (< 0 value) for a malformed request, invalid epoll instance, etc. It will return a positive result indicating how many events were reaped. IORING_OP_EPOLL_WAIT requests may be canceled using the normal io_uring cancelation infrastructure. The poll logic for managing ownership is adopted to guard the epoll side too. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 4 + include/uapi/linux/io_uring.h | 1 + io_uring/cancel.c | 5 + io_uring/epoll.c | 169 +++++++++++++++++++++++++++++++++ io_uring/epoll.h | 22 +++++ io_uring/io_uring.c | 5 + io_uring/opdef.c | 14 +++ 7 files changed, 220 insertions(+) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 3def525a1da3..ee56992d31d5 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -370,6 +370,10 @@ struct io_ring_ctx { struct io_alloc_cache futex_cache; #endif +#ifdef CONFIG_EPOLL + struct hlist_head epoll_list; +#endif + const struct cred *sq_creds; /* cred used for __io_sq_thread() */ struct io_sq_data *sq_data; /* if using sq thread polling */ diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index e11c82638527..a559e1e1544a 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -278,6 +278,7 @@ enum io_uring_op { IORING_OP_FTRUNCATE, IORING_OP_BIND, IORING_OP_LISTEN, + IORING_OP_EPOLL_WAIT, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/cancel.c b/io_uring/cancel.c index 484193567839..9cebd0145cb4 100644 --- a/io_uring/cancel.c +++ b/io_uring/cancel.c @@ -17,6 +17,7 @@ #include "timeout.h" #include "waitid.h" #include "futex.h" +#include "epoll.h" #include "cancel.h" struct io_cancel { @@ -128,6 +129,10 @@ int io_try_cancel(struct io_uring_task *tctx, struct io_cancel_data *cd, if (ret != -ENOENT) return ret; + ret = io_epoll_wait_cancel(ctx, cd, issue_flags); + if (ret != -ENOENT) + return ret; + spin_lock(&ctx->completion_lock); if (!(cd->flags & IORING_ASYNC_CANCEL_FD)) ret = io_timeout_cancel(ctx, cd); diff --git a/io_uring/epoll.c b/io_uring/epoll.c index 7848d9cc073d..5a47f0cce647 100644 --- a/io_uring/epoll.c +++ b/io_uring/epoll.c @@ -11,6 +11,7 @@ #include "io_uring.h" #include "epoll.h" +#include "poll.h" struct io_epoll { struct file *file; @@ -20,6 +21,13 @@ struct io_epoll { struct epoll_event event; }; +struct io_epoll_wait { + struct file *file; + int maxevents; + struct epoll_event __user *events; + struct wait_queue_entry wait; +}; + int io_epoll_ctl_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_epoll *epoll = io_kiocb_to_cmd(req, struct io_epoll); @@ -57,3 +65,164 @@ int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags) io_req_set_res(req, ret, 0); return IOU_OK; } + +static void __io_epoll_finish(struct io_kiocb *req, int res) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + + lockdep_assert_held(&req->ctx->uring_lock); + + epoll_wait_remove(req->file, &iew->wait); + hlist_del_init(&req->hash_node); + io_req_set_res(req, res, 0); + req->io_task_work.func = io_req_task_complete; + io_req_task_work_add(req); +} + +static void __io_epoll_cancel(struct io_kiocb *req) +{ + __io_epoll_finish(req, -ECANCELED); +} + +static void __io_epoll_wait_cancel(struct io_kiocb *req) +{ + io_poll_mark_cancelled(req); + if (io_poll_get_ownership(req)) + __io_epoll_cancel(req); +} + +bool io_epoll_wait_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx, + bool cancel_all) +{ + struct hlist_node *tmp; + struct io_kiocb *req; + bool found = false; + + lockdep_assert_held(&ctx->uring_lock); + + hlist_for_each_entry_safe(req, tmp, &ctx->epoll_list, hash_node) { + if (!io_match_task_safe(req, tctx, cancel_all)) + continue; + __io_epoll_wait_cancel(req); + found = true; + } + + return found; +} + +int io_epoll_wait_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd, + unsigned int issue_flags) +{ + struct hlist_node *tmp; + struct io_kiocb *req; + int nr = 0; + + io_ring_submit_lock(ctx, issue_flags); + hlist_for_each_entry_safe(req, tmp, &ctx->epoll_list, hash_node) { + if (!io_cancel_req_match(req, cd)) + continue; + __io_epoll_wait_cancel(req); + nr++; + } + io_ring_submit_unlock(ctx, issue_flags); + return nr ?: -ENOENT; +} + +static void io_epoll_retry(struct io_kiocb *req, struct io_tw_state *ts) +{ + int v; + + do { + v = atomic_read(&req->poll_refs); + if (unlikely(v != 1)) { + if (WARN_ON_ONCE(!(v & IO_POLL_REF_MASK))) + return; + if (v & IO_POLL_CANCEL_FLAG) { + __io_epoll_cancel(req); + return; + } + if (v & IO_POLL_FINISH_FLAG) + return; + } + v &= IO_POLL_REF_MASK; + } while (atomic_sub_return(v, &req->poll_refs) & IO_POLL_REF_MASK); + + io_req_task_submit(req, ts); +} + +static int io_epoll_execute(struct io_kiocb *req) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + + list_del_init_careful(&iew->wait.entry); + if (io_poll_get_ownership(req)) { + req->io_task_work.func = io_epoll_retry; + io_req_task_work_add(req); + } + + return 1; +} + +static __cold int io_epoll_pollfree_wake(struct io_kiocb *req) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + + io_poll_mark_cancelled(req); + list_del_init_careful(&iew->wait.entry); + io_epoll_execute(req); + return 1; +} + +static int io_epoll_wait_fn(struct wait_queue_entry *wait, unsigned mode, + int sync, void *key) +{ + struct io_kiocb *req = wait->private; + __poll_t mask = key_to_poll(key); + + if (unlikely(mask & POLLFREE)) + return io_epoll_pollfree_wake(req); + + return io_epoll_execute(req); +} + +int io_epoll_wait_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + + if (sqe->off || sqe->rw_flags || sqe->buf_index || sqe->splice_fd_in) + return -EINVAL; + + iew->maxevents = READ_ONCE(sqe->len); + iew->events = u64_to_user_ptr(READ_ONCE(sqe->addr)); + + iew->wait.flags = 0; + iew->wait.private = req; + iew->wait.func = io_epoll_wait_fn; + INIT_LIST_HEAD(&iew->wait.entry); + INIT_HLIST_NODE(&req->hash_node); + atomic_set(&req->poll_refs, 0); + return 0; +} + +int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + struct io_ring_ctx *ctx = req->ctx; + int ret; + + io_ring_submit_lock(ctx, issue_flags); + + ret = epoll_wait(req->file, iew->events, iew->maxevents, NULL, &iew->wait); + if (ret == -EIOCBQUEUED) { + if (hlist_unhashed(&req->hash_node)) + hlist_add_head(&req->hash_node, &ctx->epoll_list); + io_ring_submit_unlock(ctx, issue_flags); + return IOU_ISSUE_SKIP_COMPLETE; + } else if (ret < 0) { + req_set_fail(req); + } + hlist_del_init(&req->hash_node); + io_ring_submit_unlock(ctx, issue_flags); + io_req_set_res(req, ret, 0); + return IOU_OK; +} diff --git a/io_uring/epoll.h b/io_uring/epoll.h index 870cce11ba98..296940d89063 100644 --- a/io_uring/epoll.h +++ b/io_uring/epoll.h @@ -1,6 +1,28 @@ // SPDX-License-Identifier: GPL-2.0 +#include "cancel.h" + #if defined(CONFIG_EPOLL) +int io_epoll_wait_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd, + unsigned int issue_flags); +bool io_epoll_wait_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx, + bool cancel_all); + int io_epoll_ctl_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags); +int io_epoll_wait_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags); +#else +static inline bool io_epoll_wait_remove_all(struct io_ring_ctx *ctx, + struct io_uring_task *tctx, + bool cancel_all) +{ + return false; +} +static inline int io_epoll_wait_cancel(struct io_ring_ctx *ctx, + struct io_cancel_data *cd, + unsigned int issue_flags) +{ + return 0; +} #endif diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index e34a92c73a5d..78375981907d 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -93,6 +93,7 @@ #include "notif.h" #include "waitid.h" #include "futex.h" +#include "epoll.h" #include "napi.h" #include "uring_cmd.h" #include "msg_ring.h" @@ -358,6 +359,9 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_HLIST_HEAD(&ctx->waitid_list); #ifdef CONFIG_FUTEX INIT_HLIST_HEAD(&ctx->futex_list); +#endif +#ifdef CONFIG_EPOLL + INIT_HLIST_HEAD(&ctx->epoll_list); #endif INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func); INIT_WQ_LIST(&ctx->submit_state.compl_reqs); @@ -3084,6 +3088,7 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, ret |= io_poll_remove_all(ctx, tctx, cancel_all); ret |= io_waitid_remove_all(ctx, tctx, cancel_all); ret |= io_futex_remove_all(ctx, tctx, cancel_all); + ret |= io_epoll_wait_remove_all(ctx, tctx, cancel_all); ret |= io_uring_try_cancel_uring_cmd(ctx, tctx, cancel_all); mutex_unlock(&ctx->uring_lock); ret |= io_kill_timeouts(ctx, tctx, cancel_all); diff --git a/io_uring/opdef.c b/io_uring/opdef.c index e8baef4e5146..44553a657476 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -514,6 +514,17 @@ const struct io_issue_def io_issue_defs[] = { .async_size = sizeof(struct io_async_msghdr), #else .prep = io_eopnotsupp_prep, +#endif + }, + [IORING_OP_EPOLL_WAIT] = { + .needs_file = 1, + .unbound_nonreg_file = 1, + .audit_skip = 1, +#if defined(CONFIG_EPOLL) + .prep = io_epoll_wait_prep, + .issue = io_epoll_wait, +#else + .prep = io_eopnotsupp_prep, #endif }, }; @@ -745,6 +756,9 @@ const struct io_cold_def io_cold_defs[] = { [IORING_OP_LISTEN] = { .name = "LISTEN", }, + [IORING_OP_EPOLL_WAIT] = { + .name = "EPOLL_WAIT", + }, }; const char *io_uring_get_opcode(u8 opcode) From patchwork Tue Feb 4 19:46:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13959703 Received: from mail-io1-f54.google.com (mail-io1-f54.google.com [209.85.166.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A3B2204F7F for ; Tue, 4 Feb 2025 19:48:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698512; cv=none; b=iQmcoRfibM9JcFhPtVq7iFvkKgay++LTYGaAezssTNjQ/WVOwdjso+G6xAC8zFxsjJOx3MXOqus7Mgci3nkXWpcRkRhu6bScR2PYHNATJTkqHUyAJl4MgR4V8xDV+5zjeHBka656igdcAxDALnevTDJZq6MB816xG9fq0wFbUHo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698512; c=relaxed/simple; bh=rMFiX9v+6Qn9eaEXSN5YhNQY8Wccxwgd3xMBzNbmH/g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PeAGngr0eV4eARmQ9hr2h6A/mQ3v/6GIyAJJZziuR/zH17aUXln3FOg8VGBermgcDXOmlMwS4QvG+wWdEkasbaJ54Brud9iMDM3hAJ+/eaeszGdniKh1h2meNyAVTdF59vfU/uaCNhAEYCnnclCDrfakrs5LugHm6tIyeqBrHvU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=nC+RRdxA; arc=none smtp.client-ip=209.85.166.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="nC+RRdxA" Received: by mail-io1-f54.google.com with SMTP id ca18e2360f4ac-84cdacbc3dbso109098139f.2 for ; Tue, 04 Feb 2025 11:48:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738698510; x=1739303310; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=58iy4j9boE/URjl057boaZ+KOAekFDkHOf6N1Ju+vZY=; b=nC+RRdxASnCRNCeerBYaNVopoVENoRF2mcFCdAHXxO0TBOz60PE2ABBcY3YEQfamfr V+A5B8lhK+Q5ts/L/bHmNQX06H+xD/FKJFtzAO2W2jH47qz+NoP1Sl2tHtIAqiu+G/cW +lOIosu6ZfdFU0EgvC87PbgSjzDofkH+Nl0l02xNU4G8veQYlvO21lNQahgmyJi5ZAI6 SFLsKLqOUC2k0j/BcQp+yEsg1m7t/URSZm4lW4FzWAp9fbavlb7Vt9GOqTCG2fg4J7Sz YzDwnME2vyzuWbfn3et5yom+Q5WGfI3T0o1tD1G1mP2JdmEvgFHCCw2ySsi7qIZIU+mC Sdjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738698510; x=1739303310; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=58iy4j9boE/URjl057boaZ+KOAekFDkHOf6N1Ju+vZY=; b=dT5Zx2N/Wb7c3aLq3P/gGGrHQjKyqumnZdUtfm2sZewwltFmvUN0xnIHUtOSThiRSI jUXctZb5sD8HwxmNcPI34ngtlgPw8jq5Iyo/Zr+kTMbhlsjjHO0HHzJJ20mlHRkzfl1/ mqUD/kr8s2z/S+RS1JUIEyBLik0ygzz2yhYz3/lw7Pc/+twHElnzdicW8GCAxWvY39dT TFGSemgRMy486xr4tOaIhqi3bVHuSXs34+ZB1FmAwOeypqwWX/nxWtV0LaykW+mKJM98 0dvICoOMqqX/ynjJyiG66RIPMc/1CGRAMlkqq0faWRxkyvH1X5jF1iCn1l3JeWAbxEnM Q+qw== X-Gm-Message-State: AOJu0Ywadko84Ui3ndgz26d5Z6kEqqH/RIAKXX/u2JwImBUye6gzTs9D ov7gkO1yQM8q1WusPQyPLUXFp6CztjFt57/0gc+slezNw30H4cgkprbYHprup6kpfFOfw6A5BMX S X-Gm-Gg: ASbGncuuolhuox1ZwdPs3WR7b7+CBITqx2+Dl40iETrZ2EQcVGHq9oEh9gy5D4JLEGX PZOJKYbTEENtvf66r0ZdTB/iqLm+uwVDsJzNW85LgQmWxAWwXWuYDCKKx8cxxcRDwT6lRFXAfZa b8MgN7Fd+kTVhjzuYYEIaiGlFaE50H5h5EraPdJ5czSXPympIF3niXtJhSkcxKLaCrz0BPVPqvo odrpYqccblv6IU/yqXrKuudC3r0RKYHULuqYPPqNjEcTGP1YIyBBVSJbm49RuDT5xkKw9DfctmQ hPmWQjMWEv1iIrrSd2w= X-Google-Smtp-Source: AGHT+IFE7ZhJCuBp+WjGdKWIbdxjAaCNfYYqYgyd7hMpS8QJF7O6hJJrVQWqdpwEDpEaYYbdGFmAFA== X-Received: by 2002:a05:6602:3a8a:b0:841:a9d3:3b39 with SMTP id ca18e2360f4ac-854ea436a47mr37859239f.5.1738698510572; Tue, 04 Feb 2025 11:48:30 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ec746c95c4sm2841466173.127.2025.02.04.11.48.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2025 11:48:29 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 10/11] io_uring/epoll: add support for provided buffers Date: Tue, 4 Feb 2025 12:46:44 -0700 Message-ID: <20250204194814.393112-11-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250204194814.393112-1-axboe@kernel.dk> References: <20250204194814.393112-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This will be a prerequisite for adding multishot support, but can be used with single shot support as well. Works like any other request that supports provided buffers - set addr to NULL and ensure that sqe->buf_group is set, and IOSQE_BUFFER_SELECT in sqe->flags. Then epoll wait will pick a buffer from that group and store the events there. Signed-off-by: Jens Axboe --- io_uring/epoll.c | 31 +++++++++++++++++++++++++++---- io_uring/opdef.c | 1 + 2 files changed, 28 insertions(+), 4 deletions(-) diff --git a/io_uring/epoll.c b/io_uring/epoll.c index 5a47f0cce647..134112e7a505 100644 --- a/io_uring/epoll.c +++ b/io_uring/epoll.c @@ -10,6 +10,7 @@ #include #include "io_uring.h" +#include "kbuf.h" #include "epoll.h" #include "poll.h" @@ -189,11 +190,13 @@ int io_epoll_wait_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); - if (sqe->off || sqe->rw_flags || sqe->buf_index || sqe->splice_fd_in) + if (sqe->off || sqe->rw_flags || sqe->splice_fd_in) return -EINVAL; iew->maxevents = READ_ONCE(sqe->len); iew->events = u64_to_user_ptr(READ_ONCE(sqe->addr)); + if (req->flags & REQ_F_BUFFER_SELECT && iew->events) + return -EINVAL; iew->wait.flags = 0; iew->wait.private = req; @@ -207,22 +210,42 @@ int io_epoll_wait_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags) { struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + struct epoll_event __user *evs = iew->events; struct io_ring_ctx *ctx = req->ctx; + int maxevents = iew->maxevents; + unsigned int cflags = 0; int ret; io_ring_submit_lock(ctx, issue_flags); - ret = epoll_wait(req->file, iew->events, iew->maxevents, NULL, &iew->wait); + if (io_do_buffer_select(req)) { + size_t len = iew->maxevents * sizeof(*evs); + + evs = io_buffer_select(req, &len, 0); + if (!evs) { + ret = -ENOBUFS; + goto err; + } + maxevents = len / sizeof(*evs); + } + + ret = epoll_wait(req->file, evs, maxevents, NULL, &iew->wait); if (ret == -EIOCBQUEUED) { + io_kbuf_recycle(req, 0); if (hlist_unhashed(&req->hash_node)) hlist_add_head(&req->hash_node, &ctx->epoll_list); io_ring_submit_unlock(ctx, issue_flags); return IOU_ISSUE_SKIP_COMPLETE; - } else if (ret < 0) { + } else if (ret > 0) { + cflags = io_put_kbuf(req, ret * sizeof(*evs), 0); + } else if (!ret) { + io_kbuf_recycle(req, 0); + } else { +err: req_set_fail(req); } hlist_del_init(&req->hash_node); io_ring_submit_unlock(ctx, issue_flags); - io_req_set_res(req, ret, 0); + io_req_set_res(req, ret, cflags); return IOU_OK; } diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 44553a657476..04ff2b438531 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -520,6 +520,7 @@ const struct io_issue_def io_issue_defs[] = { .needs_file = 1, .unbound_nonreg_file = 1, .audit_skip = 1, + .buffer_select = 1, #if defined(CONFIG_EPOLL) .prep = io_epoll_wait_prep, .issue = io_epoll_wait, From patchwork Tue Feb 4 19:46:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13959704 Received: from mail-il1-f176.google.com (mail-il1-f176.google.com [209.85.166.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C99AA21A457 for ; Tue, 4 Feb 2025 19:48:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698514; cv=none; b=HWYcUPm5GhJ2208Nx2Ak/7bRGJGZ/1gwqbRaO/WxdZzi97yxR2joS96lo7FjUGT9Rprm4uwpxnNAGe10VZGqwIuCZwGijnz1XCnAJsS8H2QB35VK1+SoDxLvCBcJR5qcfZfQ8pW6Mch6g81LgffWIF2cscCjucqKgzjhXmTBdII= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738698514; c=relaxed/simple; bh=7NCJa/YUkbrD14iVC5Fo7E+HifsEy18miILjbxUpvpQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tWIYYdp8j/+9YX/TFkQKVw7g4IjIyyVgW7gJFCh4ch+UPreOX8OY3AW2by2uWUfJ8yE/kRZBJLnMgnEjs4cIf8e70KHzKFeG2yG3e0jGwzzvwVhQEhEeGykDOo9l48f5DIxgCXKvv5l9wiwMIGpZR3ojfnzFhaF4FutFP4sfOZM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=P455PtdB; arc=none smtp.client-ip=209.85.166.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="P455PtdB" Received: by mail-il1-f176.google.com with SMTP id e9e14a558f8ab-3ce4b009465so17325495ab.0 for ; Tue, 04 Feb 2025 11:48:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738698512; x=1739303312; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0n2mNjkTDU5O7DP6W32rHso6Kfv5rwx5G4baOMJXIqA=; b=P455PtdBDi5iPMZVb2rEpFQ+2R9b6VTgNwJw0kfBSU7pzgp3NGznRN6bdbPucGBaZZ GxnUCOGR1CgeXl1n1iDPySYZQ60y+RLy+W7IVKLuh7IQEtpqEHMOOiv+y2aBwtOxraOD 3cc4KPq/0lRPrH9EGBXchochg3KmZSk/hwdjfyQSAtp7DM2aI3e36bo6n7h2hNBU9uqt lmZwOd0Y/zE67KHLlde1xpcxcy89a+akslbU/+drkFd9r6zgUuKH2S3zETerCNOHxw/o RJD+yaws08tFnCGdBE5g9FOcIMmkDiOZ+SlZtVpJLDZr54vulKLzTCJSULXpYu4m3mhb 6qlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738698512; x=1739303312; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0n2mNjkTDU5O7DP6W32rHso6Kfv5rwx5G4baOMJXIqA=; b=reYlEvxQIE+ECm6Z1p7g1J1OqMq50q0VmmGJkKLwOi2QPvUx0U2xOT55vME+s+defZ ZpXFbcStO6cfHT2yITJwGBqUIsu77AX9YmEinMH698WCsnhQx56cdmHON92DujGWLkqq PEMTgzFn9AkGUlcmZZv80RLr67R4ZVix+wdQHzD8L7Pp5QcfaxwKEqrPMN5jFMflvSLy p81IpgZkDa+sljaYJmWPUIaXV/T5Itx3O/nbc1zeazYH+ety3dta2tVkPQLYtvLl1pKj 5yrEtv4qbQHSqT/Bo+bt0gTQMbiNJHeD3ig0DRpKdYu2/gVX/d1fCXLy6w2YBdhxIb+h nKww== X-Gm-Message-State: AOJu0YyCIl/MIN+34SxX3RS7dFKoGwI9miJgzj6Nu7xLC9px8cbFthHv lTT6gzFuQNoHdK2RzZrLhrBQpFK5+W2KI22x+dA/Tv9PI0I8PJzGp559+bw+tqP/LclvnZmc194 k X-Gm-Gg: ASbGncs2UJoDv7e5digKgnk+ZzZCfAULeL8g+5tcPtTDfVjXJGHtAl699ZwU6yAnJYJ f1Y08IULj55CDIPGTVGP/KRXn1rw2rlyDUPzmobys78M7qW1YIS5CWF/PZfnc1HZa7uEN8MceuV e8ceJwfiUVmyeCjTuwqclV3UGmJ41R1T5MkGHzu2D0j4NTjS/dJ1k00opFZUr20P76nFPsZjui6 NKqqjSDDDlKzwXZQHpXjzQM/ZL3PkfJ4D72wNOaCLvS9vGvRQt2wpxqbwVTrSZs7X6ABtddAiLu rhD4XCreTSfvfis1Oxc= X-Google-Smtp-Source: AGHT+IHlV5dF+b2i226E432f8IFH3lV+HH0PKBbcEE5SLTTpyRfoD/A14429fAVuvD4sVNwCMDE13Q== X-Received: by 2002:a05:6e02:13a5:b0:3cf:cd3c:bdfd with SMTP id e9e14a558f8ab-3d04f461636mr1942345ab.12.1738698511915; Tue, 04 Feb 2025 11:48:31 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ec746c95c4sm2841466173.127.2025.02.04.11.48.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2025 11:48:30 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 11/11] io_uring/epoll: add multishot support for IORING_OP_EPOLL_WAIT Date: Tue, 4 Feb 2025 12:46:45 -0700 Message-ID: <20250204194814.393112-12-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250204194814.393112-1-axboe@kernel.dk> References: <20250204194814.393112-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 As with other multishot requests, submitting a multishot epoll wait request will keep it re-armed post the initial trigger. This allows multiple epoll wait completions per request submitted, every time events are available. If more completions are expected for this epoll wait request, then IORING_CQE_F_MORE will be set in the posted cqe->flags. For multishot, the request remains on the epoll callback waitqueue head. This means that epoll doesn't need to juggle the ep->lock writelock (and disable/enable IRQs) for each invocation of the reaping loop. That should translate into nice efficiency gains. Use by setting IORING_EPOLL_WAIT_MULTISHOT in the sqe->epoll_flags member. Must be used with provided buffers. Signed-off-by: Jens Axboe --- include/uapi/linux/io_uring.h | 6 +++++ io_uring/epoll.c | 46 ++++++++++++++++++++++++++++------- 2 files changed, 43 insertions(+), 9 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index a559e1e1544a..93f504b6d4ec 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -73,6 +73,7 @@ struct io_uring_sqe { __u32 futex_flags; __u32 install_fd_flags; __u32 nop_flags; + __u32 epoll_flags; }; __u64 user_data; /* data to be passed back at completion time */ /* pack this to avoid bogus arm OABI complaints */ @@ -405,6 +406,11 @@ enum io_uring_op { #define IORING_ACCEPT_DONTWAIT (1U << 1) #define IORING_ACCEPT_POLL_FIRST (1U << 2) +/* + * epoll_wait flags, stored in sqe->epoll_flags + */ +#define IORING_EPOLL_WAIT_MULTISHOT (1U << 0) + /* * IORING_OP_MSG_RING command types, stored in sqe->addr */ diff --git a/io_uring/epoll.c b/io_uring/epoll.c index 134112e7a505..2474f2e069ef 100644 --- a/io_uring/epoll.c +++ b/io_uring/epoll.c @@ -25,6 +25,7 @@ struct io_epoll { struct io_epoll_wait { struct file *file; int maxevents; + int flags; struct epoll_event __user *events; struct wait_queue_entry wait; }; @@ -151,11 +152,12 @@ static void io_epoll_retry(struct io_kiocb *req, struct io_tw_state *ts) io_req_task_submit(req, ts); } -static int io_epoll_execute(struct io_kiocb *req) +static int io_epoll_execute(struct io_kiocb *req, __poll_t mask) { struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); - list_del_init_careful(&iew->wait.entry); + if (mask & EPOLL_URING_WAKE || !(req->flags & REQ_F_APOLL_MULTISHOT)) + list_del_init_careful(&iew->wait.entry); if (io_poll_get_ownership(req)) { req->io_task_work.func = io_epoll_retry; io_req_task_work_add(req); @@ -164,13 +166,13 @@ static int io_epoll_execute(struct io_kiocb *req) return 1; } -static __cold int io_epoll_pollfree_wake(struct io_kiocb *req) +static __cold int io_epoll_pollfree_wake(struct io_kiocb *req, __poll_t mask) { struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); io_poll_mark_cancelled(req); list_del_init_careful(&iew->wait.entry); - io_epoll_execute(req); + io_epoll_execute(req, mask); return 1; } @@ -181,20 +183,28 @@ static int io_epoll_wait_fn(struct wait_queue_entry *wait, unsigned mode, __poll_t mask = key_to_poll(key); if (unlikely(mask & POLLFREE)) - return io_epoll_pollfree_wake(req); + return io_epoll_pollfree_wake(req, mask); - return io_epoll_execute(req); + return io_epoll_execute(req, mask); } int io_epoll_wait_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); - if (sqe->off || sqe->rw_flags || sqe->splice_fd_in) + if (sqe->off || sqe->splice_fd_in) return -EINVAL; iew->maxevents = READ_ONCE(sqe->len); iew->events = u64_to_user_ptr(READ_ONCE(sqe->addr)); + iew->flags = READ_ONCE(sqe->epoll_flags); + if (iew->flags & ~IORING_EPOLL_WAIT_MULTISHOT) { + return -EINVAL; + } else if (iew->flags & IORING_EPOLL_WAIT_MULTISHOT) { + if (!(req->flags & REQ_F_BUFFER_SELECT)) + return -EINVAL; + req->flags |= REQ_F_APOLL_MULTISHOT; + } if (req->flags & REQ_F_BUFFER_SELECT && iew->events) return -EINVAL; @@ -217,7 +227,7 @@ int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags) int ret; io_ring_submit_lock(ctx, issue_flags); - +retry: if (io_do_buffer_select(req)) { size_t len = iew->maxevents * sizeof(*evs); @@ -238,14 +248,32 @@ int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags) return IOU_ISSUE_SKIP_COMPLETE; } else if (ret > 0) { cflags = io_put_kbuf(req, ret * sizeof(*evs), 0); + if (req->flags & REQ_F_BL_EMPTY) + goto stop_multi; + if (req->flags & REQ_F_APOLL_MULTISHOT) { + if (io_req_post_cqe(req, ret, cflags | IORING_CQE_F_MORE)) + goto retry; + goto stop_multi; + } } else if (!ret) { io_kbuf_recycle(req, 0); } else { err: req_set_fail(req); + if (req->flags & REQ_F_APOLL_MULTISHOT) { +stop_multi: + atomic_or(IO_POLL_FINISH_FLAG, &req->poll_refs); + io_poll_multishot_retry(req); + if (!list_empty_careful(&iew->wait.entry)) + epoll_wait_remove(req->file, &iew->wait); + req->flags &= ~REQ_F_APOLL_MULTISHOT; + } } - hlist_del_init(&req->hash_node); + if (!(req->flags & REQ_F_APOLL_MULTISHOT)) + hlist_del_init(&req->hash_node); io_ring_submit_unlock(ctx, issue_flags); io_req_set_res(req, ret, cflags); + if (issue_flags & IO_URING_F_MULTISHOT) + return IOU_STOP_MULTISHOT; return IOU_OK; }