From patchwork Fri Feb 7 17:32:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13965526 Received: from mail-il1-f171.google.com (mail-il1-f171.google.com [209.85.166.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6AB01957E2 for ; Fri, 7 Feb 2025 17:36:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949806; cv=none; b=PPGf2WH2qfz2JZZH95CmplK9abckygWFeRQLCsr9wittP+7I6/sWc3ptpuASmP00tCNnf8qneBThblecXXO9c3gbo08XC5myp9rUVVbhdHnsQJIFMPUykRFbBHZH8i9sVHTVhmNzZHI4ezwZAFvbXx7UiTkWFb9jNlesH74T5Qw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949806; c=relaxed/simple; bh=DuDffcRazxoqdMbIXpn7+EyJfMiPkF/egp+v1oEJbYo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TAJfHUgb9FrQs5hRadFS5AunF+mO/3B2BiQf9JjthPFDok6tIG5MuO2CthBg/63ollb46ykqe+4uUSovI1Igrd7pqfV5+lL14n4D1tCbv9jCSrz6rlYEq9CqWryY8VYHL0VBVssVJItr5asW1KjogLMMkoiipNPq1uzCwj0wsv4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=IEsG3nTz; arc=none smtp.client-ip=209.85.166.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="IEsG3nTz" Received: by mail-il1-f171.google.com with SMTP id e9e14a558f8ab-3d14811cabfso1566375ab.1 for ; Fri, 07 Feb 2025 09:36:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738949804; x=1739554604; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=buVgqb/rrfd7qjVWy9KC8hfJsXJN99hoK7Ry2+u0vJ0=; b=IEsG3nTz3F6aeEo9oDmNacLXyXHNlV8rDaMEhSOWiPsvQa5+/CUoQ5NyXyJiODPE4A 5DHhfHJ3zPHNwbYisRm1PNZ9+szzsgO6nmUJdnRwCezhAV6AVenua3GwxmdEicVjB4sL Y/hOLo1Ru8ia2NEx0shLv2RrUDsDZMpfCAsZ+tRYtPSUsb1h/g89DbMHsQko5nyU1meM 4Ed25Oy6DXxMepYO+tO4kNa5tB6qZ+APY9U2BZxj9TGmpA6qcr6SUZJCWxV+AreYddX8 0R6J9va5icsPROShVbXbJGiTg5prHuG8DqEZsfM5QgtwQtveXS53nMOOSkZc26xV1u3T 9ehA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738949804; x=1739554604; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=buVgqb/rrfd7qjVWy9KC8hfJsXJN99hoK7Ry2+u0vJ0=; b=KNiiKK76Trv/F8Lx3sC6P6VS8yrei1pXSxxxSHaow+vQVZsZNo2OjD9wfW0sumsTfc UNHGDjhOy/vpTIsGnbKEGrV51BXG4QpTBHWrkGpmkOMacaPVbLvw9vVoWZFSbxlyTFxy pwZF5eg7iUTTMsWq5JlLSnJLQ/cdBsRksgDLzBEC9cvD797oNSDOifv07mjhTJkpfvoE /AGeWbbT6HtZdZT9idc42Lb5NWo9tLlxlqT7jiQ6aVcwibyE4w453AZGf5je+kshWeyA GwAo5buAe699gF6tSBxONObRvcfq8syGafa3h5XCET9UAC9kP9nYNfZuDV8JPWEbzmID dEUA== X-Gm-Message-State: AOJu0YyZNY5O6/GkBksbpYslW3DnfXelZqgCZv9qL4oCJwSUmTFWDkT0 ybpEN4xeyz0RPWaAUTbmN/6dU1iF/cw7sCemUNOTGAk0Npe/yI4Vyg8Jz1FZ4tSm+k9/gg5Qdpj R X-Gm-Gg: ASbGncvHNTvjjTKqAIN23CQ723dqOW1jrBj+2gHmd9QM7U/C41Ac8bhZSkFBp7zkKHS ZTA5GZO1yDhXWUCjUP9VcKodkfpyK984CNYgFdC5myrNWrpUxGn4BZgcTRhE2rebof0FP0nlDwq V30GrrqN//FxZsxrIAmtBOexYsZmqaZoZRAv/6gwHkEp5Yt+wIBs+PMmV3p1JOCrNYK4fAJ/dJs hSFfBJtM4w/PmDHA1Ik+5ljy3BWVdEh+RLP82pOWKAFw9L0t/5T4JMkfDEfZlzv7erL++XDp7wk CMrZCwGYlgkE35xqNyw= X-Google-Smtp-Source: AGHT+IH38njmpGOj3YxkEMJe5TX+OtPLNpsOMIhwPC104gcD+CbCCGR51nFfPMBz+iu5SQLer100dw== X-Received: by 2002:a92:c241:0:b0:3cf:c85c:4f60 with SMTP id e9e14a558f8ab-3d13dd4ba6amr31410065ab.11.1738949803847; Fri, 07 Feb 2025 09:36:43 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ece0186151sm206241173.111.2025.02.07.09.36.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Feb 2025 09:36:42 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 1/7] eventpoll: abstract out ep_try_send_events() helper Date: Fri, 7 Feb 2025 10:32:24 -0700 Message-ID: <20250207173639.884745-2-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250207173639.884745-1-axboe@kernel.dk> References: <20250207173639.884745-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for reusing this helper in another epoll setup helper, abstract it out. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 7c0980db77b3..67d1808fda0e 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1980,6 +1980,22 @@ static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, return ret; } +static int ep_try_send_events(struct eventpoll *ep, + struct epoll_event __user *events, int maxevents) +{ + int res; + + /* + * Try to transfer events to user space. In case we get 0 events and + * there's still timeout left over, we go trying again in search of + * more luck. + */ + res = ep_send_events(ep, events, maxevents); + if (res > 0) + ep_suspend_napi_irqs(ep); + return res; +} + /** * ep_poll - Retrieves ready events, and delivers them to the caller-supplied * event buffer. @@ -2031,17 +2047,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, while (1) { if (eavail) { - /* - * Try to transfer events to user space. In case we get - * 0 events and there's still timeout left over, we go - * trying again in search of more luck. - */ - res = ep_send_events(ep, events, maxevents); - if (res) { - if (res > 0) - ep_suspend_napi_irqs(ep); + res = ep_try_send_events(ep, events, maxevents); + if (res) return res; - } } if (timed_out) From patchwork Fri Feb 7 17:32:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13965527 Received: from mail-il1-f171.google.com (mail-il1-f171.google.com [209.85.166.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58F2E19E804 for ; Fri, 7 Feb 2025 17:36:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949807; cv=none; b=AphVo9jRSrkLFj+DhREJbNTDSTvcaeV9Vy4/n0Yjt7cy5RWLupVAOFF5ZfdGflhahCvn1L3ORimyHiXEUqjthKuH9WUxPgsPl+vex4X768RtexsmV+HB55efjk1jYNr8n947jSjagurXsvL6FcrTBuqxYZdeLy6RUdZWdlbYclA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949807; c=relaxed/simple; bh=9CqAH+/+dru0tVXL9nfonyeHwVvUOEnm71GkeIjGvRA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OB895+sagmoJ5dmVv1ekUAyJBxn70whG03VvWThDVvnlQUyTXu6P6rrb27qpuiv1fRk6hq+FUSxtbDMAE82lCw4F3cSaAmNgLqeARcvoP7Ra4qJ6YdYlgIrHYWpRXMoo3Ra1KQOYe5MadwK1xB4i9PxZ6RcVJREQCULZf/Q8giQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=QIq1ROp2; arc=none smtp.client-ip=209.85.166.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="QIq1ROp2" Received: by mail-il1-f171.google.com with SMTP id e9e14a558f8ab-3d04d655fefso19737945ab.3 for ; Fri, 07 Feb 2025 09:36:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738949805; x=1739554605; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=K9Ti6y1QSLcy76M2w8lkpqmCcvr5mUocCPUpPmYueYw=; b=QIq1ROp2jPQ7vXDU0UDHMqpovJ+0bFWWAVB0jqVuKplQIv6Ms/n0gspR4LpFhqvxtz MVMoZU38S0KEV6mdVt6yPRZ5vhZI52FrK6z/lvmvz/Jt74JDQJq5U+F0RAxaKZFqi8MY eW04mjaDSR1FcDaTYH1QGz/82I3ALxjn9Vnqcb8t0cwDS64tdoS8dDt9mvQBK1bNDFZg Ibf19/R2F3whADMzvbM+nviLqQ+cvq+FHv7J8HAnS3FFJr6HlHJIdHfE2xCCHArjye/d K/TuIgIRyJTCUUETgDIFwAIDdpaKi8M/3IXQ6QLDt6rDCXMMG1yF2fx/Rm0YDAw7jerK 9p7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738949805; x=1739554605; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K9Ti6y1QSLcy76M2w8lkpqmCcvr5mUocCPUpPmYueYw=; b=wZ77BdHbMoK+S2QENjwX2X6XkSklcoRX9pZ4jd/HF52YU7cnhIM3zRraFlpXJAqnb8 hzdw6YT7iopGJ/odMqwMa/iOSBYrthrDbX9DCzlv+GwkBH+ohuU9nOQKU5UGnpq0MyMZ hILCjmoL6+kIgJTbq486HaviGEcNog8/TX0R+0JWS1si+6VNgUGIb6K32Iswyf7o9qlC /jtTG4vcCplrmK0ukgTX9ntecU1BSc1oppkapHEz6tuuedzjZ7Xmf7+PXDA4BuJplcaO 3acWMdO8b/QlGjvBautqC5WKgBkSoocRigSXQsrlE9O0AKXm3+ZXpUpqPHTWHj+T2yBH LfCA== X-Gm-Message-State: AOJu0Yy2gSbbZS2ZZ2gWDSdOVzCzfPfmfJZCjeypuHwABpo2Jv1/uPDK uY8xqE1QEP2w9q1S8wDfCSnhhQLBCy8p8/OuzjEt1YiGEqYtClTVIKwZTbOggyuO1dFttAGz1Gq V X-Gm-Gg: ASbGnctZUAPGPMr+cxvcKFVdJOhqiod0WS6rmXguuDS9e1+kz0rZMf5pQcSHxwQFegk J4ast0nuKYqHHaR5jkmpDS+Ij6KiscyotkZCuqlNpZYjTOvFhDUkpeTTY07DO7mZUtxPolvfPCZ WndABlrHhToC1Y7QsEGPiqNeb1EbWGZaOGpGggOmlpFCxyvAAoR298t029eti4iwY28ruoBYrbx MNZtLZVvLbq5mo0eG183eVo+EEhREddGEGk8kY3c7ooiwnkfW2qj6/CBIhMp46GtAMrL7pNw7+D DY7jkhFdop5IIauqtgw= X-Google-Smtp-Source: AGHT+IFPmeblSOGaVkG5w0Ux/nI58zc+Fqb2IC9VZwr48nevrVhclD4bfbjaJIUg0GayPfKZ5zroMg== X-Received: by 2002:a92:c263:0:b0:3cf:f88b:b51a with SMTP id e9e14a558f8ab-3d13dcfce6cmr34571995ab.2.1738949805333; Fri, 07 Feb 2025 09:36:45 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ece0186151sm206241173.111.2025.02.07.09.36.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Feb 2025 09:36:44 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 2/7] eventpoll: abstract out parameter sanity checking Date: Fri, 7 Feb 2025 10:32:25 -0700 Message-ID: <20250207173639.884745-3-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250207173639.884745-1-axboe@kernel.dk> References: <20250207173639.884745-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add a helper that checks the validity of the file descriptor and other parameters passed in to epoll_wait(). Signed-off-by: Jens Axboe --- fs/eventpoll.c | 39 +++++++++++++++++++++++++-------------- 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 67d1808fda0e..14466765b85d 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2453,6 +2453,27 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, return do_epoll_ctl(epfd, op, fd, &epds, false); } +static int ep_check_params(struct file *file, struct epoll_event __user *evs, + int maxevents) +{ + /* The maximum number of event must be greater than zero */ + if (maxevents <= 0 || maxevents > EP_MAX_EVENTS) + return -EINVAL; + + /* Verify that the area passed by the user is writeable */ + if (!access_ok(evs, maxevents * sizeof(struct epoll_event))) + return -EFAULT; + + /* + * We have to check that the file structure underneath the fd + * the user passed to us _is_ an eventpoll file. + */ + if (!is_file_epoll(file)) + return -EINVAL; + + return 0; +} + /* * Implement the event wait interface for the eventpoll file. It is the kernel * part of the user space epoll_wait(2). @@ -2461,26 +2482,16 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events, int maxevents, struct timespec64 *to) { struct eventpoll *ep; - - /* The maximum number of event must be greater than zero */ - if (maxevents <= 0 || maxevents > EP_MAX_EVENTS) - return -EINVAL; - - /* Verify that the area passed by the user is writeable */ - if (!access_ok(events, maxevents * sizeof(struct epoll_event))) - return -EFAULT; + int ret; /* Get the "struct file *" for the eventpoll file */ CLASS(fd, f)(epfd); if (fd_empty(f)) return -EBADF; - /* - * We have to check that the file structure underneath the fd - * the user passed to us _is_ an eventpoll file. - */ - if (!is_file_epoll(fd_file(f))) - return -EINVAL; + ret = ep_check_params(fd_file(f), events, maxevents); + if (unlikely(ret)) + return ret; /* * At this point it is safe to assume that the "private_data" contains From patchwork Fri Feb 7 17:32:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13965528 Received: from mail-io1-f46.google.com (mail-io1-f46.google.com [209.85.166.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B249719F12D for ; Fri, 7 Feb 2025 17:36:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949809; cv=none; b=NXp9yzI5BjjEhxBuvG0uluHgDYbycttMvOkfgikV4cfIqDajtBYxNTNv+ttFmBHpXRKV1kkuPqOfVE5W9n1XsQhZqCsgGW3RxdR9OExEnSbnHgvvHri3RxEE6tQOcxE4fU2I4T9uUKDiuls8tI+QRrxKpmRRt5xrNMG6l1GVs4Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949809; c=relaxed/simple; bh=neCFR3srUdC4BqNDJJ/puIB0s8g+B8z5na5WmH02VcE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MPdIxljg54wjtEdm8JkB9lARw32Fw4TBT3pJoxlhe/2ikGmNIglugeYk9MXZ90zHL0fqfLMfqd75wwoHOQDfuKfXVoWOjW0fRAH0YbOzm/dAtX70c9Pk0t3ZdjsBDWlWTVQP5N7QaDJxD8gTKBq8w9iq7rQIbPCEkiXUFbeWh1Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=bf/bvzxv; arc=none smtp.client-ip=209.85.166.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="bf/bvzxv" Received: by mail-io1-f46.google.com with SMTP id ca18e2360f4ac-844e10ef3cfso175631339f.2 for ; Fri, 07 Feb 2025 09:36:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738949807; x=1739554607; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vQh1Zg0+2EnqTz66Y3o6QdpwBDhvVYS6NGGQhGbGVOA=; b=bf/bvzxvCJL3LfwHEaARfeVRtt1Earg3XhvXyEHIX4/JOckqigTdFPcI7Z2/czvsO5 NbSKZxkzIX/oe5E4PAXnoWI1hDDtbhAfGVj5eXIA8eHbOA3sPyQfS5IqU78qDnop5U8K KACVJizSXUiUNbnt3eX6sU2pqEtz+5cOTvevcQG/ln/cD6nj86uTldwpTshPKUW4Qf2X F5T+bh781dCyNMDMj+85GpVvheI3Pg4xeHklPGZSDheZdpvztc3+kNfDz/I8bHuVqYVn s/6dsJtvVX86V/THj2lfH2ge/Lgkn5tvi5S24RzerbTW7olP6tyTE84/vry+v4CWdADA MzsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738949807; x=1739554607; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vQh1Zg0+2EnqTz66Y3o6QdpwBDhvVYS6NGGQhGbGVOA=; b=Ov6H6BxKe5ZK4TmUhGXZSDMT9Twl7wJMI+jMjLC2IlxVHGpKMLqis2RPWDMqTwClVp mktOtwcUGt/HN0lvVyokptjV8RG+Gh/zO8TF50ObHRT2XOK6y9GE+QukxrD0OVxrvD+v etSr4a2ULPovIbWOkj7mWEWU8dhTHfthvZ/wvtD4R4R6IxWt3fk8VV+NhGZjdeKe+DQB VAtcQd0jjp78du5jkCRJZDZqKJ6gfta++IGPNkh7S31gGKD3R+rzJraVAsY3YfmXv0Hn kXMEnOFZT+oE8xUE2/XfFwFQNOlwTNn2eLKMOp09A2D9oSv7ksTwqtria+9NTZunjfna Ak3Q== X-Gm-Message-State: AOJu0YyKD9Ce6hL5CKeubH0sj6ULsRq8ahQeo8wSXHBlKHcBEWm5KH/k BmFDrvPlCam58wlGqGFRDONXP7ALdyiCAChiUlJ881gz9m/jKiCwbPn7pC+5WGw= X-Gm-Gg: ASbGncsoo6ZmCaub8OXu0LsJZBpQ7yThXfMxcAIozsKkquWl+pKNsVJ8fD+NZt3TlB2 zwL80dsB7KpdDUO6uFMtlpqvn70XaO72Ly2xYRNOY+V5T3tMf4fPBBUWWHGnOIjo8oBq86YkXxw Smuq8Y3I9i0r1jqGm5X12vXX6bVf2Bb60KbDhFq0tR7hx0gW4zf4aJ7HpVdHXzhjomCdoKrZDOH NhzUhBCuDawxyTY9/eKPFLpQ485WXDigiJcDwGXY1n+0U3CBhWJAQOs4Ug/INxCO+ttU0J3Jq1x ctH9hbXmqIhSLgez5tQ= X-Google-Smtp-Source: AGHT+IEKzclYyCh3PyEzi+DF6BgOM8uthEYpdjl88V/12SKTf7VXtPAto5WeE4um2QNlQZDlP37OsA== X-Received: by 2002:a05:6602:360b:b0:841:9b5c:cfb3 with SMTP id ca18e2360f4ac-854fd8f9424mr421162739f.10.1738949806755; Fri, 07 Feb 2025 09:36:46 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ece0186151sm206241173.111.2025.02.07.09.36.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Feb 2025 09:36:45 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 3/7] eventpoll: add epoll_queue() interface Date: Fri, 7 Feb 2025 10:32:26 -0700 Message-ID: <20250207173639.884745-4-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250207173639.884745-1-axboe@kernel.dk> References: <20250207173639.884745-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Basic interface that takes a wait_queue_entry rather than post one on the stack, which can be a persistent callback for when new events arrive. Works like regular epoll_wait(), except it doesn't block. If events are available, they are returned. If none are available, the passed in wait_queue_entry is added to the callback list. The wait_queue_entry must be previously initialized, and the callback provided will be called when events are added to the epoll context. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 39 +++++++++++++++++++++++++++++++++++++++ include/linux/eventpoll.h | 4 ++++ 2 files changed, 43 insertions(+) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 14466765b85d..d3ac466ad415 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1996,6 +1996,33 @@ static int ep_try_send_events(struct eventpoll *ep, return res; } +static int ep_poll_queue(struct eventpoll *ep, + struct epoll_event __user *events, int maxevents, + struct wait_queue_entry *wait) +{ + int res = 0, eavail; + + /* See ep_poll() for commentary */ + eavail = ep_events_available(ep); + while (1) { + if (eavail) { + res = ep_try_send_events(ep, events, maxevents); + if (res) + return res; + } + if (!list_empty_careful(&wait->entry)) + break; + write_lock_irq(&ep->lock); + eavail = ep_events_available(ep); + if (!eavail) + __add_wait_queue_exclusive(&ep->wq, wait); + write_unlock_irq(&ep->lock); + if (!eavail) + break; + } + return -EIOCBQUEUED; +} + /** * ep_poll - Retrieves ready events, and delivers them to the caller-supplied * event buffer. @@ -2474,6 +2501,18 @@ static int ep_check_params(struct file *file, struct epoll_event __user *evs, return 0; } +int epoll_queue(struct file *file, struct epoll_event __user *events, + int maxevents, struct wait_queue_entry *wait) +{ + int ret; + + ret = ep_check_params(file, events, maxevents); + if (unlikely(ret)) + return ret; + + return ep_poll_queue(file->private_data, events, maxevents, wait); +} + /* * Implement the event wait interface for the eventpoll file. It is the kernel * part of the user space epoll_wait(2). diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 0c0d00fcd131..8de16374b8fe 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -25,6 +25,10 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long t /* Used to release the epoll bits inside the "struct file" */ void eventpoll_release_file(struct file *file); +/* Use to reap events, and/or queue for a callback on new events */ +int epoll_queue(struct file *file, struct epoll_event __user *events, + int maxevents, struct wait_queue_entry *wait); + /* * This is called from inside fs/file_table.c:__fput() to unlink files * from the eventpoll interface. We need to have this facility to cleanup From patchwork Fri Feb 7 17:32:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13965529 Received: from mail-io1-f47.google.com (mail-io1-f47.google.com [209.85.166.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F16DF19DF75 for ; Fri, 7 Feb 2025 17:36:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949810; cv=none; b=HGOLrPVPPz/sfEC3sYDRcoPOynzc4p+JC8j1B29REfHnvSd3MRBCWq2CWAzEXY9Tqzz8oK4AFYfbCYfEUm2WC+Rtv8e4XKcYkPP0Q+LjlKP7jNGBN4Q3huq6Am9I04VD1O+Czx6jVcZPOZNw3Kj8QaB7aIgst/yKksxpKCUJxvk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949810; c=relaxed/simple; bh=dyJDv1DYIwNakHlB93t6AK28i1AjFOufLyKR9k8IhYg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tAMLrOo6CFjBXn9WCBdXs0oRsOJe8k8NtcuxHZEPKHc3bkAA0A8jpoHO/dnRFtB+xj9tWAoW9gG1fft4KUxGffMvNwKudY0BmXjMx9TGJ9j/LSF5Dy/GvW8oX+j1RyE6+a4Jd/zuaFqEu5rv3/izdayLE3mBg3ctcCLzl/Q0gN0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=0bZ8m5V9; arc=none smtp.client-ip=209.85.166.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="0bZ8m5V9" Received: by mail-io1-f47.google.com with SMTP id ca18e2360f4ac-844ee43460aso155547239f.1 for ; Fri, 07 Feb 2025 09:36:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738949808; x=1739554608; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dZUJaBdxvX3TYFAnJ+wUNjux1qeKKL9lzbYxbO/HoGo=; b=0bZ8m5V9pINo8eK4nUQtE/1BpjYxi/EAVnkRs3RKvO/uSmPPqVwkuGC42MhL1TIngd HPiDHIsDUJXKYyvtC3TL9VB36HhzQl4VvL9jeFUgzmhNIIKPcZQ1VjUi46UMcOlO/IAt JA3QlHzh+EVixegrBogkF+Wip5bCMX1eLo76qOrLxIzdt8WYpXnRfI7tVV3/ciMpoB8r 52Yb8chfHyTxkwJfPIVngmRofW/EzRJXrjVb6i13HRJBUSSJbSQ22c8LwNlzU8a/yhE9 EHPbPqYHHyMEalNU+Ip/kPiDuovn+duNjPJ/15D1L+JLwvJ6YMbhkE1AwsR5Tb4s2H7p ZjOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738949808; x=1739554608; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dZUJaBdxvX3TYFAnJ+wUNjux1qeKKL9lzbYxbO/HoGo=; b=ndo6EpihACXWz2Ktl9wrZuzl+v52zoIXq/SFKQo8oFh44Tjxsuld1zW4dBa8Dh5GDz b2kQuk6lWu2Kf/4YOL4I77satMy1AMT2GcEULN1kpRQHt8Mt0jTbftZoqhYlj3D9y2fc rsO4f0YZQ/tDpFNTkwrQMp9jMOnHtJJkbX6JWtneUAvCaOWLsej6fECQxLCDhiyLFJuW PYsdvnN7AhCr4YeUHOPwtqCYfyhpgioAydo+Eb8dPxvtE6jmE4UsczAj2ggdprkEIIRc 9SMFzCMmqcpYtvtMHa1RBloXsDgEJPnTKsG35zb3pB/dgpKm5z9id61F4c/KXFECaEeb MZnA== X-Gm-Message-State: AOJu0YxKjgwu4PcnqddljjjU8dV8lFwXg3P+fTChQPVC1ZCpg3VHbZ7S KLmSOwwsHCzHW+tfUpVpKRG8EI5YVUVR/zqujX046zqJ6+KegnIXSvHjrucmWlY= X-Gm-Gg: ASbGncvMSaTBpw4pz7LAlIwCFi/gxTNEFYSkveWxtb6kGH12Q677D65LnwdnNrXCOWH S9GaTQxv6bOAA27OnZK1mqrT5iShnGDquqaS84PGOwIvzxqC+ieN0lmpnT+YVHC4n3puNOa1ZQK kdPqILlpU6YXOTkRfJjq6uQnkk0AlYhTZW+ZGH1I15YcGdwfpeSmHXXtr5BQufynLMsWS+ujiSu 4on3uuXk/2dKURkBLqlGpSxgxuCAC6sXor96hUAIGlQXfHW957VwkfOqf7aCYq8/sr5CgDmy/XQ 32xutQzVGKO0I0F6Nfc= X-Google-Smtp-Source: AGHT+IHESkWaXYphqvlRLTxkdoYMY8NNXw/bbXA1U4bM78mkr25yds7t2Z+or5Lwm4/xJAfCW9GsRw== X-Received: by 2002:a05:6e02:1a2c:b0:3cf:c773:6992 with SMTP id e9e14a558f8ab-3d13dd66668mr34070995ab.12.1738949808113; Fri, 07 Feb 2025 09:36:48 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ece0186151sm206241173.111.2025.02.07.09.36.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Feb 2025 09:36:47 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 4/7] eventpoll: add helper to remove wait entry from wait queue head Date: Fri, 7 Feb 2025 10:32:27 -0700 Message-ID: <20250207173639.884745-5-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250207173639.884745-1-axboe@kernel.dk> References: <20250207173639.884745-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 __epoll_wait_remove() is the core helper, it kills a given wait_queue_entry from the eventpoll wait_queue_head. Use it internally, and provide an overall helper, epoll_wait_remove(), which takes a struct file and provides the same functionality. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 58 +++++++++++++++++++++++++-------------- include/linux/eventpoll.h | 3 ++ 2 files changed, 40 insertions(+), 21 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index d3ac466ad415..b96cc9193517 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2023,6 +2023,42 @@ static int ep_poll_queue(struct eventpoll *ep, return -EIOCBQUEUED; } +static int __epoll_wait_remove(struct eventpoll *ep, + struct wait_queue_entry *wait, int timed_out) +{ + int eavail; + + /* + * We were woken up, thus go and try to harvest some events. If timed + * out and still on the wait queue, recheck eavail carefully under + * lock, below. + */ + eavail = 1; + + if (!list_empty_careful(&wait->entry)) { + write_lock_irq(&ep->lock); + /* + * If the thread timed out and is not on the wait queue, it + * means that the thread was woken up after its timeout expired + * before it could reacquire the lock. Thus, when wait.entry is + * empty, it needs to harvest events. + */ + if (timed_out) + eavail = list_empty(&wait->entry); + __remove_wait_queue(&ep->wq, wait); + write_unlock_irq(&ep->lock); + } + + return eavail; +} + +int epoll_wait_remove(struct file *file, struct wait_queue_entry *wait) +{ + if (is_file_epoll(file)) + return __epoll_wait_remove(file->private_data, wait, false); + return -EINVAL; +} + /** * ep_poll - Retrieves ready events, and delivers them to the caller-supplied * event buffer. @@ -2135,27 +2171,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, HRTIMER_MODE_ABS); __set_current_state(TASK_RUNNING); - /* - * We were woken up, thus go and try to harvest some events. - * If timed out and still on the wait queue, recheck eavail - * carefully under lock, below. - */ - eavail = 1; - - if (!list_empty_careful(&wait.entry)) { - write_lock_irq(&ep->lock); - /* - * If the thread timed out and is not on the wait queue, - * it means that the thread was woken up after its - * timeout expired before it could reacquire the lock. - * Thus, when wait.entry is empty, it needs to harvest - * events. - */ - if (timed_out) - eavail = list_empty(&wait.entry); - __remove_wait_queue(&ep->wq, &wait); - write_unlock_irq(&ep->lock); - } + eavail = __epoll_wait_remove(ep, &wait, timed_out); } } diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 8de16374b8fe..6c088d5e945b 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -29,6 +29,9 @@ void eventpoll_release_file(struct file *file); int epoll_queue(struct file *file, struct epoll_event __user *events, int maxevents, struct wait_queue_entry *wait); +/* Remove wait entry */ +int epoll_wait_remove(struct file *file, struct wait_queue_entry *wait); + /* * This is called from inside fs/file_table.c:__fput() to unlink files * from the eventpoll interface. We need to have this facility to cleanup From patchwork Fri Feb 7 17:32:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13965530 Received: from mail-il1-f179.google.com (mail-il1-f179.google.com [209.85.166.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DBD319CC11 for ; Fri, 7 Feb 2025 17:36:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949811; cv=none; b=c1nakU5Q3ER4TK7x6GHRLK0cneBRL1xITkrNrB+8g03Tfha/DDkJlNDAxB2FTdP5FpAaCJrK66E6luk7tgANWMdwGcjlvm/PX181+OZnsp8+eFclezSWqfpmpki4CewsImXcWXbdFv0qtsHeY4ro7LWMsmVIoV6aawpA3uIqATg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949811; c=relaxed/simple; bh=AJ4EpG+a/HnjEaQKeFR8lZf7Ex4++mtSxvprrt1iy8A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Vll2U0N+f+3iDC1NBPLXGTrZt1xCcO8+Xq0359+RzQb5cL79HkHiM+v7M6s0S842rwwKq4mm5TTfdbIkYSzFdAkd65b+pEh234MLLmgt8k9sx6iq4HiyVP6M1+yrhjnz56iOxXd1CEGjzBdq383Rl395PnV1aj3tvM+Kdveo6GA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=HZ0PTeKW; arc=none smtp.client-ip=209.85.166.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="HZ0PTeKW" Received: by mail-il1-f179.google.com with SMTP id e9e14a558f8ab-3d146df0afeso3435485ab.3 for ; Fri, 07 Feb 2025 09:36:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738949809; x=1739554609; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Yl4I89AW9N+9ZjPLUSwNXzZLneK6YdFoM+i0fKJ0Giw=; b=HZ0PTeKWdgPzlRTKYZUtkF3n4dgc6kmTN0yRvmnKb+WWyaRLPx1ZT0Exx30cqPEJpQ NFvPNl8bXhCdkXGWvFORVCq7z0kjQ02nLwqY35ebken9jBUjLqZOR2/VjSDGvePo9yNH G1w6gdu7aF6uJIU1XEYJJ5yp4Dj4get3WNBfR0eqnNis1Hbj96NyXLxhYnosNDn+GSIe ePXzoVkYbp5N5XU/i716UqhwQq9HGQ2vTfBa11kyvL1tt+gll1bcol+4PY3t7w9DrGYz 10XtITW9mtN9FoiodpD1/VwiN+r3w53hfhznX79wk3qg84Rq1lSNzuZXjxvGzJTQi5zG HbDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738949809; x=1739554609; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Yl4I89AW9N+9ZjPLUSwNXzZLneK6YdFoM+i0fKJ0Giw=; b=xN+tNNDz+0gT0RrmGGHFo7vulyWNMpvDK8bv/R1YBv9lX5NvWTGQDNKFBqjf3s34Wz 3RHUfd/Qd6fqSVWnCQz8CQTzoISa8tw8LCITHD81zgvB7ZDsWcG5sbCxt/9WwmboAsja Gf/yiSQHFiBeDKHiSeBmREHrh5ANfCrDKaJxPoem/gEfteDHViL++rrD6KVsX83ZKs3i RG/b87yLC3DVkJCK35edH/J2bXYAWCG5I/47KsgE10SLk7OMsW+qkL2+x94YAY3YSnvb 2QhXnRxIkb4aEBNEEbhxIPexYmun+8FHBGAToI/9DMVMgO669xMJVIvIrE6py5uLNCvo PZTg== X-Gm-Message-State: AOJu0YxWscURS0a1FAXSaRlNcJud3RqJ9k+z0/sTzqnSwNKfgO7iLYDX ymClMOc1hFUo6juKig3S0eY/JQyyuH/cS59vx9sTnYJRzJKMjyLRg+Qu4pcER1arxJ8owGHvZRu d X-Gm-Gg: ASbGncsPc4n+Vh2l7BiWmaNCk3gb10HlUX1K3GgOD2QgwRdDZ47CmAehMUVeu+zI4Je JtvOXW2MkxyOz+Q2qm70cR6Boz7sycbjnV+BWjWDSUkpMR5ULgfu0IFTOwPpg23GYFLjjVerp/u hr64/DCPQlRHOOKQmqm+oavhdZOkDgD5ss858mvsRA6jhglGHVrRvTqjr+6GIsKiqXrtpfZOVmZ vIILmnzQQyHGR+IEpp/XSQkyyC1zLbNWOz7pUyytQbdsDbL9BgyUl6KaNk73N7oLZuL2l6Fc9cS x7v1UEGZlV/2QEX/5o8= X-Google-Smtp-Source: AGHT+IFf5yFHbi0FpkwI0St9Frw/m6TmD/MzSZjJpnF8PSyMprcyK2hlYpYREkjfvbItwT4qYeHWRA== X-Received: by 2002:a05:6e02:338f:b0:3cf:c9ad:46a1 with SMTP id e9e14a558f8ab-3d13de7ae05mr31235335ab.13.1738949809397; Fri, 07 Feb 2025 09:36:49 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ece0186151sm206241173.111.2025.02.07.09.36.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Feb 2025 09:36:48 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 5/7] io_uring/epoll: remove CONFIG_EPOLL guards Date: Fri, 7 Feb 2025 10:32:28 -0700 Message-ID: <20250207173639.884745-6-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250207173639.884745-1-axboe@kernel.dk> References: <20250207173639.884745-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Just have the Makefile add the object if epoll is enabled, then it's not necessary to guard the entire epoll.c file inside an CONFIG_EPOLL ifdef. Signed-off-by: Jens Axboe --- io_uring/Makefile | 9 +++++---- io_uring/epoll.c | 2 -- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/io_uring/Makefile b/io_uring/Makefile index d695b60dba4f..7114a6dbd439 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -11,9 +11,10 @@ obj-$(CONFIG_IO_URING) += io_uring.o opdef.o kbuf.o rsrc.o notif.o \ eventfd.o uring_cmd.o openclose.o \ sqpoll.o xattr.o nop.o fs.o splice.o \ sync.o msg_ring.o advise.o openclose.o \ - epoll.o statx.o timeout.o fdinfo.o \ - cancel.o waitid.o register.o \ - truncate.o memmap.o alloc_cache.o + statx.o timeout.o fdinfo.o cancel.o \ + waitid.o register.o truncate.o \ + memmap.o alloc_cache.o obj-$(CONFIG_IO_WQ) += io-wq.o obj-$(CONFIG_FUTEX) += futex.o -obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o +obj-$(CONFIG_EPOLL) += epoll.o +obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o diff --git a/io_uring/epoll.c b/io_uring/epoll.c index 89bff2068a19..7848d9cc073d 100644 --- a/io_uring/epoll.c +++ b/io_uring/epoll.c @@ -12,7 +12,6 @@ #include "io_uring.h" #include "epoll.h" -#if defined(CONFIG_EPOLL) struct io_epoll { struct file *file; int epfd; @@ -58,4 +57,3 @@ int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags) io_req_set_res(req, ret, 0); return IOU_OK; } -#endif From patchwork Fri Feb 7 17:32:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13965531 Received: from mail-io1-f47.google.com (mail-io1-f47.google.com [209.85.166.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB29A19DF75 for ; Fri, 7 Feb 2025 17:36:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949813; cv=none; b=UZDA26ZWRMU98CJ+gcER5MOiIEWTRvnR1rfs9T0GVIbYEU+Ful3yb8j3RyU1F88HhDLV187tIgR8cqTjugYqziOBLETSem1E2lYjZ3wD9qeQljC3AoV+Xxnu0jq3GEiDPdsCRuKr9/dEDgzmlyS7df/fACUvNwDKlWJmj8ougfI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949813; c=relaxed/simple; bh=im1ntOsYLHYjJiEvx6njeC6vx3wirWuYtKuqP171Gs4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tKQ3i9JzipkgP8u7Si1OAjuHkaa11IutX6tLpU27tRPumUq0HJrVtHOBYTPKWtRGwojqhBw/qsZVTT9h4/EmXiiZroya+vHc8yh5sBpn42ho5WCEOxE4RpqKT+8zGOmZTda1Bgzi/CTgH/7EQSKkQCqffMsxRHyxKwUGPRzjVsA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=w/U7/PWj; arc=none smtp.client-ip=209.85.166.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="w/U7/PWj" Received: by mail-io1-f47.google.com with SMTP id ca18e2360f4ac-844e9b8b0b9so176505639f.0 for ; Fri, 07 Feb 2025 09:36:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738949811; x=1739554611; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TQ8sKu9TA7JXcFTCTxuSEZOOG4qI+mYE1gZPeZ19VjI=; b=w/U7/PWjm0jUCptV/X671ti0JCD+43RtYOiJ+rnkve21x7fQVTIm4TlS8wnSl4Azra 86PZ3Zi8SVGwmqwN5wPgexPbMxsPsjTlcqa8KhRZiWjTDb65LOggFvMTJIiGCCRdXSJ1 EB3whJXVmgXWPZsz2D6CJfbRHrmKNuEfPsf0vspFK/VL3bXfhhAuLD8EicE89Et4wyDk 5QiyxFpq3z241dkE1OEDugR1IQmXCyEhWWirCL8IjD1+tVZPZ0a5iF5wp0WY9eeY+cCg utfMxC/6AJLmMPai1YJ+5FbN5LT61IdQaCmGkOmRh8hOMs73ikDKkN8NrFJQNNNc23mp f5/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738949811; x=1739554611; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TQ8sKu9TA7JXcFTCTxuSEZOOG4qI+mYE1gZPeZ19VjI=; b=MJc8L3QPmlbR/4dVxxdpKSacXIHt9RIiuZ6M3gDjJmXmVBIgUkfoBzhYWXGiqhDuXv cFTPeRBYGHUbCNjP2ajutv7V4cWTPpwJRTHwJEC3huWfWHKyCnjZvtSHJC2mORh/r80q 7/bygcbvCK0+8YByVEbn89ZB73NgPoviBoCQvoFwhHmT7GGsIkPLHK8x+2NRO+vrgdZ+ ID754gJ/Gv8NrxqAy9ZgKR382TmGS4/VKU+CidOnCu+liYF+WiMXjB6fltAnjNR9ioAH 1SxOE1FgLcOyCQbr91gCi6yuEsOMAYY28VAylHEfPY9iAH72c7HIW/5Z1z6wLM5IZAdO sjVg== X-Gm-Message-State: AOJu0Yzed/GTULuh0FU1PKpmD5O6XmGM5fJsSL9vBcKbmTPwGuIchgT/ sBkkxH32zrW2mbwYyC1KGZ/scDn2dBz9LTtw6qmgiP94K+LowP6g69Xp+iutdm0= X-Gm-Gg: ASbGncsmlO4MeYpuoFvUP9B0mk1Kau6hoX5zjhknJcrqCvfiwXP7sX+qtVEuhW3nW17 Hy1DE+gw/h3XkDMuuwWCQwkxKZ3rJnjt6p5OY8IM2I74sI1Td/6uH6ySBxgv8sVUZ8SxbksbHJT 3xvyEZHxBywCNXmUnvHLqL7iv0OObKX90FBbh7HmDlMZERUiaGOImBM2wiREmlsU9VkWk5IL8yE NGuaADxRXcMK6fvM6+JMQ/BZmyHPW4J/FpWf7VAt0jGEATYrXEtaJWqRg+JDj6VEkCy64PfePW0 FEKBVuqswy4B434Xev0= X-Google-Smtp-Source: AGHT+IGTfHeUMPey2BvLVdDpKeNQRJC8LAYlSXA0VI1/YyMFocNIej7iZLHyFztOXS6Jz5PYpOKPgg== X-Received: by 2002:a05:6602:3991:b0:849:a2bb:ffde with SMTP id ca18e2360f4ac-854fd89a878mr510739039f.4.1738949810703; Fri, 07 Feb 2025 09:36:50 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ece0186151sm206241173.111.2025.02.07.09.36.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Feb 2025 09:36:49 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 6/7] io_uring/poll: pull ownership handling into poll.h Date: Fri, 7 Feb 2025 10:32:29 -0700 Message-ID: <20250207173639.884745-7-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250207173639.884745-1-axboe@kernel.dk> References: <20250207173639.884745-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for using it from somewhere else. Rather than try and duplicate the functionality, just make it generically available to io_uring opcodes. Note: would have to be used carefully, cannot be used by opcodes that can trigger poll logic. Signed-off-by: Jens Axboe --- io_uring/poll.c | 30 +----------------------------- io_uring/poll.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+), 29 deletions(-) diff --git a/io_uring/poll.c b/io_uring/poll.c index bb1c0cd4f809..5e44ac562491 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -41,16 +41,6 @@ struct io_poll_table { __poll_t result_mask; }; -#define IO_POLL_CANCEL_FLAG BIT(31) -#define IO_POLL_RETRY_FLAG BIT(30) -#define IO_POLL_REF_MASK GENMASK(29, 0) - -/* - * We usually have 1-2 refs taken, 128 is more than enough and we want to - * maximise the margin between this amount and the moment when it overflows. - */ -#define IO_POLL_REF_BIAS 128 - #define IO_WQE_F_DOUBLE 1 static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, @@ -70,7 +60,7 @@ static inline bool wqe_is_double(struct wait_queue_entry *wqe) return priv & IO_WQE_F_DOUBLE; } -static bool io_poll_get_ownership_slowpath(struct io_kiocb *req) +bool io_poll_get_ownership_slowpath(struct io_kiocb *req) { int v; @@ -85,24 +75,6 @@ static bool io_poll_get_ownership_slowpath(struct io_kiocb *req) return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK); } -/* - * If refs part of ->poll_refs (see IO_POLL_REF_MASK) is 0, it's free. We can - * bump it and acquire ownership. It's disallowed to modify requests while not - * owning it, that prevents from races for enqueueing task_work's and b/w - * arming poll and wakeups. - */ -static inline bool io_poll_get_ownership(struct io_kiocb *req) -{ - if (unlikely(atomic_read(&req->poll_refs) >= IO_POLL_REF_BIAS)) - return io_poll_get_ownership_slowpath(req); - return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK); -} - -static void io_poll_mark_cancelled(struct io_kiocb *req) -{ - atomic_or(IO_POLL_CANCEL_FLAG, &req->poll_refs); -} - static struct io_poll *io_poll_get_double(struct io_kiocb *req) { /* pure poll stashes this in ->async_data, poll driven retry elsewhere */ diff --git a/io_uring/poll.h b/io_uring/poll.h index 04ede93113dc..2f416cd3be13 100644 --- a/io_uring/poll.h +++ b/io_uring/poll.h @@ -21,6 +21,18 @@ struct async_poll { struct io_poll *double_poll; }; +#define IO_POLL_CANCEL_FLAG BIT(31) +#define IO_POLL_RETRY_FLAG BIT(30) +#define IO_POLL_REF_MASK GENMASK(29, 0) + +bool io_poll_get_ownership_slowpath(struct io_kiocb *req); + +/* + * We usually have 1-2 refs taken, 128 is more than enough and we want to + * maximise the margin between this amount and the moment when it overflows. + */ +#define IO_POLL_REF_BIAS 128 + /* * Must only be called inside issue_flags & IO_URING_F_MULTISHOT, or * potentially other cases where we already "own" this poll request. @@ -30,6 +42,25 @@ static inline void io_poll_multishot_retry(struct io_kiocb *req) atomic_inc(&req->poll_refs); } +/* + * If refs part of ->poll_refs (see IO_POLL_REF_MASK) is 0, it's free. We can + * bump it and acquire ownership. It's disallowed to modify requests while not + * owning it, that prevents from races for enqueueing task_work's and b/w + * arming poll and wakeups. + */ +static inline bool io_poll_get_ownership(struct io_kiocb *req) +{ + if (unlikely(atomic_read(&req->poll_refs) >= IO_POLL_REF_BIAS)) + return io_poll_get_ownership_slowpath(req); + return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK); +} + +static inline void io_poll_mark_cancelled(struct io_kiocb *req) +{ + atomic_or(IO_POLL_CANCEL_FLAG, &req->poll_refs); +} + + int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_poll_add(struct io_kiocb *req, unsigned int issue_flags); From patchwork Fri Feb 7 17:32:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13965532 Received: from mail-io1-f41.google.com (mail-io1-f41.google.com [209.85.166.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 244AB19CCFC for ; Fri, 7 Feb 2025 17:36:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949815; cv=none; b=WD0sf6E+CqyfLu25AIoav0zogYo9Jcl2iyUWTfb70MrVR8toYsx+S46Xmxlnm5Ct+g7dijAlJjut5DdawzGJ3tlyOOJZO3KzBjI0MR2/lPozAkd0qsdrrNdy1eqpyMLsfVodB1buD+88dKuxS3nQEZwAALV2Dv5maoY9s6cnpOU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738949815; c=relaxed/simple; bh=/wc5aGUcFs1HMlPcN4lPleOhlLs4Z6Y3krXMU90FQ5E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nra4P84rGkDSjYzw3LjB7ifHjCEN9+X/QSZgO4/Tjb5vpjdedArTHzmF4Kqp3FrZI/e286+M87pHlPwuaj7dkkGueL3HT4a/bjuHIVZ8gHXaDYOMpZ/KSLHrL1khm6ubOtpUjG8lMOvWiL7ePoNCgPCf8EPLP4TAsvdWLrZPx1s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=rX1ViygU; arc=none smtp.client-ip=209.85.166.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="rX1ViygU" Received: by mail-io1-f41.google.com with SMTP id ca18e2360f4ac-844ce213af6so71537239f.1 for ; Fri, 07 Feb 2025 09:36:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738949812; x=1739554612; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7qbvtsfwyIZePBB7edcFoY3EkzIhOm6ji2xaEWPLWTg=; b=rX1ViygU+6mtbKC1K1dzq/cgWDMZ5vAXz8Kd54gsDIxnG+XCHq109PcMiUk17Eoa5A urLIAne3jgQlJsLjReGsS1XD2FObXCWAwYXJlZTAcL53y8mZ8rLwAICyT29LNJEbvMv8 22gyZ9ff3s5G7dGaB2V/mZSHCiANIGpHzfK0bw/rBSOcL2xz9pRfdZ/zXlhHKnD3rXq4 beIuvDYSpYuwv/T3gYVAF0fiW1IGA+hDLq295Q4qUw7A1I+qwDYBFfViB4dNJ8ML2t+v are2j05u95yoWdRu2XkYY8uQZ9miTeMyo4XKr4H8Abkqi9O2rQI4mJFleg2t/42Imckm lAMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738949812; x=1739554612; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7qbvtsfwyIZePBB7edcFoY3EkzIhOm6ji2xaEWPLWTg=; b=MKPhB+s6B08CmYoURtQo3ZpwdtbPiXNQXvtxuob5SIyOsxBHreKrxdPcCig/yP2Afg 1EJADuKpx/wMDqF8SVORNsmd3f1jqiCb9J/5bcqsyDtDgcxLlhkOIhKyFCSy4ndPtJvN aNgSqG7JnOrwudLIk8DwI5GPsxKcUnx01MGTgFZ21XnaCdXVGv4eOq/dIKEVvhW0dUnO +jg9BrKgRJQQcoCgjEoG22P74tUMnlagF/rGAfShSThF8SWXKO/vB0ZEbYNyUW5FWX+3 tyO1LITimlWaH6FCvXm5IC9lGOrH+iaqm2pOemfytvlG66lEYpo8KKOP+c8htD+AIs3v mX8w== X-Gm-Message-State: AOJu0YwkYSG/on7SPaoPA2Efac6TOb2ADHER+Wn7QagvFb4Qs6KYY8lv R0yfJpf3574TnINSErnrrl9yDjSXSjmFgruNRoYZo1q7p1mHKJE6IjpMNf+raGw= X-Gm-Gg: ASbGnct6enPgv3uNI2E6nZUVdPswrfoXGFq0+oLF8iwo6RpRaKu1vOFp98G6jkm+Otn 7qOVp5Yd9VB/3FxjqmfWI8RxqIPFB9oES0kyrS++r2/SBaU1JcoY6jlUS8Qx/ziT8SIYq99D6sr LrVGxDcdqP/ytR3r+KFsoVHKrtHmCO1cawrFOu2IEg/zeZ0i+Pf1jc+lgKTZiLKVsK+MXTkTpUS VcvGqzk8tD3cXesUaQrpUqJYmfTnHndlRSsi1oTurRN7maOnsA8dsb6angnxeDKuVT+Z2pEqxsw 4iq+nSWeiLhVRxNKeHY= X-Google-Smtp-Source: AGHT+IEhNjgBKE9AcXHRNWx2yEx4kJJe7LB6Frs7X1I9sitvExI/LF4qPxS9MDRcfcLNcaO/qNB7zQ== X-Received: by 2002:a05:6602:7509:b0:844:debf:24dc with SMTP id ca18e2360f4ac-854fd8a0a0cmr475959439f.5.1738949812085; Fri, 07 Feb 2025 09:36:52 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4ece0186151sm206241173.111.2025.02.07.09.36.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Feb 2025 09:36:51 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 7/7] io_uring/epoll: add support for IORING_OP_EPOLL_WAIT Date: Fri, 7 Feb 2025 10:32:30 -0700 Message-ID: <20250207173639.884745-8-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250207173639.884745-1-axboe@kernel.dk> References: <20250207173639.884745-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 For existing epoll event loops that can't fully convert to io_uring, the used approach is usually to add the io_uring fd to the epoll instance and use epoll_wait() to wait on both "legacy" and io_uring events. While this work, it isn't optimal as: 1) epoll_wait() is pretty limited in what it can do. It does not support partial reaping of events, or waiting on a batch of events. 2) When an io_uring ring is added to an epoll instance, it activates the io_uring "I'm being polled" logic which slows things down. Rather than use this approach, with EPOLL_WAIT support added to io_uring, event loops can use the normal io_uring wait logic for everything, as long as an epoll wait request has been armed with io_uring. Note that IORING_OP_EPOLL_WAIT does NOT take a timeout value, as this is an async request. Waiting on io_uring events in general has various timeout parameters, and those are the ones that should be used when waiting on any kind of request. If events are immediately available for reaping, then This opcode will return those immediately. If none are available, then it will post an async completion when they become available. cqe->res will contain either an error code (< 0 value) for a malformed request, invalid epoll instance, etc. It will return a positive result indicating how many events were reaped. IORING_OP_EPOLL_WAIT requests may be canceled using the normal io_uring cancelation infrastructure. The poll logic for managing ownership is adopted to guard the epoll side too. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 4 + include/uapi/linux/io_uring.h | 1 + io_uring/cancel.c | 5 ++ io_uring/epoll.c | 143 +++++++++++++++++++++++++++++++++ io_uring/epoll.h | 22 +++++ io_uring/io_uring.c | 5 ++ io_uring/opdef.c | 14 ++++ 7 files changed, 194 insertions(+) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index e2fef264ff8b..031ba708a81d 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -369,6 +369,10 @@ struct io_ring_ctx { struct io_alloc_cache futex_cache; #endif +#ifdef CONFIG_EPOLL + struct hlist_head epoll_list; +#endif + const struct cred *sq_creds; /* cred used for __io_sq_thread() */ struct io_sq_data *sq_data; /* if using sq thread polling */ diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index e11c82638527..a559e1e1544a 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -278,6 +278,7 @@ enum io_uring_op { IORING_OP_FTRUNCATE, IORING_OP_BIND, IORING_OP_LISTEN, + IORING_OP_EPOLL_WAIT, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/cancel.c b/io_uring/cancel.c index 0870060bac7c..d1af9496d9b3 100644 --- a/io_uring/cancel.c +++ b/io_uring/cancel.c @@ -17,6 +17,7 @@ #include "timeout.h" #include "waitid.h" #include "futex.h" +#include "epoll.h" #include "cancel.h" struct io_cancel { @@ -128,6 +129,10 @@ int io_try_cancel(struct io_uring_task *tctx, struct io_cancel_data *cd, if (ret != -ENOENT) return ret; + ret = io_epoll_wait_cancel(ctx, cd, issue_flags); + if (ret != -ENOENT) + return ret; + spin_lock(&ctx->completion_lock); if (!(cd->flags & IORING_ASYNC_CANCEL_FD)) ret = io_timeout_cancel(ctx, cd); diff --git a/io_uring/epoll.c b/io_uring/epoll.c index 7848d9cc073d..8f54bb1c39de 100644 --- a/io_uring/epoll.c +++ b/io_uring/epoll.c @@ -11,6 +11,7 @@ #include "io_uring.h" #include "epoll.h" +#include "poll.h" struct io_epoll { struct file *file; @@ -20,6 +21,13 @@ struct io_epoll { struct epoll_event event; }; +struct io_epoll_wait { + struct file *file; + int maxevents; + struct epoll_event __user *events; + struct wait_queue_entry wait; +}; + int io_epoll_ctl_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_epoll *epoll = io_kiocb_to_cmd(req, struct io_epoll); @@ -57,3 +65,138 @@ int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags) io_req_set_res(req, ret, 0); return IOU_OK; } + +static void __io_epoll_finish(struct io_kiocb *req, int res) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + + lockdep_assert_held(&req->ctx->uring_lock); + + epoll_wait_remove(req->file, &iew->wait); + hlist_del_init(&req->hash_node); + io_req_set_res(req, res, 0); + req->io_task_work.func = io_req_task_complete; + io_req_task_work_add(req); +} + +static void __io_epoll_cancel(struct io_kiocb *req) +{ + __io_epoll_finish(req, -ECANCELED); +} + +static bool __io_epoll_wait_cancel(struct io_kiocb *req) +{ + io_poll_mark_cancelled(req); + if (io_poll_get_ownership(req)) + __io_epoll_cancel(req); + return true; +} + +bool io_epoll_wait_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx, + bool cancel_all) +{ + return io_cancel_remove_all(ctx, tctx, &ctx->epoll_list, cancel_all, __io_epoll_wait_cancel); +} + +int io_epoll_wait_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd, + unsigned int issue_flags) +{ + return io_cancel_remove(ctx, cd, issue_flags, &ctx->epoll_list, __io_epoll_wait_cancel); +} + +static void io_epoll_retry(struct io_kiocb *req, struct io_tw_state *ts) +{ + int v; + + do { + v = atomic_read(&req->poll_refs); + if (unlikely(v != 1)) { + if (WARN_ON_ONCE(!(v & IO_POLL_REF_MASK))) + return; + if (v & IO_POLL_CANCEL_FLAG) { + __io_epoll_cancel(req); + return; + } + } + v &= IO_POLL_REF_MASK; + } while (atomic_sub_return(v, &req->poll_refs) & IO_POLL_REF_MASK); + + io_req_task_submit(req, ts); +} + +static int io_epoll_execute(struct io_kiocb *req) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + + list_del_init_careful(&iew->wait.entry); + if (io_poll_get_ownership(req)) { + req->io_task_work.func = io_epoll_retry; + io_req_task_work_add(req); + } + + return 1; +} + +static __cold int io_epoll_pollfree_wake(struct io_kiocb *req) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + + io_poll_mark_cancelled(req); + list_del_init_careful(&iew->wait.entry); + io_epoll_execute(req); + return 1; +} + +static int io_epoll_wait_fn(struct wait_queue_entry *wait, unsigned mode, + int sync, void *key) +{ + struct io_kiocb *req = wait->private; + __poll_t mask = key_to_poll(key); + + if (unlikely(mask & POLLFREE)) + return io_epoll_pollfree_wake(req); + + return io_epoll_execute(req); +} + +int io_epoll_wait_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + + if (sqe->off || sqe->rw_flags || sqe->buf_index || sqe->splice_fd_in) + return -EINVAL; + + iew->maxevents = READ_ONCE(sqe->len); + iew->events = u64_to_user_ptr(READ_ONCE(sqe->addr)); + + iew->wait.flags = 0; + iew->wait.private = req; + iew->wait.func = io_epoll_wait_fn; + INIT_LIST_HEAD(&iew->wait.entry); + INIT_HLIST_NODE(&req->hash_node); + atomic_set(&req->poll_refs, 0); + return 0; +} + +int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + struct io_ring_ctx *ctx = req->ctx; + int ret; + + io_ring_submit_lock(ctx, issue_flags); + + ret = epoll_queue(req->file, iew->events, iew->maxevents, &iew->wait); + if (ret == -EIOCBQUEUED) { + if (hlist_unhashed(&req->hash_node)) + hlist_add_head(&req->hash_node, &ctx->epoll_list); + io_ring_submit_unlock(ctx, issue_flags); + return IOU_ISSUE_SKIP_COMPLETE; + } else if (ret < 0) { + req_set_fail(req); + } + hlist_del_init(&req->hash_node); + io_ring_submit_unlock(ctx, issue_flags); + io_req_set_res(req, ret, 0); + return IOU_OK; +} diff --git a/io_uring/epoll.h b/io_uring/epoll.h index 870cce11ba98..296940d89063 100644 --- a/io_uring/epoll.h +++ b/io_uring/epoll.h @@ -1,6 +1,28 @@ // SPDX-License-Identifier: GPL-2.0 +#include "cancel.h" + #if defined(CONFIG_EPOLL) +int io_epoll_wait_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd, + unsigned int issue_flags); +bool io_epoll_wait_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx, + bool cancel_all); + int io_epoll_ctl_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags); +int io_epoll_wait_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags); +#else +static inline bool io_epoll_wait_remove_all(struct io_ring_ctx *ctx, + struct io_uring_task *tctx, + bool cancel_all) +{ + return false; +} +static inline int io_epoll_wait_cancel(struct io_ring_ctx *ctx, + struct io_cancel_data *cd, + unsigned int issue_flags) +{ + return 0; +} #endif diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index ec98a0ec6f34..73b9246eaa50 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -93,6 +93,7 @@ #include "notif.h" #include "waitid.h" #include "futex.h" +#include "epoll.h" #include "napi.h" #include "uring_cmd.h" #include "msg_ring.h" @@ -356,6 +357,9 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_HLIST_HEAD(&ctx->waitid_list); #ifdef CONFIG_FUTEX INIT_HLIST_HEAD(&ctx->futex_list); +#endif +#ifdef CONFIG_EPOLL + INIT_HLIST_HEAD(&ctx->epoll_list); #endif INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func); INIT_WQ_LIST(&ctx->submit_state.compl_reqs); @@ -3079,6 +3083,7 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, ret |= io_poll_remove_all(ctx, tctx, cancel_all); ret |= io_waitid_remove_all(ctx, tctx, cancel_all); ret |= io_futex_remove_all(ctx, tctx, cancel_all); + ret |= io_epoll_wait_remove_all(ctx, tctx, cancel_all); ret |= io_uring_try_cancel_uring_cmd(ctx, tctx, cancel_all); mutex_unlock(&ctx->uring_lock); ret |= io_kill_timeouts(ctx, tctx, cancel_all); diff --git a/io_uring/opdef.c b/io_uring/opdef.c index e8baef4e5146..44553a657476 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -514,6 +514,17 @@ const struct io_issue_def io_issue_defs[] = { .async_size = sizeof(struct io_async_msghdr), #else .prep = io_eopnotsupp_prep, +#endif + }, + [IORING_OP_EPOLL_WAIT] = { + .needs_file = 1, + .unbound_nonreg_file = 1, + .audit_skip = 1, +#if defined(CONFIG_EPOLL) + .prep = io_epoll_wait_prep, + .issue = io_epoll_wait, +#else + .prep = io_eopnotsupp_prep, #endif }, }; @@ -745,6 +756,9 @@ const struct io_cold_def io_cold_defs[] = { [IORING_OP_LISTEN] = { .name = "LISTEN", }, + [IORING_OP_EPOLL_WAIT] = { + .name = "EPOLL_WAIT", + }, }; const char *io_uring_get_opcode(u8 opcode)