From patchwork Sun Oct 30 22:01:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13025232 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AD03FA3743 for ; Sun, 30 Oct 2022 22:02:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229933AbiJ3WCO (ORCPT ); Sun, 30 Oct 2022 18:02:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229938AbiJ3WCK (ORCPT ); Sun, 30 Oct 2022 18:02:10 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 630B0BC84 for ; Sun, 30 Oct 2022 15:02:09 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id k5so1334894pjo.5 for ; Sun, 30 Oct 2022 15:02:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8PHeEQvI03GO1u4arOs5eKGxrrWj+QcJXenB1+d+Ev4=; b=6l9nZSzxsOlgev3XI2BPsdlCYcxkwe86MfpEVfJczMSX1+0I/rH8AlHqSqBvvyUFD+ FLOo+9r+o+aWu5kEugJOUFPwjCw4I54idiEjMpMSPIltB8jPPCqhqN8P1bXH5sssnvQA qkfQNjn4W48osJfV2juo6XvIlw9Z/Hure5Ix8EsJEn5JkL+e3nOK6wqxUNn8vX9vqmPp rG+PyTTCQLqQ6k6pwkXoxeA+qz4Tgabrzzbj+3ooWXmaRO+62+Qi0xNSgA3F6qi5GTnj gMK/ksFvl1WR/UPEciiD58drbBPYSQwTv7nUZ+wLrJehGnY27OoUj1IHxXCBypqKpjKr qU9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8PHeEQvI03GO1u4arOs5eKGxrrWj+QcJXenB1+d+Ev4=; b=ETt/XdsHsie2lxwMcfZcwRkDrG1UGh5R77DCB6CqC+xEY+RAONvQu7ge7HkNnVGV6v e/sEDTB4WtGuk29+qfujnefZKvxLGk/SHHCIYpaZvVVY8XetbEojyf47cRR/Lii60q5L duUC3vYPzcnuM3Sf/6qSkSue/FR276nQjcmpSeP8rQ8iLaE3V1Rc3fZXwv96umGySFTD bXqt8d7B7kBRD5gU03omPg4Fu1zWtIkbFUr0WfK9ptBO2wsa5z+fkSpv1yk/WA3NJ1cv H7PK7yr5ImZ+1avgjHI5B3uZm9kkHuPU0ZUUycVAwPKEAHk7vE/wu2kLee5nud2r7y5A OIOg== X-Gm-Message-State: ACrzQf37BZzpKtA2jLb0Kz1m342v+Ja1TkPbGkgP2ozJGJa3HlWGmoGf 2q15LY63dW+IEo5vhr1KUXiePQ== X-Google-Smtp-Source: AMsMyM6FFdgylcHCZmtH8I4OKTzEQHVeq1NVugx8akj7qK+2y0An0oYMAXlbDOSj+D4hmYwUynLsSA== X-Received: by 2002:a17:903:2452:b0:186:99e0:672d with SMTP id l18-20020a170903245200b0018699e0672dmr11199251pls.95.1667167328768; Sun, 30 Oct 2022 15:02:08 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id y3-20020aa79e03000000b0056d73ef41fdsm562852pfq.75.2022.10.30.15.02.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 30 Oct 2022 15:02:08 -0700 (PDT) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 1/6] eventpoll: cleanup branches around sleeping for events Date: Sun, 30 Oct 2022 16:01:58 -0600 Message-Id: <20221030220203.31210-2-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221030220203.31210-1-axboe@kernel.dk> References: <20221030220203.31210-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Rather than have two separate branches here, collapse them into a single one instead. No functional changes here, just a cleanup in preparation for changes in this area. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 52954d4637b5..3061bdde6cba 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1869,14 +1869,15 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, * important. */ eavail = ep_events_available(ep); - if (!eavail) + if (!eavail) { __add_wait_queue_exclusive(&ep->wq, &wait); - - write_unlock_irq(&ep->lock); - - if (!eavail) + write_unlock_irq(&ep->lock); timed_out = !schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS); + } else { + write_unlock_irq(&ep->lock); + } + __set_current_state(TASK_RUNNING); /* From patchwork Sun Oct 30 22:01:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13025233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37A5DC38A02 for ; Sun, 30 Oct 2022 22:02:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229978AbiJ3WCR (ORCPT ); Sun, 30 Oct 2022 18:02:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229939AbiJ3WCM (ORCPT ); Sun, 30 Oct 2022 18:02:12 -0400 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75FD0A452 for ; Sun, 30 Oct 2022 15:02:10 -0700 (PDT) Received: by mail-pl1-x633.google.com with SMTP id p21so5363226plr.7 for ; Sun, 30 Oct 2022 15:02:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TCMSGu8wCAuxi+YEaeqt2Nm6ljarPNoUCebXSLfB1Fw=; b=H1OQbWeLIKTClEI0Y18BCREEaUPyB4u9aov6gnRCwP3FMlWoDhoOyrQj9dTUw8f7Rl Lnc+rX7GLGtNIG7+mOPq5mJ218zFKXipmPROig7C1ec18KZ542PUt0JccxGsjh5HO1a0 /z8kUmiSoWowwx6pl7CmnUz6U/WQmRT/NWSJGgn5Kjr93Rcz9iCWvN5ozoSLbcjCLLaW vQSLjW+N399VOHIFg9wUxuq3+NKkDgXzX84jiBotca1ufvXrHs73a7xFaaZLJbSrPgpF kI9+im4ilsx5ns6Kwr6KHXhyinzcJ9zfi1ofygZp48zSk618zFDCdZssNjQg3mnekR5P YICw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TCMSGu8wCAuxi+YEaeqt2Nm6ljarPNoUCebXSLfB1Fw=; b=xDpqbhP5OniDH5iThTc2PzxgZ3KjveYUveXkr8vj/44CnXKPh5zOnRRY2QscXR8ig9 Jf8x0vi507Aeh+cl1XWeB5R3gergfq+GqK7YwJ/3w4LziQenOBsthiJzDY/kHni8Dyr2 qSs5neNVHfIzT4ZA52pqqnUUK3vX0NXzT/bCrZkopAm3EmoU3ymaZhhefiL28KxC0poq R9kEBg8cussNGSioSiPrAYlSdQzOaHncU1J1okxR+VDyR+W83HaA7HxWom6ZhRjap2lb rYd27uG8HxTAUtpC9F8PgSjCoCfbNtsQB+hNKwDN2+/44TO7oqmeHkn15jRlfQescg++ fdnQ== X-Gm-Message-State: ACrzQf0bwfjatuIx4knnJnFm0SphairjtYiq+QKOc42JiBRrB0wFrva9 TutJ1wcErKixur8GuBbQ6aeIVA== X-Google-Smtp-Source: AMsMyM4GdWVVzQsWPsWQHA0TiR71X3uUndA1mHuoEfLgekOPxznEf3CE25HyycOiKuLhhOXIat/5mw== X-Received: by 2002:a17:90a:ca87:b0:212:d2bd:82f5 with SMTP id y7-20020a17090aca8700b00212d2bd82f5mr11677337pjt.203.1667167329708; Sun, 30 Oct 2022 15:02:09 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id y3-20020aa79e03000000b0056d73ef41fdsm562852pfq.75.2022.10.30.15.02.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 30 Oct 2022 15:02:09 -0700 (PDT) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 2/6] eventpoll: don't pass in 'timed_out' to ep_busy_loop() Date: Sun, 30 Oct 2022 16:01:59 -0600 Message-Id: <20221030220203.31210-3-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221030220203.31210-1-axboe@kernel.dk> References: <20221030220203.31210-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org It's known to be 'false' from the one call site we have, as we break out of the loop if it's not. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 3061bdde6cba..64d7331353dd 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -396,12 +396,12 @@ static bool ep_busy_loop_end(void *p, unsigned long start_time) * * we must do our busy polling with irqs enabled */ -static bool ep_busy_loop(struct eventpoll *ep, int nonblock) +static bool ep_busy_loop(struct eventpoll *ep) { unsigned int napi_id = READ_ONCE(ep->napi_id); if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on()) { - napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false, + napi_busy_loop(napi_id, ep_busy_loop_end, ep, false, BUSY_POLL_BUDGET); if (ep_events_available(ep)) return true; @@ -453,7 +453,7 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi) #else -static inline bool ep_busy_loop(struct eventpoll *ep, int nonblock) +static inline bool ep_busy_loop(struct eventpoll *ep) { return false; } @@ -1826,7 +1826,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, if (timed_out) return 0; - eavail = ep_busy_loop(ep, timed_out); + eavail = ep_busy_loop(ep); if (eavail) continue; From patchwork Sun Oct 30 22:02:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13025235 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56A10C38A02 for ; Sun, 30 Oct 2022 22:02:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230024AbiJ3WCg (ORCPT ); Sun, 30 Oct 2022 18:02:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35000 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229962AbiJ3WCQ (ORCPT ); Sun, 30 Oct 2022 18:02:16 -0400 Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53019BCB1 for ; Sun, 30 Oct 2022 15:02:11 -0700 (PDT) Received: by mail-pg1-x536.google.com with SMTP id 20so9216386pgc.5 for ; Sun, 30 Oct 2022 15:02:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=d8OpHlPRST08skeh7BHSZbFN09k6AB6QzxLidf3Pxy0=; b=FtnVuw2djVhLJ39B4Iu0YH3QmQxMkIGxXHKnGZA7K1Sw4fYMnyIG96gEfOp4ciSi3F cp2w+v7oMMq0Yi2VBZSN6Hrdemo+vuHjooi0IikAo7UQPpKQ88+6759GdWUK+E0kufEv JmuhNI176JjjlMg/MU4+jDvC333l9Ozzl1KVuDENgYG8GAaobnWkN8D4i3P+gSnLe3wp DwP0mgFKh8mD0/6QaVxqR1lyCrEHmDnbYsdnvwA22dLE7MQLkOCq8F03DuPRZPcY7+uP bMXRGPp77FPPOBjK+xLWlYkO6HIx11jiV+KgVJpNw+o0kooBQ3k8mQHdrQSzFG9blCoz kwRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=d8OpHlPRST08skeh7BHSZbFN09k6AB6QzxLidf3Pxy0=; b=oRAxgZuejq1XsSNbZVrqLg0HtIOhgnK1S2a1zWo5WOQ9OC8ydCE3G8jldgQfgCv+KY xI2ekY285UVRX/l0eNH/O7NXT+lSGd2hLTdB/LDautloq9wXaOenTHG3Nj5SNK7aN58w DJPHoFHKYNqWXzOBy1HOYvX9O6gSVKKFz65ipUZ7/vzSTlDGh7TcfNzn98xmdTbNfKrB HjsnH8MJWxPiH2IOh+WV35WK22PWJc9QSjUgvoG4SmB++vL/85PppnQKAZCPb7SIOHRm jJG34476+H416AOaZvR7szVOYDhG9tWbxFaP8pJWwFOhLQ2L+JOry5t5o1NYeR5LVxN1 wmhg== X-Gm-Message-State: ACrzQf23ocxQynieEHrOQWOd5jo+GXCTqKAIxzHNttpZaPHZrI/3169w /Qn3rektPhzSgDzl69Ch0rF9vGewUaW3yP37 X-Google-Smtp-Source: AMsMyM42vqv1YONW63nh8FnRHNJKmyUOG/hteMSNrrCZTsdp06c1hADrIyNernP/tWI+uBiwCUMByA== X-Received: by 2002:a05:6a00:1348:b0:56b:f5c0:1d9d with SMTP id k8-20020a056a00134800b0056bf5c01d9dmr10963195pfu.45.1667167330589; Sun, 30 Oct 2022 15:02:10 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id y3-20020aa79e03000000b0056d73ef41fdsm562852pfq.75.2022.10.30.15.02.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 30 Oct 2022 15:02:10 -0700 (PDT) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 3/6] eventpoll: split out wait handling Date: Sun, 30 Oct 2022 16:02:00 -0600 Message-Id: <20221030220203.31210-4-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221030220203.31210-1-axboe@kernel.dk> References: <20221030220203.31210-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In preparation for making changes to how wakeups and sleeps are done, move the timeout scheduling into a helper and manage it rather than rely on schedule_hrtimeout_range(). Signed-off-by: Jens Axboe --- fs/eventpoll.c | 68 ++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 55 insertions(+), 13 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 64d7331353dd..888f565d0c5f 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1762,6 +1762,47 @@ static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, return ret; } +struct epoll_wq { + wait_queue_entry_t wait; + struct hrtimer timer; + bool timed_out; +}; + +static enum hrtimer_restart ep_timer(struct hrtimer *timer) +{ + struct epoll_wq *ewq = container_of(timer, struct epoll_wq, timer); + struct task_struct *task = ewq->wait.private; + + ewq->timed_out = true; + wake_up_process(task); + return HRTIMER_NORESTART; +} + +static void ep_schedule(struct eventpoll *ep, struct epoll_wq *ewq, ktime_t *to, + u64 slack) +{ + if (ewq->timed_out) + return; + if (to && *to == 0) { + ewq->timed_out = true; + return; + } + if (!to) { + schedule(); + return; + } + + hrtimer_init_on_stack(&ewq->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); + ewq->timer.function = ep_timer; + hrtimer_set_expires_range_ns(&ewq->timer, *to, slack); + hrtimer_start_expires(&ewq->timer, HRTIMER_MODE_ABS); + + schedule(); + + hrtimer_cancel(&ewq->timer); + destroy_hrtimer_on_stack(&ewq->timer); +} + /** * ep_poll - Retrieves ready events, and delivers them to the caller-supplied * event buffer. @@ -1782,13 +1823,15 @@ static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, int maxevents, struct timespec64 *timeout) { - int res, eavail, timed_out = 0; + int res, eavail; u64 slack = 0; - wait_queue_entry_t wait; ktime_t expires, *to = NULL; + struct epoll_wq ewq; lockdep_assert_irqs_enabled(); + ewq.timed_out = false; + if (timeout && (timeout->tv_sec | timeout->tv_nsec)) { slack = select_estimate_accuracy(timeout); to = &expires; @@ -1798,7 +1841,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, * Avoid the unnecessary trip to the wait queue loop, if the * caller specified a non blocking operation. */ - timed_out = 1; + ewq.timed_out = true; } /* @@ -1823,7 +1866,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, return res; } - if (timed_out) + if (ewq.timed_out) return 0; eavail = ep_busy_loop(ep); @@ -1850,8 +1893,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, * performance issue if a process is killed, causing all of its * threads to wake up without being removed normally. */ - init_wait(&wait); - wait.func = ep_autoremove_wake_function; + init_wait(&ewq.wait); + ewq.wait.func = ep_autoremove_wake_function; write_lock_irq(&ep->lock); /* @@ -1870,10 +1913,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, */ eavail = ep_events_available(ep); if (!eavail) { - __add_wait_queue_exclusive(&ep->wq, &wait); + __add_wait_queue_exclusive(&ep->wq, &ewq.wait); write_unlock_irq(&ep->lock); - timed_out = !schedule_hrtimeout_range(to, slack, - HRTIMER_MODE_ABS); + ep_schedule(ep, &ewq, to, slack); } else { write_unlock_irq(&ep->lock); } @@ -1887,7 +1929,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, */ eavail = 1; - if (!list_empty_careful(&wait.entry)) { + if (!list_empty_careful(&ewq.wait.entry)) { write_lock_irq(&ep->lock); /* * If the thread timed out and is not on the wait queue, @@ -1896,9 +1938,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, * Thus, when wait.entry is empty, it needs to harvest * events. */ - if (timed_out) - eavail = list_empty(&wait.entry); - __remove_wait_queue(&ep->wq, &wait); + if (ewq.timed_out) + eavail = list_empty(&ewq.wait.entry); + __remove_wait_queue(&ep->wq, &ewq.wait); write_unlock_irq(&ep->lock); } } From patchwork Sun Oct 30 22:02:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13025234 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0108BECAAA1 for ; Sun, 30 Oct 2022 22:02:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229954AbiJ3WCf (ORCPT ); Sun, 30 Oct 2022 18:02:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229959AbiJ3WCQ (ORCPT ); Sun, 30 Oct 2022 18:02:16 -0400 Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5EABCBCBC for ; Sun, 30 Oct 2022 15:02:12 -0700 (PDT) Received: by mail-pl1-x62e.google.com with SMTP id 4so9270028pli.0 for ; Sun, 30 Oct 2022 15:02:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+7XieXpcJrzhapQHtWCNoW0rQy5yiKhkq34SDOyw0xw=; b=Z2rp9mxaTzm5Kh46RSwGk7G6rZY/KQhbeDA2IAjzJGp/KbgOvSg7hhvjOqyULqd566 //TWc8mDkHbQ+DYXM0UrXa1OFHB+kcFlfhSr5EpBPAZ4BenGOAV8UQytYF1B2Zu15GZW 5wSN+m6knpruLvqwjn/CbyS0LcSLGDOxBx2XJBPBYD36NcCOwgnf8PE0K83/qoHEWITB CoB+1QFBWq2K9ejFNlUmTFH4z4jR/GN1EKDtyv0QkZWRvmvLrhnWvunYux57608rO2K1 NyfP1zPcMuWKUItcpD6jYYjGRh9hLNQQpDbfHBw2vNSNUUlLYIpcgwApfo0fwBwOZ87a d44A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+7XieXpcJrzhapQHtWCNoW0rQy5yiKhkq34SDOyw0xw=; b=erveMNtisTm6ZiZo3JKvipcYY6GPK8oR2XdSoAneRsba8TBTDiuJQoPSks8ABHd2bq 5sKAqpIvdajldO6bsrMefIRfb6NG2M+rZSxamfAaJaSgstjv8rGzOYxFN7E8CQhxTLM4 whWXQ9X7mDyr+3kYIEZtN+s0FfgsjgTkOwyKKUrMxnNTYTH/leUjku/EVgSJQM8xPjYw fspx/nOEU0fnXpSBOxzPLSrKHq7ytkxIblb6HCM4EB15wXTVPVBumEP6ZsZ+IN71uvcH RoRoXIWXTwsi4FShMSGs/+8948kBpzeTbn0HocX9VGwIu5hbEr0q2aYzZPZdxw0ahvDC J2ig== X-Gm-Message-State: ACrzQf0ydcEeI4JCDbL5P/35Aovui4PTENf3F1ljQO3DQkrGLc8FJhQ4 03eLwVCzE55yCaNI9MRR5ngKIyfth0hy5Ixk X-Google-Smtp-Source: AMsMyM4tMCu7URYFgaAQZgnaRiMY0iyFuLab70utjXxpA0cL2CeaZK4UGziOCx2oR15XKHFOeUNPiQ== X-Received: by 2002:a17:903:258b:b0:186:8bb2:de32 with SMTP id jb11-20020a170903258b00b001868bb2de32mr11380804plb.63.1667167331690; Sun, 30 Oct 2022 15:02:11 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id y3-20020aa79e03000000b0056d73ef41fdsm562852pfq.75.2022.10.30.15.02.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 30 Oct 2022 15:02:11 -0700 (PDT) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 4/6] eventpoll: move expires to epoll_wq Date: Sun, 30 Oct 2022 16:02:01 -0600 Message-Id: <20221030220203.31210-5-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221030220203.31210-1-axboe@kernel.dk> References: <20221030220203.31210-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This makes the expiration available to the wakeup handler. No functional changes expected in this patch, purely in preparation for being able to use the timeout on the wakeup side. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 888f565d0c5f..0994f2eb6adc 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1765,6 +1765,7 @@ static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, struct epoll_wq { wait_queue_entry_t wait; struct hrtimer timer; + ktime_t timeout_ts; bool timed_out; }; @@ -1825,7 +1826,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, { int res, eavail; u64 slack = 0; - ktime_t expires, *to = NULL; + ktime_t *to = NULL; struct epoll_wq ewq; lockdep_assert_irqs_enabled(); @@ -1834,7 +1835,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, if (timeout && (timeout->tv_sec | timeout->tv_nsec)) { slack = select_estimate_accuracy(timeout); - to = &expires; + to = &ewq.timeout_ts; *to = timespec64_to_ktime(*timeout); } else if (timeout) { /* From patchwork Sun Oct 30 22:02:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13025236 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F5F9ECAAA1 for ; Sun, 30 Oct 2022 22:02:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230036AbiJ3WCi (ORCPT ); Sun, 30 Oct 2022 18:02:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229938AbiJ3WCU (ORCPT ); Sun, 30 Oct 2022 18:02:20 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54646AE72 for ; Sun, 30 Oct 2022 15:02:13 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d10so9167929pfh.6 for ; Sun, 30 Oct 2022 15:02:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=T2Hzb+xxyKnPzMsKtR+WvmF1ywGYO4tzf59V+gc/0/s=; b=4SbYyN02zU3+59sahg+YLvVvWkSd64+1XsaqdsPV7wLhPizJiL1ZOSfFWLEXqALNYa 6W6QeSVCcOzhNwakNJ4zPTTE1Kep4cZIA88KTO2RA+MLTQn/knPn/uclsfs5FWpp2r6X MBwpK2jSIc9AuGhnLLcYN95HkF+kptFd0w5wvA/YtE9N5+9YKHf7bcTuPECr48LAKx0C 7ea3GPXX5Pek4eIhJCfGxGLaMzcEtWvp/NBeUf5cfL4o2WKF375Mdt8o3+ozsisYNfV4 LlCBO/tvvMnw7IofiBWylGe7Pp676nxsruegSlfD1fNjwlCGunhC8UcSHHnfzrI6bgv5 Dicw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T2Hzb+xxyKnPzMsKtR+WvmF1ywGYO4tzf59V+gc/0/s=; b=vmnDW3nKZQpuHRq1N9u75eBvsvX1Mt35gYODOVEelyWMe8FQezzTnToJz27mdCBfYB nsXONybPGSFhCN1B3g/RDfMtVPYWhSlDk74k/+ZULly3TKGBQoFZGn63u5UV6LUeXfPx 6DrUFIB8yiUYdOB4DGvEaD9SozLR5Sb8rBrMsC9LkFORFK2k5BmXTlGUglB45tD5mATR Is3mZhOTyhjro/KAXARiOOi1cuEtF2aJMJEkb4HHU9+/WLESxJmJcVtwR5C95lgnPRPd Isu2BDXv/spzsVfMaGpfUwnBNbZm1f09hNG3kHfLpZf7eFDZD9UFsfHfj4wlw8+IVAMD ntcg== X-Gm-Message-State: ACrzQf3duDfx00UMZikcAlsLj4QJSiY/GqdD1FOsp1LW+DrA01RcJi4y mD5Be3lPhIa9QSkDCYRsjgthNg== X-Google-Smtp-Source: AMsMyM6rC0HxreFF+0nLCbPFZujbettJ72v6ARDfIlBCp4K0TWosk65ubJlAUaD1LH+Z4thDNhlCxg== X-Received: by 2002:a63:f103:0:b0:439:398f:80f8 with SMTP id f3-20020a63f103000000b00439398f80f8mr9698925pgi.494.1667167332660; Sun, 30 Oct 2022 15:02:12 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id y3-20020aa79e03000000b0056d73ef41fdsm562852pfq.75.2022.10.30.15.02.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 30 Oct 2022 15:02:12 -0700 (PDT) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 5/6] eventpoll: move file checking earlier for epoll_ctl() Date: Sun, 30 Oct 2022 16:02:02 -0600 Message-Id: <20221030220203.31210-6-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221030220203.31210-1-axboe@kernel.dk> References: <20221030220203.31210-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This just cleans up the checking a bit, in preparation for a change that will need access to 'ep' earlier. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 0994f2eb6adc..962d897bbfc6 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2111,6 +2111,20 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds, if (!f.file) goto error_return; + /* + * We have to check that the file structure underneath the file + * descriptor the user passed to us _is_ an eventpoll file. + */ + error = -EINVAL; + if (!is_file_epoll(f.file)) + goto error_fput; + + /* + * At this point it is safe to assume that the "private_data" contains + * our own data structure. + */ + ep = f.file->private_data; + /* Get the "struct file *" for the target file */ tf = fdget(fd); if (!tf.file) @@ -2126,12 +2140,10 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds, ep_take_care_of_epollwakeup(epds); /* - * We have to check that the file structure underneath the file descriptor - * the user passed to us _is_ an eventpoll file. And also we do not permit - * adding an epoll file descriptor inside itself. + * We do not permit adding an epoll file descriptor inside itself. */ error = -EINVAL; - if (f.file == tf.file || !is_file_epoll(f.file)) + if (f.file == tf.file) goto error_tgt_fput; /* @@ -2147,12 +2159,6 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds, goto error_tgt_fput; } - /* - * At this point it is safe to assume that the "private_data" contains - * our own data structure. - */ - ep = f.file->private_data; - /* * When we insert an epoll file descriptor inside another epoll file * descriptor, there is the chance of creating closed loops, which are From patchwork Sun Oct 30 22:02:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13025237 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25FEFC38A02 for ; Sun, 30 Oct 2022 22:02:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230048AbiJ3WCo (ORCPT ); Sun, 30 Oct 2022 18:02:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229635AbiJ3WCc (ORCPT ); Sun, 30 Oct 2022 18:02:32 -0400 Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54857BC3E for ; Sun, 30 Oct 2022 15:02:14 -0700 (PDT) Received: by mail-pl1-x62e.google.com with SMTP id j12so9235043plj.5 for ; Sun, 30 Oct 2022 15:02:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pwT9F3VGgPNDdfJa3yPd0dcN35etavVFYMg4RY5YL0o=; b=VIkEe8kfBczuf+b12NGhnxMQuafEN/XF99q0aDh4J9MnhwgaMH7hDBHb9s3jvYnyTg ImBJ8HIU6TWGEhH9qVBxjZDvHd8JpOSmpvoktzoCxu2ynpY/0oPVA0ymJRJnbFRRw6KB aMxQWidSacHx7htlYYplGXwVmuHkU7evUOXkUvg0EBLxwtSEviMeCodR2MG/Z1Tkhvs+ 6tEMfVStRqMe3+TVpGewnXQoFSZH9DEzI8Mg3D5IbWx4GX0ZSxrF9W31BhGX3xP82rPK K1NDqcdmIGhhiCrYTSCEXi6SBeV8LkBSZXCMvpBVgddljCOA/ZuTi0g9v4Z+9HqwnvLB p0zA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pwT9F3VGgPNDdfJa3yPd0dcN35etavVFYMg4RY5YL0o=; b=IH8qHNVEvIXEZJYtGS/kzMFgok+uJu4seWy4aBmA4ltjrjKsQQBQsXHMxVIp/YLmst RuIQE//FULrHBvzETt4MoCXCmuuFh4YD6jeOunHMkZtJt7Fo6Qr4UgxHJJBoYeYZIz9d cNLbrR/jR5IU+Ty/roon43vo48TB4p4v3/4xBqAEKrH9s0nGH5hujWOBC1KyZyKyka8m SD3CRfHem6hh0Jmf8EVll2gxQA1ZyNZp0rTyu2aqTOY174g2syOYfkW8lDbsJgGmk0+B 0qa8GG0xVoTbtOxWBBFTDtDnPT1uNdn/LGS49ZC/9cIzoe7XA3Wl7TCQRqMkG4U2IU8r kwZg== X-Gm-Message-State: ACrzQf2VglXo9kUE0pva+b9fhlwQDY167eij5WJqJfs+AssDCPnMBFCa ZuA+2B1Y6XjRUwxQQvjDKS6DA+X/OPMiJiDN X-Google-Smtp-Source: AMsMyM5oezGWS2uuaJrw0tYnaqdg8Gzy8DXtwyYycWSJySaDFqjPaO9aeThPTXH6sui2BPFZkTbG0w== X-Received: by 2002:a17:902:b614:b0:186:940d:7e98 with SMTP id b20-20020a170902b61400b00186940d7e98mr11208913pls.80.1667167333625; Sun, 30 Oct 2022 15:02:13 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id y3-20020aa79e03000000b0056d73ef41fdsm562852pfq.75.2022.10.30.15.02.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 30 Oct 2022 15:02:13 -0700 (PDT) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 6/6] eventpoll: add support for min-wait Date: Sun, 30 Oct 2022 16:02:03 -0600 Message-Id: <20221030220203.31210-7-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221030220203.31210-1-axboe@kernel.dk> References: <20221030220203.31210-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Rather than just have a timeout value for waiting on events, add EPOLL_CTL_MIN_WAIT to allow setting a minimum time that epoll_wait() should always wait for events to arrive. For medium workload efficiencies, some production workloads inject artificial timers or sleeps before calling epoll_wait() to get better batching and higher efficiencies. While this does help, it's not as efficient as it could be. By adding support for epoll_wait() for this directly, we can avoids extra context switches and scheduler and timer overhead. As an example, running an AB test on an identical workload at about ~370K reqs/second, without this change and with the sleep hack mentioned above (using 200 usec as the timeout), we're doing 310K-340K non-voluntary context switches per second. Idle CPU on the host is 27-34%. With the the sleep hack removed and epoll set to the same 200 usec value, we're handling the exact same load but at 292K-315k non-voluntary context switches and idle CPU of 33-41%, a substantial win. Basic test case: struct d { int p1, p2; }; static void *fn(void *data) { struct d *d = data; char b = 0x89; /* Generate 2 events 20 msec apart */ usleep(10000); write(d->p1, &b, sizeof(b)); usleep(10000); write(d->p2, &b, sizeof(b)); return NULL; } int main(int argc, char *argv[]) { struct epoll_event ev, events[2]; pthread_t thread; int p1[2], p2[2]; struct d d; int efd, ret; efd = epoll_create1(0); if (efd < 0) { perror("epoll_create"); return 1; } if (pipe(p1) < 0) { perror("pipe"); return 1; } if (pipe(p2) < 0) { perror("pipe"); return 1; } ev.events = EPOLLIN; ev.data.fd = p1[0]; if (epoll_ctl(efd, EPOLL_CTL_ADD, p1[0], &ev) < 0) { perror("epoll add"); return 1; } ev.events = EPOLLIN; ev.data.fd = p2[0]; if (epoll_ctl(efd, EPOLL_CTL_ADD, p2[0], &ev) < 0) { perror("epoll add"); return 1; } /* always wait 200 msec for events */ ev.data.u64 = 200000; if (epoll_ctl(efd, EPOLL_CTL_MIN_WAIT, -1, &ev) < 0) { perror("epoll add set timeout"); return 1; } d.p1 = p1[1]; d.p2 = p2[1]; pthread_create(&thread, NULL, fn, &d); /* expect to get 2 events here rather than just 1 */ ret = epoll_wait(efd, events, 2, -1); printf("epoll_wait=%d\n", ret); return 0; } Signed-off-by: Jens Axboe --- fs/eventpoll.c | 97 +++++++++++++++++++++++++++++----- include/linux/eventpoll.h | 2 +- include/uapi/linux/eventpoll.h | 1 + 3 files changed, 85 insertions(+), 15 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 962d897bbfc6..9e00f8780ec5 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -117,6 +117,9 @@ struct eppoll_entry { /* The "base" pointer is set to the container "struct epitem" */ struct epitem *base; + /* min wait time if (min_wait_ts) & 1 != 0 */ + ktime_t min_wait_ts; + /* * Wait queue item that will be linked to the target file wait * queue head. @@ -217,6 +220,9 @@ struct eventpoll { u64 gen; struct hlist_head refs; + /* min wait for epoll_wait() */ + unsigned int min_wait_ts; + #ifdef CONFIG_NET_RX_BUSY_POLL /* used to track busy poll napi_id */ unsigned int napi_id; @@ -1747,6 +1753,32 @@ static struct timespec64 *ep_timeout_to_timespec(struct timespec64 *to, long ms) return to; } +struct epoll_wq { + wait_queue_entry_t wait; + struct hrtimer timer; + ktime_t timeout_ts; + ktime_t min_wait_ts; + struct eventpoll *ep; + bool timed_out; + int maxevents; + int wakeups; +}; + +static bool ep_should_min_wait(struct epoll_wq *ewq) +{ + if (ewq->min_wait_ts & 1) { + /* just an approximation */ + if (++ewq->wakeups >= ewq->maxevents) + goto stop_wait; + if (ktime_before(ktime_get_ns(), ewq->min_wait_ts)) + return true; + } + +stop_wait: + ewq->min_wait_ts &= ~(u64) 1; + return false; +} + /* * autoremove_wake_function, but remove even on failure to wake up, because we * know that default_wake_function/ttwu will only fail if the thread is already @@ -1756,27 +1788,37 @@ static struct timespec64 *ep_timeout_to_timespec(struct timespec64 *to, long ms) static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned int mode, int sync, void *key) { - int ret = default_wake_function(wq_entry, mode, sync, key); + struct epoll_wq *ewq = container_of(wq_entry, struct epoll_wq, wait); + int ret; + + /* + * If min wait time hasn't been satisfied yet, keep waiting + */ + if (ep_should_min_wait(ewq)) + return 0; + ret = default_wake_function(wq_entry, mode, sync, key); list_del_init(&wq_entry->entry); return ret; } -struct epoll_wq { - wait_queue_entry_t wait; - struct hrtimer timer; - ktime_t timeout_ts; - bool timed_out; -}; - static enum hrtimer_restart ep_timer(struct hrtimer *timer) { struct epoll_wq *ewq = container_of(timer, struct epoll_wq, timer); struct task_struct *task = ewq->wait.private; + const bool is_min_wait = ewq->min_wait_ts & 1; + + if (!is_min_wait || ep_events_available(ewq->ep)) { + if (!is_min_wait) + ewq->timed_out = true; + ewq->min_wait_ts &= ~(u64) 1; + wake_up_process(task); + return HRTIMER_NORESTART; + } - ewq->timed_out = true; - wake_up_process(task); - return HRTIMER_NORESTART; + ewq->min_wait_ts &= ~(u64) 1; + hrtimer_set_expires_range_ns(&ewq->timer, ewq->timeout_ts, 0); + return HRTIMER_RESTART; } static void ep_schedule(struct eventpoll *ep, struct epoll_wq *ewq, ktime_t *to, @@ -1831,12 +1873,16 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, lockdep_assert_irqs_enabled(); + ewq.min_wait_ts = 0; + ewq.ep = ep; + ewq.maxevents = maxevents; ewq.timed_out = false; + ewq.wakeups = 0; if (timeout && (timeout->tv_sec | timeout->tv_nsec)) { slack = select_estimate_accuracy(timeout); + ewq.timeout_ts = timespec64_to_ktime(*timeout); to = &ewq.timeout_ts; - *to = timespec64_to_ktime(*timeout); } else if (timeout) { /* * Avoid the unnecessary trip to the wait queue loop, if the @@ -1845,6 +1891,18 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, ewq.timed_out = true; } + /* + * If min_wait is set for this epoll instance, note the min_wait + * time. Ensure the lowest bit is set in ewq.min_wait_ts, that's + * the state bit for whether or not min_wait is enabled. + */ + if (ep->min_wait_ts) { + ewq.min_wait_ts = ktime_add_us(ktime_get_ns(), + ep->min_wait_ts); + ewq.min_wait_ts |= (u64) 1; + to = &ewq.min_wait_ts; + } + /* * This call is racy: We may or may not see events that are being added * to the ready list under the lock (e.g., in IRQ callbacks). For cases @@ -1913,7 +1971,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, * important. */ eavail = ep_events_available(ep); - if (!eavail) { + if (!eavail || ewq.min_wait_ts & 1) { __add_wait_queue_exclusive(&ep->wq, &ewq.wait); write_unlock_irq(&ep->lock); ep_schedule(ep, &ewq, to, slack); @@ -2125,6 +2183,17 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds, */ ep = f.file->private_data; + /* + * Handle EPOLL_CTL_MIN_WAIT upfront as we don't need to care about + * the fd being passed in. + */ + if (op == EPOLL_CTL_MIN_WAIT) { + /* return old value */ + error = ep->min_wait_ts; + ep->min_wait_ts = epds->data; + goto error_fput; + } + /* Get the "struct file *" for the target file */ tf = fdget(fd); if (!tf.file) @@ -2257,7 +2326,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, { struct epoll_event epds; - if (ep_op_has_event(op) && + if ((ep_op_has_event(op) || op == EPOLL_CTL_MIN_WAIT) && copy_from_user(&epds, event, sizeof(struct epoll_event))) return -EFAULT; diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 3337745d81bd..cbef635cb7e4 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -59,7 +59,7 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds, /* Tells if the epoll_ctl(2) operation needs an event copy from userspace */ static inline int ep_op_has_event(int op) { - return op != EPOLL_CTL_DEL; + return op != EPOLL_CTL_DEL && op != EPOLL_CTL_MIN_WAIT; } #else diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h index 8a3432d0f0dc..81ecb1ca36e0 100644 --- a/include/uapi/linux/eventpoll.h +++ b/include/uapi/linux/eventpoll.h @@ -26,6 +26,7 @@ #define EPOLL_CTL_ADD 1 #define EPOLL_CTL_DEL 2 #define EPOLL_CTL_MOD 3 +#define EPOLL_CTL_MIN_WAIT 4 /* Epoll event masks */ #define EPOLLIN (__force __poll_t)0x00000001