From patchwork Thu Dec 1 18:11:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13061713 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0F74C4321E for ; Thu, 1 Dec 2022 18:12:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230098AbiLASMH (ORCPT ); Thu, 1 Dec 2022 13:12:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229830AbiLASMG (ORCPT ); Thu, 1 Dec 2022 13:12:06 -0500 Received: from mail-il1-x12a.google.com (mail-il1-x12a.google.com [IPv6:2607:f8b0:4864:20::12a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCBE2B7DDD for ; Thu, 1 Dec 2022 10:12:05 -0800 (PST) Received: by mail-il1-x12a.google.com with SMTP id m15so1097232ilq.2 for ; Thu, 01 Dec 2022 10:12:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8PHeEQvI03GO1u4arOs5eKGxrrWj+QcJXenB1+d+Ev4=; b=HZsYyn/z6EZxF36zedEHApAUBJ4P2dRhxWAAmEyxvmr310UQWzVRPVbApPIsSlk1o/ rgSJjhyFcbZgkEzBYSXVuoKf9N6ik+dqn61leEofV6Rr7hRtWk0lyGfnaKdO0PG7+u/+ 44cZp9cYXcWPQk/YuxfkuaNFMmi/6uKPYn8HnxVOej4A7MQVSb/ZXS0V5Rpe6SQ0IeAo NSmxA1q2sZBjPXe37/6uKtE1H4SbpwEB8Gu314VMk29RlzXWaHSGKH7Ri0Yr5PRMbyvl m8y70Uk6erMJKkVghZF++Qv7+nRhEa2ncIOxeLZqW+GFWUPYdDyxCEI0gYmrEfKlxpaN VJ6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8PHeEQvI03GO1u4arOs5eKGxrrWj+QcJXenB1+d+Ev4=; b=XPddC99cFi/hwv+5AeSL1SmwdVmlAeIutwN0bsaViMji+OXSKKbNvAClPeAySkImrk 4+lGX/5fq38ZaOEIFh9FTX6Xd+ZhQaLhKJy99z0R6DXmqg969b5LOQwbAbDet41Z0X6f lf+tPihCChJi3tzo+W5y+QOKral1a6nltz4Ea/3ZUhLgbFznKw82XKAJtFSVUVV4V4KS yqCDOJ9PNE84P79X6q9iG7dKWa1glFCKmd4sh3XRJIH47f16QsrnVQ5KaggafSsJtwlt JGx6HGZcwWX9wizrpBnrksmLo8NUDqXD/vyMya2aO630ZH6dGRpgUP1sVMXvxzB7aaQN JmVQ== X-Gm-Message-State: ANoB5plr4CiLB89cx5GNDUpuLKBTfBlsO+dgzHEPSPHnIP7EE7M0OI/z vtFSHH+ZlZrHgaQB3dyv4QqRBg== X-Google-Smtp-Source: AA0mqf6kGjBTXK8PVOunig0m1HO0fqDO2b1pZHBDm6FXORezd3Vc7V2LpVfuCrs+dFZA/N6MzEk9+Q== X-Received: by 2002:a05:6e02:1251:b0:303:1c15:2818 with SMTP id j17-20020a056e02125100b003031c152818mr8384519ilq.87.1669918324962; Thu, 01 Dec 2022 10:12:04 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:04 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 1/7] eventpoll: cleanup branches around sleeping for events Date: Thu, 1 Dec 2022 11:11:50 -0700 Message-Id: <20221201181156.848373-2-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Rather than have two separate branches here, collapse them into a single one instead. No functional changes here, just a cleanup in preparation for changes in this area. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 52954d4637b5..3061bdde6cba 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1869,14 +1869,15 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, * important. */ eavail = ep_events_available(ep); - if (!eavail) + if (!eavail) { __add_wait_queue_exclusive(&ep->wq, &wait); - - write_unlock_irq(&ep->lock); - - if (!eavail) + write_unlock_irq(&ep->lock); timed_out = !schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS); + } else { + write_unlock_irq(&ep->lock); + } + __set_current_state(TASK_RUNNING); /* From patchwork Thu Dec 1 18:11:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13061721 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D67FEC43217 for ; Thu, 1 Dec 2022 18:12:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230171AbiLASMI (ORCPT ); Thu, 1 Dec 2022 13:12:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229999AbiLASMH (ORCPT ); Thu, 1 Dec 2022 13:12:07 -0500 Received: from mail-io1-xd2e.google.com (mail-io1-xd2e.google.com [IPv6:2607:f8b0:4864:20::d2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FBBEA1C0A for ; Thu, 1 Dec 2022 10:12:06 -0800 (PST) Received: by mail-io1-xd2e.google.com with SMTP id n188so1569994iof.8 for ; Thu, 01 Dec 2022 10:12:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TCMSGu8wCAuxi+YEaeqt2Nm6ljarPNoUCebXSLfB1Fw=; b=7rToxcBAjNQi+teOR/HtEPsnLOYMOnWUeha2jHX+ubOPphUUlFzGLJ4Fe1IYPdO1VW enEBIs7LitVgRCKTuHrjuThBnVfxtBFEMCju3P6P9WcCV1t9LWJhlo3mkRwr6SrAwO1D yoqZbuamMTV2WlHMUkepdyL5Xqor2YcYSaaTktTtZNAFOZhGhF4FUnPijBD3hwA7DvNH RHAlKTRVQqxVvJvcBxfqIqMWd6mNhSUYAzvuqN6ViWqrv4BAvQm18wBgyzIwfR7zpR7X 1mkSD3MYCxFWbk4qT1FMG9SddowhtGk3HGi6pPUSrixd5nXBqwQXeQ8hhBpKqqvpGASM IBMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TCMSGu8wCAuxi+YEaeqt2Nm6ljarPNoUCebXSLfB1Fw=; b=MSbWpyVVxbd7jSa0SLrtI5UeKgUtUYwa+R82YqN5vvSyggb6zeZx0Vbn3+iQd62cKp fGGgG3jUGLdNvpqt3HwCX73VGpBg0yz/kNcXoXQe1W/Kna5bptVi7243c8Jj42AFHw2G RuqBokr0AhWiO8gqi2pwxJLe0fytuu4Tc+EBBQNzHs9EWUMMZNZ4bH9eRCO1pB+LdyX6 +fmINHAVI9IzUlzlS3OWnPrF3cXEGYfid1A+yZic6whAWSTgM8ba2YtQI6VMlAGeMSnm FGoWd5pFXWtyUPwvNt1yGqBnY6X85Rpn102JeodwiOfnBEQyfo07SSODS7gGep4QtqZr BJzg== X-Gm-Message-State: ANoB5pl09vfwFSczxFsMYwJHALA4WKSsP1HZQ+IGIQApy6WpXf7ZJAmK cxFWjrbuISHHgv5YPLxEuFAMqg== X-Google-Smtp-Source: AA0mqf6rBwwB4nUZFcqG3+zWfl/ciQn5NxuTGH2Wmrrxs/wUaiO8rXztJ5U/q7t0p/9YbpFDxZwY4A== X-Received: by 2002:a05:6638:15cc:b0:389:e983:dfd1 with SMTP id i12-20020a05663815cc00b00389e983dfd1mr8192666jat.306.1669918325874; Thu, 01 Dec 2022 10:12:05 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:05 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 2/7] eventpoll: don't pass in 'timed_out' to ep_busy_loop() Date: Thu, 1 Dec 2022 11:11:51 -0700 Message-Id: <20221201181156.848373-3-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org It's known to be 'false' from the one call site we have, as we break out of the loop if it's not. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 3061bdde6cba..64d7331353dd 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -396,12 +396,12 @@ static bool ep_busy_loop_end(void *p, unsigned long start_time) * * we must do our busy polling with irqs enabled */ -static bool ep_busy_loop(struct eventpoll *ep, int nonblock) +static bool ep_busy_loop(struct eventpoll *ep) { unsigned int napi_id = READ_ONCE(ep->napi_id); if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on()) { - napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false, + napi_busy_loop(napi_id, ep_busy_loop_end, ep, false, BUSY_POLL_BUDGET); if (ep_events_available(ep)) return true; @@ -453,7 +453,7 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi) #else -static inline bool ep_busy_loop(struct eventpoll *ep, int nonblock) +static inline bool ep_busy_loop(struct eventpoll *ep) { return false; } @@ -1826,7 +1826,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, if (timed_out) return 0; - eavail = ep_busy_loop(ep, timed_out); + eavail = ep_busy_loop(ep); if (eavail) continue; From patchwork Thu Dec 1 18:11:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13061722 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 425C2C47089 for ; Thu, 1 Dec 2022 18:12:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230252AbiLASMK (ORCPT ); Thu, 1 Dec 2022 13:12:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36376 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230158AbiLASMH (ORCPT ); Thu, 1 Dec 2022 13:12:07 -0500 Received: from mail-il1-x130.google.com (mail-il1-x130.google.com [IPv6:2607:f8b0:4864:20::130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67B47A1C0A for ; Thu, 1 Dec 2022 10:12:07 -0800 (PST) Received: by mail-il1-x130.google.com with SMTP id x12so1099402ilg.1 for ; Thu, 01 Dec 2022 10:12:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=d8OpHlPRST08skeh7BHSZbFN09k6AB6QzxLidf3Pxy0=; b=AS0uFZT7AMQnrCsnu0myce/68PNqhpRsA5u5f4LRNoWMlDHGfdkpnX4Bi7cN5cdYGb xqdvKOmhCzI+5xIdNOa7IXXJKN76QUhQ5bEE0TGc9TT1CzwQWBwxaBo9dcoAm2EXczuO BWbTIe5bxzNBixBo+grxsNP/S6uE+oBHuP7FwcSGjYpqt9pNgmik/w/epK7az0/nTi6J SU1JdTULxEiQvzxCmGA4dAqUAdR7cFcERSrlB3wYXqjxQcTWauNZkbX3up+4lfaLMAno nD9XRVUYifvWGAzNsFQcvl/jDfoyDKtULpSDCOxy/rpRlexRJGGOuiApfmzFdFHr2iUL J55w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=d8OpHlPRST08skeh7BHSZbFN09k6AB6QzxLidf3Pxy0=; b=UxWQ9+WR/vdyEV41IYt91nwLua2QO1xpm36zf7JzukifOj1Thj8eUoUc/OrHP7juu7 2EgtgeKagq7WRMr34xq8qHnz5cgBBh6m1P8NbsE2QJesPJMtuRinCDSNkKv/98KufrBO v2lFPLiHIOevKQ9ly94dYiVMT51Lr7aoAsPn+5D3gqffAE6Cs14g0tTlRoF1SJNRpN9v 850XsyS31KX1JYp2U0snf6JIWF0s7s63IhrERkt0JyBCvcP/bRXb5It3nEi++aKWWLZf cTE5YyLiu36yOwCrn5clE+10DF4kIxSkwcxLOe8zDWH+H1y7CCtlZrdLIfBmKbWEqQe6 jGYg== X-Gm-Message-State: ANoB5plwA9BRDPUoDdhUzNbM/qSPSbPDkPpXoxZR9jMLlGVB0+kTpfVg BSCcIS2rXyll2GkNny4HHh6+pg== X-Google-Smtp-Source: AA0mqf74rtsrKXEuoo2caUTbU7JNHXUXS1+uampBMqsW7Usq/+Os6d15vXnGS1Vl1qTyLEypX5XlLQ== X-Received: by 2002:a92:d689:0:b0:303:2806:1ca0 with SMTP id p9-20020a92d689000000b0030328061ca0mr5247364iln.247.1669918326690; Thu, 01 Dec 2022 10:12:06 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:06 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 3/7] eventpoll: split out wait handling Date: Thu, 1 Dec 2022 11:11:52 -0700 Message-Id: <20221201181156.848373-4-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In preparation for making changes to how wakeups and sleeps are done, move the timeout scheduling into a helper and manage it rather than rely on schedule_hrtimeout_range(). Signed-off-by: Jens Axboe --- fs/eventpoll.c | 68 ++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 55 insertions(+), 13 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 64d7331353dd..888f565d0c5f 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1762,6 +1762,47 @@ static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, return ret; } +struct epoll_wq { + wait_queue_entry_t wait; + struct hrtimer timer; + bool timed_out; +}; + +static enum hrtimer_restart ep_timer(struct hrtimer *timer) +{ + struct epoll_wq *ewq = container_of(timer, struct epoll_wq, timer); + struct task_struct *task = ewq->wait.private; + + ewq->timed_out = true; + wake_up_process(task); + return HRTIMER_NORESTART; +} + +static void ep_schedule(struct eventpoll *ep, struct epoll_wq *ewq, ktime_t *to, + u64 slack) +{ + if (ewq->timed_out) + return; + if (to && *to == 0) { + ewq->timed_out = true; + return; + } + if (!to) { + schedule(); + return; + } + + hrtimer_init_on_stack(&ewq->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); + ewq->timer.function = ep_timer; + hrtimer_set_expires_range_ns(&ewq->timer, *to, slack); + hrtimer_start_expires(&ewq->timer, HRTIMER_MODE_ABS); + + schedule(); + + hrtimer_cancel(&ewq->timer); + destroy_hrtimer_on_stack(&ewq->timer); +} + /** * ep_poll - Retrieves ready events, and delivers them to the caller-supplied * event buffer. @@ -1782,13 +1823,15 @@ static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, int maxevents, struct timespec64 *timeout) { - int res, eavail, timed_out = 0; + int res, eavail; u64 slack = 0; - wait_queue_entry_t wait; ktime_t expires, *to = NULL; + struct epoll_wq ewq; lockdep_assert_irqs_enabled(); + ewq.timed_out = false; + if (timeout && (timeout->tv_sec | timeout->tv_nsec)) { slack = select_estimate_accuracy(timeout); to = &expires; @@ -1798,7 +1841,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, * Avoid the unnecessary trip to the wait queue loop, if the * caller specified a non blocking operation. */ - timed_out = 1; + ewq.timed_out = true; } /* @@ -1823,7 +1866,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, return res; } - if (timed_out) + if (ewq.timed_out) return 0; eavail = ep_busy_loop(ep); @@ -1850,8 +1893,8 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, * performance issue if a process is killed, causing all of its * threads to wake up without being removed normally. */ - init_wait(&wait); - wait.func = ep_autoremove_wake_function; + init_wait(&ewq.wait); + ewq.wait.func = ep_autoremove_wake_function; write_lock_irq(&ep->lock); /* @@ -1870,10 +1913,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, */ eavail = ep_events_available(ep); if (!eavail) { - __add_wait_queue_exclusive(&ep->wq, &wait); + __add_wait_queue_exclusive(&ep->wq, &ewq.wait); write_unlock_irq(&ep->lock); - timed_out = !schedule_hrtimeout_range(to, slack, - HRTIMER_MODE_ABS); + ep_schedule(ep, &ewq, to, slack); } else { write_unlock_irq(&ep->lock); } @@ -1887,7 +1929,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, */ eavail = 1; - if (!list_empty_careful(&wait.entry)) { + if (!list_empty_careful(&ewq.wait.entry)) { write_lock_irq(&ep->lock); /* * If the thread timed out and is not on the wait queue, @@ -1896,9 +1938,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, * Thus, when wait.entry is empty, it needs to harvest * events. */ - if (timed_out) - eavail = list_empty(&wait.entry); - __remove_wait_queue(&ep->wq, &wait); + if (ewq.timed_out) + eavail = list_empty(&ewq.wait.entry); + __remove_wait_queue(&ep->wq, &ewq.wait); write_unlock_irq(&ep->lock); } } From patchwork Thu Dec 1 18:11:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13061723 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 056BAC43217 for ; Thu, 1 Dec 2022 18:12:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230164AbiLASMX (ORCPT ); Thu, 1 Dec 2022 13:12:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230227AbiLASMJ (ORCPT ); Thu, 1 Dec 2022 13:12:09 -0500 Received: from mail-io1-xd32.google.com (mail-io1-xd32.google.com [IPv6:2607:f8b0:4864:20::d32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 60339B7DDD for ; Thu, 1 Dec 2022 10:12:08 -0800 (PST) Received: by mail-io1-xd32.google.com with SMTP id i80so1605960ioa.0 for ; Thu, 01 Dec 2022 10:12:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+7XieXpcJrzhapQHtWCNoW0rQy5yiKhkq34SDOyw0xw=; b=rCV2KahuRSA2UomgHjcLL2MPSHWQzvEOZHnqXl9CNih1pE+gZRpjzdYg3AGoNxGtH/ lrwbTserdKCot3tq/5ePqPYHCEDByLsr5EJeyAa3WUpPRYldyDwDb72H4U+e7EULEDfX YOz044AliNTRJSeVBbissUP76TH1dDpAxrwe5pgd94G4wvCJ4IeVnDfJhMx47sKWeWPs 0C7kWE1xTeT+Dx73CkQ/oLdCov1SSDwhGCyL0CKWRohvyTJvqo4qo5nur82aLKeNrnxf rQRbX7ukSMrqtcrZIvRHBe9oddDEKsDHVoB5vCoS6H8vIAXT8EF6vPM5WVzceR1ifqHH kwow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+7XieXpcJrzhapQHtWCNoW0rQy5yiKhkq34SDOyw0xw=; b=vehopNYXYxUh3wzI1Ye6BjwDy9RrO+8dpZXtczPrY5/z9HwsT/jr6hdRgSAPjiKGOj IgEuO5MM6EFj8kiuPIQWT4aJIUCWoTVbuqBtBcjKGBTbGwVdPt+4YCgT0OnB8B4cBSmH DcPDMf8a142ZOHIejlm5vbkAxII3vIlU7eXDrPHw7c/5h3lCMGSI6kli80LosKIQLeWG gq+QhEgGAK67MubYsmIkcWHfSiQGGCe/WBaCPm95F7ZMoVup36s1yxU8HRqfx71CJioN QvZLMJigqAxgvAv/9xu7NIUNF5g8Y09C7R3i95DMIbfhShie6cjMTratCwnfZJULk6ra X5FA== X-Gm-Message-State: ANoB5pmUBuwD8a+fJfCIPChKBPzns822usgl8AcDSIQHrER6kYvyFH/t /DRWvxb7D+s+f9VsEVkd3zPPNg== X-Google-Smtp-Source: AA0mqf74WtcWLOBknC51Eqq2LhOOyGElfDAy+jSdzXDD60HRlSEZEkNsL3iTmtYnZTNrl5XL3BCV9A== X-Received: by 2002:a02:a710:0:b0:389:d089:4233 with SMTP id k16-20020a02a710000000b00389d0894233mr12448957jam.18.1669918327697; Thu, 01 Dec 2022 10:12:07 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:07 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 4/7] eventpoll: move expires to epoll_wq Date: Thu, 1 Dec 2022 11:11:53 -0700 Message-Id: <20221201181156.848373-5-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This makes the expiration available to the wakeup handler. No functional changes expected in this patch, purely in preparation for being able to use the timeout on the wakeup side. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 888f565d0c5f..0994f2eb6adc 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1765,6 +1765,7 @@ static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, struct epoll_wq { wait_queue_entry_t wait; struct hrtimer timer; + ktime_t timeout_ts; bool timed_out; }; @@ -1825,7 +1826,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, { int res, eavail; u64 slack = 0; - ktime_t expires, *to = NULL; + ktime_t *to = NULL; struct epoll_wq ewq; lockdep_assert_irqs_enabled(); @@ -1834,7 +1835,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, if (timeout && (timeout->tv_sec | timeout->tv_nsec)) { slack = select_estimate_accuracy(timeout); - to = &expires; + to = &ewq.timeout_ts; *to = timespec64_to_ktime(*timeout); } else if (timeout) { /* From patchwork Thu Dec 1 18:11:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13061724 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DF7AC43217 for ; Thu, 1 Dec 2022 18:12:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230491AbiLASM0 (ORCPT ); Thu, 1 Dec 2022 13:12:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230367AbiLASMS (ORCPT ); Thu, 1 Dec 2022 13:12:18 -0500 Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36B47B844B for ; Thu, 1 Dec 2022 10:12:09 -0800 (PST) Received: by mail-io1-xd2b.google.com with SMTP id 135so1570679iou.7 for ; Thu, 01 Dec 2022 10:12:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=T2Hzb+xxyKnPzMsKtR+WvmF1ywGYO4tzf59V+gc/0/s=; b=2atU+qY8N6U1TWuEjoXo0Ek+wKx6gWrLYp8D9cZz1SEd++H+eWVuXdr3WUmYAaBF3h CXDqX2fX2YFDXbnYSeIWfgeuJyAoP/MnjAsqnXzRedinp+W7XDc7ppleItMH3Ati2r1r pWU+OXSKGhu/TbrMFfI7gUobNXqYO3tuC+SBghlInhOM9GAdacLA5zPErWxgwlToLDPk VLBDRdh/ETAMA76uckSDzQRnLzivJQnR0ZzmDxBpX9FrTbWFBuexQuphd37ILJwWHd0E /cfJYCD7LwtgUHaWTI2vY2Waup5fInS5DVQuUsNCxIiaNr7qjw7Sm35SjXYyN9jMkbTq pfSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T2Hzb+xxyKnPzMsKtR+WvmF1ywGYO4tzf59V+gc/0/s=; b=b03dDmg3A1UQB6fquEilNAaLIMA10Ja9Sg9aZUcRQO8m+97ixpDxhgQo9Eo3Ps6lK4 vIcnBuKaCi6u9F8mKLE4PGCGijj/1LpRvUfx2y213lhswlHcwjqUEweFRRPCRXeYbMaT JqmlJwcBAyNUGvZeZu7iYIK/LtKYVJ52ZPcIe8wAHABrhrVKQIEYK1falm76+PXDAHMF FgBc2FjMtkBaNKfhTvJ1br/4uwbmi3c+pZoCVxK8wb6kZ2LZc4AtJHCTnByztxNwqQXa tAKPiOVp1a+zCnfjd8w+C+IR4fiGISed4ai5lJHW3FgLyEsPj+QD4ztcvnwUibrXMiRT gPzQ== X-Gm-Message-State: ANoB5pl8mXTzFOrNglUvEVFbpUfssSz9QdOSB1Cgxg2q+Novq2j0S4Jk r4NQlVpNPO1Tn/8KqhEJ+xl05Q== X-Google-Smtp-Source: AA0mqf71c/tcCYkCrnJEPuvS6YTKCor+CO40ni3Rrsmti3mZrEckPPOVEKukp2kBF215wOaXLqH5OQ== X-Received: by 2002:a02:16c8:0:b0:38a:c4d:931f with SMTP id a191-20020a0216c8000000b0038a0c4d931fmr3207615jaa.176.1669918328499; Thu, 01 Dec 2022 10:12:08 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:08 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 5/7] eventpoll: move file checking earlier for epoll_ctl() Date: Thu, 1 Dec 2022 11:11:54 -0700 Message-Id: <20221201181156.848373-6-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This just cleans up the checking a bit, in preparation for a change that will need access to 'ep' earlier. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 0994f2eb6adc..962d897bbfc6 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2111,6 +2111,20 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds, if (!f.file) goto error_return; + /* + * We have to check that the file structure underneath the file + * descriptor the user passed to us _is_ an eventpoll file. + */ + error = -EINVAL; + if (!is_file_epoll(f.file)) + goto error_fput; + + /* + * At this point it is safe to assume that the "private_data" contains + * our own data structure. + */ + ep = f.file->private_data; + /* Get the "struct file *" for the target file */ tf = fdget(fd); if (!tf.file) @@ -2126,12 +2140,10 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds, ep_take_care_of_epollwakeup(epds); /* - * We have to check that the file structure underneath the file descriptor - * the user passed to us _is_ an eventpoll file. And also we do not permit - * adding an epoll file descriptor inside itself. + * We do not permit adding an epoll file descriptor inside itself. */ error = -EINVAL; - if (f.file == tf.file || !is_file_epoll(f.file)) + if (f.file == tf.file) goto error_tgt_fput; /* @@ -2147,12 +2159,6 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds, goto error_tgt_fput; } - /* - * At this point it is safe to assume that the "private_data" contains - * our own data structure. - */ - ep = f.file->private_data; - /* * When we insert an epoll file descriptor inside another epoll file * descriptor, there is the chance of creating closed loops, which are From patchwork Thu Dec 1 18:11:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13061725 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 901F7C47088 for ; Thu, 1 Dec 2022 18:12:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230445AbiLASMa (ORCPT ); Thu, 1 Dec 2022 13:12:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230386AbiLASMW (ORCPT ); Thu, 1 Dec 2022 13:12:22 -0500 Received: from mail-io1-xd32.google.com (mail-io1-xd32.google.com [IPv6:2607:f8b0:4864:20::d32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23EA3B7DE1 for ; Thu, 1 Dec 2022 10:12:10 -0800 (PST) Received: by mail-io1-xd32.google.com with SMTP id d18so1574514iof.6 for ; Thu, 01 Dec 2022 10:12:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mZ9tl6KkNetvRAWXe2iAAf/ML4fBDffEt0E3CXrPzCE=; b=l4wGJmKVt1fPfhJvQBLb5fQkAKw+lUzYeSTILSbU6Uq+yHzEtXv1G/VdU4rETW2k0a WgEKMyF1NjMLgqGbaejKRlOXnP1hSlHSJYkJdgZAmu+SPy+TmPHkrSW13WnaxD5fgD8+ IhvSF9aaSloA3mi1nq9od2wRv3ISoh12Uxm8LSdCFB7H3nEo8Iwfxn4v5r4LfT0JE7yD fnOP0lo0vweKMI4qLFDYINpw9TyXqgvMlnIhKZAEKAmKmUY8fq4/H6wW88GH4paIE4Xr UTHG/9PwV1oEImy4nEfT+GcIgYvHe9ohvof3nu5pVjQ0sd3+wMNkUItC4k8sRXIrWnBp n/Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mZ9tl6KkNetvRAWXe2iAAf/ML4fBDffEt0E3CXrPzCE=; b=ZKJmoT1MZeGws944DdxOkLAea/olDnaKViRbKs3ZWG15n4ndfRcKzFYk2V1K2clZPf hd/v92zyjYeMIB7ATcJ945hLKvcgGoxD9vpQp4EXksi1Cm8AupLqkrvShCX6CmT7swP7 FohZ9C4zf68Yo4cW95EJVMAfQxnlq5qvIqIPYtXJ0eV37kHlWP3dGkff6vFL31h+HUVv 6S7GJ+ZDB5j7zOKVC+YAewLuaGk+VBRcmRE1P5xFgjZ4+zgaK9bsFgN5ZF2wSrYRUh+Y QXjLfXwos0/XshqAXv3P9EngK5658zVctRtMjjsycoCvUEckdIGCZqyyyCEEn4ZJvoLS jjeQ== X-Gm-Message-State: ANoB5plpO1XAMm8GM22Y3V0/so323AedmE3LVBPx/uhMgtRDG+/JjGiK XXg++2GGdIf0IgiH8YlhIwbUVQ== X-Google-Smtp-Source: AA0mqf5zLJz/mgwfBZKobgUjeWoWZs1zCAXYSpLS0zfZhZNhS+w+fzJrMS6nQ2st8FUfaz6HwlAScw== X-Received: by 2002:a6b:fb13:0:b0:6de:383e:4146 with SMTP id h19-20020a6bfb13000000b006de383e4146mr24784036iog.48.1669918329401; Thu, 01 Dec 2022 10:12:09 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:08 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 6/7] eventpoll: add support for min-wait Date: Thu, 1 Dec 2022 11:11:55 -0700 Message-Id: <20221201181156.848373-7-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This adds the necessary infrastructure to support a minimum wait for reaping events, API for setting or applying a minimum wait will come in the following patches. For medium workload efficiencies, some production workloads inject artificial timers or sleeps before calling epoll_wait() to get better batching and higher efficiencies. While this does help, it's not as efficient as it could be. By adding support for epoll_wait() for this directly, we can avoids extra context switches and scheduler and timer overhead. As an example, running an AB test on an identical workload at about ~370K reqs/second, without this change and with the sleep hack mentioned above (using 200 usec as the timeout), we're doing 310K-340K non-voluntary context switches per second. Idle CPU on the host is 27-34%. With the the sleep hack removed and epoll set to the same 200 usec value, we're handling the exact same load but at 292K-315k non-voluntary context switches and idle CPU of 33-41%, a substantial win. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 84 ++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 71 insertions(+), 13 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 962d897bbfc6..daa9885d9c2b 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -117,6 +117,9 @@ struct eppoll_entry { /* The "base" pointer is set to the container "struct epitem" */ struct epitem *base; + /* min wait time if (min_wait_ts) & 1 != 0 */ + ktime_t min_wait_ts; + /* * Wait queue item that will be linked to the target file wait * queue head. @@ -217,6 +220,9 @@ struct eventpoll { u64 gen; struct hlist_head refs; + /* min wait for epoll_wait() */ + unsigned int min_wait_ts; + #ifdef CONFIG_NET_RX_BUSY_POLL /* used to track busy poll napi_id */ unsigned int napi_id; @@ -1747,6 +1753,32 @@ static struct timespec64 *ep_timeout_to_timespec(struct timespec64 *to, long ms) return to; } +struct epoll_wq { + wait_queue_entry_t wait; + struct hrtimer timer; + ktime_t timeout_ts; + ktime_t min_wait_ts; + struct eventpoll *ep; + bool timed_out; + int maxevents; + int wakeups; +}; + +static bool ep_should_min_wait(struct epoll_wq *ewq) +{ + if (ewq->min_wait_ts & 1) { + /* just an approximation */ + if (++ewq->wakeups >= ewq->maxevents) + goto stop_wait; + if (ktime_before(ktime_get_ns(), ewq->min_wait_ts)) + return true; + } + +stop_wait: + ewq->min_wait_ts &= ~(u64) 1; + return false; +} + /* * autoremove_wake_function, but remove even on failure to wake up, because we * know that default_wake_function/ttwu will only fail if the thread is already @@ -1756,27 +1788,37 @@ static struct timespec64 *ep_timeout_to_timespec(struct timespec64 *to, long ms) static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned int mode, int sync, void *key) { - int ret = default_wake_function(wq_entry, mode, sync, key); + struct epoll_wq *ewq = container_of(wq_entry, struct epoll_wq, wait); + int ret; + /* + * If min wait time hasn't been satisfied yet, keep waiting + */ + if (ep_should_min_wait(ewq)) + return 0; + + ret = default_wake_function(wq_entry, mode, sync, key); list_del_init(&wq_entry->entry); return ret; } -struct epoll_wq { - wait_queue_entry_t wait; - struct hrtimer timer; - ktime_t timeout_ts; - bool timed_out; -}; - static enum hrtimer_restart ep_timer(struct hrtimer *timer) { struct epoll_wq *ewq = container_of(timer, struct epoll_wq, timer); struct task_struct *task = ewq->wait.private; + const bool is_min_wait = ewq->min_wait_ts & 1; + + if (!is_min_wait || ep_events_available(ewq->ep)) { + if (!is_min_wait) + ewq->timed_out = true; + ewq->min_wait_ts &= ~(u64) 1; + wake_up_process(task); + return HRTIMER_NORESTART; + } - ewq->timed_out = true; - wake_up_process(task); - return HRTIMER_NORESTART; + ewq->min_wait_ts &= ~(u64) 1; + hrtimer_set_expires_range_ns(&ewq->timer, ewq->timeout_ts, 0); + return HRTIMER_RESTART; } static void ep_schedule(struct eventpoll *ep, struct epoll_wq *ewq, ktime_t *to, @@ -1831,12 +1873,16 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, lockdep_assert_irqs_enabled(); + ewq.min_wait_ts = 0; + ewq.ep = ep; + ewq.maxevents = maxevents; ewq.timed_out = false; + ewq.wakeups = 0; if (timeout && (timeout->tv_sec | timeout->tv_nsec)) { slack = select_estimate_accuracy(timeout); + ewq.timeout_ts = timespec64_to_ktime(*timeout); to = &ewq.timeout_ts; - *to = timespec64_to_ktime(*timeout); } else if (timeout) { /* * Avoid the unnecessary trip to the wait queue loop, if the @@ -1845,6 +1891,18 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, ewq.timed_out = true; } + /* + * If min_wait is set for this epoll instance, note the min_wait + * time. Ensure the lowest bit is set in ewq.min_wait_ts, that's + * the state bit for whether or not min_wait is enabled. + */ + if (!ewq.timed_out && ep->min_wait_ts) { + ewq.min_wait_ts = ktime_add_us(ktime_get_ns(), + ep->min_wait_ts); + ewq.min_wait_ts |= (u64) 1; + to = &ewq.min_wait_ts; + } + /* * This call is racy: We may or may not see events that are being added * to the ready list under the lock (e.g., in IRQ callbacks). For cases @@ -1913,7 +1971,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, * important. */ eavail = ep_events_available(ep); - if (!eavail) { + if (!eavail || ewq.min_wait_ts & 1) { __add_wait_queue_exclusive(&ep->wq, &ewq.wait); write_unlock_irq(&ep->lock); ep_schedule(ep, &ewq, to, slack); From patchwork Thu Dec 1 18:11:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13061726 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3407C4321E for ; Thu, 1 Dec 2022 18:12:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230359AbiLASMc (ORCPT ); Thu, 1 Dec 2022 13:12:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230436AbiLASMX (ORCPT ); Thu, 1 Dec 2022 13:12:23 -0500 Received: from mail-io1-xd32.google.com (mail-io1-xd32.google.com [IPv6:2607:f8b0:4864:20::d32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3C08B846E for ; Thu, 1 Dec 2022 10:12:10 -0800 (PST) Received: by mail-io1-xd32.google.com with SMTP id y131so908141iof.9 for ; Thu, 01 Dec 2022 10:12:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hAGk4MFjlfW7Mob6ujVIOlyczZbXb2RyPAqfUdaDlEo=; b=Gnqziq8xRmWSBCtI5LbkgYdL2sMBi4ijnnNDCjyb1I6NyYRb4wWQRfi1KeWDcV6woX 2hBG2JA6tPkR84DL4X+rhLHSfSacd65MbGAg32TAPk9YlGxw9kcZgSYXFGNGCC0L5YgY +AuvkWHv/JFbbonRZzYiDMVHxoX9MAkrOh3UCZuIcqxQAktZruc/xq+PU1feO+g8bBS7 DfS7IYpIU+ZQSkmZeoRLSQwzB/++gpslbtWACrtpE7SHmjRbQrhtK/+tRdg7UXMzqpQw Uh8ymHGPUCn+JbALqvUi8JNEliEEGmRAUlCbzIKJhVyPNtyMey7JusEhVCtcfubPIMbD /pfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hAGk4MFjlfW7Mob6ujVIOlyczZbXb2RyPAqfUdaDlEo=; b=f38f3/pK6td5LX+lFBYxim4Z8iE6egRoBLOM2wyR9O+ncxqNNF73O54V1EhWqlbFgU ga0l121oGX64ESsIxOL+cA2zvO7e0nyTRg100wQKyNXyOEhvxSJp6CA8anWTKJh0tpAA ihShn8DC4wMFnz4AOzEVBT7edXjTo04tJgv48WlHzUNt9GJMJAMZPpTUzr21fMuUpQ1t xI6oYuoDwyoIR7uNc9/YbEyslJxsCWdDqeEK1yAKIwiwMXse/MrurmV8XEArL9JFQteR 77TAlTpfW6VFV4VhSvV2eeKVHJEVh4BPk6uwPVZyS03VjqOW3zdyRK5R2MmLbt+44/nV VNMA== X-Gm-Message-State: ANoB5plGeIMdiFmsA6GML/1X6icja7iApTwbesmObjk6CzU4gM0Dbc54 vTYNrHoOigRB+KcYQ37EmA6trNRheZYQFklv X-Google-Smtp-Source: AA0mqf43cOjadjWzOUmOk1OU4HLtoIJvOE0jX9fTTnSB5KjGk2DM0Qq0yRWwkjs/cVASjL2efMOJVA== X-Received: by 2002:a05:6638:4709:b0:389:e195:e8fb with SMTP id cs9-20020a056638470900b00389e195e8fbmr10218197jab.254.1669918330191; Thu, 01 Dec 2022 10:12:10 -0800 (PST) Received: from m1max.localdomain ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id y21-20020a027315000000b00374fe4f0bc3sm1842028jab.158.2022.12.01.10.12.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Dec 2022 10:12:09 -0800 (PST) From: Jens Axboe To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Cc: soheil@google.com, willemdebruijn.kernel@gmail.com, stefanha@redhat.com, Jens Axboe Subject: [PATCH 7/7] eventpoll: add method for configuring minimum wait on epoll context Date: Thu, 1 Dec 2022 11:11:56 -0700 Message-Id: <20221201181156.848373-8-axboe@kernel.dk> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20221201181156.848373-1-axboe@kernel.dk> References: <20221201181156.848373-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add support for EPOLL_CTL_MIN_WAIT, which can be used to define a minimum reap time for an epoll context. Basic test case: struct d { int p1, p2; }; static void *fn(void *data) { struct d *d = data; char b = 0x89; /* Generate 2 events 20 msec apart */ usleep(10000); write(d->p1, &b, sizeof(b)); usleep(10000); write(d->p2, &b, sizeof(b)); return NULL; } int main(int argc, char *argv[]) { struct epoll_event ev, events[2]; pthread_t thread; int p1[2], p2[2]; struct d d; int efd, ret; efd = epoll_create1(0); if (efd < 0) { perror("epoll_create"); return 1; } if (pipe(p1) < 0) { perror("pipe"); return 1; } if (pipe(p2) < 0) { perror("pipe"); return 1; } ev.events = EPOLLIN; ev.data.fd = p1[0]; if (epoll_ctl(efd, EPOLL_CTL_ADD, p1[0], &ev) < 0) { perror("epoll add"); return 1; } ev.events = EPOLLIN; ev.data.fd = p2[0]; if (epoll_ctl(efd, EPOLL_CTL_ADD, p2[0], &ev) < 0) { perror("epoll add"); return 1; } /* always wait 200 msec for events */ ev.data.u64 = 200000; if (epoll_ctl(efd, EPOLL_CTL_MIN_WAIT, -1, &ev) < 0) { perror("epoll add set timeout"); return 1; } d.p1 = p1[1]; d.p2 = p2[1]; pthread_create(&thread, NULL, fn, &d); /* expect to get 2 events here rather than just 1 */ ret = epoll_wait(efd, events, 2, -1); printf("epoll_wait=%d\n", ret); return 0; } If EPOLL_CTL_MIN_WAIT is used with a timeout of 0, it is a no-op, and acts the same as if it wasn't called to begin with. Only a non-zero usec delay value will result in a wait time being applied for reaping events. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 13 ++++++++++++- include/linux/eventpoll.h | 2 +- include/uapi/linux/eventpoll.h | 1 + 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index daa9885d9c2b..ec7ffce8265a 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2183,6 +2183,17 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds, */ ep = f.file->private_data; + /* + * Handle EPOLL_CTL_MIN_WAIT upfront as we don't need to care about + * the fd being passed in. + */ + if (op == EPOLL_CTL_MIN_WAIT) { + /* return old value */ + error = ep->min_wait_ts; + ep->min_wait_ts = epds->data; + goto error_fput; + } + /* Get the "struct file *" for the target file */ tf = fdget(fd); if (!tf.file) @@ -2315,7 +2326,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, { struct epoll_event epds; - if (ep_op_has_event(op) && + if ((ep_op_has_event(op) || op == EPOLL_CTL_MIN_WAIT) && copy_from_user(&epds, event, sizeof(struct epoll_event))) return -EFAULT; diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 3337745d81bd..cbef635cb7e4 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -59,7 +59,7 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds, /* Tells if the epoll_ctl(2) operation needs an event copy from userspace */ static inline int ep_op_has_event(int op) { - return op != EPOLL_CTL_DEL; + return op != EPOLL_CTL_DEL && op != EPOLL_CTL_MIN_WAIT; } #else diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h index 8a3432d0f0dc..81ecb1ca36e0 100644 --- a/include/uapi/linux/eventpoll.h +++ b/include/uapi/linux/eventpoll.h @@ -26,6 +26,7 @@ #define EPOLL_CTL_ADD 1 #define EPOLL_CTL_DEL 2 #define EPOLL_CTL_MOD 3 +#define EPOLL_CTL_MIN_WAIT 4 /* Epoll event masks */ #define EPOLLIN (__force __poll_t)0x00000001