From patchwork Wed Oct 28 18:02:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Soheil Hassas Yeganeh X-Patchwork-Id: 11863429 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ADDC361C for ; Wed, 28 Oct 2020 22:00:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8410924759 for ; Wed, 28 Oct 2020 22:00:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BoCFMRz4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729399AbgJ1WAL (ORCPT ); Wed, 28 Oct 2020 18:00:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729396AbgJ1WAK (ORCPT ); Wed, 28 Oct 2020 18:00:10 -0400 Received: from mail-pl1-x642.google.com (mail-pl1-x642.google.com [IPv6:2607:f8b0:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF194C0613CF; Wed, 28 Oct 2020 15:00:10 -0700 (PDT) Received: by mail-pl1-x642.google.com with SMTP id x23so301109plr.6; Wed, 28 Oct 2020 15:00:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=CwYcjUQ9n0ZM+r827p0TWJkTogeEds8umhVXpQGywJI=; b=BoCFMRz4ffT0VFvro9CSY2MtQzfcrZXLSRRBIMQscg6M/ym8yVi7FIbqr6izdKht+N weYRAzuZ+jpLU81fTDpdYa6atx55gTvqvL2cClz+uhHc2D0zcMxLuoUy6nnrJ9YrECXD RQCOW8IeTVlrg7oTG3cdtIdNOj9G3kPEs3Okz9Kq3p++qdI+Vjsn6GUoIfjz3uQ2i/v9 sEFvZFyGXWG0De469unrtLvDUQuol4shhhSH62mAeK2UtrGFekEakmnwiPSeOQAoXXlr FPYt92EtK/tQu2Pt2spAcZSeE2Lg3vuVR0G/CTCYYYRCu/erze61WD9pkMtGuEZxy9B3 uSAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=CwYcjUQ9n0ZM+r827p0TWJkTogeEds8umhVXpQGywJI=; b=duVBU3ZzflRyn4N7VssUCSzIqFeFkcnF6K1BOlx8rQp8mE+cYOrts8vM3jfqAWVj/5 ZXceaYvsNXUAAImhG+npKEsCj2p2s7CPaHyOPYClDUpeLvxdWCCoIkiObiOJoeJhsijI pLNvWm5+DvZcfXGQV3SvFc0UeIMA5YL6DY9mUiySW3zF2T3ONhHH8YKFf9SNQ4VjeuMU psw5W48FNT8ZLSfT/k4M5HbJNqN8G3V3uQCv6NSRjII3GOsf6cdzkakFhxShlex9ecn6 SOS4lPTEyrxGuLE72bUxKe7IHHrKGV5NdgYl7JIojwl2Jw9LN5ayqHxmQovxCn2QXSvm hghw== X-Gm-Message-State: AOAM530CPbx84+xDFwwvnpn2QyD6JQpBG4wX2KNjOh66SLYtIym7tsVu tfED0iySssfW+IO6GzYWeUVxGHOEIaY= X-Google-Smtp-Source: ABdhPJxH9u5TCOT2f00qm+z9/5Z98BTojmCESoV9x4IrB6x4AMCfgHtLspS3XeKRF+y+hixfUN7TXg== X-Received: by 2002:a0c:b65b:: with SMTP id q27mr586836qvf.8.1603908132941; Wed, 28 Oct 2020 11:02:12 -0700 (PDT) Received: from soheil4.nyc.corp.google.com ([2620:0:1003:312:a6ae:11ff:fe18:6946]) by smtp.gmail.com with ESMTPSA id o2sm65054qkk.121.2020.10.28.11.02.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Oct 2020 11:02:12 -0700 (PDT) From: Soheil Hassas Yeganeh To: viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, dave@stgolabs.net, Soheil Hassas Yeganeh , Guantao Liu , Eric Dumazet , Willem de Bruijn , Khazhismel Kumykov Subject: [PATCH 1/2] epoll: check ep_events_available() upon timeout Date: Wed, 28 Oct 2020 14:02:01 -0400 Message-Id: <20201028180202.952079-1-soheil.kdev@gmail.com> X-Mailer: git-send-email 2.29.0.rc2.309.g374f81d7ae-goog MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Soheil Hassas Yeganeh After abc610e01c66, we break out of the ep_poll loop upon timeout, without checking whether there is any new events available. Prior to that patch-series we always called ep_events_available() after exiting the loop. This can cause races and missed wakeups. For example, consider the following scenario reported by Guantao Liu: Suppose we have an eventfd added using EPOLLET to an epollfd. Thread 1: Sleeps for just below 5ms and then writes to an eventfd. Thread 2: Calls epoll_wait with a timeout of 5 ms. If it sees an event of the eventfd, it will write back on that fd. Thread 3: Calls epoll_wait with a negative timeout. Prior to abc610e01c66, it is guaranteed that Thread 3 will wake up either by Thread 1 or Thread 2. After abc610e01c66, Thread 3 can be blocked indefinitely if Thread 2 sees a timeout right before the write to the eventfd by Thread 1. Thread 2 will be woken up from schedule_hrtimeout_range and, with evail 0, it will not call ep_send_events(). To fix this issue, while holding the lock, try to remove the thread that timed out the wait queue and check whether it was woken up or not. Fixes: abc610e01c66 ("fs/epoll: avoid barrier after an epoll_wait(2) timeout") Reported-by: Guantao Liu Tested-by: Guantao Liu Signed-off-by: Soheil Hassas Yeganeh Reviewed-by: Eric Dumazet Acked-by: Willem de Bruijn Reviewed-by: Khazhismel Kumykov Cc: Davidlohr Bueso --- fs/eventpoll.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 4df61129566d..11388436b85a 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1907,7 +1907,21 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) { timed_out = 1; - break; + __set_current_state(TASK_RUNNING); + /* + * Acquire the lock and try to remove this thread from + * the wait queue. If this thread is not on the wait + * queue, it has woken up after its timeout ended + * before it could re-acquire the lock. In that case, + * try to harvest some events. + */ + write_lock_irq(&ep->lock); + if (!list_empty(&wait.entry)) + __remove_wait_queue(&ep->wq, &wait); + else + eavail = 1; + write_unlock_irq(&ep->lock); + goto send_events; } /* We were woken up, thus go and try to harvest some events */ From patchwork Wed Oct 28 18:02:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Soheil Hassas Yeganeh X-Patchwork-Id: 11864731 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7838C697 for ; Thu, 29 Oct 2020 01:04:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5305F20796 for ; Thu, 29 Oct 2020 01:04:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dTSxd0Al" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404125AbgJ2BE3 (ORCPT ); Wed, 28 Oct 2020 21:04:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728662AbgJ1WGh (ORCPT ); Wed, 28 Oct 2020 18:06:37 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE4CAC0613CF; Wed, 28 Oct 2020 15:06:36 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id r7so512215qkf.3; Wed, 28 Oct 2020 15:06:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Je1s1kkyt4+Sk3cNzvubNo1nCKh/m8Vq6dPCfCCBkyo=; b=dTSxd0AlKUrqznvkilblmcag8TGcwXnQBOUYPH+YjyDnYpqa5l79kWv9xtGnlcfEg4 0h5+E7FcTSYl+PVNOwTzsxfEQK9z/MTfFaaAGoVb7SHz8W0AOfRu5XBwLWJnZS00oDuG jS+fZzamMtUJESkP7okgED5M2/zgKkFH7lKRQt42ekyoBWWf/NVYpG4DYN6AQoGVEk7S 2iY1R/7VKt7a1ek/S4+4A35gFqGqi3Bi0Q76oVC9NxjlFmQ2t6NocI6yXOLzUFh/LAfx bYOjASOli5G9ToeVw0qmW7EhMy9jJ9tNJ3taNIocb7GkYt1U28X8IH4znWmzH6zCySdb XtLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Je1s1kkyt4+Sk3cNzvubNo1nCKh/m8Vq6dPCfCCBkyo=; b=V6roh5IzEl/4KDMGeQi5h6hRWAGlhWiJ6FpUUB9qrrLtiHm0L1Ho4laXXNwFCUg/oP bcv5+KPV+rUMfZakvlL2ZIy7GGlQ2Cu+jZxHM74CVydaGH1R4fduePrPnTPgSfKrdu+z HYU/FvQvNRhBJrnacqIFbjEdZqF+l84WntULPlYxDmkfn//6PiQxv57ZHwNeolDUOoHd j6mLRt4dThgcCDNTTXTmAxzH/b0DxHSwmwAspuHbSraynAvcoviV+iaE6Oa95M280km1 KTiYIhPy7+ibtVwNLECsTNEg8h/FwszRL6S6kILeqjCffJRaFytxhDo+R4yF37cBB2K/ 6jfw== X-Gm-Message-State: AOAM531INlC+mkbUG/ECva/kpgjr0gjhf0spup+MmJkazvcBHdiU3QHf VgJy29CIR5Rb1aXnoznnU2T7XturmuA= X-Google-Smtp-Source: ABdhPJyuSBJ63bwvwJqI/Z4pLE6IUxC6rQn1z+b1IwljXAlCJSfx04tsV2xM9z1LqB4BaN4eT5eu7g== X-Received: by 2002:aed:2982:: with SMTP id o2mr73310qtd.73.1603908137202; Wed, 28 Oct 2020 11:02:17 -0700 (PDT) Received: from soheil4.nyc.corp.google.com ([2620:0:1003:312:a6ae:11ff:fe18:6946]) by smtp.gmail.com with ESMTPSA id o2sm65054qkk.121.2020.10.28.11.02.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Oct 2020 11:02:16 -0700 (PDT) From: Soheil Hassas Yeganeh To: viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org, dave@stgolabs.net, Soheil Hassas Yeganeh , Guantao Liu , Eric Dumazet , Willem de Bruijn , Khazhismel Kumykov Subject: [PATCH 2/2] epoll: add a selftest for epoll timeout race Date: Wed, 28 Oct 2020 14:02:02 -0400 Message-Id: <20201028180202.952079-2-soheil.kdev@gmail.com> X-Mailer: git-send-email 2.29.0.rc2.309.g374f81d7ae-goog In-Reply-To: <20201028180202.952079-1-soheil.kdev@gmail.com> References: <20201028180202.952079-1-soheil.kdev@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Soheil Hassas Yeganeh Add a test case to ensure an event is observed by at least one poller when an epoll timeout is used. Signed-off-by: Guantao Liu Signed-off-by: Soheil Hassas Yeganeh Reviewed-by: Eric Dumazet Acked-by: Willem de Bruijn Reviewed-by: Khazhismel Kumykov --- .../filesystems/epoll/epoll_wakeup_test.c | 95 +++++++++++++++++++ 1 file changed, 95 insertions(+) diff --git a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c index d979ff14775a..8f82f99f7748 100644 --- a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c +++ b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c @@ -3282,4 +3282,99 @@ TEST(epoll60) close(ctx.epfd); } +struct epoll61_ctx { + int epfd; + int evfd; +}; + +static void *epoll61_write_eventfd(void *ctx_) +{ + struct epoll61_ctx *ctx = ctx_; + int64_t l = 1; + + usleep(10950); + write(ctx->evfd, &l, sizeof(l)); + return NULL; +} + +static void *epoll61_epoll_with_timeout(void *ctx_) +{ + struct epoll61_ctx *ctx = ctx_; + struct epoll_event events[1]; + int n; + + n = epoll_wait(ctx->epfd, events, 1, 11); + /* + * If epoll returned the eventfd, write on the eventfd to wake up the + * blocking poller. + */ + if (n == 1) { + int64_t l = 1; + + write(ctx->evfd, &l, sizeof(l)); + } + return NULL; +} + +static void *epoll61_blocking_epoll(void *ctx_) +{ + struct epoll61_ctx *ctx = ctx_; + struct epoll_event events[1]; + + epoll_wait(ctx->epfd, events, 1, -1); + return NULL; +} + +TEST(epoll61) +{ + struct epoll61_ctx ctx; + struct epoll_event ev; + int i, r; + + ctx.epfd = epoll_create1(0); + ASSERT_GE(ctx.epfd, 0); + ctx.evfd = eventfd(0, EFD_NONBLOCK); + ASSERT_GE(ctx.evfd, 0); + + ev.events = EPOLLIN | EPOLLET | EPOLLERR | EPOLLHUP; + ev.data.ptr = NULL; + r = epoll_ctl(ctx.epfd, EPOLL_CTL_ADD, ctx.evfd, &ev); + ASSERT_EQ(r, 0); + + /* + * We are testing a race. Repeat the test case 1000 times to make it + * more likely to fail in case of a bug. + */ + for (i = 0; i < 1000; i++) { + pthread_t threads[3]; + int n; + + /* + * Start 3 threads: + * Thread 1 sleeps for 10.9ms and writes to the evenfd. + * Thread 2 calls epoll with a timeout of 11ms. + * Thread 3 calls epoll with a timeout of -1. + * + * The eventfd write by Thread 1 should either wakeup Thread 2 + * or Thread 3. If it wakes up Thread 2, Thread 2 writes on the + * eventfd to wake up Thread 3. + * + * If no events are missed, all three threads should eventually + * be joinable. + */ + ASSERT_EQ(pthread_create(&threads[0], NULL, + epoll61_write_eventfd, &ctx), 0); + ASSERT_EQ(pthread_create(&threads[1], NULL, + epoll61_epoll_with_timeout, &ctx), 0); + ASSERT_EQ(pthread_create(&threads[2], NULL, + epoll61_blocking_epoll, &ctx), 0); + + for (n = 0; n < ARRAY_SIZE(threads); ++n) + ASSERT_EQ(pthread_join(threads[n], NULL), 0); + } + + close(ctx.epfd); + close(ctx.evfd); +} + TEST_HARNESS_MAIN