From patchwork Fri Feb 14 17:02:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Penyaev X-Patchwork-Id: 11382655 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7F3F4930 for ; Fri, 14 Feb 2020 17:03:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6801120656 for ; Fri, 14 Feb 2020 17:03:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2394430AbgBNRCs (ORCPT ); Fri, 14 Feb 2020 12:02:48 -0500 Received: from mx2.suse.de ([195.135.220.15]:34652 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2394411AbgBNRCs (ORCPT ); Fri, 14 Feb 2020 12:02:48 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 65BD3B133; Fri, 14 Feb 2020 17:02:46 +0000 (UTC) From: Roman Penyaev Cc: Roman Penyaev , Max Neunhoeffer , Jakub Kicinski , Christopher Kohlhoff , Davidlohr Bueso , Jason Baron , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v3 1/2] epoll: fix possible lost wakeup on epoll_ctl() path Date: Fri, 14 Feb 2020 18:02:10 +0100 Message-Id: <20200214170211.561524-1-rpenyaev@suse.de> X-Mailer: git-send-email 2.24.1 MIME-Version: 1.0 To: unlisted-recipients:; (no To-header on input) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This fixes possible lost wakeup introduced by the a218cc491420. Originally modifications to ep->wq were serialized by ep->wq.lock, but in the a218cc491420 new rw lock was introduced in order to relax fd event path, i.e. callers of ep_poll_callback() function. After the change ep_modify and ep_insert (both are called on epoll_ctl() path) were switched to ep->lock, but ep_poll (epoll_wait) was using ep->wq.lock on wqueue list modification. The bug doesn't lead to any wqueue list corruptions, because wake up path and list modifications were serialized by ep->wq.lock internally, but actual waitqueue_active() check prior wake_up() call can be reordered with modifications of ep ready list, thus wake up can be lost. And yes, can be healed by explicit smp_mb(): list_add_tail(&epi->rdlink, &ep->rdllist); smp_mb(); if (waitqueue_active(&ep->wq)) wake_up(&ep->wp); But let's make it simple, thus current patch replaces ep->wq.lock with the ep->lock for wqueue modifications, thus wake up path always observes activeness of the wqueue correcty. Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention") References: https://bugzilla.kernel.org/show_bug.cgi?id=205933 Signed-off-by: Roman Penyaev Reported-by: Max Neunhoeffer Bisected-by: Max Neunhoeffer Tested-by: Max Neunhoeffer Cc: Jakub Kicinski Cc: Christopher Kohlhoff Cc: Davidlohr Bueso Cc: Jason Baron Cc: Andrew Morton Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org #5.1+ --- Nothing was changed in v3 Nothing interesting in v2: changed the comment a bit and specified Reported-by and Bisected-by tags fs/eventpoll.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index b041b66002db..eee3c92a9ebf 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1854,9 +1854,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, waiter = true; init_waitqueue_entry(&wait, current); - spin_lock_irq(&ep->wq.lock); + write_lock_irq(&ep->lock); __add_wait_queue_exclusive(&ep->wq, &wait); - spin_unlock_irq(&ep->wq.lock); + write_unlock_irq(&ep->lock); } for (;;) { @@ -1904,9 +1904,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, goto fetch_events; if (waiter) { - spin_lock_irq(&ep->wq.lock); + write_lock_irq(&ep->lock); __remove_wait_queue(&ep->wq, &wait); - spin_unlock_irq(&ep->wq.lock); + write_unlock_irq(&ep->lock); } return res; From patchwork Fri Feb 14 17:02:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roman Penyaev X-Patchwork-Id: 11382653 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 316D6109A for ; Fri, 14 Feb 2020 17:02:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 19F8B20656 for ; Fri, 14 Feb 2020 17:02:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392649AbgBNRCv (ORCPT ); Fri, 14 Feb 2020 12:02:51 -0500 Received: from mx2.suse.de ([195.135.220.15]:34684 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2394040AbgBNRCt (ORCPT ); Fri, 14 Feb 2020 12:02:49 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 18392B15A; Fri, 14 Feb 2020 17:02:48 +0000 (UTC) From: Roman Penyaev Cc: Roman Penyaev , Max Neunhoeffer , Jakub Kicinski , Christopher Kohlhoff , Davidlohr Bueso , Jason Baron , Andrew Morton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 2/2] kselftest: introduce new epoll test case Date: Fri, 14 Feb 2020 18:02:11 +0100 Message-Id: <20200214170211.561524-2-rpenyaev@suse.de> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200214170211.561524-1-rpenyaev@suse.de> References: <20200214170211.561524-1-rpenyaev@suse.de> MIME-Version: 1.0 To: unlisted-recipients:; (no To-header on input) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This testcase repeats epollbug.c from the bug: https://bugzilla.kernel.org/show_bug.cgi?id=205933 What it tests? It tests the race between epoll_ctl() and epoll_wait(). New event mask passed to epoll_ctl() triggers wake up, which can be missed because of the bug described in the link. Reproduction is 100%, so easy to fix. Kudos, Max, for wonderful test case. Signed-off-by: Roman Penyaev Cc: Max Neunhoeffer Cc: Jakub Kicinski Cc: Christopher Kohlhoff Cc: Davidlohr Bueso Cc: Jason Baron Cc: Andrew Morton Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- Nothing was changed in v3 Nothing interesting in v2: changed the comment a bit .../filesystems/epoll/epoll_wakeup_test.c | 67 ++++++++++++++++++- 1 file changed, 66 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c index 37a04dab56f0..11eee0b60040 100644 --- a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c +++ b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c @@ -7,13 +7,14 @@ #include #include #include +#include #include "../../kselftest_harness.h" struct epoll_mtcontext { int efd[3]; int sfd[4]; - int count; + volatile int count; pthread_t main; pthread_t waiter; @@ -3071,4 +3072,68 @@ TEST(epoll58) close(ctx.sfd[3]); } +static void *epoll59_thread(void *ctx_) +{ + struct epoll_mtcontext *ctx = ctx_; + struct epoll_event e; + int i; + + for (i = 0; i < 100000; i++) { + while (ctx->count == 0) + ; + + e.events = EPOLLIN | EPOLLERR | EPOLLET; + epoll_ctl(ctx->efd[0], EPOLL_CTL_MOD, ctx->sfd[0], &e); + ctx->count = 0; + } + + return NULL; +} + +/* + * t0 + * (p) \ + * e0 + * (et) / + * e0 + * + * Based on https://bugzilla.kernel.org/show_bug.cgi?id=205933 + */ +TEST(epoll59) +{ + pthread_t emitter; + struct pollfd pfd; + struct epoll_event e; + struct epoll_mtcontext ctx = { 0 }; + int i, ret; + + signal(SIGUSR1, signal_handler); + + ctx.efd[0] = epoll_create1(0); + ASSERT_GE(ctx.efd[0], 0); + + ctx.sfd[0] = eventfd(1, 0); + ASSERT_GE(ctx.sfd[0], 0); + + e.events = EPOLLIN | EPOLLERR | EPOLLET; + ASSERT_EQ(epoll_ctl(ctx.efd[0], EPOLL_CTL_ADD, ctx.sfd[0], &e), 0); + + ASSERT_EQ(pthread_create(&emitter, NULL, epoll59_thread, &ctx), 0); + + for (i = 0; i < 100000; i++) { + ret = epoll_wait(ctx.efd[0], &e, 1, 1000); + ASSERT_GT(ret, 0); + + while (ctx.count != 0) + ; + ctx.count = 1; + } + if (pthread_tryjoin_np(emitter, NULL) < 0) { + pthread_kill(emitter, SIGUSR1); + pthread_join(emitter, NULL); + } + close(ctx.efd[0]); + close(ctx.sfd[0]); +} + TEST_HARNESS_MAIN