Message ID | 20201106231635.3528496-2-soheil.kdev@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | simplify ep_poll | expand |
On Fri, 06 Nov 2020, Soheil Hassas Yeganeh wrote: >From: Soheil Hassas Yeganeh <soheil@google.com> > >After abc610e01c66 ("fs/epoll: avoid barrier after an epoll_wait(2) >timeout"), we break out of the ep_poll loop upon timeout, without checking >whether there is any new events available. Prior to that patch-series we >always called ep_events_available() after exiting the loop. > >This can cause races and missed wakeups. For example, consider the >following scenario reported by Guantao Liu: > >Suppose we have an eventfd added using EPOLLET to an epollfd. > >Thread 1: Sleeps for just below 5ms and then writes to an eventfd. >Thread 2: Calls epoll_wait with a timeout of 5 ms. If it sees an > event of the eventfd, it will write back on that fd. >Thread 3: Calls epoll_wait with a negative timeout. > >Prior to abc610e01c66, it is guaranteed that Thread 3 will wake up either >by Thread 1 or Thread 2. After abc610e01c66, Thread 3 can be blocked >indefinitely if Thread 2 sees a timeout right before the write to the >eventfd by Thread 1. Thread 2 will be woken up from >schedule_hrtimeout_range and, with evail 0, it will not call >ep_send_events(). > >To fix this issue: >1) Simplify the timed_out case as suggested by Linus. >2) while holding the lock, recheck whether the thread was woken up > after its time out has reached. > >Note that (2) is different from Linus' original suggestion: It do not >set "eavail = ep_events_available(ep)" to avoid unnecessary contention >(when there are too many timed-out threads and a small number of events), >as well as races mentioned in the discussion thread. > >This is the first patch in the series so that the backport to stable >releases is straightforward. > >Link: https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com >Fixes: abc610e01c66 ("fs/epoll: avoid barrier after an epoll_wait(2) timeout") >Tested-by: Guantao Liu <guantaol@google.com> >Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> >Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> >Reported-by: Guantao Liu <guantaol@google.com> >Reviewed-by: Eric Dumazet <edumazet@google.com> >Reviewed-by: Willem de Bruijn <willemb@google.com> >Reviewed-by: Khazhismel Kumykov <khazhy@google.com> Thanks for providing the fix and a testcase. Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 4df61129566d..117b1c395ae4 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1902,23 +1902,30 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, } write_unlock_irq(&ep->lock); - if (eavail || res) - break; + if (!eavail && !res) + timed_out = !schedule_hrtimeout_range(to, slack, + HRTIMER_MODE_ABS); - if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) { - timed_out = 1; - break; - } - - /* We were woken up, thus go and try to harvest some events */ + /* + * We were woken up, thus go and try to harvest some events. + * If timed out and still on the wait queue, recheck eavail + * carefully under lock, below. + */ eavail = 1; - } while (0); __set_current_state(TASK_RUNNING); if (!list_empty_careful(&wait.entry)) { write_lock_irq(&ep->lock); + /* + * If the thread timed out and is not on the wait queue, it + * means that the thread was woken up after its timeout expired + * before it could reacquire the lock. Thus, when wait.entry is + * empty, it needs to harvest events. + */ + if (timed_out) + eavail = list_empty(&wait.entry); __remove_wait_queue(&ep->wq, &wait); write_unlock_irq(&ep->lock); }