diff mbox series

eventpoll: fix missing wakeup for ovflist in ep_poll_callback

Message ID 20200424025057.118641-1-khazhy@google.com (mailing list archive)
State New, archived
Headers show
Series eventpoll: fix missing wakeup for ovflist in ep_poll_callback | expand

Commit Message

Khazhy Kumykov April 24, 2020, 2:50 a.m. UTC
In the event that we add to ovflist, before 339ddb53d373 we would be
woken up by ep_scan_ready_list, and did no wakeup in ep_poll_callback.
With that wakeup removed, if we add to ovflist here, we may never wake
up. Rather than adding back the ep_scan_ready_list wakeup - which was
resulting un uncessary wakeups, trigger a wake-up in ep_poll_callback.

We noticed that one of our workloads was missing wakeups starting with
339ddb53d373 and upon manual inspection, this wakeup seemed missing to
me. With this patch added, we no longer see missing wakeups. I haven't
yet tried to make a small reproducer, but the existing kselftests in
filesystem/epoll passed for me with this patch.

Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
---
 fs/eventpoll.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Andrew Morton April 24, 2020, 3:31 a.m. UTC | #1
On Thu, 23 Apr 2020 19:50:57 -0700 Khazhismel Kumykov <khazhy@google.com> wrote:

> In the event that we add to ovflist, before 339ddb53d373 we would be
> woken up by ep_scan_ready_list, and did no wakeup in ep_poll_callback.
> With that wakeup removed, if we add to ovflist here, we may never wake
> up. Rather than adding back the ep_scan_ready_list wakeup - which was
> resulting un uncessary wakeups, trigger a wake-up in ep_poll_callback.
> 
> We noticed that one of our workloads was missing wakeups starting with
> 339ddb53d373 and upon manual inspection, this wakeup seemed missing to
> me. With this patch added, we no longer see missing wakeups. I haven't
> yet tried to make a small reproducer, but the existing kselftests in
> filesystem/epoll passed for me with this patch.

I'm no longer familiar with this code, so I'll await input from others.

> Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")
> Signed-off-by: Khazhismel Kumykov <khazhy@google.com>

However it sounds like a cc:stable would be warranted here, so that
earlier affected kernels get the fix?
Roman Penyaev April 24, 2020, 10:11 a.m. UTC | #2
Hi Khazhismel,

That seems to be correct. The patch you refer 339ddb53d373
relies on callback path, which *should* wake up, not the path
which harvests events (thus unnecessary wakeups).  When we add
a new event to the ->ovflist nobody wakes up the waiters,
thus missing wakeup. You are right.

May I suggest a small change in order to avoid one new goto?
We can add a new event in either ->ovflist or ->rdllist and
then wakeup should happen. So simple 'else if' branch should
do things right, something like the following:

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 8c596641a72b..7d566667c6ad 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1171,6 +1171,10 @@ static inline bool chain_epi_lockless(struct 
epitem *epi)
  {
         struct eventpoll *ep = epi->ep;

+       /* Fast preliminary check */
+       if (epi->next != EP_UNACTIVE_PTR)
+               return false;
+
         /* Check that the same epi has not been just chained from 
another CPU */
         if (cmpxchg(&epi->next, EP_UNACTIVE_PTR, NULL) != 
EP_UNACTIVE_PTR)
                 return false;
@@ -1237,16 +1241,13 @@ static int ep_poll_callback(wait_queue_entry_t 
*wait, unsigned mode, int sync, v
          * chained in ep->ovflist and requeued later on.
          */
         if (READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR) {
-               if (epi->next == EP_UNACTIVE_PTR &&
-                   chain_epi_lockless(epi))
+               if (chain_epi_lockless(epi))
                         ep_pm_stay_awake_rcu(epi);
-               goto out_unlock;
         }
-
-       /* If this file is already in the ready list we exit soon */
-       if (!ep_is_linked(epi) &&
-           list_add_tail_lockless(&epi->rdllink, &ep->rdllist)) {
-               ep_pm_stay_awake_rcu(epi);
+       /* Otherwise take usual path and add event to ready list */
+       else if (!ep_is_linked(epi)) {
+               if (list_add_tail_lockless(&epi->rdllink, &ep->rdllist))
+                       ep_pm_stay_awake_rcu(epi);
         }


I also moved 'epi->next == EP_UNACTIVE_PTR' check directly
to the chain_epi_lockless, where it should be.

This is minor, of course, you are free to keep it as is.

Reviewed-by: Roman Penyaev <rpenyaev@suse.de>

--
Roman


On 2020-04-24 04:50, Khazhismel Kumykov wrote:
> In the event that we add to ovflist, before 339ddb53d373 we would be
> woken up by ep_scan_ready_list, and did no wakeup in ep_poll_callback.
> With that wakeup removed, if we add to ovflist here, we may never wake
> up. Rather than adding back the ep_scan_ready_list wakeup - which was
> resulting un uncessary wakeups, trigger a wake-up in ep_poll_callback.
> 
> We noticed that one of our workloads was missing wakeups starting with
> 339ddb53d373 and upon manual inspection, this wakeup seemed missing to
> me. With this patch added, we no longer see missing wakeups. I haven't
> yet tried to make a small reproducer, but the existing kselftests in
> filesystem/epoll passed for me with this patch.
> 
> Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested 
> epoll")
> Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
> ---
>  fs/eventpoll.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 8c596641a72b..40cc89559cf6 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1240,7 +1240,7 @@ static int ep_poll_callback(wait_queue_entry_t
> *wait, unsigned mode, int sync, v
>  		if (epi->next == EP_UNACTIVE_PTR &&
>  		    chain_epi_lockless(epi))
>  			ep_pm_stay_awake_rcu(epi);
> -		goto out_unlock;
> +		goto out_wakeup_unlock;
>  	}
> 
>  	/* If this file is already in the ready list we exit soon */
> @@ -1249,6 +1249,7 @@ static int ep_poll_callback(wait_queue_entry_t
> *wait, unsigned mode, int sync, v
>  		ep_pm_stay_awake_rcu(epi);
>  	}
> 
> +out_wakeup_unlock:
>  	/*
>  	 * Wake up ( if active ) both the eventpoll wait list and the 
> ->poll()
>  	 * wait list.
Khazhy Kumykov April 24, 2020, 6:50 p.m. UTC | #3
On Fri, Apr 24, 2020 at 3:11 AM Roman Penyaev <rpenyaev@suse.de> wrote:
>
> Hi Khazhismel,
>
> That seems to be correct. The patch you refer 339ddb53d373
> relies on callback path, which *should* wake up, not the path
> which harvests events (thus unnecessary wakeups).  When we add
> a new event to the ->ovflist nobody wakes up the waiters,
> thus missing wakeup. You are right.
>
> May I suggest a small change in order to avoid one new goto?
> We can add a new event in either ->ovflist or ->rdllist and
> then wakeup should happen. So simple 'else if' branch should
> do things right, something like the following:
>

Thanks for the review! I agree, I'll send a v2 without new goto
diff mbox series

Patch

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 8c596641a72b..40cc89559cf6 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1240,7 +1240,7 @@  static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
 		if (epi->next == EP_UNACTIVE_PTR &&
 		    chain_epi_lockless(epi))
 			ep_pm_stay_awake_rcu(epi);
-		goto out_unlock;
+		goto out_wakeup_unlock;
 	}
 
 	/* If this file is already in the ready list we exit soon */
@@ -1249,6 +1249,7 @@  static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
 		ep_pm_stay_awake_rcu(epi);
 	}
 
+out_wakeup_unlock:
 	/*
 	 * Wake up ( if active ) both the eventpoll wait list and the ->poll()
 	 * wait list.