diff mbox series

[v2,2/3] epoll: ep->wq can be woken up unlocked in certain cases

Message ID 20200210094123.389854-2-rpenyaev@suse.de (mailing list archive)
State New, archived
Headers show
Series [v2,1/3] epoll: fix possible lost wakeup on epoll_ctl() path | expand

Commit Message

Roman Penyaev Feb. 10, 2020, 9:41 a.m. UTC
Now ep->lock is responsible for wqueue serialization, thus if ep->lock
is taken on write path, wake_up_locked() can be invoked.

Though, read path is different.  Since concurrent cpus can enter the
wake up function it needs to be internally serialized, thus wake_up()
variant is used which implies internal spin lock.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Nothing interesting in v2:
     changed the comment a bit

 fs/eventpoll.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Jason Baron Feb. 10, 2020, 6:16 p.m. UTC | #1
On 2/10/20 4:41 AM, Roman Penyaev wrote:
> Now ep->lock is responsible for wqueue serialization, thus if ep->lock
> is taken on write path, wake_up_locked() can be invoked.
> 
> Though, read path is different.  Since concurrent cpus can enter the
> wake up function it needs to be internally serialized, thus wake_up()
> variant is used which implies internal spin lock.
> 
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Max Neunhoeffer <max@arangodb.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
> Cc: Davidlohr Bueso <dbueso@suse.de>
> Cc: Jason Baron <jbaron@akamai.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  Nothing interesting in v2:
>      changed the comment a bit
> 
>  fs/eventpoll.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index eee3c92a9ebf..6e218234bd4a 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1173,7 +1173,7 @@ static inline bool chain_epi_lockless(struct epitem *epi)
>   * Another thing worth to mention is that ep_poll_callback() can be called
>   * concurrently for the same @epi from different CPUs if poll table was inited
>   * with several wait queues entries.  Plural wakeup from different CPUs of a
> - * single wait queue is serialized by wq.lock, but the case when multiple wait
> + * single wait queue is serialized by ep->lock, but the case when multiple wait
>   * queues are used should be detected accordingly.  This is detected using
>   * cmpxchg() operation.
>   */
> @@ -1248,6 +1248,12 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
>  				break;
>  			}
>  		}
> +		/*
> +		 * Since here we have the read lock (ep->lock) taken, plural
> +		 * wakeup from different CPUs can occur, thus we call wake_up()
> +		 * variant which implies its own lock on wqueue. All other paths
> +		 * take write lock.
> +		 */
>  		wake_up(&ep->wq);
>  	}
>  	if (waitqueue_active(&ep->poll_wait))
> @@ -1551,7 +1557,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
>  
>  		/* Notify waiting tasks that events are available */
>  		if (waitqueue_active(&ep->wq))
> -			wake_up(&ep->wq);
> +			wake_up_locked(&ep->wq);


So I think this will now hit the 'lockdep_assert_held()' in
__wake_up_common()? I agree that its correct, but I think it will
confuse lockdep here...

Thanks,

-Jason
Roman Penyaev Feb. 10, 2020, 7:31 p.m. UTC | #2
On 2020-02-10 19:16, Jason Baron wrote:
> On 2/10/20 4:41 AM, Roman Penyaev wrote:
>> Now ep->lock is responsible for wqueue serialization, thus if ep->lock
>> is taken on write path, wake_up_locked() can be invoked.
>> 
>> Though, read path is different.  Since concurrent cpus can enter the
>> wake up function it needs to be internally serialized, thus wake_up()
>> variant is used which implies internal spin lock.
>> 
>> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
>> Cc: Max Neunhoeffer <max@arangodb.com>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
>> Cc: Davidlohr Bueso <dbueso@suse.de>
>> Cc: Jason Baron <jbaron@akamai.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: linux-fsdevel@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>  Nothing interesting in v2:
>>      changed the comment a bit
>> 
>>  fs/eventpoll.c | 12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>> 
>> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>> index eee3c92a9ebf..6e218234bd4a 100644
>> --- a/fs/eventpoll.c
>> +++ b/fs/eventpoll.c
>> @@ -1173,7 +1173,7 @@ static inline bool chain_epi_lockless(struct 
>> epitem *epi)
>>   * Another thing worth to mention is that ep_poll_callback() can be 
>> called
>>   * concurrently for the same @epi from different CPUs if poll table 
>> was inited
>>   * with several wait queues entries.  Plural wakeup from different 
>> CPUs of a
>> - * single wait queue is serialized by wq.lock, but the case when 
>> multiple wait
>> + * single wait queue is serialized by ep->lock, but the case when 
>> multiple wait
>>   * queues are used should be detected accordingly.  This is detected 
>> using
>>   * cmpxchg() operation.
>>   */
>> @@ -1248,6 +1248,12 @@ static int ep_poll_callback(wait_queue_entry_t 
>> *wait, unsigned mode, int sync, v
>>  				break;
>>  			}
>>  		}
>> +		/*
>> +		 * Since here we have the read lock (ep->lock) taken, plural
>> +		 * wakeup from different CPUs can occur, thus we call wake_up()
>> +		 * variant which implies its own lock on wqueue. All other paths
>> +		 * take write lock.
>> +		 */
>>  		wake_up(&ep->wq);
>>  	}
>>  	if (waitqueue_active(&ep->poll_wait))
>> @@ -1551,7 +1557,7 @@ static int ep_insert(struct eventpoll *ep, const 
>> struct epoll_event *event,
>> 
>>  		/* Notify waiting tasks that events are available */
>>  		if (waitqueue_active(&ep->wq))
>> -			wake_up(&ep->wq);
>> +			wake_up_locked(&ep->wq);
> 
> 
> So I think this will now hit the 'lockdep_assert_held()' in
> __wake_up_common()? I agree that its correct, but I think it will
> confuse lockdep here...

Argh! True. And I do not see any neat way to shut up lockdep here
(Calling lock_acquire() manually seems not an option for such minor
thing).

Then this optimization is not needed, patch is cancelled.

Thanks for noting that.

--
Roman
diff mbox series

Patch

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index eee3c92a9ebf..6e218234bd4a 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1173,7 +1173,7 @@  static inline bool chain_epi_lockless(struct epitem *epi)
  * Another thing worth to mention is that ep_poll_callback() can be called
  * concurrently for the same @epi from different CPUs if poll table was inited
  * with several wait queues entries.  Plural wakeup from different CPUs of a
- * single wait queue is serialized by wq.lock, but the case when multiple wait
+ * single wait queue is serialized by ep->lock, but the case when multiple wait
  * queues are used should be detected accordingly.  This is detected using
  * cmpxchg() operation.
  */
@@ -1248,6 +1248,12 @@  static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
 				break;
 			}
 		}
+		/*
+		 * Since here we have the read lock (ep->lock) taken, plural
+		 * wakeup from different CPUs can occur, thus we call wake_up()
+		 * variant which implies its own lock on wqueue. All other paths
+		 * take write lock.
+		 */
 		wake_up(&ep->wq);
 	}
 	if (waitqueue_active(&ep->poll_wait))
@@ -1551,7 +1557,7 @@  static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
 
 		/* Notify waiting tasks that events are available */
 		if (waitqueue_active(&ep->wq))
-			wake_up(&ep->wq);
+			wake_up_locked(&ep->wq);
 		if (waitqueue_active(&ep->poll_wait))
 			pwake++;
 	}
@@ -1657,7 +1663,7 @@  static int ep_modify(struct eventpoll *ep, struct epitem *epi,
 
 			/* Notify waiting tasks that events are available */
 			if (waitqueue_active(&ep->wq))
-				wake_up(&ep->wq);
+				wake_up_locked(&ep->wq);
 			if (waitqueue_active(&ep->poll_wait))
 				pwake++;
 		}