diff mbox series

[1/1] epoll: remove wrong assert that ep_poll_callback is always called with irqs off

Message ID 20190108100121.20247-1-rpenyaev@suse.de (mailing list archive)
State New, archived
Headers show
Series [1/1] epoll: remove wrong assert that ep_poll_callback is always called with irqs off | expand

Commit Message

Roman Penyaev Jan. 8, 2019, 10:01 a.m. UTC
That was wrong assumption that all drivers disable irqs before waking up
a wait queue.  Even assert line is removed the whole logic stays correct:
epoll always locks rwlock with irqs disabled and by itself does not call
from interrupts, thus it is up to driver how to call wake_up_locked(),
because if driver does not handle any interrupts (like fuse in the the
report) of course it is safe on its side to take a simple spin_lock.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Reported-by: syzbot+aea82bf9ee6ffd9a79d9@syzkaller.appspotmail.com
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 fs/eventpoll.c | 6 ------
 1 file changed, 6 deletions(-)

Comments

Roman Penyaev Jan. 8, 2019, 12:42 p.m. UTC | #1
On 2019-01-08 11:01, Roman Penyaev wrote:
> That was wrong assumption that all drivers disable irqs before waking 
> up
> a wait queue.  Even assert line is removed the whole logic stays 
> correct:
> epoll always locks rwlock with irqs disabled and by itself does not 
> call
> from interrupts, thus it is up to driver how to call wake_up_locked(),
> because if driver does not handle any interrupts (like fuse in the the
> report) of course it is safe on its side to take a simple spin_lock.

This is wrong and can lead to dead lock: we always call read_lock(), 
caller
can call us with irqs enabled.  Another driver, which also calls
ep_poll_callback(), can be called from interrupt context (irqs disabled)
thus it can interrupt the one who does not disable irqs.  Even we take
a read_lock() (which should be fine to interrupt), write_lock(), which
comes in the middle, can cause a dead lock:

#CPU0                      #CPU1

task:                      task:                       irq:

                            spin_lock(&wq1->lock);
                            ep_poll_callback():
                            read_lock(&ep->lock)
                            ....
write_lock_irq(&ep->lock)  ....
   #waits reads             ....          >>>>>>>>>>>>>>  IRQ CPU1
                                                       
spin_lock_irqsave(&wq2->lock)
                                                       
ep_poll_callback():
                                                       
read_lock(&ep->lock);
                                                       # to avoid write 
starve should
                                                       # wait writer to 
finish, thus
                                                       # dead lock


What we can do:

a) disable irqs if we are not in interrupt.
b) revert the patch completely.

David, is it really crucial in terms of performance to avoid double
local_irq_save() on Xen on this ep_poll_callback() hot path?

For example why not to do the following:

   if (!in_interrupt())
        local_irq_save(flags);
   read_lock(ep->lock);

with huge comment explaining performance number.

Or just give up and simply revert the original patch completely
and always call read_lock_irqsave().

--
Roman
Davidlohr Bueso Jan. 8, 2019, 3:16 p.m. UTC | #2
On 2019-01-08 04:42, Roman Penyaev wrote:
> What we can do:
> 
> a) disable irqs if we are not in interrupt.
> b) revert the patch completely.
> 
> David, is it really crucial in terms of performance to avoid double
> local_irq_save() on Xen on this ep_poll_callback() hot path?

Note that such optimizations are also relevant for baremetal, ie: x86 
PUSHF + POPF can be pretty expensive because of insn dependencies.

> 
> For example why not to do the following:
> 
>   if (!in_interrupt())
>        local_irq_save(flags);
>   read_lock(ep->lock);
> 
> with huge comment explaining performance number.
> 
> Or just give up and simply revert the original patch completely
> and always call read_lock_irqsave().

Yeah so the reason why I had done the other epoll lock irq optimizations 
was because they were painfully obvious. ep_poll_callback(), however is 
a different beast, as you've encountered. I vote for not shooting 
ourselves in the foot and just dropping this patch -- most large 
performance benefits will come from microbenches anyway.

Thanks,
Davidlohr
Roman Penyaev Jan. 8, 2019, 4:07 p.m. UTC | #3
On 2019-01-08 16:16, Davidlohr Bueso wrote:
> On 2019-01-08 04:42, Roman Penyaev wrote:
>> What we can do:
>> 
>> a) disable irqs if we are not in interrupt.
>> b) revert the patch completely.
>> 
>> David, is it really crucial in terms of performance to avoid double
>> local_irq_save() on Xen on this ep_poll_callback() hot path?
> 
> Note that such optimizations are also relevant for baremetal, ie: x86
> PUSHF + POPF can be pretty expensive because of insn dependencies.
> 
>> 
>> For example why not to do the following:
>> 
>>   if (!in_interrupt())
>>        local_irq_save(flags);
>>   read_lock(ep->lock);
>> 
>> with huge comment explaining performance number.
>> 
>> Or just give up and simply revert the original patch completely
>> and always call read_lock_irqsave().
> 
> Yeah so the reason why I had done the other epoll lock irq
> optimizations was because they were painfully obvious.
> ep_poll_callback(), however is a different beast, as you've
> encountered. I vote for not shooting ourselves in the foot and just
> dropping this patch -- most large performance benefits will come from
> microbenches anyway.

Then I will send another patch which changes read_lock() on 
read_lock_irqsave(),
since simple revert of the original patch won't work.

Thanks.

--
Roman
diff mbox series

Patch

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f307c8679027..f5f88250cdf2 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1217,12 +1217,6 @@  static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
 	__poll_t pollflags = key_to_poll(key);
 	int ewake = 0;
 
-	/*
-	 * Called by irq context or interrupts are disabled by the wake_up_*poll
-	 * callers.
-	 */
-	lockdep_assert_irqs_disabled();
-
 	read_lock(&ep->lock);
 
 	ep_set_busy_poll_napi_id(epi);