Message ID | 20250304154410.GB5756@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | pipes && EPOLLET, again | expand |
On Tue, 4 Mar 2025 at 05:45, Oleg Nesterov <oleg@redhat.com> wrote: > > Linus, > > On 03/04, Oleg Nesterov wrote: > > > > and we need to cleanup the poll_usage > > logic first. > > We have already discussed this before, I'll probably do this later, > but lets forget it for now. > > Don't we need the trivial one-liner below anyway? See this email of mine: https://lore.kernel.org/all/CAHk-=wiCRwRFi0kGwd_Uv+Xv4HOB-ivHyUp9it6CNSmrKT4gOA@mail.gmail.com/ and the last paragraph in particular. The whole "poll_usage" thing is a kernel *hack* to deal with broken user space that expected garbage semantics that aren't real, and were never real. They were a random implementation detail, and they were *wrong*. But because we have a strict "we don't break user space", we introduced that completely bogus hack to say "ok, we'll send these completely wrong extraneous events despite the fact that nothing has changed, because some broken user space program was written to expect them". > I am not saying this is a bug, but please consider That program is buggy, and we're not adding new hacks for new bugs. If you ask for an edge-triggered EPOLL event, you get an *edge* triggered EPOLL event. And there is no edge - the pipe was already readable, no edge anywhere in sight. So to clarify: we added the hack to work around *existing* user space bugs that happened to work before. If anything, we might consider removing the crazy "poll_usage" hack in the (probably futile) hope that user space has been fixed. But we're not encouraging *new* bugs. Linus
On 03/04, Linus Torvalds wrote: > > On Tue, 4 Mar 2025 at 05:45, Oleg Nesterov <oleg@redhat.com> wrote: > > > > Don't we need the trivial one-liner below anyway? > > See this email of mine: > > https://lore.kernel.org/all/CAHk-=wiCRwRFi0kGwd_Uv+Xv4HOB-ivHyUp9it6CNSmrKT4gOA@mail.gmail.com/ > > and the last paragraph in particular. > > The whole "poll_usage" thing is a kernel *hack* to deal with broken > user space that expected garbage semantics that aren't real, and were > never real. Yes agreed. But we can make this hack more understandable. But as I said, this is off-topic right now. > introduced that completely bogus hack to say "ok, we'll send these > completely wrong extraneous events despite the fact that nothing has > changed, because some broken user space program was written to expect > them". Yes, but since we already have this hack: > That program is buggy, and we're not adding new hacks for new bugs. Yes, but see below... > If you ask for an edge-triggered EPOLL event, you get an *edge* > triggered EPOLL event. And there is no edge - the pipe was already > readable, no edge anywhere in sight. Yes, the pipe was already readable before before fork, but this condition was already "consumed" by the 1st epoll_wait(EPOLLET). Please see below. > If anything, we might consider removing the crazy "poll_usage" hack in > the (probably futile) hope that user space has been fixed. This would be the best option ;) Until then: I agree that my test case is "buggy", but afaics it is not buggier than userspace programs which rely on the unconditional kill_fasync()'s in pipe_read/pipe_write? So. anon_pipe_write() does if (was_empty) wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM); kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); before wait_event(pipe->wr_wait), but before return it does if (was_empty || pipe->poll_usage) wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM); kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); and this looks confusing to me. If pipe_write() doesn't take poll_usage into account before wait_event(wr_wait), then it doesn't need kill_fasync() too? So I won't argue, but why not make both cases more consistent? Oleg.
On Tue, 4 Mar 2025 at 09:32, Oleg Nesterov <oleg@redhat.com> wrote: > > I agree that my test case is "buggy", but afaics it is not buggier than > userspace programs which rely on the unconditional kill_fasync()'s in > pipe_read/pipe_write? I'm not convinced any such users actually exist. The reason kill_fasync() is unconditional is that it's cheap. The normal situation is "nobody there", and we test that without any locking. So we've never bothered to make any changes to that path, and there's never been any real reason to have any "was_empty" like conditionals. Linus
diff --git a/fs/pipe.c b/fs/pipe.c index b0641f75b1ba..8a32257cc74f 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -554,7 +554,7 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter *from) * become empty while we dropped the lock. */ mutex_unlock(&pipe->mutex); - if (was_empty) + if (was_empty || pipe->poll_usage) wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM); kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); wait_event_interruptible_exclusive(pipe->wr_wait, pipe_writable(pipe));