diff mbox series

pipe: honor IOCB_NOWAIT

Message ID cedfa436-47a3-7cbc-1948-75d0e28cfdc5@kernel.dk (mailing list archive)
State New, archived
Headers show
Series pipe: honor IOCB_NOWAIT | expand

Commit Message

Jens Axboe Sept. 7, 2020, 3:21 p.m. UTC
Pipe only looks at O_NONBLOCK for non-blocking operation, which means that
io_uring can't easily poll for it or attempt non-blocking issues. Check for
IOCB_NOWAIT in locking the pipe for reads and writes, and ditto when we
decide on whether or not to block or return -EAGAIN.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

If this is acceptable, then I can add S_ISFIFO to the whitelist on file
descriptors we can IOCB_NOWAIT try for, then poll if we get -EAGAIN
instead of using thread offload.

Comments

Al Viro Sept. 11, 2020, 1:12 a.m. UTC | #1
On Mon, Sep 07, 2020 at 09:21:02AM -0600, Jens Axboe wrote:
> Pipe only looks at O_NONBLOCK for non-blocking operation, which means that
> io_uring can't easily poll for it or attempt non-blocking issues. Check for
> IOCB_NOWAIT in locking the pipe for reads and writes, and ditto when we
> decide on whether or not to block or return -EAGAIN.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> If this is acceptable, then I can add S_ISFIFO to the whitelist on file
> descriptors we can IOCB_NOWAIT try for, then poll if we get -EAGAIN
> instead of using thread offload.

Will check.  In the meanwhile, blacklist eventpoll again.  Because your
attempts at "nonblocking" there had been both ugly as hell *AND* fail
to prevent blocking.  And frankly, I'm very tempted to rip that crap
out entirely.  Seriously, *look* at the code you've modified in
do_epoll_ctl().  And tell me why the hell is grabbing ->mtx in that
function needs to be infested with trylocks, while exact same mutex
taken in loop_check_proc() called under those is fine with mutex_lock().
Ditto for calls of vfs_poll() inside ep_insert(), GFP_KERNEL allocations
in ep_ptable_queue_proc(), synchronize_rcu() callable from ep_modify()
(from the same function), et sodding cetera.

No, this is _not_ an invitation to spread the same crap over even more
places in there; I just want to understand where had that kind of voodoo
approach comes from.  And that's directly relevant for this patch,
because it looks like the same kind of thing.

What is your semantics for IOCB_NOWAIT?  What should and what should _not_
be waited for?
Jens Axboe Sept. 11, 2020, 11:21 a.m. UTC | #2
On 9/10/20 7:12 PM, Al Viro wrote:
> On Mon, Sep 07, 2020 at 09:21:02AM -0600, Jens Axboe wrote:
>> Pipe only looks at O_NONBLOCK for non-blocking operation, which means that
>> io_uring can't easily poll for it or attempt non-blocking issues. Check for
>> IOCB_NOWAIT in locking the pipe for reads and writes, and ditto when we
>> decide on whether or not to block or return -EAGAIN.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> ---
>>
>> If this is acceptable, then I can add S_ISFIFO to the whitelist on file
>> descriptors we can IOCB_NOWAIT try for, then poll if we get -EAGAIN
>> instead of using thread offload.
> 
> Will check.

Thanks!

> In the meanwhile, blacklist eventpoll again.  Because your
> attempts at "nonblocking" there had been both ugly as hell *AND* fail
> to prevent blocking.  And frankly, I'm very tempted to rip that crap
> out entirely.  Seriously, *look* at the code you've modified in
> do_epoll_ctl().  And tell me why the hell is grabbing ->mtx in that
> function needs to be infested with trylocks, while exact same mutex
> taken in loop_check_proc() called under those is fine with mutex_lock().
> Ditto for calls of vfs_poll() inside ep_insert(), GFP_KERNEL allocations
> in ep_ptable_queue_proc(), synchronize_rcu() callable from ep_modify()
> (from the same function), et sodding cetera.

Ugh missed the loop_check_proc() part :/

The original patch wasn't that pretty, in my defense the whole thing is
pretty ugly to begin with. It'd need serious rewriting to become
palatable. If you want to yank the patch, I've got no problem with that.

> No, this is _not_ an invitation to spread the same crap over even more
> places in there; I just want to understand where had that kind of voodoo
> approach comes from.  And that's directly relevant for this patch,
> because it looks like the same kind of thing.

The pipe patch is way cleaner, and pretty much to the point. Don't think
that's a fair comparison.

> What is your semantics for IOCB_NOWAIT?  What should and what should _not_
> be waited for?

Basically it's don't wait for space (if write) if it's full, don't wait
for data if it's empty. O_NONBLOCK for the operation, instead of as a
file state. The locking isn't strictly required, and it's basically
impossible to avoid various deeper down items like memory allocations
potentially blocking, so...
diff mbox series

Patch

diff --git a/fs/pipe.c b/fs/pipe.c
index 60dbee457143..3cee28e35985 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -82,6 +82,13 @@  void pipe_unlock(struct pipe_inode_info *pipe)
 }
 EXPORT_SYMBOL(pipe_unlock);
 
+static inline bool __pipe_lock_nonblock(struct pipe_inode_info *pipe)
+{
+	if (mutex_trylock(&pipe->mutex))
+		return true;
+	return false;
+}
+
 static inline void __pipe_lock(struct pipe_inode_info *pipe)
 {
 	mutex_lock_nested(&pipe->mutex, I_MUTEX_PARENT);
@@ -244,7 +251,12 @@  pipe_read(struct kiocb *iocb, struct iov_iter *to)
 		return 0;
 
 	ret = 0;
-	__pipe_lock(pipe);
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (!__pipe_lock_nonblock(pipe))
+			return -EAGAIN;
+	} else {
+		__pipe_lock(pipe);
+	}
 
 	/*
 	 * We only wake up writers if the pipe was full when we started
@@ -344,7 +356,8 @@  pipe_read(struct kiocb *iocb, struct iov_iter *to)
 			break;
 		if (ret)
 			break;
-		if (filp->f_flags & O_NONBLOCK) {
+		if ((filp->f_flags & O_NONBLOCK) ||
+		    (iocb->ki_flags & IOCB_NOWAIT)) {
 			ret = -EAGAIN;
 			break;
 		}
@@ -432,7 +445,12 @@  pipe_write(struct kiocb *iocb, struct iov_iter *from)
 	if (unlikely(total_len == 0))
 		return 0;
 
-	__pipe_lock(pipe);
+	if (iocb->ki_flags & IOCB_NOWAIT) {
+		if (!__pipe_lock_nonblock(pipe))
+			return -EAGAIN;
+	} else {
+		__pipe_lock(pipe);
+	}
 
 	if (!pipe->readers) {
 		send_sig(SIGPIPE, current, 0);
@@ -554,7 +572,8 @@  pipe_write(struct kiocb *iocb, struct iov_iter *from)
 			continue;
 
 		/* Wait for buffer space to become available. */
-		if (filp->f_flags & O_NONBLOCK) {
+		if ((filp->f_flags & O_NONBLOCK) ||
+		    (iocb->ki_flags & IOCB_NOWAIT)) {
 			if (!ret)
 				ret = -EAGAIN;
 			break;