diff mbox series

[v2] 9p/trans_fd: perform read/write with TIF_SIGPENDING set

Message ID 4293faaf-8279-77e2-8b1a-aff765416980@I-love.SAKURA.ne.jp (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series [v2] 9p/trans_fd: perform read/write with TIF_SIGPENDING set | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 0 this patch: 4
netdev/cc_maintainers warning 4 maintainers not CCed: edumazet@google.com davem@davemloft.net kuba@kernel.org pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 0 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 31 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Tetsuo Handa Aug. 27, 2022, 6:11 a.m. UTC
syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop()
 from p9_conn_destroy() from p9_fd_close() is failing to interrupt already
started kernel_read() from p9_fd_read() from p9_read_work() and/or
kernel_write() from p9_fd_write() from p9_write_work() requests.

Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not
need to interrupt kernel_{read,write}(). However, since p9_fd_open() does
not set O_NONBLOCK flag, but pipe blocks unless signal is pending,
p9_mux_poll_stop() needs to interrupt kernel_{read,write}() when the file
descriptor refers to a pipe. In other words, pipe file descriptor needs
to be handled as if socket file descriptor. We somehow need to interrupt
kernel_{read,write}() on pipes.

If we can tolerate "possibility of breaking userspace program by setting
O_NONBLOCK flag on userspace-supplied file descriptors" and "possibility
of race window that userspace program clears O_NONBLOCK flag between after
automatically setting O_NONBLOCK flag and before calling
kernel_{read,write}()", we could automatically set O_NONBLOCK flag
immediately before calling kernel_{read,write}().

A different approach, which this patch is doing, is to surround
kernel_{read,write}() with set_thread_flag(TIF_SIGPENDING) and
recalc_sigpending(). This might be ugly and bit costly, but does not
touch userspace-supplied file descriptors.

Link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33 [1]
Reported-by: syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com>
---
Although syzbot tested that this patch solves hung task problem, syzbot
cannot verify that this patch will not break functionality of p9 users.
Please test before applying this patch.

 net/9p/trans_fd.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

Comments

Christian Schoenebeck Sept. 1, 2022, 3:23 p.m. UTC | #1
On Samstag, 27. August 2022 08:11:48 CEST Tetsuo Handa wrote:
> syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop()
>  from p9_conn_destroy() from p9_fd_close() is failing to interrupt already
> started kernel_read() from p9_fd_read() from p9_read_work() and/or
> kernel_write() from p9_fd_write() from p9_write_work() requests.
> 
> Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not
> need to interrupt kernel_{read,write}(). However, since p9_fd_open() does
> not set O_NONBLOCK flag, but pipe blocks unless signal is pending,
> p9_mux_poll_stop() needs to interrupt kernel_{read,write}() when the file
> descriptor refers to a pipe. In other words, pipe file descriptor needs
> to be handled as if socket file descriptor. We somehow need to interrupt
> kernel_{read,write}() on pipes.
> 
> If we can tolerate "possibility of breaking userspace program by setting
> O_NONBLOCK flag on userspace-supplied file descriptors" and "possibility
> of race window that userspace program clears O_NONBLOCK flag between after
> automatically setting O_NONBLOCK flag and before calling
> kernel_{read,write}()", we could automatically set O_NONBLOCK flag
> immediately before calling kernel_{read,write}().
> 
> A different approach, which this patch is doing, is to surround
> kernel_{read,write}() with set_thread_flag(TIF_SIGPENDING) and
> recalc_sigpending(). This might be ugly and bit costly, but does not
> touch userspace-supplied file descriptors.

So the intention in this alternative approach is to allow user space apps  
still being able to perform blocking I/O, while at the same time making the 
kernel thread interruptible to fix this hung task issue, correct?

> Link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33 [1]
> Reported-by: syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Tested-by: syzbot <syzbot+8b41a1365f1106fd0f33@syzkaller.appspotmail.com>
> ---
> Although syzbot tested that this patch solves hung task problem, syzbot
> cannot verify that this patch will not break functionality of p9 users.
> Please test before applying this patch.
> 
>  net/9p/trans_fd.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index e758978b44be..e2f4e3245a80 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -256,11 +256,13 @@ static int p9_fd_read(struct p9_client *client, void
> *v, int len) if (!ts)
>  		return -EREMOTEIO;
> 
> -	if (!(ts->rd->f_flags & O_NONBLOCK))
> -		p9_debug(P9_DEBUG_ERROR, "blocking read ...\n");
> -
>  	pos = ts->rd->f_pos;
> +	/* Force non-blocking read() even without O_NONBLOCK. */
> +	set_thread_flag(TIF_SIGPENDING);
>  	ret = kernel_read(ts->rd, v, len, &pos);
> +	spin_lock_irq(&current->sighand->siglock);
> +	recalc_sigpending();
> +	spin_unlock_irq(&current->sighand->siglock);

Is the recalc_sigpending() block here actually needed? The TIF_SIGPENDING flag 
is already cleared by net/9p/client.c, no?

>  	if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
>  		client->status = Disconnected;
>  	return ret;
> @@ -423,10 +425,12 @@ static int p9_fd_write(struct p9_client *client, void
> *v, int len) if (!ts)
>  		return -EREMOTEIO;
> 
> -	if (!(ts->wr->f_flags & O_NONBLOCK))
> -		p9_debug(P9_DEBUG_ERROR, "blocking write ...\n");
> -
> +	/* Force non-blocking write() even without O_NONBLOCK. */
> +	set_thread_flag(TIF_SIGPENDING);
>  	ret = kernel_write(ts->wr, v, len, &ts->wr->f_pos);
> +	spin_lock_irq(&current->sighand->siglock);
> +	recalc_sigpending();
> +	spin_unlock_irq(&current->sighand->siglock);
>  	if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
>  		client->status = Disconnected;
>  	return ret;
Tetsuo Handa Sept. 1, 2022, 10:25 p.m. UTC | #2
On 2022/09/02 0:23, Christian Schoenebeck wrote:
> So the intention in this alternative approach is to allow user space apps  
> still being able to perform blocking I/O, while at the same time making the 
> kernel thread interruptible to fix this hung task issue, correct?

Making the kernel thread "non-blocking" (rather than "interruptible") in order
not to be blocked on I/O on pipes.

Since kernel threads by default do not receive signals, being "interruptible"
or "killable" does not help (except for silencing khungtaskd warning). Being
"non-blocking" like I/O on sockets helps.

>> --- a/net/9p/trans_fd.c
>> +++ b/net/9p/trans_fd.c
>> @@ -256,11 +256,13 @@ static int p9_fd_read(struct p9_client *client, void
>> *v, int len) if (!ts)
>>  		return -EREMOTEIO;
>>
>> -	if (!(ts->rd->f_flags & O_NONBLOCK))
>> -		p9_debug(P9_DEBUG_ERROR, "blocking read ...\n");
>> -
>>  	pos = ts->rd->f_pos;
>> +	/* Force non-blocking read() even without O_NONBLOCK. */
>> +	set_thread_flag(TIF_SIGPENDING);
>>  	ret = kernel_read(ts->rd, v, len, &pos);
>> +	spin_lock_irq(&current->sighand->siglock);
>> +	recalc_sigpending();
>> +	spin_unlock_irq(&current->sighand->siglock);
> 
> Is the recalc_sigpending() block here actually needed? The TIF_SIGPENDING flag 
> is already cleared by net/9p/client.c, no?

This is actually needed.

The thread which processes this function is a kernel workqueue thread which
is supposed to process other functions (which might call "interruptible"
functions even if signals are not received by default).

The thread which currently clearing the TIF_SIGPENDING flag is a user process
(which are calling "killable" functions from syscall context but effectively
"uninterruptible" due to clearing the TIF_SIGPENDING flag and retrying).
By the way, clearing the TIF_SIGPENDING flag before retrying "killable" functions
(like p9_client_rpc() does) is very bad and needs to be avoided...
Dominique Martinet Sept. 3, 2022, 11:39 p.m. UTC | #3
Thanks for the patch and sorry for the slow reply

v1 vs v2: my take is that I think v1 is easier to understand, and if you
pass a fd to be used as kernel end for 9p you shouldn't also be messing
with it so it's fair game to make it O_NONBLOCK -- we're reading and
writing to these things, the fds shouldn't be used by the caller after
the mount syscall.

Is there any reason you spent time working on v2, or is that just
theorical for not messing with userland fd ?

unless there's any reason I'll try to find time to test v1 and queue it
for 6.1

Tetsuo Handa wrote on Fri, Sep 02, 2022 at 07:25:30AM +0900:
> On 2022/09/02 0:23, Christian Schoenebeck wrote:
> > So the intention in this alternative approach is to allow user space apps  
> > still being able to perform blocking I/O, while at the same time making the 
> > kernel thread interruptible to fix this hung task issue, correct?
> 
> Making the kernel thread "non-blocking" (rather than "interruptible") in order
> not to be blocked on I/O on pipes.
> 
> Since kernel threads by default do not receive signals, being "interruptible"
> or "killable" does not help (except for silencing khungtaskd warning). Being
> "non-blocking" like I/O on sockets helps.

I'm still not 100% sure I understand the root of the deadlock, but I can
agree the worker thread shouldn't block.

We seem to check for EAGAIN where kernel_read/write end up being called
and there's a poll for scheduling so it -should- work, but I assume this
hasn't been tested much and might take a bit of time to test.


> The thread which currently clearing the TIF_SIGPENDING flag is a user process
> (which are calling "killable" functions from syscall context but effectively
> "uninterruptible" due to clearing the TIF_SIGPENDING flag and retrying).
> By the way, clearing the TIF_SIGPENDING flag before retrying "killable" functions
> (like p9_client_rpc() does) is very bad and needs to be avoided...

Yes, I really wish we could make this go away.
I started work to make the cancel (flush) asynchronous, but it
introduced a regression I never had (and still don't have) time to
figure out... If you have motivation to take over, the patches I sent
are here:
https://lore.kernel.org/all/20181217110111.GB17466@nautica/T/
(unfortunately some refactoring happened and they no longer apply, but
the logic should be mostly sane)


Thanks,
--
Dominique
Tetsuo Handa Sept. 4, 2022, 12:27 a.m. UTC | #4
On 2022/09/04 8:39, Dominique Martinet wrote:
> Is there any reason you spent time working on v2, or is that just
> theorical for not messing with userland fd ?

Just theoretical for not messing with userland fd, for programs generated
by fuzzers might use fds passed to the mount() syscall. I imagined that
syzbot again reports this problem when it started playing with fcntl().

For robustness, not messing with userland fd is the better. ;-)

> 
> unless there's any reason I'll try to find time to test v1 and queue it
> for 6.1

OK.

> We seem to check for EAGAIN where kernel_read/write end up being called
> and there's a poll for scheduling so it -should- work, but I assume this
> hasn't been tested much and might take a bit of time to test.

Right. Since the I/O in kernel side is poll based multiplexing, forcing
non-blocking I/O -should- work. (But I can't test e.g. changes in CPU time
usage because I don't have environment to test. I assume that poll based
multiplexing saves us from doing busy looping.)

We are currently checking for ERESTARTSYS and EAGAIN. The former is for
non-socket fds which do not have O_NONBLOCK flag, and the latter is for
socket fds which have O_NONBLOCK flag. If we enforce O_NONBLOCK flag,
the former will become redundant. I think we can remove the former check
after you tested that setting O_NONBLOCK flag on non-socket fds does not break.
Dominique Martinet Oct. 7, 2022, 1:40 a.m. UTC | #5
Tetsuo Handa wrote on Sun, Sep 04, 2022 at 09:27:22AM +0900:
> On 2022/09/04 8:39, Dominique Martinet wrote:
> > Is there any reason you spent time working on v2, or is that just
> > theorical for not messing with userland fd ?
> 
> Just theoretical for not messing with userland fd, for programs generated
> by fuzzers might use fds passed to the mount() syscall. I imagined that
> syzbot again reports this problem when it started playing with fcntl().
> 
> For robustness, not messing with userland fd is the better. ;-)

By the way digging this back made me think about this a bit again.
My opinion hasn't really changed that if you want to shoot yourself in
the foot I don't think we're crossing any priviledge boundary here, but
we could probably prevent it by saying the mount call with close that fd
and somehow steal it? (drop the fget, close_fd after get_file perhaps?)

That should address your concern about robustess and syzbot will no
longer be able to play with fcntl on "our" end of the pipe. I think it's
fair to say that once you pass it to the kernel all bets are off, so
closing it for the userspace application could make sense, and the mount
already survives when short processes do the mount call and immediately
exit so it's not like we need that fd to be open...


What do you think?

(either way would be for 6.2, the patch is already good enough as is for
me)
--
Dominique
Tetsuo Handa Oct. 7, 2022, 11:52 a.m. UTC | #6
On 2022/10/07 10:40, Dominique Martinet wrote:
> Tetsuo Handa wrote on Sun, Sep 04, 2022 at 09:27:22AM +0900:
>> On 2022/09/04 8:39, Dominique Martinet wrote:
>>> Is there any reason you spent time working on v2, or is that just
>>> theorical for not messing with userland fd ?
>>
>> Just theoretical for not messing with userland fd, for programs generated
>> by fuzzers might use fds passed to the mount() syscall. I imagined that
>> syzbot again reports this problem when it started playing with fcntl().
>>
>> For robustness, not messing with userland fd is the better. ;-)
> 
> By the way digging this back made me think about this a bit again.
> My opinion hasn't really changed that if you want to shoot yourself in
> the foot I don't think we're crossing any priviledge boundary here, but
> we could probably prevent it by saying the mount call with close that fd
> and somehow steal it? (drop the fget, close_fd after get_file perhaps?)
> 
> That should address your concern about robustess and syzbot will no
> longer be able to play with fcntl on "our" end of the pipe. I think it's
> fair to say that once you pass it to the kernel all bets are off, so
> closing it for the userspace application could make sense, and the mount
> already survives when short processes do the mount call and immediately
> exit so it's not like we need that fd to be open...
> 
> 
> What do you think?

I found that pipe is using alloc_file_clone() which allocates "struct file"
instead of just incrementing "struct file"->f_count.

Then, can we add EXPORT_SYMBOL_GPL(alloc_file_clone) to fs/file_table.c and
use it like

  struct file *f;

  ts->rd = fget(rfd);
  if (!ts->rd)
    goto out_free_ts;
  if (!(ts->rd->f_mode & FMODE_READ))
    goto out_put_rd;
  f = alloc_file_clone(ts->rd, ts->rd->f_flags | O_NONBLOCK, ts->rd->f_op);
  if (IS_ERR(f))
    goto out_put_rd;
  fput(ts->rd);
  ts->rd = f;

  ts->wr = fget(wfd);
  if (!ts->wr)
    goto out_put_rd;
  if (!(ts->wr->f_mode & FMODE_WRITE))
    goto out_put_wr;
  f = alloc_file_clone(ts->wr, ts->wr->f_flags | O_NONBLOCK, ts->wr->f_op);
  if (IS_ERR(f))
    goto out_put_wr;
  fput(ts->wr);
  ts->wr = f;

 from p9_fd_open() for cloning "struct file" with O_NONBLOCK flag added?
Just an idea. I don't know whether alloc_file_clone() arguments are correct...
diff mbox series

Patch

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index e758978b44be..e2f4e3245a80 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -256,11 +256,13 @@  static int p9_fd_read(struct p9_client *client, void *v, int len)
 	if (!ts)
 		return -EREMOTEIO;
 
-	if (!(ts->rd->f_flags & O_NONBLOCK))
-		p9_debug(P9_DEBUG_ERROR, "blocking read ...\n");
-
 	pos = ts->rd->f_pos;
+	/* Force non-blocking read() even without O_NONBLOCK. */
+	set_thread_flag(TIF_SIGPENDING);
 	ret = kernel_read(ts->rd, v, len, &pos);
+	spin_lock_irq(&current->sighand->siglock);
+	recalc_sigpending();
+	spin_unlock_irq(&current->sighand->siglock);
 	if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
 		client->status = Disconnected;
 	return ret;
@@ -423,10 +425,12 @@  static int p9_fd_write(struct p9_client *client, void *v, int len)
 	if (!ts)
 		return -EREMOTEIO;
 
-	if (!(ts->wr->f_flags & O_NONBLOCK))
-		p9_debug(P9_DEBUG_ERROR, "blocking write ...\n");
-
+	/* Force non-blocking write() even without O_NONBLOCK. */
+	set_thread_flag(TIF_SIGPENDING);
 	ret = kernel_write(ts->wr, v, len, &ts->wr->f_pos);
+	spin_lock_irq(&current->sighand->siglock);
+	recalc_sigpending();
+	spin_unlock_irq(&current->sighand->siglock);
 	if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
 		client->status = Disconnected;
 	return ret;