diff mbox series

[v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending()

Message ID 9422b998-5bab-85cc-5416-3bb5cf6dd853@kernel.dk (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series [v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending() | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4 this patch: 4
netdev/cc_maintainers warning 8 maintainers not CCed: asmadeus@codewreck.org pabeni@redhat.com linux_oss@crudebyte.com lucho@ionkov.net ericvh@gmail.com edumazet@google.com v9fs-developer@lists.sourceforge.net davem@davemloft.net
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning WARNING: networking block comments don't use an empty /* line, use /* Comment...
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jens Axboe Feb. 3, 2023, 4:04 p.m. UTC
signal_pending() really means that an exit to userspace is required to
clear the condition, as it could be either an actual signal, or it could
be TWA_SIGNAL based task_work that needs processing. The 9p client
does a recalc_sigpending() to take care of the former, but that still
leaves TWA_SIGNAL task_work. The result is that if we do have TWA_SIGNAL
task_work pending, then we'll sit in a tight loop spinning as
signal_pending() remains true even after recalc_sigpending().

Move the signal_pending() logic into a helper that deals with both, and
return -ERESTARTSYS if the reason for signal_pendding() being true is
that we have task_work to process.

Link: https://lore.kernel.org/lkml/Y9TgUupO5C39V%2FDW@xpf.sh.intel.com/
Reported-and-tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

v2: don't rely on task_work_run(), rather just punt with -ERESTARTYS at
    that point. For one, we don't want to export task_work_run(), it's
    in-kernel only. And secondly, we need to ensure we have a sane state
    before running task_work. The latter did look fine before, but this
    should be saner. Tested this also fixes the report as well for me.

Comments

Dominique Martinet Feb. 5, 2023, 10:02 a.m. UTC | #1
meta-comment: 9p is usually handled separately from netdev, I just saw
this by chance when Simon replied to v1 -- please cc
v9fs-developer@lists.sourceforge.net for v3 if there is one
(well, it's a bit of a weird tree because patches are sometimes taken
through -net...)

Also added Christian (virtio 9p) and Eric (second maintainer) to Tos for
attention.


Jens Axboe wrote on Fri, Feb 03, 2023 at 09:04:28AM -0700:
> signal_pending() really means that an exit to userspace is required to
> clear the condition, as it could be either an actual signal, or it could
> be TWA_SIGNAL based task_work that needs processing. The 9p client
> does a recalc_sigpending() to take care of the former, but that still
> leaves TWA_SIGNAL task_work. The result is that if we do have TWA_SIGNAL
> task_work pending, then we'll sit in a tight loop spinning as
> signal_pending() remains true even after recalc_sigpending().
> 
> Move the signal_pending() logic into a helper that deals with both, and
> return -ERESTARTSYS if the reason for signal_pendding() being true is
> that we have task_work to process.
> Link: https://lore.kernel.org/lkml/Y9TgUupO5C39V%2FDW@xpf.sh.intel.com/
> Reported-and-tested-by: Pengfei Xu <pengfei.xu@intel.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
> v2: don't rely on task_work_run(), rather just punt with -ERESTARTYS at
>     that point. For one, we don't want to export task_work_run(), it's
>     in-kernel only. And secondly, we need to ensure we have a sane state
>     before running task_work. The latter did look fine before, but this
>     should be saner. Tested this also fixes the report as well for me.

Hmm, just bailing out here is a can of worm -- when we get the reply
from server depending on the transport hell might break loose (zc
requests in particular on virtio will probably just access the memory
anyway... fd will consider it got a bogus reply and close the connection
which is a lesser evil but still not appropriatey)

We really need to get rid of that retry loop in the first place, and req
refcounting I added a couple of years ago was a first step towards async
flush which will help with that, but the async flush code had a bug I
never found time to work out so it never made it and we need an
immediate fix.

... Just looking at code out loud, sorry for rambling: actually that
signal handling in virtio is already out of p9_virtio_zc_request() so
the pages are already unpinned by the time we do that flush, and I guess
it's not much worse -- refcounting will make it "mostly work" exactly as
it does now, as in the pages won't be freed until we actually get the
reply, so the pages can get moved underneath virtio which is bad but is
the same as right now, and I guess it's a net improvement?


I agree with your assessment that we can't use task_work_run(), I assume
it's also quite bad to just clear the flag?
I'm not familiar with these task at all, in which case do they happen?
Would you be able to share an easy reproducer so that I/someone can try
on various transports?

If it's "rare enough" I'd say sacrificing the connection might make more
sense than a busy loop, but if it's becoming common I think we'll need
to spend some more time thinking about it...
It might be less effort to dig out my async flush commits if this become
too complicated, but I wish I could say I have time for it...

Thanks!

> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 622ec6a586ee..9caa66cbd5b7 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -652,6 +652,25 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
>  	return ERR_PTR(err);
>  }
>  
> +static int p9_sigpending(int *sigpending)
> +{
> +	*sigpending = 0;
> +
> +	if (!signal_pending(current))
> +		return 0;
> +
> +	/*
> +	 * If we have a TIF_NOTIFY_SIGNAL pending, abort to get it
> +	 * processed.
> +	 */
> +	if (test_thread_flag(TIF_NOTIFY_SIGNAL))
> +		return -ERESTARTSYS;
> +
> +	*sigpending = 1;
> +	clear_thread_flag(TIF_SIGPENDING);
> +	return 0;
> +}
> +
>  /**
>   * p9_client_rpc - issue a request and wait for a response
>   * @c: client session
> @@ -687,12 +706,9 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
>  	req->tc.zc = false;
>  	req->rc.zc = false;
>  
> -	if (signal_pending(current)) {
> -		sigpending = 1;
> -		clear_thread_flag(TIF_SIGPENDING);
> -	} else {
> -		sigpending = 0;
> -	}
> +	err = p9_sigpending(&sigpending);
> +	if (err)
> +		goto reterr;
>  
>  	err = c->trans_mod->request(c, req);
>  	if (err < 0) {
> @@ -789,12 +805,9 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
>  	req->tc.zc = true;
>  	req->rc.zc = true;
>  
> -	if (signal_pending(current)) {
> -		sigpending = 1;
> -		clear_thread_flag(TIF_SIGPENDING);
> -	} else {
> -		sigpending = 0;
> -	}
> +	err = p9_sigpending(&sigpending);
> +	if (err)
> +		goto reterr;
>  
>  	err = c->trans_mod->zc_request(c, req, uidata, uodata,
>  				       inlen, olen, in_hdrlen);
>
Jens Axboe Feb. 6, 2023, 8:19 p.m. UTC | #2
On 2/5/23 3:02?AM, Dominique Martinet wrote:
> meta-comment: 9p is usually handled separately from netdev, I just saw
> this by chance when Simon replied to v1 -- please cc
> v9fs-developer@lists.sourceforge.net for v3 if there is one
> (well, it's a bit of a weird tree because patches are sometimes taken
> through -net...)
> 
> Also added Christian (virtio 9p) and Eric (second maintainer) to Tos for
> attention.

Thanks! I can send out a v3, but let's get the discussion sorted first.
Only change I want to make is the comment format, which apparently is
different in net/ that most other spots in the kernel.

> Jens Axboe wrote on Fri, Feb 03, 2023 at 09:04:28AM -0700:
>> signal_pending() really means that an exit to userspace is required to
>> clear the condition, as it could be either an actual signal, or it could
>> be TWA_SIGNAL based task_work that needs processing. The 9p client
>> does a recalc_sigpending() to take care of the former, but that still
>> leaves TWA_SIGNAL task_work. The result is that if we do have TWA_SIGNAL
>> task_work pending, then we'll sit in a tight loop spinning as
>> signal_pending() remains true even after recalc_sigpending().
>>
>> Move the signal_pending() logic into a helper that deals with both, and
>> return -ERESTARTSYS if the reason for signal_pendding() being true is
>> that we have task_work to process.
>> Link: https://lore.kernel.org/lkml/Y9TgUupO5C39V%2FDW@xpf.sh.intel.com/
>> Reported-and-tested-by: Pengfei Xu <pengfei.xu@intel.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>> v2: don't rely on task_work_run(), rather just punt with -ERESTARTYS at
>>     that point. For one, we don't want to export task_work_run(), it's
>>     in-kernel only. And secondly, we need to ensure we have a sane state
>>     before running task_work. The latter did look fine before, but this
>>     should be saner. Tested this also fixes the report as well for me.
> 
> Hmm, just bailing out here is a can of worm -- when we get the reply
> from server depending on the transport hell might break loose (zc
> requests in particular on virtio will probably just access the memory
> anyway... fd will consider it got a bogus reply and close the connection
> which is a lesser evil but still not appropriatey)
> 
> We really need to get rid of that retry loop in the first place, and req
> refcounting I added a couple of years ago was a first step towards async
> flush which will help with that, but the async flush code had a bug I
> never found time to work out so it never made it and we need an
> immediate fix.
> 
> ... Just looking at code out loud, sorry for rambling: actually that
> signal handling in virtio is already out of p9_virtio_zc_request() so
> the pages are already unpinned by the time we do that flush, and I guess
> it's not much worse -- refcounting will make it "mostly work" exactly as
> it does now, as in the pages won't be freed until we actually get the
> reply, so the pages can get moved underneath virtio which is bad but is
> the same as right now, and I guess it's a net improvement?
> 
> 
> I agree with your assessment that we can't use task_work_run(), I assume
> it's also quite bad to just clear the flag?
> I'm not familiar with these task at all, in which case do they happen?
> Would you be able to share an easy reproducer so that I/someone can try
> on various transports?

You can't just clear the flag without also running the task_work. Hence
it either needs to be done right there, or leave it pending and let the
exit to userspace take care of it.

> If it's "rare enough" I'd say sacrificing the connection might make more
> sense than a busy loop, but if it's becoming common I think we'll need
> to spend some more time thinking about it...
> It might be less effort to dig out my async flush commits if this become
> too complicated, but I wish I could say I have time for it...

It can be a number of different things - eg fput() will do it. The
particular case that I came across was io_uring, which will use
TWA_SIGNAL based task_work for retry operations (and other things). If
you use io_uring, and depending on how you setup the ring, it can be
quite common or will never happen. Dropping the connection task_work
being pending is not a viable solution, I'm afraid.
Dominique Martinet Feb. 6, 2023, 9:42 p.m. UTC | #3
Jens Axboe wrote on Mon, Feb 06, 2023 at 01:19:24PM -0700:
> > I agree with your assessment that we can't use task_work_run(), I assume
> > it's also quite bad to just clear the flag?
> > I'm not familiar with these task at all, in which case do they happen?
> > Would you be able to share an easy reproducer so that I/someone can try
> > on various transports?
> 
> You can't just clear the flag without also running the task_work. Hence
> it either needs to be done right there, or leave it pending and let the
> exit to userspace take care of it.

Sorry I didn't develop that idea; the signal path resets the pending
signal when we're done, I assumed we could also reset the TWA_SIGNAL
flag when we're done flushing. That might take a while though, so it's
far from optimal.

> > If it's "rare enough" I'd say sacrificing the connection might make more
> > sense than a busy loop, but if it's becoming common I think we'll need
> > to spend some more time thinking about it...
> > It might be less effort to dig out my async flush commits if this become
> > too complicated, but I wish I could say I have time for it...
> 
> It can be a number of different things - eg fput() will do it.

Hm, schedule_delayed_work on the last fput, ok.
I was wondering what it had to do with the current 9p thread, but since
it's not scheduled on a particular cpu it can pick another cpu to wake
up, that makes sense -- although conceptually it feels rather bad to
interrupt a remote IO because of a local task that can be done later;
e.g. between having the fput wait a bit, or cancel a slow operation like
a 1MB write, I'd rather make the fput wait.
Do you know why that signal/interrupt is needed in the first place?

> The particular case that I came across was io_uring, which will use
> TWA_SIGNAL based task_work for retry operations (and other things). If
> you use io_uring, and depending on how you setup the ring, it can be
> quite common or will never happen. Dropping the connection task_work
> being pending is not a viable solution, I'm afraid.

Thanks for confirming that it's perfectly normal, let's not drop
connections :)

My preferred approach is still to try and restore the async flush code,
but that will take a while -- it's not something that'll work right away
and I want some tests so it won't be ready for this merge window.
If we can have some sort of workaround until then it'll probably be for
the best, but I don't have any other idea (than temporarily clearing the
flag) at this point.

I'll setup some uring IO on 9p and see if I can produce these.
Jens Axboe Feb. 6, 2023, 9:56 p.m. UTC | #4
On 2/6/23 2:42?PM, Dominique Martinet wrote:
> Jens Axboe wrote on Mon, Feb 06, 2023 at 01:19:24PM -0700:
>>> I agree with your assessment that we can't use task_work_run(), I assume
>>> it's also quite bad to just clear the flag?
>>> I'm not familiar with these task at all, in which case do they happen?
>>> Would you be able to share an easy reproducer so that I/someone can try
>>> on various transports?
>>
>> You can't just clear the flag without also running the task_work. Hence
>> it either needs to be done right there, or leave it pending and let the
>> exit to userspace take care of it.
> 
> Sorry I didn't develop that idea; the signal path resets the pending
> signal when we're done, I assumed we could also reset the TWA_SIGNAL
> flag when we're done flushing. That might take a while though, so it's
> far from optimal.

Sure, if you set it again when done, then it will probably work just
fine. But you need to treat TIF_NOTIFY_SIGNAL and TIF_SIGPENDING
separately. An attempt at that at the end of this email, totally
untested, and I'm not certain it's a good idea at all (see below). Is
there a reason why we can't exit and get the task_work processed
instead? That'd be greatly preferable.

>>> If it's "rare enough" I'd say sacrificing the connection might make more
>>> sense than a busy loop, but if it's becoming common I think we'll need
>>> to spend some more time thinking about it...
>>> It might be less effort to dig out my async flush commits if this become
>>> too complicated, but I wish I could say I have time for it...
>>
>> It can be a number of different things - eg fput() will do it.
> 
> Hm, schedule_delayed_work on the last fput, ok.
> I was wondering what it had to do with the current 9p thread, but since
> it's not scheduled on a particular cpu it can pick another cpu to wake
> up, that makes sense -- although conceptually it feels rather bad to
> interrupt a remote IO because of a local task that can be done later;
> e.g. between having the fput wait a bit, or cancel a slow operation like
> a 1MB write, I'd rather make the fput wait.
> Do you know why that signal/interrupt is needed in the first place?

It's needed if the task is currently sleeping in the kernel, to abort a
sleeping loop. The task_work may contain actions that will result in the
sleep loop being satisfied and hence ending, which means it needs to be
processed. That's my worry with the check-and-clear, then reset state
solution.

>> The particular case that I came across was io_uring, which will use
>> TWA_SIGNAL based task_work for retry operations (and other things). If
>> you use io_uring, and depending on how you setup the ring, it can be
>> quite common or will never happen. Dropping the connection task_work
>> being pending is not a viable solution, I'm afraid.
> 
> Thanks for confirming that it's perfectly normal, let's not drop
> connections :)
> 
> My preferred approach is still to try and restore the async flush code,
> but that will take a while -- it's not something that'll work right away
> and I want some tests so it won't be ready for this merge window.
> If we can have some sort of workaround until then it'll probably be for
> the best, but I don't have any other idea (than temporarily clearing the
> flag) at this point.
> 
> I'll setup some uring IO on 9p and see if I can produce these.

I'm attaching a test case. I don't think it's particularly useful, but
it does nicely demonstrate the infinite loop that 9p gets into if
there's task_work pending.
Jens Axboe Feb. 6, 2023, 9:58 p.m. UTC | #5
>> Sorry I didn't develop that idea; the signal path resets the pending
>> signal when we're done, I assumed we could also reset the TWA_SIGNAL
>> flag when we're done flushing. That might take a while though, so it's
>> far from optimal.
> 
> Sure, if you set it again when done, then it will probably work just
> fine. But you need to treat TIF_NOTIFY_SIGNAL and TIF_SIGPENDING
> separately. An attempt at that at the end of this email, totally
> untested, and I'm not certain it's a good idea at all (see below). Is
> there a reason why we can't exit and get the task_work processed
> instead? That'd be greatly preferable.
Forgot to include it, but as mentioned, don't think it's a sane idea...


diff --git a/net/9p/client.c b/net/9p/client.c
index 622ec6a586ee..e4ff2773e00b 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -652,6 +652,33 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 	return ERR_PTR(err);
 }
 
+static void p9_clear_sigpending(int *sigpending, int *notifypending)
+{
+	if (signal_pending(current)) {
+		*sigpending = test_thread_flag(TIF_SIGPENDING);
+		if (*sigpending)
+			clear_thread_flag(TIF_SIGPENDING);
+		*notifypending = test_thread_flag(TIF_NOTIFY_SIGNAL);
+		if (*notifypending)
+			clear_thread_flag(TIF_NOTIFY_SIGNAL);
+	} else {
+		*sigpending = *notifypending = 0;
+	}
+}
+
+static void p9_reset_sigpending(int sigpending, int notifypending)
+{
+	unsigned long flags;
+
+	if (sigpending) {
+		spin_lock_irqsave(&current->sighand->siglock, flags);
+		recalc_sigpending();
+		spin_unlock_irqrestore(&current->sighand->siglock, flags);
+	}
+	if (notifypending)
+		set_tsk_thread_flag(current, TIF_NOTIFY_SIGNAL);
+}
+
 /**
  * p9_client_rpc - issue a request and wait for a response
  * @c: client session
@@ -665,8 +692,7 @@ static struct p9_req_t *
 p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 {
 	va_list ap;
-	int sigpending, err;
-	unsigned long flags;
+	int sigpending, notifypending, err;
 	struct p9_req_t *req;
 	/* Passing zero for tsize/rsize to p9_client_prepare_req() tells it to
 	 * auto determine an appropriate (small) request/response size
@@ -687,12 +713,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	req->tc.zc = false;
 	req->rc.zc = false;
 
-	if (signal_pending(current)) {
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
-	} else {
-		sigpending = 0;
-	}
+	p9_clear_sigpending(&sigpending, &notifypending);
 
 	err = c->trans_mod->request(c, req);
 	if (err < 0) {
@@ -714,8 +735,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 
 	if (err == -ERESTARTSYS && c->status == Connected &&
 	    type == P9_TFLUSH) {
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
+		p9_clear_sigpending(&sigpending, &notifypending);
 		goto again;
 	}
 
@@ -725,8 +745,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	}
 	if (err == -ERESTARTSYS && c->status == Connected) {
 		p9_debug(P9_DEBUG_MUX, "flushing\n");
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
+		p9_clear_sigpending(&sigpending, &notifypending);
 
 		if (c->trans_mod->cancel(c, req))
 			p9_client_flush(c, req);
@@ -736,11 +755,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 			err = 0;
 	}
 recalc_sigpending:
-	if (sigpending) {
-		spin_lock_irqsave(&current->sighand->siglock, flags);
-		recalc_sigpending();
-		spin_unlock_irqrestore(&current->sighand->siglock, flags);
-	}
+	p9_reset_sigpending(sigpending, notifypending);
 	if (err < 0)
 		goto reterr;
 
@@ -773,8 +788,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 					 const char *fmt, ...)
 {
 	va_list ap;
-	int sigpending, err;
-	unsigned long flags;
+	int sigpending, notifypending, err;
 	struct p9_req_t *req;
 
 	va_start(ap, fmt);
@@ -789,12 +803,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 	req->tc.zc = true;
 	req->rc.zc = true;
 
-	if (signal_pending(current)) {
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
-	} else {
-		sigpending = 0;
-	}
+	p9_clear_sigpending(&sigpending, &notifypending);
 
 	err = c->trans_mod->zc_request(c, req, uidata, uodata,
 				       inlen, olen, in_hdrlen);
@@ -810,8 +819,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 	}
 	if (err == -ERESTARTSYS && c->status == Connected) {
 		p9_debug(P9_DEBUG_MUX, "flushing\n");
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
+		p9_clear_sigpending(&sigpending, &notifypending);
 
 		if (c->trans_mod->cancel(c, req))
 			p9_client_flush(c, req);
@@ -821,11 +829,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 			err = 0;
 	}
 recalc_sigpending:
-	if (sigpending) {
-		spin_lock_irqsave(&current->sighand->siglock, flags);
-		recalc_sigpending();
-		spin_unlock_irqrestore(&current->sighand->siglock, flags);
-	}
+	p9_reset_sigpending(sigpending, notifypending);
 	if (err < 0)
 		goto reterr;
Dominique Martinet Feb. 6, 2023, 10:29 p.m. UTC | #6
Jens Axboe wrote on Mon, Feb 06, 2023 at 02:56:57PM -0700:
> Sure, if you set it again when done, then it will probably work just
> fine. But you need to treat TIF_NOTIFY_SIGNAL and TIF_SIGPENDING
> separately. An attempt at that at the end of this email, totally
> untested, and I'm not certain it's a good idea at all (see below). Is
> there a reason why we can't exit and get the task_work processed
> instead? That'd be greatly preferable.

No good reason aside of "it's not ready", but in the current code things
will probably get weird.
I actually misremembered the tag lookup for trans_fd, since we're not
freeing the tag yet the lookup will work and the connexion might not be
dropped (just reading into a buffer then freeing it in the cb without
any further processing), but even my refcounting works better than I
thought you'll end up with the IO being replayed while the server is
still processing the first one.
This is unlikely, but for example this could happen: 

- first write [0;1MB]
- write is interrupted before server handled it
   - write replayed and handled, userspace continues to...
   - second write [1MB-4k;1MB]
- first write handle by server, overwriting the second write

And who doesn't enjoy a silent corruption for breakfast?


> > Hm, schedule_delayed_work on the last fput, ok.
> > I was wondering what it had to do with the current 9p thread, but since
> > it's not scheduled on a particular cpu it can pick another cpu to wake
> > up, that makes sense -- although conceptually it feels rather bad to
> > interrupt a remote IO because of a local task that can be done later;
> > e.g. between having the fput wait a bit, or cancel a slow operation like
> > a 1MB write, I'd rather make the fput wait.
> > Do you know why that signal/interrupt is needed in the first place?
> 
> It's needed if the task is currently sleeping in the kernel, to abort a
> sleeping loop. The task_work may contain actions that will result in the
> sleep loop being satisfied and hence ending, which means it needs to be
> processed. That's my worry with the check-and-clear, then reset state
> solution.

I see, sleeping loop might not wake up until the signal is handled, but
it won't handle it if we don't get out.
Not bailing out on sigkill is bad enough but that's possibly much worse
indeed... And that also means the busy loop isn't any better, I was
wondering how it was noticed if it was just a few busy checks but in
that case just temporarily clearing the flag won't get out either,
that's not even a workaround.

I assume that also explains why it wants that task, and cannot just run
from the idle context-- it's not just any worker task, it's in the
process context? (sorry for using you as a rubber duck...)

> > I'll setup some uring IO on 9p and see if I can produce these.
> 
> I'm attaching a test case. I don't think it's particularly useful, but
> it does nicely demonstrate the infinite loop that 9p gets into if
> there's task_work pending.

Thanks, that helps!
I might not have time until weekend but I'll definitely look at it.
Jens Axboe Feb. 6, 2023, 10:56 p.m. UTC | #7
On 2/6/23 3:29?PM, Dominique Martinet wrote:
>>> Hm, schedule_delayed_work on the last fput, ok.
>>> I was wondering what it had to do with the current 9p thread, but since
>>> it's not scheduled on a particular cpu it can pick another cpu to wake
>>> up, that makes sense -- although conceptually it feels rather bad to
>>> interrupt a remote IO because of a local task that can be done later;
>>> e.g. between having the fput wait a bit, or cancel a slow operation like
>>> a 1MB write, I'd rather make the fput wait.
>>> Do you know why that signal/interrupt is needed in the first place?
>>
>> It's needed if the task is currently sleeping in the kernel, to abort a
>> sleeping loop. The task_work may contain actions that will result in the
>> sleep loop being satisfied and hence ending, which means it needs to be
>> processed. That's my worry with the check-and-clear, then reset state
>> solution.
> 
> I see, sleeping loop might not wake up until the signal is handled, but
> it won't handle it if we don't get out.

Exactly

> Not bailing out on sigkill is bad enough but that's possibly much worse
> indeed... And that also means the busy loop isn't any better, I was
> wondering how it was noticed if it was just a few busy checks but in
> that case just temporarily clearing the flag won't get out either,
> that's not even a workaround.
> 
> I assume that also explains why it wants that task, and cannot just run
> from the idle context-- it's not just any worker task, it's in the
> process context? (sorry for using you as a rubber duck...)

Right, it needs to run in the context of the right task. So we can't
just punt it out-of-line to something else, whihc would obviously also
solve that dependency loop.

>>> I'll setup some uring IO on 9p and see if I can produce these.
>>
>> I'm attaching a test case. I don't think it's particularly useful, but
>> it does nicely demonstrate the infinite loop that 9p gets into if
>> there's task_work pending.
> 
> Thanks, that helps!
> I might not have time until weekend but I'll definitely look at it.

Sounds good, thanks! I'll consider my patch abandoned and wait for what
you have.
diff mbox series

Patch

diff --git a/net/9p/client.c b/net/9p/client.c
index 622ec6a586ee..9caa66cbd5b7 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -652,6 +652,25 @@  static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 	return ERR_PTR(err);
 }
 
+static int p9_sigpending(int *sigpending)
+{
+	*sigpending = 0;
+
+	if (!signal_pending(current))
+		return 0;
+
+	/*
+	 * If we have a TIF_NOTIFY_SIGNAL pending, abort to get it
+	 * processed.
+	 */
+	if (test_thread_flag(TIF_NOTIFY_SIGNAL))
+		return -ERESTARTSYS;
+
+	*sigpending = 1;
+	clear_thread_flag(TIF_SIGPENDING);
+	return 0;
+}
+
 /**
  * p9_client_rpc - issue a request and wait for a response
  * @c: client session
@@ -687,12 +706,9 @@  p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	req->tc.zc = false;
 	req->rc.zc = false;
 
-	if (signal_pending(current)) {
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
-	} else {
-		sigpending = 0;
-	}
+	err = p9_sigpending(&sigpending);
+	if (err)
+		goto reterr;
 
 	err = c->trans_mod->request(c, req);
 	if (err < 0) {
@@ -789,12 +805,9 @@  static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 	req->tc.zc = true;
 	req->rc.zc = true;
 
-	if (signal_pending(current)) {
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
-	} else {
-		sigpending = 0;
-	}
+	err = p9_sigpending(&sigpending);
+	if (err)
+		goto reterr;
 
 	err = c->trans_mod->zc_request(c, req, uidata, uodata,
 				       inlen, olen, in_hdrlen);