Message ID | 9422b998-5bab-85cc-5416-3bb5cf6dd853@kernel.dk (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending() | expand |
meta-comment: 9p is usually handled separately from netdev, I just saw this by chance when Simon replied to v1 -- please cc v9fs-developer@lists.sourceforge.net for v3 if there is one (well, it's a bit of a weird tree because patches are sometimes taken through -net...) Also added Christian (virtio 9p) and Eric (second maintainer) to Tos for attention. Jens Axboe wrote on Fri, Feb 03, 2023 at 09:04:28AM -0700: > signal_pending() really means that an exit to userspace is required to > clear the condition, as it could be either an actual signal, or it could > be TWA_SIGNAL based task_work that needs processing. The 9p client > does a recalc_sigpending() to take care of the former, but that still > leaves TWA_SIGNAL task_work. The result is that if we do have TWA_SIGNAL > task_work pending, then we'll sit in a tight loop spinning as > signal_pending() remains true even after recalc_sigpending(). > > Move the signal_pending() logic into a helper that deals with both, and > return -ERESTARTSYS if the reason for signal_pendding() being true is > that we have task_work to process. > Link: https://lore.kernel.org/lkml/Y9TgUupO5C39V%2FDW@xpf.sh.intel.com/ > Reported-and-tested-by: Pengfei Xu <pengfei.xu@intel.com> > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > v2: don't rely on task_work_run(), rather just punt with -ERESTARTYS at > that point. For one, we don't want to export task_work_run(), it's > in-kernel only. And secondly, we need to ensure we have a sane state > before running task_work. The latter did look fine before, but this > should be saner. Tested this also fixes the report as well for me. Hmm, just bailing out here is a can of worm -- when we get the reply from server depending on the transport hell might break loose (zc requests in particular on virtio will probably just access the memory anyway... fd will consider it got a bogus reply and close the connection which is a lesser evil but still not appropriatey) We really need to get rid of that retry loop in the first place, and req refcounting I added a couple of years ago was a first step towards async flush which will help with that, but the async flush code had a bug I never found time to work out so it never made it and we need an immediate fix. ... Just looking at code out loud, sorry for rambling: actually that signal handling in virtio is already out of p9_virtio_zc_request() so the pages are already unpinned by the time we do that flush, and I guess it's not much worse -- refcounting will make it "mostly work" exactly as it does now, as in the pages won't be freed until we actually get the reply, so the pages can get moved underneath virtio which is bad but is the same as right now, and I guess it's a net improvement? I agree with your assessment that we can't use task_work_run(), I assume it's also quite bad to just clear the flag? I'm not familiar with these task at all, in which case do they happen? Would you be able to share an easy reproducer so that I/someone can try on various transports? If it's "rare enough" I'd say sacrificing the connection might make more sense than a busy loop, but if it's becoming common I think we'll need to spend some more time thinking about it... It might be less effort to dig out my async flush commits if this become too complicated, but I wish I could say I have time for it... Thanks! > > diff --git a/net/9p/client.c b/net/9p/client.c > index 622ec6a586ee..9caa66cbd5b7 100644 > --- a/net/9p/client.c > +++ b/net/9p/client.c > @@ -652,6 +652,25 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c, > return ERR_PTR(err); > } > > +static int p9_sigpending(int *sigpending) > +{ > + *sigpending = 0; > + > + if (!signal_pending(current)) > + return 0; > + > + /* > + * If we have a TIF_NOTIFY_SIGNAL pending, abort to get it > + * processed. > + */ > + if (test_thread_flag(TIF_NOTIFY_SIGNAL)) > + return -ERESTARTSYS; > + > + *sigpending = 1; > + clear_thread_flag(TIF_SIGPENDING); > + return 0; > +} > + > /** > * p9_client_rpc - issue a request and wait for a response > * @c: client session > @@ -687,12 +706,9 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) > req->tc.zc = false; > req->rc.zc = false; > > - if (signal_pending(current)) { > - sigpending = 1; > - clear_thread_flag(TIF_SIGPENDING); > - } else { > - sigpending = 0; > - } > + err = p9_sigpending(&sigpending); > + if (err) > + goto reterr; > > err = c->trans_mod->request(c, req); > if (err < 0) { > @@ -789,12 +805,9 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type, > req->tc.zc = true; > req->rc.zc = true; > > - if (signal_pending(current)) { > - sigpending = 1; > - clear_thread_flag(TIF_SIGPENDING); > - } else { > - sigpending = 0; > - } > + err = p9_sigpending(&sigpending); > + if (err) > + goto reterr; > > err = c->trans_mod->zc_request(c, req, uidata, uodata, > inlen, olen, in_hdrlen); >
On 2/5/23 3:02?AM, Dominique Martinet wrote: > meta-comment: 9p is usually handled separately from netdev, I just saw > this by chance when Simon replied to v1 -- please cc > v9fs-developer@lists.sourceforge.net for v3 if there is one > (well, it's a bit of a weird tree because patches are sometimes taken > through -net...) > > Also added Christian (virtio 9p) and Eric (second maintainer) to Tos for > attention. Thanks! I can send out a v3, but let's get the discussion sorted first. Only change I want to make is the comment format, which apparently is different in net/ that most other spots in the kernel. > Jens Axboe wrote on Fri, Feb 03, 2023 at 09:04:28AM -0700: >> signal_pending() really means that an exit to userspace is required to >> clear the condition, as it could be either an actual signal, or it could >> be TWA_SIGNAL based task_work that needs processing. The 9p client >> does a recalc_sigpending() to take care of the former, but that still >> leaves TWA_SIGNAL task_work. The result is that if we do have TWA_SIGNAL >> task_work pending, then we'll sit in a tight loop spinning as >> signal_pending() remains true even after recalc_sigpending(). >> >> Move the signal_pending() logic into a helper that deals with both, and >> return -ERESTARTSYS if the reason for signal_pendding() being true is >> that we have task_work to process. >> Link: https://lore.kernel.org/lkml/Y9TgUupO5C39V%2FDW@xpf.sh.intel.com/ >> Reported-and-tested-by: Pengfei Xu <pengfei.xu@intel.com> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> --- >> v2: don't rely on task_work_run(), rather just punt with -ERESTARTYS at >> that point. For one, we don't want to export task_work_run(), it's >> in-kernel only. And secondly, we need to ensure we have a sane state >> before running task_work. The latter did look fine before, but this >> should be saner. Tested this also fixes the report as well for me. > > Hmm, just bailing out here is a can of worm -- when we get the reply > from server depending on the transport hell might break loose (zc > requests in particular on virtio will probably just access the memory > anyway... fd will consider it got a bogus reply and close the connection > which is a lesser evil but still not appropriatey) > > We really need to get rid of that retry loop in the first place, and req > refcounting I added a couple of years ago was a first step towards async > flush which will help with that, but the async flush code had a bug I > never found time to work out so it never made it and we need an > immediate fix. > > ... Just looking at code out loud, sorry for rambling: actually that > signal handling in virtio is already out of p9_virtio_zc_request() so > the pages are already unpinned by the time we do that flush, and I guess > it's not much worse -- refcounting will make it "mostly work" exactly as > it does now, as in the pages won't be freed until we actually get the > reply, so the pages can get moved underneath virtio which is bad but is > the same as right now, and I guess it's a net improvement? > > > I agree with your assessment that we can't use task_work_run(), I assume > it's also quite bad to just clear the flag? > I'm not familiar with these task at all, in which case do they happen? > Would you be able to share an easy reproducer so that I/someone can try > on various transports? You can't just clear the flag without also running the task_work. Hence it either needs to be done right there, or leave it pending and let the exit to userspace take care of it. > If it's "rare enough" I'd say sacrificing the connection might make more > sense than a busy loop, but if it's becoming common I think we'll need > to spend some more time thinking about it... > It might be less effort to dig out my async flush commits if this become > too complicated, but I wish I could say I have time for it... It can be a number of different things - eg fput() will do it. The particular case that I came across was io_uring, which will use TWA_SIGNAL based task_work for retry operations (and other things). If you use io_uring, and depending on how you setup the ring, it can be quite common or will never happen. Dropping the connection task_work being pending is not a viable solution, I'm afraid.
Jens Axboe wrote on Mon, Feb 06, 2023 at 01:19:24PM -0700: > > I agree with your assessment that we can't use task_work_run(), I assume > > it's also quite bad to just clear the flag? > > I'm not familiar with these task at all, in which case do they happen? > > Would you be able to share an easy reproducer so that I/someone can try > > on various transports? > > You can't just clear the flag without also running the task_work. Hence > it either needs to be done right there, or leave it pending and let the > exit to userspace take care of it. Sorry I didn't develop that idea; the signal path resets the pending signal when we're done, I assumed we could also reset the TWA_SIGNAL flag when we're done flushing. That might take a while though, so it's far from optimal. > > If it's "rare enough" I'd say sacrificing the connection might make more > > sense than a busy loop, but if it's becoming common I think we'll need > > to spend some more time thinking about it... > > It might be less effort to dig out my async flush commits if this become > > too complicated, but I wish I could say I have time for it... > > It can be a number of different things - eg fput() will do it. Hm, schedule_delayed_work on the last fput, ok. I was wondering what it had to do with the current 9p thread, but since it's not scheduled on a particular cpu it can pick another cpu to wake up, that makes sense -- although conceptually it feels rather bad to interrupt a remote IO because of a local task that can be done later; e.g. between having the fput wait a bit, or cancel a slow operation like a 1MB write, I'd rather make the fput wait. Do you know why that signal/interrupt is needed in the first place? > The particular case that I came across was io_uring, which will use > TWA_SIGNAL based task_work for retry operations (and other things). If > you use io_uring, and depending on how you setup the ring, it can be > quite common or will never happen. Dropping the connection task_work > being pending is not a viable solution, I'm afraid. Thanks for confirming that it's perfectly normal, let's not drop connections :) My preferred approach is still to try and restore the async flush code, but that will take a while -- it's not something that'll work right away and I want some tests so it won't be ready for this merge window. If we can have some sort of workaround until then it'll probably be for the best, but I don't have any other idea (than temporarily clearing the flag) at this point. I'll setup some uring IO on 9p and see if I can produce these.
On 2/6/23 2:42?PM, Dominique Martinet wrote: > Jens Axboe wrote on Mon, Feb 06, 2023 at 01:19:24PM -0700: >>> I agree with your assessment that we can't use task_work_run(), I assume >>> it's also quite bad to just clear the flag? >>> I'm not familiar with these task at all, in which case do they happen? >>> Would you be able to share an easy reproducer so that I/someone can try >>> on various transports? >> >> You can't just clear the flag without also running the task_work. Hence >> it either needs to be done right there, or leave it pending and let the >> exit to userspace take care of it. > > Sorry I didn't develop that idea; the signal path resets the pending > signal when we're done, I assumed we could also reset the TWA_SIGNAL > flag when we're done flushing. That might take a while though, so it's > far from optimal. Sure, if you set it again when done, then it will probably work just fine. But you need to treat TIF_NOTIFY_SIGNAL and TIF_SIGPENDING separately. An attempt at that at the end of this email, totally untested, and I'm not certain it's a good idea at all (see below). Is there a reason why we can't exit and get the task_work processed instead? That'd be greatly preferable. >>> If it's "rare enough" I'd say sacrificing the connection might make more >>> sense than a busy loop, but if it's becoming common I think we'll need >>> to spend some more time thinking about it... >>> It might be less effort to dig out my async flush commits if this become >>> too complicated, but I wish I could say I have time for it... >> >> It can be a number of different things - eg fput() will do it. > > Hm, schedule_delayed_work on the last fput, ok. > I was wondering what it had to do with the current 9p thread, but since > it's not scheduled on a particular cpu it can pick another cpu to wake > up, that makes sense -- although conceptually it feels rather bad to > interrupt a remote IO because of a local task that can be done later; > e.g. between having the fput wait a bit, or cancel a slow operation like > a 1MB write, I'd rather make the fput wait. > Do you know why that signal/interrupt is needed in the first place? It's needed if the task is currently sleeping in the kernel, to abort a sleeping loop. The task_work may contain actions that will result in the sleep loop being satisfied and hence ending, which means it needs to be processed. That's my worry with the check-and-clear, then reset state solution. >> The particular case that I came across was io_uring, which will use >> TWA_SIGNAL based task_work for retry operations (and other things). If >> you use io_uring, and depending on how you setup the ring, it can be >> quite common or will never happen. Dropping the connection task_work >> being pending is not a viable solution, I'm afraid. > > Thanks for confirming that it's perfectly normal, let's not drop > connections :) > > My preferred approach is still to try and restore the async flush code, > but that will take a while -- it's not something that'll work right away > and I want some tests so it won't be ready for this merge window. > If we can have some sort of workaround until then it'll probably be for > the best, but I don't have any other idea (than temporarily clearing the > flag) at this point. > > I'll setup some uring IO on 9p and see if I can produce these. I'm attaching a test case. I don't think it's particularly useful, but it does nicely demonstrate the infinite loop that 9p gets into if there's task_work pending.
>> Sorry I didn't develop that idea; the signal path resets the pending >> signal when we're done, I assumed we could also reset the TWA_SIGNAL >> flag when we're done flushing. That might take a while though, so it's >> far from optimal. > > Sure, if you set it again when done, then it will probably work just > fine. But you need to treat TIF_NOTIFY_SIGNAL and TIF_SIGPENDING > separately. An attempt at that at the end of this email, totally > untested, and I'm not certain it's a good idea at all (see below). Is > there a reason why we can't exit and get the task_work processed > instead? That'd be greatly preferable. Forgot to include it, but as mentioned, don't think it's a sane idea... diff --git a/net/9p/client.c b/net/9p/client.c index 622ec6a586ee..e4ff2773e00b 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -652,6 +652,33 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c, return ERR_PTR(err); } +static void p9_clear_sigpending(int *sigpending, int *notifypending) +{ + if (signal_pending(current)) { + *sigpending = test_thread_flag(TIF_SIGPENDING); + if (*sigpending) + clear_thread_flag(TIF_SIGPENDING); + *notifypending = test_thread_flag(TIF_NOTIFY_SIGNAL); + if (*notifypending) + clear_thread_flag(TIF_NOTIFY_SIGNAL); + } else { + *sigpending = *notifypending = 0; + } +} + +static void p9_reset_sigpending(int sigpending, int notifypending) +{ + unsigned long flags; + + if (sigpending) { + spin_lock_irqsave(¤t->sighand->siglock, flags); + recalc_sigpending(); + spin_unlock_irqrestore(¤t->sighand->siglock, flags); + } + if (notifypending) + set_tsk_thread_flag(current, TIF_NOTIFY_SIGNAL); +} + /** * p9_client_rpc - issue a request and wait for a response * @c: client session @@ -665,8 +692,7 @@ static struct p9_req_t * p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) { va_list ap; - int sigpending, err; - unsigned long flags; + int sigpending, notifypending, err; struct p9_req_t *req; /* Passing zero for tsize/rsize to p9_client_prepare_req() tells it to * auto determine an appropriate (small) request/response size @@ -687,12 +713,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) req->tc.zc = false; req->rc.zc = false; - if (signal_pending(current)) { - sigpending = 1; - clear_thread_flag(TIF_SIGPENDING); - } else { - sigpending = 0; - } + p9_clear_sigpending(&sigpending, ¬ifypending); err = c->trans_mod->request(c, req); if (err < 0) { @@ -714,8 +735,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) if (err == -ERESTARTSYS && c->status == Connected && type == P9_TFLUSH) { - sigpending = 1; - clear_thread_flag(TIF_SIGPENDING); + p9_clear_sigpending(&sigpending, ¬ifypending); goto again; } @@ -725,8 +745,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) } if (err == -ERESTARTSYS && c->status == Connected) { p9_debug(P9_DEBUG_MUX, "flushing\n"); - sigpending = 1; - clear_thread_flag(TIF_SIGPENDING); + p9_clear_sigpending(&sigpending, ¬ifypending); if (c->trans_mod->cancel(c, req)) p9_client_flush(c, req); @@ -736,11 +755,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) err = 0; } recalc_sigpending: - if (sigpending) { - spin_lock_irqsave(¤t->sighand->siglock, flags); - recalc_sigpending(); - spin_unlock_irqrestore(¤t->sighand->siglock, flags); - } + p9_reset_sigpending(sigpending, notifypending); if (err < 0) goto reterr; @@ -773,8 +788,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) { va_list ap; - int sigpending, err; - unsigned long flags; + int sigpending, notifypending, err; struct p9_req_t *req; va_start(ap, fmt); @@ -789,12 +803,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type, req->tc.zc = true; req->rc.zc = true; - if (signal_pending(current)) { - sigpending = 1; - clear_thread_flag(TIF_SIGPENDING); - } else { - sigpending = 0; - } + p9_clear_sigpending(&sigpending, ¬ifypending); err = c->trans_mod->zc_request(c, req, uidata, uodata, inlen, olen, in_hdrlen); @@ -810,8 +819,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type, } if (err == -ERESTARTSYS && c->status == Connected) { p9_debug(P9_DEBUG_MUX, "flushing\n"); - sigpending = 1; - clear_thread_flag(TIF_SIGPENDING); + p9_clear_sigpending(&sigpending, ¬ifypending); if (c->trans_mod->cancel(c, req)) p9_client_flush(c, req); @@ -821,11 +829,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type, err = 0; } recalc_sigpending: - if (sigpending) { - spin_lock_irqsave(¤t->sighand->siglock, flags); - recalc_sigpending(); - spin_unlock_irqrestore(¤t->sighand->siglock, flags); - } + p9_reset_sigpending(sigpending, notifypending); if (err < 0) goto reterr;
Jens Axboe wrote on Mon, Feb 06, 2023 at 02:56:57PM -0700: > Sure, if you set it again when done, then it will probably work just > fine. But you need to treat TIF_NOTIFY_SIGNAL and TIF_SIGPENDING > separately. An attempt at that at the end of this email, totally > untested, and I'm not certain it's a good idea at all (see below). Is > there a reason why we can't exit and get the task_work processed > instead? That'd be greatly preferable. No good reason aside of "it's not ready", but in the current code things will probably get weird. I actually misremembered the tag lookup for trans_fd, since we're not freeing the tag yet the lookup will work and the connexion might not be dropped (just reading into a buffer then freeing it in the cb without any further processing), but even my refcounting works better than I thought you'll end up with the IO being replayed while the server is still processing the first one. This is unlikely, but for example this could happen: - first write [0;1MB] - write is interrupted before server handled it - write replayed and handled, userspace continues to... - second write [1MB-4k;1MB] - first write handle by server, overwriting the second write And who doesn't enjoy a silent corruption for breakfast? > > Hm, schedule_delayed_work on the last fput, ok. > > I was wondering what it had to do with the current 9p thread, but since > > it's not scheduled on a particular cpu it can pick another cpu to wake > > up, that makes sense -- although conceptually it feels rather bad to > > interrupt a remote IO because of a local task that can be done later; > > e.g. between having the fput wait a bit, or cancel a slow operation like > > a 1MB write, I'd rather make the fput wait. > > Do you know why that signal/interrupt is needed in the first place? > > It's needed if the task is currently sleeping in the kernel, to abort a > sleeping loop. The task_work may contain actions that will result in the > sleep loop being satisfied and hence ending, which means it needs to be > processed. That's my worry with the check-and-clear, then reset state > solution. I see, sleeping loop might not wake up until the signal is handled, but it won't handle it if we don't get out. Not bailing out on sigkill is bad enough but that's possibly much worse indeed... And that also means the busy loop isn't any better, I was wondering how it was noticed if it was just a few busy checks but in that case just temporarily clearing the flag won't get out either, that's not even a workaround. I assume that also explains why it wants that task, and cannot just run from the idle context-- it's not just any worker task, it's in the process context? (sorry for using you as a rubber duck...) > > I'll setup some uring IO on 9p and see if I can produce these. > > I'm attaching a test case. I don't think it's particularly useful, but > it does nicely demonstrate the infinite loop that 9p gets into if > there's task_work pending. Thanks, that helps! I might not have time until weekend but I'll definitely look at it.
On 2/6/23 3:29?PM, Dominique Martinet wrote: >>> Hm, schedule_delayed_work on the last fput, ok. >>> I was wondering what it had to do with the current 9p thread, but since >>> it's not scheduled on a particular cpu it can pick another cpu to wake >>> up, that makes sense -- although conceptually it feels rather bad to >>> interrupt a remote IO because of a local task that can be done later; >>> e.g. between having the fput wait a bit, or cancel a slow operation like >>> a 1MB write, I'd rather make the fput wait. >>> Do you know why that signal/interrupt is needed in the first place? >> >> It's needed if the task is currently sleeping in the kernel, to abort a >> sleeping loop. The task_work may contain actions that will result in the >> sleep loop being satisfied and hence ending, which means it needs to be >> processed. That's my worry with the check-and-clear, then reset state >> solution. > > I see, sleeping loop might not wake up until the signal is handled, but > it won't handle it if we don't get out. Exactly > Not bailing out on sigkill is bad enough but that's possibly much worse > indeed... And that also means the busy loop isn't any better, I was > wondering how it was noticed if it was just a few busy checks but in > that case just temporarily clearing the flag won't get out either, > that's not even a workaround. > > I assume that also explains why it wants that task, and cannot just run > from the idle context-- it's not just any worker task, it's in the > process context? (sorry for using you as a rubber duck...) Right, it needs to run in the context of the right task. So we can't just punt it out-of-line to something else, whihc would obviously also solve that dependency loop. >>> I'll setup some uring IO on 9p and see if I can produce these. >> >> I'm attaching a test case. I don't think it's particularly useful, but >> it does nicely demonstrate the infinite loop that 9p gets into if >> there's task_work pending. > > Thanks, that helps! > I might not have time until weekend but I'll definitely look at it. Sounds good, thanks! I'll consider my patch abandoned and wait for what you have.
diff --git a/net/9p/client.c b/net/9p/client.c index 622ec6a586ee..9caa66cbd5b7 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -652,6 +652,25 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c, return ERR_PTR(err); } +static int p9_sigpending(int *sigpending) +{ + *sigpending = 0; + + if (!signal_pending(current)) + return 0; + + /* + * If we have a TIF_NOTIFY_SIGNAL pending, abort to get it + * processed. + */ + if (test_thread_flag(TIF_NOTIFY_SIGNAL)) + return -ERESTARTSYS; + + *sigpending = 1; + clear_thread_flag(TIF_SIGPENDING); + return 0; +} + /** * p9_client_rpc - issue a request and wait for a response * @c: client session @@ -687,12 +706,9 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...) req->tc.zc = false; req->rc.zc = false; - if (signal_pending(current)) { - sigpending = 1; - clear_thread_flag(TIF_SIGPENDING); - } else { - sigpending = 0; - } + err = p9_sigpending(&sigpending); + if (err) + goto reterr; err = c->trans_mod->request(c, req); if (err < 0) { @@ -789,12 +805,9 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type, req->tc.zc = true; req->rc.zc = true; - if (signal_pending(current)) { - sigpending = 1; - clear_thread_flag(TIF_SIGPENDING); - } else { - sigpending = 0; - } + err = p9_sigpending(&sigpending); + if (err) + goto reterr; err = c->trans_mod->zc_request(c, req, uidata, uodata, inlen, olen, in_hdrlen);
signal_pending() really means that an exit to userspace is required to clear the condition, as it could be either an actual signal, or it could be TWA_SIGNAL based task_work that needs processing. The 9p client does a recalc_sigpending() to take care of the former, but that still leaves TWA_SIGNAL task_work. The result is that if we do have TWA_SIGNAL task_work pending, then we'll sit in a tight loop spinning as signal_pending() remains true even after recalc_sigpending(). Move the signal_pending() logic into a helper that deals with both, and return -ERESTARTSYS if the reason for signal_pendding() being true is that we have task_work to process. Link: https://lore.kernel.org/lkml/Y9TgUupO5C39V%2FDW@xpf.sh.intel.com/ Reported-and-tested-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> --- v2: don't rely on task_work_run(), rather just punt with -ERESTARTYS at that point. For one, we don't want to export task_work_run(), it's in-kernel only. And secondly, we need to ensure we have a sane state before running task_work. The latter did look fine before, but this should be saner. Tested this also fixes the report as well for me.