diff mbox

9pfs: don't try to flush self and avoid QEMU hang on reset

Message ID 148968198512.5555.1880820193606077571.stgit@bahia (mailing list archive)
State New, archived
Headers show

Commit Message

Greg Kurz March 16, 2017, 4:33 p.m. UTC
According to the 9P spec [*], when a client wants to cancel a pending I/O
request identified by a given tag (uint16), it must send a Tflush message
and wait for the server to respond with a Rflush message before reusing this
tag for another I/O. The server may still send a completion message for the
I/O if it wasn't actually cancelled but the Rflush message must arrive after
that.

QEMU hence waits for the flushed PDU to complete before sending the Rflush
message back to the client.

If a client sends 'Tflush tag oldtag' and tag == oldtag, QEMU will then
allocate a PDU identified by tag, find it in the PDU list and wait for
this same PDU to complete... i.e. wait for a completion that will never
happen. This causes a tag and ring slot leak in the guest, and a PDU
leak in QEMU, all of them limited by the maximal number of PDUs (128).
But, worse, this causes QEMU to hang on device reset since v9fs_reset()
wants to drain all pending I/O.

This insane behavior is likely to denote a bug in the client, and it would
deserve an Rerror message to be sent back. Unfortunately, the protocol
allows it and requires all flush requests to suceed (only a Tflush response
is expected).

The only option is to detect when we have to handle a self-referencing
flush request and report success to the client right away.

[*] http://man.cat-v.org/plan_9/5/flush

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/9pfs/9p.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Blake March 21, 2017, 2:01 p.m. UTC | #1
On 03/16/2017 11:33 AM, Greg Kurz wrote:
> According to the 9P spec [*], when a client wants to cancel a pending I/O
> request identified by a given tag (uint16), it must send a Tflush message
> and wait for the server to respond with a Rflush message before reusing this
> tag for another I/O. The server may still send a completion message for the
> I/O if it wasn't actually cancelled but the Rflush message must arrive after
> that.
> 
> QEMU hence waits for the flushed PDU to complete before sending the Rflush
> message back to the client.
> 
> If a client sends 'Tflush tag oldtag' and tag == oldtag, QEMU will then
> allocate a PDU identified by tag, find it in the PDU list and wait for
> this same PDU to complete... i.e. wait for a completion that will never
> happen. This causes a tag and ring slot leak in the guest, and a PDU
> leak in QEMU, all of them limited by the maximal number of PDUs (128).
> But, worse, this causes QEMU to hang on device reset since v9fs_reset()
> wants to drain all pending I/O.
> 
> This insane behavior is likely to denote a bug in the client, and it would
> deserve an Rerror message to be sent back. Unfortunately, the protocol
> allows it and requires all flush requests to suceed (only a Tflush response

s/suceed/succeed/

> is expected).
> 
> The only option is to detect when we have to handle a self-referencing
> flush request and report success to the client right away.
> 
> [*] http://man.cat-v.org/plan_9/5/flush
> 
> Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  hw/9pfs/9p.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>
Greg Kurz March 21, 2017, 2:42 p.m. UTC | #2
On Tue, 21 Mar 2017 09:01:50 -0500
Eric Blake <eblake@redhat.com> wrote:

> On 03/16/2017 11:33 AM, Greg Kurz wrote:
> > According to the 9P spec [*], when a client wants to cancel a pending I/O
> > request identified by a given tag (uint16), it must send a Tflush message
> > and wait for the server to respond with a Rflush message before reusing this
> > tag for another I/O. The server may still send a completion message for the
> > I/O if it wasn't actually cancelled but the Rflush message must arrive after
> > that.
> > 
> > QEMU hence waits for the flushed PDU to complete before sending the Rflush
> > message back to the client.
> > 
> > If a client sends 'Tflush tag oldtag' and tag == oldtag, QEMU will then
> > allocate a PDU identified by tag, find it in the PDU list and wait for
> > this same PDU to complete... i.e. wait for a completion that will never
> > happen. This causes a tag and ring slot leak in the guest, and a PDU
> > leak in QEMU, all of them limited by the maximal number of PDUs (128).
> > But, worse, this causes QEMU to hang on device reset since v9fs_reset()
> > wants to drain all pending I/O.
> > 
> > This insane behavior is likely to denote a bug in the client, and it would
> > deserve an Rerror message to be sent back. Unfortunately, the protocol
> > allows it and requires all flush requests to suceed (only a Tflush response  
> 
> s/suceed/succeed/
> 
> > is expected).
> > 
> > The only option is to detect when we have to handle a self-referencing
> > flush request and report success to the client right away.
> > 
> > [*] http://man.cat-v.org/plan_9/5/flush
> > 
> > Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >  hw/9pfs/9p.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >   
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 

Oh, I've sent a v2 for this patch (error_report() a warning) and it is
actually part of the pull request I've sent earlier today... dunno how
to have your Reviewed-by: added there.

Thanks.

--
Greg
Eric Blake March 21, 2017, 3:42 p.m. UTC | #3
On 03/21/2017 09:42 AM, Greg Kurz wrote:

>>> This insane behavior is likely to denote a bug in the client, and it would
>>> deserve an Rerror message to be sent back. Unfortunately, the protocol
>>> allows it and requires all flush requests to suceed (only a Tflush response  
>>
>> s/suceed/succeed/

The pull request still has the typo,


>>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>
> 
> Oh, I've sent a v2 for this patch (error_report() a warning) and it is
> actually part of the pull request I've sent earlier today... dunno how
> to have your Reviewed-by: added there.

If you really want it, send a v2 pull request before Peter merges v1
(and an explicit NACK on the v1 cover letter will make your intentions
clear).  But at this point, I'm fine if the v1 pull request goes in
untouched.
diff mbox

Patch

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 76c9247c777d..e20417fbb0db 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -2369,7 +2369,7 @@  static void coroutine_fn v9fs_flush(void *opaque)
             break;
         }
     }
-    if (cancel_pdu) {
+    if (cancel_pdu && cancel_pdu != pdu) {
         cancel_pdu->cancelled = 1;
         /*
          * Wait for pdu to complete.