diff mbox series

[2/2] virtio-blk: Ensure no requests in virtqueues before deleting vqs.

Message ID 20240122110722.690223-3-yi.sun@unisoc.com (mailing list archive)
State New, archived
Headers show
Series Fix requests loss during virtio-blk device suspend | expand

Commit Message

Yi Sun Jan. 22, 2024, 11:07 a.m. UTC
Ensure no remaining requests in virtqueues before resetting vdev and
deleting virtqueues. Otherwise these requests will never be completed.
It may cause the system to become unresponsive. So it is better to place
blk_mq_quiesce_queue() in front of virtio_reset_device().

Function blk_mq_quiesce_queue() can ensure that requests have become
in_flight status, but it cannot guarantee that requests have been
processed by the device. Virtqueues should never be deleted before
all requests become complete status.

New function blk_mq_tagset_wait_request_completed() ensure that all
requests in virtqueues become complete status.

Signed-off-by: Yi Sun <yi.sun@unisoc.com>
---
 drivers/block/virtio_blk.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Stefan Hajnoczi Jan. 22, 2024, 3:42 p.m. UTC | #1
On Mon, Jan 22, 2024 at 07:07:22PM +0800, Yi Sun wrote:
> Ensure no remaining requests in virtqueues before resetting vdev and
> deleting virtqueues. Otherwise these requests will never be completed.
> It may cause the system to become unresponsive. So it is better to place
> blk_mq_quiesce_queue() in front of virtio_reset_device().

QEMU's virtio-blk device implementation completes all requests during
device reset. Most device implementations have to do the same to avoid
leaving dangling requests in flight across device reset.

Which device implementation are you using and why is it safe for the
device is simply drop requests across device reset?

Stefan
yi sun Jan. 23, 2024, 3:27 a.m. UTC | #2
On Mon, Jan 22, 2024 at 11:43 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Mon, Jan 22, 2024 at 07:07:22PM +0800, Yi Sun wrote:
> > Ensure no remaining requests in virtqueues before resetting vdev and
> > deleting virtqueues. Otherwise these requests will never be completed.
> > It may cause the system to become unresponsive. So it is better to place
> > blk_mq_quiesce_queue() in front of virtio_reset_device().
>
> QEMU's virtio-blk device implementation completes all requests during
> device reset. Most device implementations have to do the same to avoid
> leaving dangling requests in flight across device reset.
>
> Which device implementation are you using and why is it safe for the
> device is simply drop requests across device reset?
>
> Stefan

Virtio-blk device implementation completes all requests during device reset, but
this can only ensure that the device has finished using virtqueue. We should
also consider the driver's use of virtqueue.

I caught such an example. Before del_vqs, the request had been processed by
the device, but it had not been processed by the driver. Although I am
using kernel5.4,
I think this problem may also occur with the latest version of kernel.

The debug code I added is as follows:
virtblk_freeze()
{
        vdev reset();
        quiesce queue();
        if (virtqueue->num_free != 1024) //1024 is the init value.
                BUG_ON(1);
        vdev del_vqs();
}

BUG_ON triggered the dump, the analysis is as follows:

There is one request left in request_queue.
crash_arm64> struct request ffffff81f0560000 | grep -e state -e __data_len
  __data_len = 20480,
  state = MQ_RQ_IN_FLIGHT,

crash_arm64> vring_virtqueue.packed,last_used_idx,broken,vq 0xffffff8086f92900 |
grep -e num -e used_wrap_counter -e last_used_idx -e broken -e
num_free -e desc_state -e "desc ="
        num = 1024,
        desc = 0xffffff8085ff8000,
      used_wrap_counter = false,
      desc_state = 0xffffff8085610000,
  last_used_idx = 487,
  broken = false,
    num_free = 1017,

Find desc based on last_used_idx. Through flags, we can know that the request
has been processed by the device, but it is still in flight state
because it has not
had time to run virtblk_done().
crash_arm> vring_packed_desc ffffff8085ff9e70
struct vring_packed_desc {
  addr = 10474619192,
  len = 20481,
  id = 667,
  flags = 2
}

I'm using a closed source virtual machine, so I can't see the source
code for it,
but I'm guessing it's similar to qemu.

After the device completes the request, we must also ensure that the driver can
complete the request in virtblk_done().
Stefan Hajnoczi Jan. 23, 2024, 3:09 p.m. UTC | #3
Hi Michael,
This could potentially affect other VIRTIO drivers too. Please see
below.

On Tue, Jan 23, 2024 at 11:27:40AM +0800, yi sun wrote:
> On Mon, Jan 22, 2024 at 11:43 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Mon, Jan 22, 2024 at 07:07:22PM +0800, Yi Sun wrote:
> > > Ensure no remaining requests in virtqueues before resetting vdev and
> > > deleting virtqueues. Otherwise these requests will never be completed.
> > > It may cause the system to become unresponsive. So it is better to place
> > > blk_mq_quiesce_queue() in front of virtio_reset_device().
> >
> > QEMU's virtio-blk device implementation completes all requests during
> > device reset. Most device implementations have to do the same to avoid
> > leaving dangling requests in flight across device reset.
> >
> > Which device implementation are you using and why is it safe for the
> > device is simply drop requests across device reset?
> >
> > Stefan
> 
> Virtio-blk device implementation completes all requests during device reset, but
> this can only ensure that the device has finished using virtqueue. We should
> also consider the driver's use of virtqueue.
> 
> I caught such an example. Before del_vqs, the request had been processed by
> the device, but it had not been processed by the driver. Although I am
> using kernel5.4,
> I think this problem may also occur with the latest version of kernel.
> 
> The debug code I added is as follows:
> virtblk_freeze()
> {
>         vdev reset();
>         quiesce queue();
>         if (virtqueue->num_free != 1024) //1024 is the init value.
>                 BUG_ON(1);
>         vdev del_vqs();
> }
> 
> BUG_ON triggered the dump, the analysis is as follows:
> 
> There is one request left in request_queue.
> crash_arm64> struct request ffffff81f0560000 | grep -e state -e __data_len
>   __data_len = 20480,
>   state = MQ_RQ_IN_FLIGHT,
> 
> crash_arm64> vring_virtqueue.packed,last_used_idx,broken,vq 0xffffff8086f92900 |
> grep -e num -e used_wrap_counter -e last_used_idx -e broken -e
> num_free -e desc_state -e "desc ="
>         num = 1024,
>         desc = 0xffffff8085ff8000,
>       used_wrap_counter = false,
>       desc_state = 0xffffff8085610000,
>   last_used_idx = 487,
>   broken = false,
>     num_free = 1017,
> 
> Find desc based on last_used_idx. Through flags, we can know that the request
> has been processed by the device, but it is still in flight state
> because it has not
> had time to run virtblk_done().
> crash_arm> vring_packed_desc ffffff8085ff9e70
> struct vring_packed_desc {
>   addr = 10474619192,
>   len = 20481,
>   id = 667,
>   flags = 2
> }
> 
> I'm using a closed source virtual machine, so I can't see the source
> code for it,
> but I'm guessing it's similar to qemu.
> 
> After the device completes the request, we must also ensure that the driver can
> complete the request in virtblk_done().
> 

Okay, I think your approach of waiting for requests before
virtio_device_reset() makes sense. blk_mq_complete_request() is async
(might be deferred to an IPI or softirq) so it's not enough for
virtblk_done() to run before virtio_device_reset() returns. There is no
guarantee that virtblk_request_done() runs before virtio_device_reset()
returns.

A separate issue about virtio_device_reset():

Are you using virtio-mmio? virtio-mmio's vm_reset() doesn't offer the
same guarantees as virtio-pci's reset functions. virtio-pci guarantees
that irqs sent by the device during the reset operation complete and the
irq handlers run before virtio_device_reset() returns. virtio-mmio does
not.

(On top of this, virtio_device_reset() now has
CONFIG_VIRTIO_HARDEN_NOTIFICATION which changes the semantics. Drivers
cannot expect to complete any in-flight requests in
virtio_device_reset() when complied with this config option.)

Other drivers may be affected by these inconsistent
virtio_device_reset() semantics. I haven't audited the code.

Stefan
yi sun Jan. 24, 2024, 7:38 a.m. UTC | #4
Yes, I'm using virtio-mmio.

On Tue, Jan 23, 2024 at 11:09 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> Hi Michael,
> This could potentially affect other VIRTIO drivers too. Please see
> below.
>
> On Tue, Jan 23, 2024 at 11:27:40AM +0800, yi sun wrote:
> > On Mon, Jan 22, 2024 at 11:43 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >
> > > On Mon, Jan 22, 2024 at 07:07:22PM +0800, Yi Sun wrote:
> > > > Ensure no remaining requests in virtqueues before resetting vdev and
> > > > deleting virtqueues. Otherwise these requests will never be completed.
> > > > It may cause the system to become unresponsive. So it is better to place
> > > > blk_mq_quiesce_queue() in front of virtio_reset_device().
> > >
> > > QEMU's virtio-blk device implementation completes all requests during
> > > device reset. Most device implementations have to do the same to avoid
> > > leaving dangling requests in flight across device reset.
> > >
> > > Which device implementation are you using and why is it safe for the
> > > device is simply drop requests across device reset?
> > >
> > > Stefan
> >
> > Virtio-blk device implementation completes all requests during device reset, but
> > this can only ensure that the device has finished using virtqueue. We should
> > also consider the driver's use of virtqueue.
> >
> > I caught such an example. Before del_vqs, the request had been processed by
> > the device, but it had not been processed by the driver. Although I am
> > using kernel5.4,
> > I think this problem may also occur with the latest version of kernel.
> >
> > The debug code I added is as follows:
> > virtblk_freeze()
> > {
> >         vdev reset();
> >         quiesce queue();
> >         if (virtqueue->num_free != 1024) //1024 is the init value.
> >                 BUG_ON(1);
> >         vdev del_vqs();
> > }
> >
> > BUG_ON triggered the dump, the analysis is as follows:
> >
> > There is one request left in request_queue.
> > crash_arm64> struct request ffffff81f0560000 | grep -e state -e __data_len
> >   __data_len = 20480,
> >   state = MQ_RQ_IN_FLIGHT,
> >
> > crash_arm64> vring_virtqueue.packed,last_used_idx,broken,vq 0xffffff8086f92900 |
> > grep -e num -e used_wrap_counter -e last_used_idx -e broken -e
> > num_free -e desc_state -e "desc ="
> >         num = 1024,
> >         desc = 0xffffff8085ff8000,
> >       used_wrap_counter = false,
> >       desc_state = 0xffffff8085610000,
> >   last_used_idx = 487,
> >   broken = false,
> >     num_free = 1017,
> >
> > Find desc based on last_used_idx. Through flags, we can know that the request
> > has been processed by the device, but it is still in flight state
> > because it has not
> > had time to run virtblk_done().
> > crash_arm> vring_packed_desc ffffff8085ff9e70
> > struct vring_packed_desc {
> >   addr = 10474619192,
> >   len = 20481,
> >   id = 667,
> >   flags = 2
> > }
> >
> > I'm using a closed source virtual machine, so I can't see the source
> > code for it,
> > but I'm guessing it's similar to qemu.
> >
> > After the device completes the request, we must also ensure that the driver can
> > complete the request in virtblk_done().
> >
>
> Okay, I think your approach of waiting for requests before
> virtio_device_reset() makes sense. blk_mq_complete_request() is async
> (might be deferred to an IPI or softirq) so it's not enough for
> virtblk_done() to run before virtio_device_reset() returns. There is no
> guarantee that virtblk_request_done() runs before virtio_device_reset()
> returns.
>
> A separate issue about virtio_device_reset():
>
> Are you using virtio-mmio? virtio-mmio's vm_reset() doesn't offer the
> same guarantees as virtio-pci's reset functions. virtio-pci guarantees
> that irqs sent by the device during the reset operation complete and the
> irq handlers run before virtio_device_reset() returns. virtio-mmio does
> not.
>
> (On top of this, virtio_device_reset() now has
> CONFIG_VIRTIO_HARDEN_NOTIFICATION which changes the semantics. Drivers
> cannot expect to complete any in-flight requests in
> virtio_device_reset() when complied with this config option.)
>
> Other drivers may be affected by these inconsistent
> virtio_device_reset() semantics. I haven't audited the code.
>
> Stefan
diff mbox series

Patch

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 3b6b9abb8ce1..380f009953dd 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1595,14 +1595,16 @@  static int virtblk_freeze(struct virtio_device *vdev)
 {
 	struct virtio_blk *vblk = vdev->priv;
 
+	/* Ensure no requests in virtqueues before deleting vqs. */
+	blk_mq_quiesce_queue(vblk->disk->queue);
+	blk_mq_tagset_wait_request_completed(vblk->disk->queue->tag_set);
+
 	/* Ensure we don't receive any more interrupts */
 	virtio_reset_device(vdev);
 
 	/* Make sure no work handler is accessing the device. */
 	flush_work(&vblk->config_work);
 
-	blk_mq_quiesce_queue(vblk->disk->queue);
-
 	vdev->config->del_vqs(vdev);
 	kfree(vblk->vqs);