diff mbox

[1/3] block: detach devices from DriveInfo at unrealize time

Message ID 1456151945-11225-2-git-send-email-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini Feb. 22, 2016, 2:39 p.m. UTC
Instead of delaying blk_detach_dev and blockdev_auto_del until
the object is finalized and properties are released, do that
as soon as possible.

This patch replaces blockdev_mark_auto_del calls with blk_detach_dev
and blockdev_del_drive (the latter is a combination of the former
blockdev_mark_auto_del and blockdev_auto_del).

release_drive's call to blockdev_auto_del can then be removed completely.
This is of course okay in the case where the device has been unrealized
before and unrealize took care of calling blockdev_del_drive.  However,
it is also okay if the device has failed to be realized.  In that case,
blockdev_mark_auto_del was never called (because it is called during
unrealize) and thus release_drive's blockdev_auto_del call did nothing.
The drive-del-test qtest covers this case.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 blockdev.c                       | 21 +++++----------------
 hw/block/virtio-blk.c            |  4 +++-
 hw/block/xen_disk.c              |  1 +
 hw/core/qdev-properties-system.c |  8 ++++++--
 hw/ide/piix.c                    |  3 +++
 hw/scsi/scsi-bus.c               |  4 +++-
 hw/usb/dev-storage.c             |  3 ++-
 include/sysemu/blockdev.h        |  4 +---
 8 files changed, 24 insertions(+), 24 deletions(-)

Comments

Markus Armbruster March 21, 2016, 3:13 p.m. UTC | #1
Quick recap of the analysis of the status quo I sent last week:

* virtio-blk and SCSI devices delete their block backend on unplug,
  except when it was created with blockdev-add.  No other device deletes
  backends.

* drive_del deletes a backend when it can, else it closes and hides it.

* drive_del has been a fertile source of headaches.  Care advised.

Paolo Bonzini <pbonzini@redhat.com> writes:

> Instead of delaying blk_detach_dev and blockdev_auto_del until
> the object is finalized and properties are released, do that
> as soon as possible.
>
> This patch replaces blockdev_mark_auto_del calls with blk_detach_dev
> and blockdev_del_drive (the latter is a combination of the former
> blockdev_mark_auto_del and blockdev_auto_del).
>
> release_drive's call to blockdev_auto_del can then be removed completely.
> This is of course okay in the case where the device has been unrealized
> before and unrealize took care of calling blockdev_del_drive.  However,
> it is also okay if the device has failed to be realized.  In that case,
> blockdev_mark_auto_del was never called (because it is called during
> unrealize) and thus release_drive's blockdev_auto_del call did nothing.
> The drive-del-test qtest covers this case.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  blockdev.c                       | 21 +++++----------------
>  hw/block/virtio-blk.c            |  4 +++-
>  hw/block/xen_disk.c              |  1 +
>  hw/core/qdev-properties-system.c |  8 ++++++--
>  hw/ide/piix.c                    |  3 +++
>  hw/scsi/scsi-bus.c               |  4 +++-
>  hw/usb/dev-storage.c             |  3 ++-
>  include/sysemu/blockdev.h        |  4 +---
>  8 files changed, 24 insertions(+), 24 deletions(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index 1f73478..2dfb2d8 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -114,20 +114,16 @@ void override_max_devs(BlockInterfaceType type, int max_devs)
>  /*
>   * We automatically delete the drive when a device using it gets
>   * unplugged.  Questionable feature, but we can't just drop it.
> - * Device models call blockdev_mark_auto_del() to schedule the
> - * automatic deletion, and generic qdev code calls blockdev_auto_del()
> - * when deletion is actually safe.
> + * Device models call blockdev_del_drive() to schedule the
> + * automatic deletion, and generic block layer code uses the
> + * refcount to do the deletion when it is actually safe.
>   */
> -void blockdev_mark_auto_del(BlockBackend *blk)
> +void blockdev_del_drive(BlockBackend *blk)
>  {
>      DriveInfo *dinfo = blk_legacy_dinfo(blk);
>      BlockDriverState *bs = blk_bs(blk);
>      AioContext *aio_context;
>  
> -    if (!dinfo) {
> -        return;
> -    }
> -
>      if (bs) {
>          aio_context = bdrv_get_aio_context(bs);
>          aio_context_acquire(aio_context);
> @@ -139,14 +135,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
>          aio_context_release(aio_context);
>      }
>  
> -    dinfo->auto_del = 1;
> -}
> -
> -void blockdev_auto_del(BlockBackend *blk)
> -{
> -    DriveInfo *dinfo = blk_legacy_dinfo(blk);
> -
> -    if (dinfo && dinfo->auto_del) {
> +    if (dinfo) {
>          blk_unref(blk);
>      }
>  }

Initially (commit 14bafc5), blockdev_mark_auto_del() and
blockdev_auto_del() simply did what their names promised:

    void blockdev_mark_auto_del(BlockDriverState *bs)
    {
        DriveInfo *dinfo = drive_get_by_blockdev(bs);

        dinfo->auto_del = 1;
    }

    void blockdev_auto_del(BlockDriverState *bs)
    {
        DriveInfo *dinfo = drive_get_by_blockdev(bs);
    
        if (dinfo->auto_del) {
            drive_uninit(dinfo);
        }
    }

Since we didn't want to perpetuate the "automatic deletion" wart for
backends created with the new (& still experimental) blockdev-add, we
prepended

    if (!dinfo) {
        return;
    }

to the marking in commit 2d246f0 and 26f8b3a.

Meanwhile, commit 12bde0e added block job cancellation:

    block: cancel jobs when a device is ready to go away
    
    We do not want jobs to keep a device busy for a possibly very long
    time, and management could become confused because they thought a
    device was not even there anymore.  So, cancel long-running jobs
    as soon as their device is going to disappear.

The automatic block job cancellation is an extension of the automatic
deletion wart.  We cancel exactly when we schedule the warty deletion.
Note that this made blockdev_mark_auto_del() do more than its name
promises.  I never liked that, and always wondered why we don't cancel
in blockdev_auto_del() instead.

Also meanwhile, blockdev_auto_del() slowly morphed from "delete" to
"drop a reference".

Anyway, code before your patch, with // additional annotations

    void blockdev_mark_auto_del(BlockBackend *blk)
    {
        DriveInfo *dinfo = blk_legacy_dinfo(blk);
        BlockDriverState *bs = blk_bs(blk);
        AioContext *aio_context;

        // Limit the auto-deletion wart to pre-blockdev-add
        if (!dinfo) {
            return;
        }

        // Warty automatic job cancellation
        if (bs) {
            aio_context = bdrv_get_aio_context(bs);
            aio_context_acquire(aio_context);

            if (bs->job) {
                block_job_cancel(bs->job);
            }

            aio_context_release(aio_context);
        }

        // Schedule warty automatic deletion
        dinfo->auto_del = 1;
    }

    void blockdev_auto_del(BlockBackend *blk)
    {
        DriveInfo *dinfo = blk_legacy_dinfo(blk);

        // Execute scheduled warty automatic deletion, if any
        // This drops the reference block-backend.c holds in trust for
        // lookup by name.  The device already dropped its own
        // reference.  The backend is deleted unless more references
        // exist (not sure that's possible).  If they do, management
        // applications that expect auto-deletion may get confused.
        if (dinfo && dinfo->auto_del) {
            blk_unref(blk);
        }
    }

Code after your patch:

    void blockdev_del_drive(BlockBackend *blk)
    {
        DriveInfo *dinfo = blk_legacy_dinfo(blk);
        BlockDriverState *bs = blk_bs(blk);
        AioContext *aio_context;

        // warty automatic job cancellation
        if (bs) {
            aio_context = bdrv_get_aio_context(bs);
            aio_context_acquire(aio_context);

            if (bs->job) {
                block_job_cancel(bs->job);
            }

            aio_context_release(aio_context);
        }

        if (dinfo) {
            // Drop the reference held in trust for lookup by name.  The
            // device still holds another one (if qdevified, the
            // property holds it).  Unless more references exist, the
            // backend will be auto-deleted when the device drops its
            // reference.
            blk_unref(blk);
        }
    }

Instead of delaying the unref to blockdev_auto_del(), which made sense
back when it was a hard delete, you unref right away, leaving
blockdev_auto_del() with nothing to do.  Swaps the order of the two
unrefs, but that's just fine.

We now cancel even when !dinfo, i.e. even when we won't delete.  Are you
sure that's correct?  If it is, then it needs to be explained in the
commit message.

> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index c427698..0582787 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -945,7 +945,9 @@ static void virtio_blk_device_unrealize(DeviceState *dev, Error **errp)
>      s->dataplane = NULL;
>      qemu_del_vm_change_state_handler(s->change);
>      unregister_savevm(dev, "virtio-blk", s);
> -    blockdev_mark_auto_del(s->blk);
> +    blk_detach_dev(s->blk, dev);
> +    blockdev_del_drive(s->blk);
> +    s->blk = NULL;
>      virtio_cleanup(vdev);
>  }

Before your patch, we leave finalization of the property to its
release() callback release_drive(), as we should.  All we do here is
schedule warty deletion.  And that we must do here, because only here we
know that warty deletion is wanted.

Your patch inserts a copy of release_drive() and hacks it up a bit.  Two
hunks down, release_drive() gets hacked up to conditionally avoid
repeating the job.

This feels rather dirty to me.

>  
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 7bd5bde..39a72e4 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -1041,6 +1041,7 @@ static void blk_disconnect(struct XenDevice *xendev)
>  
>      if (blkdev->blk) {
>          blk_detach_dev(blkdev->blk, blkdev);
> +        blockdev_del_drive(blkdev->blk);
>          blk_unref(blkdev->blk);
>          blkdev->blk = NULL;
>      }

This is a non-qdevified device, where the link to the backend is not a
property, and the link to the backend has to be dismantled by the device
itself.

I believe inserting blockdev_del_drive() extends the automatic deletion
wart to this device.  That's an incompatible change, isn't it?

> diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
> index e10cede..469ba8a 100644
> --- a/hw/core/qdev-properties-system.c
> +++ b/hw/core/qdev-properties-system.c
> @@ -101,9 +101,13 @@ static void release_drive(Object *obj, const char *name, void *opaque)
>      Property *prop = opaque;
>      BlockBackend **ptr = qdev_get_prop_ptr(dev, prop);
>  
> -    if (*ptr) {
> +    if (*ptr && blk_get_attached_dev(*ptr) != NULL) {
> +        /* Unrealize has already called blk_detach_dev and blockdev_del_drive
> +         * if the device has been realized; in that case, blk_get_attached_dev
> +         * returns NULL.  Thus, we get here if the device failed to realize,
> +         * and the -drive must not be released.
> +         */
>          blk_detach_dev(*ptr, dev);
> -        blockdev_auto_del(*ptr);
>      }
>  }

Two changes:

* The change to the condition suppresses the code you copied to
  unrealize() methods when it already ran there.

* blockdev_auto_del() is gone.

> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
> index df46147..cf8fa58 100644
> --- a/hw/ide/piix.c
> +++ b/hw/ide/piix.c
> @@ -182,6 +182,9 @@ int pci_piix3_xen_ide_unplug(DeviceState *dev)
>              if (ds) {
>                  blk_detach_dev(blk, ds);
>              }
> +            if (pci_ide->bus[di->bus].ifs[di->unit].blk) {
> +                blockdev_del_drive(blk);
> +            }
>              pci_ide->bus[di->bus].ifs[di->unit].blk = NULL;
>              if (!(i % 2)) {
>                  idedev = pci_ide->bus[di->bus].master;

Same comment as for xen_disk.c.

> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> index 6dcdbc0..3b2b766 100644
> --- a/hw/scsi/scsi-bus.c
> +++ b/hw/scsi/scsi-bus.c
> @@ -214,7 +214,9 @@ static void scsi_qdev_unrealize(DeviceState *qdev, Error **errp)
>      }
>  
>      scsi_device_purge_requests(dev, SENSE_CODE(NO_SENSE));
> -    blockdev_mark_auto_del(dev->conf.blk);
> +    blk_detach_dev(dev->conf.blk, qdev);
> +    blockdev_del_drive(dev->conf.blk);
> +    dev->conf.blk = NULL;
>  }
>  
>  /* handle legacy '-drive if=scsi,...' cmd line args */

Same comment as for virtio-blk.c.

> diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
> index 5ae0424..1c00211 100644
> --- a/hw/usb/dev-storage.c
> +++ b/hw/usb/dev-storage.c
> @@ -643,7 +643,8 @@ static void usb_msd_realize_storage(USBDevice *dev, Error **errp)
>       * blockdev, or else scsi_bus_legacy_add_drive() dies when it
>       * attaches again.
>       *
> -     * The hack is probably a bad idea.
> +     * The hack is probably a bad idea.  Anyway, this is why this does not
> +     * call blockdev_del_drive.
>       */
>      blk_detach_dev(blk, &s->dev.qdev);
>      s->conf.blk = NULL;

Note that other qdevified block devices (such as nvme) are *not*
touched.  Warty auto deletion is extended only to some, but not all
cases.

> diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
> index b06a060..ae7ad67 100644
> --- a/include/sysemu/blockdev.h
> +++ b/include/sysemu/blockdev.h
> @@ -14,8 +14,7 @@
>  #include "qapi/error.h"
>  #include "qemu/queue.h"
>  
> -void blockdev_mark_auto_del(BlockBackend *blk);
> -void blockdev_auto_del(BlockBackend *blk);
> +void blockdev_del_drive(BlockBackend *blk);
>  
>  typedef enum {
>      IF_DEFAULT = -1,            /* for use with drive_add() only */
> @@ -34,7 +33,6 @@ struct DriveInfo {
>      BlockInterfaceType type;
>      int bus;
>      int unit;
> -    int auto_del;               /* see blockdev_mark_auto_del() */
>      bool is_default;            /* Added by default_drive() ?  */
>      int media_cd;
>      int cyls, heads, secs, trans;
Paolo Bonzini March 21, 2016, 3:31 p.m. UTC | #2
On 21/03/2016 16:13, Markus Armbruster wrote:
> Meanwhile, commit 12bde0e added block job cancellation:
> 
>     block: cancel jobs when a device is ready to go away
>     
>     We do not want jobs to keep a device busy for a possibly very long
>     time, and management could become confused because they thought a
>     device was not even there anymore.  So, cancel long-running jobs
>     as soon as their device is going to disappear.
> 
> The automatic block job cancellation is an extension of the automatic
> deletion wart.  We cancel exactly when we schedule the warty deletion.
> Note that this made blockdev_mark_auto_del() do more than its name
> promises.  I never liked that, and always wondered why we don't cancel
> in blockdev_auto_del() instead.

Because management would fall prey of exactly the bug we're trying to
fix.  For example by getting a BLOCK_JOB_CANCELLED event for a block
device that (according to the earlier DEVICE_DELETED event) should have
gone away already.

> Instead of delaying the unref to blockdev_auto_del(), which made sense
> back when it was a hard delete, you unref right away, leaving
> blockdev_auto_del() with nothing to do.  Swaps the order of the two
> unrefs, but that's just fine.
> 
> We now cancel even when !dinfo, i.e. even when we won't delete.  Are you
> sure that's correct?  If it is, then it needs to be explained in the
> commit message.

Will avoid that.

>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>> index c427698..0582787 100644
>> --- a/hw/block/virtio-blk.c
>> +++ b/hw/block/virtio-blk.c
>> @@ -945,7 +945,9 @@ static void virtio_blk_device_unrealize(DeviceState *dev, Error **errp)
>>      s->dataplane = NULL;
>>      qemu_del_vm_change_state_handler(s->change);
>>      unregister_savevm(dev, "virtio-blk", s);
>> -    blockdev_mark_auto_del(s->blk);
>> +    blk_detach_dev(s->blk, dev);
>> +    blockdev_del_drive(s->blk);
>> +    s->blk = NULL;
>>      virtio_cleanup(vdev);
>>  }
> 
> Before your patch, we leave finalization of the property to its
> release() callback release_drive(), as we should.  All we do here is
> schedule warty deletion.  And that we must do here, because only here we
> know that warty deletion is wanted.
> 
> Your patch inserts a copy of release_drive() and hacks it up a bit.  Two
> hunks down, release_drive() gets hacked up to conditionally avoid
> repeating the job.
> 
> This feels rather dirty to me.

The other possibility is to make blk_detach_dev do nothing if blk->dev
== NULL, i.e. make it idempotent.  On one hand, who doesn't like
idempotency; on the other hand, removing an assertion is also dirty.

I chose the easy way here (changing as fewer contracts as possible).

>> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
>> index 7bd5bde..39a72e4 100644
>> --- a/hw/block/xen_disk.c
>> +++ b/hw/block/xen_disk.c
>> @@ -1041,6 +1041,7 @@ static void blk_disconnect(struct XenDevice *xendev)
>>  
>>      if (blkdev->blk) {
>>          blk_detach_dev(blkdev->blk, blkdev);
>> +        blockdev_del_drive(blkdev->blk);
>>          blk_unref(blkdev->blk);
>>          blkdev->blk = NULL;
>>      }
> 
> This is a non-qdevified device, where the link to the backend is not a
> property, and the link to the backend has to be dismantled by the device
> itself.
> 
> I believe inserting blockdev_del_drive() extends the automatic deletion
> wart to this device.  That's an incompatible change, isn't it?

This is why I wanted a careful review. :)  I can surely get rid of it.

>> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
>> index df46147..cf8fa58 100644
>> --- a/hw/ide/piix.c
>> +++ b/hw/ide/piix.c
>> @@ -182,6 +182,9 @@ int pci_piix3_xen_ide_unplug(DeviceState *dev)
>>              if (ds) {
>>                  blk_detach_dev(blk, ds);
>>              }
>> +            if (pci_ide->bus[di->bus].ifs[di->unit].blk) {
>> +                blockdev_del_drive(blk);
>> +            }
>>              pci_ide->bus[di->bus].ifs[di->unit].blk = NULL;
>>              if (!(i % 2)) {
>>                  idedev = pci_ide->bus[di->bus].master;
> 
> Same comment as for xen_disk.c.

Same here.

>> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
>> index 6dcdbc0..3b2b766 100644
>> --- a/hw/scsi/scsi-bus.c
>> +++ b/hw/scsi/scsi-bus.c
>> @@ -214,7 +214,9 @@ static void scsi_qdev_unrealize(DeviceState *qdev, Error **errp)
>>      }
>>  
>>      scsi_device_purge_requests(dev, SENSE_CODE(NO_SENSE));
>> -    blockdev_mark_auto_del(dev->conf.blk);
>> +    blk_detach_dev(dev->conf.blk, qdev);
>> +    blockdev_del_drive(dev->conf.blk);
>> +    dev->conf.blk = NULL;
>>  }
>>  
>>  /* handle legacy '-drive if=scsi,...' cmd line args */
> 
> Same comment as for virtio-blk.c.

Same answer...

>> diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
>> index 5ae0424..1c00211 100644
>> --- a/hw/usb/dev-storage.c
>> +++ b/hw/usb/dev-storage.c
>> @@ -643,7 +643,8 @@ static void usb_msd_realize_storage(USBDevice *dev, Error **errp)
>>       * blockdev, or else scsi_bus_legacy_add_drive() dies when it
>>       * attaches again.
>>       *
>> -     * The hack is probably a bad idea.
>> +     * The hack is probably a bad idea.  Anyway, this is why this does not
>> +     * call blockdev_del_drive.
>>       */
>>      blk_detach_dev(blk, &s->dev.qdev);
>>      s->conf.blk = NULL;
> 
> Note that other qdevified block devices (such as nvme) are *not*
> touched.  Warty auto deletion is extended only to some, but not all
> cases.

I wonder if we actually _should_ extend to all of them, i.e. which way
is the bug.  That would of course change what to do with Xen and IDE.

Paolo
Markus Armbruster March 21, 2016, 5:19 p.m. UTC | #3
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 21/03/2016 16:13, Markus Armbruster wrote:
>> Meanwhile, commit 12bde0e added block job cancellation:
>> 
>>     block: cancel jobs when a device is ready to go away
>>     
>>     We do not want jobs to keep a device busy for a possibly very long
>>     time, and management could become confused because they thought a
>>     device was not even there anymore.  So, cancel long-running jobs
>>     as soon as their device is going to disappear.
>> 
>> The automatic block job cancellation is an extension of the automatic
>> deletion wart.  We cancel exactly when we schedule the warty deletion.
>> Note that this made blockdev_mark_auto_del() do more than its name
>> promises.  I never liked that, and always wondered why we don't cancel
>> in blockdev_auto_del() instead.
>
> Because management would fall prey of exactly the bug we're trying to
> fix.  For example by getting a BLOCK_JOB_CANCELLED event for a block
> device that (according to the earlier DEVICE_DELETED event) should have
> gone away already.
>
>> Instead of delaying the unref to blockdev_auto_del(), which made sense
>> back when it was a hard delete, you unref right away, leaving
>> blockdev_auto_del() with nothing to do.  Swaps the order of the two
>> unrefs, but that's just fine.
>> 
>> We now cancel even when !dinfo, i.e. even when we won't delete.  Are you
>> sure that's correct?  If it is, then it needs to be explained in the
>> commit message.
>
> Will avoid that.
>
>>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
>>> index c427698..0582787 100644
>>> --- a/hw/block/virtio-blk.c
>>> +++ b/hw/block/virtio-blk.c
>>> @@ -945,7 +945,9 @@ static void virtio_blk_device_unrealize(DeviceState *dev, Error **errp)
>>>      s->dataplane = NULL;
>>>      qemu_del_vm_change_state_handler(s->change);
>>>      unregister_savevm(dev, "virtio-blk", s);
>>> -    blockdev_mark_auto_del(s->blk);
>>> +    blk_detach_dev(s->blk, dev);
>>> +    blockdev_del_drive(s->blk);
>>> +    s->blk = NULL;
>>>      virtio_cleanup(vdev);
>>>  }
>> 
>> Before your patch, we leave finalization of the property to its
>> release() callback release_drive(), as we should.  All we do here is
>> schedule warty deletion.  And that we must do here, because only here we
>> know that warty deletion is wanted.
>> 
>> Your patch inserts a copy of release_drive() and hacks it up a bit.  Two
>> hunks down, release_drive() gets hacked up to conditionally avoid
>> repeating the job.
>> 
>> This feels rather dirty to me.
>
> The other possibility is to make blk_detach_dev do nothing if blk->dev
> == NULL, i.e. make it idempotent.  On one hand, who doesn't like
> idempotency; on the other hand, removing an assertion is also dirty.
>
> I chose the easy way here (changing as fewer contracts as possible).

Why can't we keep the work in the property release() method
release_drive()?

The only reason blockdev_mark_auto_del() isn't there is that the device
decides whether to call it, not the property.

>>> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
>>> index 7bd5bde..39a72e4 100644
>>> --- a/hw/block/xen_disk.c
>>> +++ b/hw/block/xen_disk.c
>>> @@ -1041,6 +1041,7 @@ static void blk_disconnect(struct XenDevice *xendev)
>>>  
>>>      if (blkdev->blk) {
>>>          blk_detach_dev(blkdev->blk, blkdev);
>>> +        blockdev_del_drive(blkdev->blk);
>>>          blk_unref(blkdev->blk);
>>>          blkdev->blk = NULL;
>>>      }
>> 
>> This is a non-qdevified device, where the link to the backend is not a
>> property, and the link to the backend has to be dismantled by the device
>> itself.
>> 
>> I believe inserting blockdev_del_drive() extends the automatic deletion
>> wart to this device.  That's an incompatible change, isn't it?
>
> This is why I wanted a careful review. :)  I can surely get rid of it.
>
>>> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
>>> index df46147..cf8fa58 100644
>>> --- a/hw/ide/piix.c
>>> +++ b/hw/ide/piix.c
>>> @@ -182,6 +182,9 @@ int pci_piix3_xen_ide_unplug(DeviceState *dev)
>>>              if (ds) {
>>>                  blk_detach_dev(blk, ds);
>>>              }
>>> +            if (pci_ide->bus[di->bus].ifs[di->unit].blk) {
>>> +                blockdev_del_drive(blk);
>>> +            }
>>>              pci_ide->bus[di->bus].ifs[di->unit].blk = NULL;
>>>              if (!(i % 2)) {
>>>                  idedev = pci_ide->bus[di->bus].master;
>> 
>> Same comment as for xen_disk.c.
>
> Same here.
>
>>> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
>>> index 6dcdbc0..3b2b766 100644
>>> --- a/hw/scsi/scsi-bus.c
>>> +++ b/hw/scsi/scsi-bus.c
>>> @@ -214,7 +214,9 @@ static void scsi_qdev_unrealize(DeviceState *qdev, Error **errp)
>>>      }
>>>  
>>>      scsi_device_purge_requests(dev, SENSE_CODE(NO_SENSE));
>>> -    blockdev_mark_auto_del(dev->conf.blk);
>>> +    blk_detach_dev(dev->conf.blk, qdev);
>>> +    blockdev_del_drive(dev->conf.blk);
>>> +    dev->conf.blk = NULL;
>>>  }
>>>  
>>>  /* handle legacy '-drive if=scsi,...' cmd line args */
>> 
>> Same comment as for virtio-blk.c.
>
> Same answer...
>
>>> diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
>>> index 5ae0424..1c00211 100644
>>> --- a/hw/usb/dev-storage.c
>>> +++ b/hw/usb/dev-storage.c
>>> @@ -643,7 +643,8 @@ static void usb_msd_realize_storage(USBDevice *dev, Error **errp)
>>>       * blockdev, or else scsi_bus_legacy_add_drive() dies when it
>>>       * attaches again.
>>>       *
>>> -     * The hack is probably a bad idea.
>>> +     * The hack is probably a bad idea.  Anyway, this is why this does not
>>> +     * call blockdev_del_drive.
>>>       */
>>>      blk_detach_dev(blk, &s->dev.qdev);
>>>      s->conf.blk = NULL;
>> 
>> Note that other qdevified block devices (such as nvme) are *not*
>> touched.  Warty auto deletion is extended only to some, but not all
>> cases.
>
> I wonder if we actually _should_ extend to all of them, i.e. which way
> is the bug.  That would of course change what to do with Xen and IDE.

Certainly debatable.

Current wart: virtio-blk and SCSI devices auto-delete block backends
with DriveInfo.

Possible alternate wart: all devices auto-delete block backends with
DriveInfo.  This is more regular, but the more regular wart is still a
wart.

I think only if some our users actually expect the alternate wart can we
seriosuly consider switching, because then we have to choose between two
breakages anyway:

* We can stick to the current wart, and leave these users broken.

* We can switch to the alternate wart, unbreak these users, and break
  the users that expect the current wart.

Without further evidence on who expects what, I'd stick to the current
wart.
Paolo Bonzini March 21, 2016, 5:30 p.m. UTC | #4
On 21/03/2016 18:19, Markus Armbruster wrote:
>> >
>> > The other possibility is to make blk_detach_dev do nothing if blk->dev
>> > == NULL, i.e. make it idempotent.  On one hand, who doesn't like
>> > idempotency; on the other hand, removing an assertion is also dirty.
>> >
>> > I chose the easy way here (changing as fewer contracts as possible).
> Why can't we keep the work in the property release() method
> release_drive()?
> 
> The only reason blockdev_mark_auto_del() isn't there is that the device
> decides whether to call it, not the property.

DEVICE_DELETED is currently sent right after setting unrealized to false
(see device_unparent), and you cannnot send it later than that.  In
particular release_drive would mean sending the drive when properties
are removed in instance_finalize; by that time you don't have anymore a
QOM path to include in the event.

Paolo
Paolo Bonzini March 22, 2016, 10:15 p.m. UTC | #5
On 21/03/2016 18:19, Markus Armbruster wrote:
> I think only if some our users actually expect the alternate wart can we
> seriosuly consider switching, because then we have to choose between two
> breakages anyway:
> 
> * We can stick to the current wart, and leave these users broken.
> 
> * We can switch to the alternate wart, unbreak these users, and break
>   the users that expect the current wart.
> 
> Without further evidence on who expects what, I'd stick to the current
> wart.

I certainly would expect nvme to behave the same as virtio-blk, for one.

Paolo
Markus Armbruster March 23, 2016, 8:35 a.m. UTC | #6
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 21/03/2016 18:19, Markus Armbruster wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>>
>>> On 21/03/2016 16:13, Markus Armbruster wrote:
>>>> Before your patch, we leave finalization of the property to its
>>>> release() callback release_drive(), as we should.  All we do here is
>>>> schedule warty deletion.  And that we must do here, because only here we
>>>> know that warty deletion is wanted.
>>>> 
>>>> Your patch inserts a copy of release_drive() and hacks it up a bit.  Two
>>>> hunks down, release_drive() gets hacked up to conditionally avoid
>>>> repeating the job.
>>>> 
>>>> This feels rather dirty to me.
>>>
>>> The other possibility is to make blk_detach_dev do nothing if blk->dev
>>> == NULL, i.e. make it idempotent.  On one hand, who doesn't like
>>> idempotency; on the other hand, removing an assertion is also dirty.
>>>
>>> I chose the easy way here (changing as fewer contracts as possible).
>>
>> Why can't we keep the work in the property release() method
>> release_drive()?
>> 
>> The only reason blockdev_mark_auto_del() isn't there is that the device

s/isn't there/exists/ (oops)

>> decides whether to call it, not the property.
>
> DEVICE_DELETED is currently sent right after setting unrealized to false
> (see device_unparent), and you cannnot send it later than that.  In
> particular release_drive would mean sending the drive when properties
> are removed in instance_finalize; by that time you don't have anymore a
> QOM path to include in the event.

I see.  To delay DEVICE_DELETED, we'd have to save the QOM path, and
that would be bothersome.

Still, copying code from property to devices with that property is
undesirable.  It's not that bad in this patch, because we copy only to
the devices that do warty backend deletion, and that's just the two
places where we call blockdev_mark_auto_del() now.  However, it gets
worse if we decide to extend warty backend deletion to *all* devices:
more places, and a new need to consistently copy it to every new user of
the drive property.

When you find yourself copying code from a property callback into every
device using it, the real problem might be you're missing a callback.
In this case, one that runs at unrealize time.  The existing release()
runs at finalize time.
Paolo Bonzini March 23, 2016, 9:35 a.m. UTC | #7
On 23/03/2016 09:35, Markus Armbruster wrote:
>> by that time you don't have anymore a QOM path to include in the event.
>
> I see.  To delay DEVICE_DELETED, we'd have to save the QOM path, and
> that would be bothersome.

Not just that, the QOM path goes away at the time we currently raise
DEVICE_DELETED.  If we delay it to finalization, it might even have been
reused.

> Still, copying code from property to devices with that property is
> undesirable.  It's not that bad in this patch, because we copy only to
> the devices that do warty backend deletion, and that's just the two
> places where we call blockdev_mark_auto_del() now.  However, it gets
> worse if we decide to extend warty backend deletion to *all* devices:
> more places, and a new need to consistently copy it to every new user of
> the drive property.
> 
> When you find yourself copying code from a property callback into every
> device using it, the real problem might be you're missing a callback.
> In this case, one that runs at unrealize time.  The existing release()
> runs at finalize time.

That's certainly a good thing to do if we decide to extend autodeletion
to the NVMe and SD devices.

Paolo
diff mbox

Patch

diff --git a/blockdev.c b/blockdev.c
index 1f73478..2dfb2d8 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -114,20 +114,16 @@  void override_max_devs(BlockInterfaceType type, int max_devs)
 /*
  * We automatically delete the drive when a device using it gets
  * unplugged.  Questionable feature, but we can't just drop it.
- * Device models call blockdev_mark_auto_del() to schedule the
- * automatic deletion, and generic qdev code calls blockdev_auto_del()
- * when deletion is actually safe.
+ * Device models call blockdev_del_drive() to schedule the
+ * automatic deletion, and generic block layer code uses the
+ * refcount to do the deletion when it is actually safe.
  */
-void blockdev_mark_auto_del(BlockBackend *blk)
+void blockdev_del_drive(BlockBackend *blk)
 {
     DriveInfo *dinfo = blk_legacy_dinfo(blk);
     BlockDriverState *bs = blk_bs(blk);
     AioContext *aio_context;
 
-    if (!dinfo) {
-        return;
-    }
-
     if (bs) {
         aio_context = bdrv_get_aio_context(bs);
         aio_context_acquire(aio_context);
@@ -139,14 +135,7 @@  void blockdev_mark_auto_del(BlockBackend *blk)
         aio_context_release(aio_context);
     }
 
-    dinfo->auto_del = 1;
-}
-
-void blockdev_auto_del(BlockBackend *blk)
-{
-    DriveInfo *dinfo = blk_legacy_dinfo(blk);
-
-    if (dinfo && dinfo->auto_del) {
+    if (dinfo) {
         blk_unref(blk);
     }
 }
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index c427698..0582787 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -945,7 +945,9 @@  static void virtio_blk_device_unrealize(DeviceState *dev, Error **errp)
     s->dataplane = NULL;
     qemu_del_vm_change_state_handler(s->change);
     unregister_savevm(dev, "virtio-blk", s);
-    blockdev_mark_auto_del(s->blk);
+    blk_detach_dev(s->blk, dev);
+    blockdev_del_drive(s->blk);
+    s->blk = NULL;
     virtio_cleanup(vdev);
 }
 
diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 7bd5bde..39a72e4 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -1041,6 +1041,7 @@  static void blk_disconnect(struct XenDevice *xendev)
 
     if (blkdev->blk) {
         blk_detach_dev(blkdev->blk, blkdev);
+        blockdev_del_drive(blkdev->blk);
         blk_unref(blkdev->blk);
         blkdev->blk = NULL;
     }
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index e10cede..469ba8a 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -101,9 +101,13 @@  static void release_drive(Object *obj, const char *name, void *opaque)
     Property *prop = opaque;
     BlockBackend **ptr = qdev_get_prop_ptr(dev, prop);
 
-    if (*ptr) {
+    if (*ptr && blk_get_attached_dev(*ptr) != NULL) {
+        /* Unrealize has already called blk_detach_dev and blockdev_del_drive
+         * if the device has been realized; in that case, blk_get_attached_dev
+         * returns NULL.  Thus, we get here if the device failed to realize,
+         * and the -drive must not be released.
+         */
         blk_detach_dev(*ptr, dev);
-        blockdev_auto_del(*ptr);
     }
 }
 
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index df46147..cf8fa58 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -182,6 +182,9 @@  int pci_piix3_xen_ide_unplug(DeviceState *dev)
             if (ds) {
                 blk_detach_dev(blk, ds);
             }
+            if (pci_ide->bus[di->bus].ifs[di->unit].blk) {
+                blockdev_del_drive(blk);
+            }
             pci_ide->bus[di->bus].ifs[di->unit].blk = NULL;
             if (!(i % 2)) {
                 idedev = pci_ide->bus[di->bus].master;
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 6dcdbc0..3b2b766 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -214,7 +214,9 @@  static void scsi_qdev_unrealize(DeviceState *qdev, Error **errp)
     }
 
     scsi_device_purge_requests(dev, SENSE_CODE(NO_SENSE));
-    blockdev_mark_auto_del(dev->conf.blk);
+    blk_detach_dev(dev->conf.blk, qdev);
+    blockdev_del_drive(dev->conf.blk);
+    dev->conf.blk = NULL;
 }
 
 /* handle legacy '-drive if=scsi,...' cmd line args */
diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c
index 5ae0424..1c00211 100644
--- a/hw/usb/dev-storage.c
+++ b/hw/usb/dev-storage.c
@@ -643,7 +643,8 @@  static void usb_msd_realize_storage(USBDevice *dev, Error **errp)
      * blockdev, or else scsi_bus_legacy_add_drive() dies when it
      * attaches again.
      *
-     * The hack is probably a bad idea.
+     * The hack is probably a bad idea.  Anyway, this is why this does not
+     * call blockdev_del_drive.
      */
     blk_detach_dev(blk, &s->dev.qdev);
     s->conf.blk = NULL;
diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index b06a060..ae7ad67 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -14,8 +14,7 @@ 
 #include "qapi/error.h"
 #include "qemu/queue.h"
 
-void blockdev_mark_auto_del(BlockBackend *blk);
-void blockdev_auto_del(BlockBackend *blk);
+void blockdev_del_drive(BlockBackend *blk);
 
 typedef enum {
     IF_DEFAULT = -1,            /* for use with drive_add() only */
@@ -34,7 +33,6 @@  struct DriveInfo {
     BlockInterfaceType type;
     int bus;
     int unit;
-    int auto_del;               /* see blockdev_mark_auto_del() */
     bool is_default;            /* Added by default_drive() ?  */
     int media_cd;
     int cyls, heads, secs, trans;