mbox series

[v2,0/3] HMP/snapshot changes - do not use ID anymore

Message ID	20180906111107.30684-1-danielhb413@gmail.com (mailing list archive)
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Daniel Henrique Barboza <danielhb413@gmail.com> To: qemu-devel@nongnu.org Date: Thu, 6 Sep 2018 08:11:04 -0300 Message-Id: <20180906111107.30684-1-danielhb413@gmail.com> Subject: [Qemu-devel] [PATCH v2 0/3] HMP/snapshot changes - do not use ID anymore Precedence: list Cc: kwolf@redhat.com, Daniel Henrique Barboza <danielhb413@gmail.com>, armbru@redhat.com, dgilbert@redhat.com, muriloo@linux.ibm.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	HMP/snapshot changes - do not use ID anymore \| expand [v2,0/3] HMP/snapshot changes - do not use ID anymore [v2,1/3] block/snapshot.c: eliminate use of ID input in snapshot operations [v2,2/3] block/snapshot: remove bdrv_snapshot_delete_by_id_or_name [v2,3/3] qcow2-snapshot: remove redundant find_snapshot_by_id_and_name call

Message ID

20180906111107.30684-1-danielhb413@gmail.com (mailing list archive)

Headers

From: Daniel Henrique Barboza <danielhb413@gmail.com>
To: qemu-devel@nongnu.org
Date: Thu,  6 Sep 2018 08:11:04 -0300
Message-Id: <20180906111107.30684-1-danielhb413@gmail.com>
Subject: [Qemu-devel] [PATCH v2 0/3] HMP/snapshot changes - do not use ID
 anymore
Precedence: list
Cc: kwolf@redhat.com, Daniel Henrique Barboza <danielhb413@gmail.com>,
	armbru@redhat.com, dgilbert@redhat.com, muriloo@linux.ibm.com,
	mreitz@redhat.com
Errors-To: 
 qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Series

HMP/snapshot changes - do not use ID anymore | expand

Message

Daniel Henrique Barboza Sept. 6, 2018, 11:11 a.m. UTC

changes in v2:
- removed the "RFC" marker;
- added a new patch (patch 2) that removes
bdrv_snapshot_delete_by_id_or_name from the code;
- made changes in patch 1 as suggested by Murilo;
- previous patch set link:
https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html


It is not uncommon to see bugs being opened by testers that attempt to
create VM snapshots using HMP. It turns out that "0" and "1" are quite
common snapshot names and they trigger a lot of bugs. I gave an example
in the commit message of patch 1, but to sum up here: QEMU treats the
input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
is documented as such, but this can lead to strange situations.

Given that it is strange for an API to consider a parameter to be 2 fields
at the same time, and inadvently treating them as one or the other, and
that removing the ID field is too drastic, my idea here is to keep the
ID field for internal control, but do not let the user set it.

I guess there's room for discussion about considering this change an API
change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
but simplifying the meaning of the parameters of savevm/loadvm/delvm.


Daniel Henrique Barboza (3):
  block/snapshot.c: eliminate use of ID input in snapshot operations
  block/snapshot: remove bdrv_snapshot_delete_by_id_or_name
  qcow2-snapshot: remove redundant find_snapshot_by_id_and_name call

 block/qcow2-snapshot.c   |  5 -----
 block/snapshot.c         | 25 +++----------------------
 hmp-commands.hx          | 24 ++++++++++++------------
 include/block/snapshot.h |  3 ---
 qemu-img.c               | 15 +++++++++++----
 5 files changed, 26 insertions(+), 46 deletions(-)

Comments

Dr. David Alan Gilbert Sept. 21, 2018, 12:29 p.m. UTC | #1

* Daniel Henrique Barboza (danielhb413@gmail.com) wrote:
> changes in v2:
> - removed the "RFC" marker;
> - added a new patch (patch 2) that removes
> bdrv_snapshot_delete_by_id_or_name from the code;
> - made changes in patch 1 as suggested by Murilo;
> - previous patch set link:
> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> 
> 
> It is not uncommon to see bugs being opened by testers that attempt to
> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> common snapshot names and they trigger a lot of bugs. I gave an example
> in the commit message of patch 1, but to sum up here: QEMU treats the
> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> is documented as such, but this can lead to strange situations.
> 
> Given that it is strange for an API to consider a parameter to be 2 fields
> at the same time, and inadvently treating them as one or the other, and
> that removing the ID field is too drastic, my idea here is to keep the
> ID field for internal control, but do not let the user set it.
> 
> I guess there's room for discussion about considering this change an API
> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> but simplifying the meaning of the parameters of savevm/loadvm/delvm.

Can you clarify a couple of things:
  a) What is it about libvirt's use that means it's OK ? Does it never
use numeric tags?
  b) After this series are you always guaranteed to be able to fix
any existing oddly named snapshots?

Dave

> 
> Daniel Henrique Barboza (3):
>   block/snapshot.c: eliminate use of ID input in snapshot operations
>   block/snapshot: remove bdrv_snapshot_delete_by_id_or_name
>   qcow2-snapshot: remove redundant find_snapshot_by_id_and_name call
> 
>  block/qcow2-snapshot.c   |  5 -----
>  block/snapshot.c         | 25 +++----------------------
>  hmp-commands.hx          | 24 ++++++++++++------------
>  include/block/snapshot.h |  3 ---
>  qemu-img.c               | 15 +++++++++++----
>  5 files changed, 26 insertions(+), 46 deletions(-)
> 
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Daniel Henrique Barboza Sept. 24, 2018, 8:48 p.m. UTC | #2

Hey David,

On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote:
> * Daniel Henrique Barboza (danielhb413@gmail.com) wrote:
>> changes in v2:
>> - removed the "RFC" marker;
>> - added a new patch (patch 2) that removes
>> bdrv_snapshot_delete_by_id_or_name from the code;
>> - made changes in patch 1 as suggested by Murilo;
>> - previous patch set link:
>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>>
>>
>> It is not uncommon to see bugs being opened by testers that attempt to
>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
>> common snapshot names and they trigger a lot of bugs. I gave an example
>> in the commit message of patch 1, but to sum up here: QEMU treats the
>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
>> is documented as such, but this can lead to strange situations.
>>
>> Given that it is strange for an API to consider a parameter to be 2 fields
>> at the same time, and inadvently treating them as one or the other, and
>> that removing the ID field is too drastic, my idea here is to keep the
>> ID field for internal control, but do not let the user set it.
>>
>> I guess there's room for discussion about considering this change an API
>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> Can you clarify a couple of things:
>    a) What is it about libvirt's use that means it's OK ? Does it never
> use numeric tags?

I am glad you asked because my understanding in how Libvirt was dealing
with snapshots was wrong, and I just looked into it further to answer your
question. Luckily, this series fixes the situation for Libvirt as well.

I was misled by the fact that Libvirt does not show the same symptoms we 
see in
QEMU of this problem, but the bug is there. Here's a quick test with 
Libvirt with
"0" and "1" as snapshot names, considering a VM named with no snapshots,
using QEMU 2.12 and Libvirt 4.0.0:

- create the "0" snapshot:

$ sudo virsh snapshot-create-as --name 0 dhb
Domain snapshot 0 created

$ sudo virsh snapshot-list dhb
Name Creation Time State
------------------------------------------------------------
0 2018-09-24 15:47:56 -0400 running

$ sudo virsh qemu-monitor-command dhb --hmp info snapshots
List of snapshots present on all disks:
ID TAG VM SIZE DATE VM CLOCK
--    0  405M 2018-09-24 15:47:56 00:04:20.867


- created the "1" snapshot. Here, Libvirt shows both snapshots with 
snapshot-list,
but the snapshot was erased inside QEMU:

$ sudo virsh snapshot-create-as --name 1 dhb
Domain snapshot 1 created
$
$ sudo virsh snapshot-list dhb
Name Creation Time State
------------------------------------------------------------
0 2018-09-24 15:47:56 -0400 running
1 2018-09-24 15:50:09 -0400 running

$ sudo virsh qemu-monitor-command dhb --hmp info snapshots
List of snapshots present on all disks:
ID TAG VM SIZE DATE VM CLOCK
--    1  404M 2018-09-24 15:50:10 00:05:36.226


This is where I stopped checking out Libvirt at first, concluding 
wrongly that it
was immune to the bug.

Libvirt does not throw an error when trying to apply snapshot 0. In 
fact, it acts
like everything went fine:

$ sudo virsh snapshot-revert dhb 0

$ echo $?
0

Reverting back to snapshot "1" works as intended, restoring the VM state 
when it
was created.


(perhaps this is something we should let Libvirt be aware of ...)



This series fixes this behavior because the snapshot 0 isn't erased from 
QEMU, allowing
Libvirt to successfully restore it.


>    b) After this series are you always guaranteed to be able to fix
> any existing oddly named snapshots?

The oddly named snapshots that already exists can be affected by the 
semantic
change in loadvm and delvm, in a way that the user can't load/del using 
the snap
ID, just the tag. Aside from that, I don't see any side effects with 
existing
snapshots and this patch series.


Thanks,


Daniel


>
> Dave
>
>> Daniel Henrique Barboza (3):
>>    block/snapshot.c: eliminate use of ID input in snapshot operations
>>    block/snapshot: remove bdrv_snapshot_delete_by_id_or_name
>>    qcow2-snapshot: remove redundant find_snapshot_by_id_and_name call
>>
>>   block/qcow2-snapshot.c   |  5 -----
>>   block/snapshot.c         | 25 +++----------------------
>>   hmp-commands.hx          | 24 ++++++++++++------------
>>   include/block/snapshot.h |  3 ---
>>   qemu-img.c               | 15 +++++++++++----
>>   5 files changed, 26 insertions(+), 46 deletions(-)
>>
>> -- 
>> 2.17.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Daniel Henrique Barboza Oct. 8, 2018, 6:13 p.m. UTC | #3

ping

On 9/6/18 8:11 AM, Daniel Henrique Barboza wrote:
> changes in v2:
> - removed the "RFC" marker;
> - added a new patch (patch 2) that removes
> bdrv_snapshot_delete_by_id_or_name from the code;
> - made changes in patch 1 as suggested by Murilo;
> - previous patch set link:
> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>
>
> It is not uncommon to see bugs being opened by testers that attempt to
> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> common snapshot names and they trigger a lot of bugs. I gave an example
> in the commit message of patch 1, but to sum up here: QEMU treats the
> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> is documented as such, but this can lead to strange situations.
>
> Given that it is strange for an API to consider a parameter to be 2 fields
> at the same time, and inadvently treating them as one or the other, and
> that removing the ID field is too drastic, my idea here is to keep the
> ID field for internal control, but do not let the user set it.
>
> I guess there's room for discussion about considering this change an API
> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
>
>
> Daniel Henrique Barboza (3):
>    block/snapshot.c: eliminate use of ID input in snapshot operations
>    block/snapshot: remove bdrv_snapshot_delete_by_id_or_name
>    qcow2-snapshot: remove redundant find_snapshot_by_id_and_name call
>
>   block/qcow2-snapshot.c   |  5 -----
>   block/snapshot.c         | 25 +++----------------------
>   hmp-commands.hx          | 24 ++++++++++++------------
>   include/block/snapshot.h |  3 ---
>   qemu-img.c               | 15 +++++++++++----
>   5 files changed, 26 insertions(+), 46 deletions(-)
>

Markus Armbruster Oct. 9, 2018, 5:34 p.m. UTC | #4

Cc: libvir-list for review of the compatibility argument below.

Daniel Henrique Barboza <danielhb413@gmail.com> writes:

> Hey David,
>
> On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote:
>> * Daniel Henrique Barboza (danielhb413@gmail.com) wrote:
>>> changes in v2:
>>> - removed the "RFC" marker;
>>> - added a new patch (patch 2) that removes
>>> bdrv_snapshot_delete_by_id_or_name from the code;
>>> - made changes in patch 1 as suggested by Murilo;
>>> - previous patch set link:
>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>>>
>>>
>>> It is not uncommon to see bugs being opened by testers that attempt to
>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
>>> common snapshot names and they trigger a lot of bugs. I gave an example
>>> in the commit message of patch 1, but to sum up here: QEMU treats the
>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
>>> is documented as such, but this can lead to strange situations.
>>>
>>> Given that it is strange for an API to consider a parameter to be 2 fields
>>> at the same time, and inadvently treating them as one or the other, and
>>> that removing the ID field is too drastic, my idea here is to keep the
>>> ID field for internal control, but do not let the user set it.
>>>
>>> I guess there's room for discussion about considering this change an API
>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
>> Can you clarify a couple of things:
>>    a) What is it about libvirt's use that means it's OK ? Does it never
>> use numeric tags?
>
> I am glad you asked because my understanding in how Libvirt was dealing
> with snapshots was wrong, and I just looked into it further to answer your
> question. Luckily, this series fixes the situation for Libvirt as well.
>
> I was misled by the fact that Libvirt does not show the same symptoms
> we see in
> QEMU of this problem, but the bug is there. Here's a quick test with
> Libvirt with
> "0" and "1" as snapshot names, considering a VM named with no snapshots,
> using QEMU 2.12 and Libvirt 4.0.0:
>
> - create the "0" snapshot:
>
> $ sudo virsh snapshot-create-as --name 0 dhb
> Domain snapshot 0 created
>
> $ sudo virsh snapshot-list dhb
> Name Creation Time State
> ------------------------------------------------------------
> 0 2018-09-24 15:47:56 -0400 running
>
> $ sudo virsh qemu-monitor-command dhb --hmp info snapshots
> List of snapshots present on all disks:
> ID TAG VM SIZE DATE VM CLOCK
> --    0  405M 2018-09-24 15:47:56 00:04:20.867
>
>
> - created the "1" snapshot. Here, Libvirt shows both snapshots with
> snapshot-list,
> but the snapshot was erased inside QEMU:
>
> $ sudo virsh snapshot-create-as --name 1 dhb
> Domain snapshot 1 created
> $
> $ sudo virsh snapshot-list dhb
> Name Creation Time State
> ------------------------------------------------------------
> 0 2018-09-24 15:47:56 -0400 running
> 1 2018-09-24 15:50:09 -0400 running
>
> $ sudo virsh qemu-monitor-command dhb --hmp info snapshots
> List of snapshots present on all disks:
> ID TAG VM SIZE DATE VM CLOCK
> --    1  404M 2018-09-24 15:50:10 00:05:36.226
>
>
> This is where I stopped checking out Libvirt at first, concluding
> wrongly that it
> was immune to the bug.
>
> Libvirt does not throw an error when trying to apply snapshot 0. In
> fact, it acts
> like everything went fine:
>
> $ sudo virsh snapshot-revert dhb 0
>
> $ echo $?
> 0

Is that a libvirt bug?

> Reverting back to snapshot "1" works as intended, restoring the VM
> state when it
> was created.
>
>
> (perhaps this is something we should let Libvirt be aware of ...)
>
>
>
> This series fixes this behavior because the snapshot 0 isn't erased
> from QEMU, allowing
> Libvirt to successfully restore it.
>
>
>>    b) After this series are you always guaranteed to be able to fix
>> any existing oddly named snapshots?
>
> The oddly named snapshots that already exists can be affected by the
> semantic
> change in loadvm and delvm, in a way that the user can't load/del
> using the snap
> ID, just the tag. Aside from that, I don't see any side effects with
> existing
> snapshots and this patch series.

Do all snapshots have a tag that is unique within their image?  Even
snapshots created by old versions of QEMU?

Peter Krempa Oct. 10, 2018, 7:56 a.m. UTC | #5

On Tue, Oct 09, 2018 at 19:34:43 +0200, Markus Armbruster wrote:
> Cc: libvir-list for review of the compatibility argument below.
> 
> Daniel Henrique Barboza <danielhb413@gmail.com> writes:
> 
> > Hey David,
> >
> > On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote:
> >> * Daniel Henrique Barboza (danielhb413@gmail.com) wrote:
> >>> changes in v2:
> >>> - removed the "RFC" marker;
> >>> - added a new patch (patch 2) that removes
> >>> bdrv_snapshot_delete_by_id_or_name from the code;
> >>> - made changes in patch 1 as suggested by Murilo;
> >>> - previous patch set link:
> >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> >>>
> >>>
> >>> It is not uncommon to see bugs being opened by testers that attempt to
> >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> >>> common snapshot names and they trigger a lot of bugs. I gave an example
> >>> in the commit message of patch 1, but to sum up here: QEMU treats the
> >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> >>> is documented as such, but this can lead to strange situations.
> >>>
> >>> Given that it is strange for an API to consider a parameter to be 2 fields
> >>> at the same time, and inadvently treating them as one or the other, and
> >>> that removing the ID field is too drastic, my idea here is to keep the
> >>> ID field for internal control, but do not let the user set it.
> >>>
> >>> I guess there's room for discussion about considering this change an API
> >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> >> Can you clarify a couple of things:
> >>    a) What is it about libvirt's use that means it's OK ? Does it never
> >> use numeric tags?
> >
> > I am glad you asked because my understanding in how Libvirt was dealing
> > with snapshots was wrong, and I just looked into it further to answer your
> > question. Luckily, this series fixes the situation for Libvirt as well.
> >
> > I was misled by the fact that Libvirt does not show the same symptoms
> > we see in
> > QEMU of this problem, but the bug is there. Here's a quick test with
> > Libvirt with
> > "0" and "1" as snapshot names, considering a VM named with no snapshots,
> > using QEMU 2.12 and Libvirt 4.0.0:
> >
> > - create the "0" snapshot:
> >
> > $ sudo virsh snapshot-create-as --name 0 dhb
> > Domain snapshot 0 created
> >
> > $ sudo virsh snapshot-list dhb
> > Name Creation Time State
> > ------------------------------------------------------------
> > 0 2018-09-24 15:47:56 -0400 running
> >
> > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots
> > List of snapshots present on all disks:
> > ID TAG VM SIZE DATE VM CLOCK
> > --    0  405M 2018-09-24 15:47:56 00:04:20.867
> >
> >
> > - created the "1" snapshot. Here, Libvirt shows both snapshots with
> > snapshot-list,
> > but the snapshot was erased inside QEMU:
> >
> > $ sudo virsh snapshot-create-as --name 1 dhb
> > Domain snapshot 1 created
> > $
> > $ sudo virsh snapshot-list dhb
> > Name Creation Time State
> > ------------------------------------------------------------
> > 0 2018-09-24 15:47:56 -0400 running
> > 1 2018-09-24 15:50:09 -0400 running
> >
> > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots
> > List of snapshots present on all disks:
> > ID TAG VM SIZE DATE VM CLOCK
> > --    1  404M 2018-09-24 15:50:10 00:05:36.226
> >
> >
> > This is where I stopped checking out Libvirt at first, concluding
> > wrongly that it
> > was immune to the bug.
> >
> > Libvirt does not throw an error when trying to apply snapshot 0. In
> > fact, it acts
> > like everything went fine:
> >
> > $ sudo virsh snapshot-revert dhb 0
> >
> > $ echo $?
> > 0
> 
> Is that a libvirt bug?

Probably yes. The error handling from HMP sucks and can't really be
fixed in all cases. This is for the handler which calls "loadvm":

    if (strstr(reply, "No block device supports snapshots") != NULL) {
        virReportError(VIR_ERR_OPERATION_INVALID, "%s",
                       _("this domain does not have a device to load snapshots"));
        goto cleanup;
    } else if (strstr(reply, "Could not find snapshot") != NULL) {
        virReportError(VIR_ERR_OPERATION_INVALID,
                       _("the snapshot '%s' does not exist, and was not loaded"),
                       name);
        goto cleanup;
    } else if (strstr(reply, "Snapshots not supported on device") != NULL) {
        virReportError(VIR_ERR_OPERATION_INVALID, "%s", reply);
        goto cleanup;
    } else if (strstr(reply, "Could not open VM state file") != NULL) {
        virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
        goto cleanup;
    } else if (strstr(reply, "Error") != NULL
             && strstr(reply, "while loading VM state") != NULL) {
        virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
        goto cleanup;
    } else if (strstr(reply, "Error") != NULL
             && strstr(reply, "while activating snapshot on") != NULL) {
        virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
        goto cleanup;
    }

If the above does not match the reported error, we report success. The
same problem can happen with any of the 5 [1] HMP command handlers we
still have.

Note that a similar abomination is also for the code which calls
"savevm".

[1] We technically have 6 HMP handlers, but "cpu_set" is not used if yoy
have a QEMU younger than 3 years. Soon also "drive_add" and "drive_del"
will be replaced, so we'll be stuck with the 3 internal snapshot ones.
> 
> > Reverting back to snapshot "1" works as intended, restoring the VM
> > state when it
> > was created.
> >
> >
> > (perhaps this is something we should let Libvirt be aware of ...)

The error message from qemu which was wrongly ignored by qemu can be
found in the libvirtd debug log. It would be helpful if you could
provide it.

> >
> >
> >
> > This series fixes this behavior because the snapshot 0 isn't erased
> > from QEMU, allowing
> > Libvirt to successfully restore it.
> >
> >
> >>    b) After this series are you always guaranteed to be able to fix
> >> any existing oddly named snapshots?
> >
> > The oddly named snapshots that already exists can be affected by the
> > semantic
> > change in loadvm and delvm, in a way that the user can't load/del
> > using the snap
> > ID, just the tag. Aside from that, I don't see any side effects with
> > existing
> > snapshots and this patch series.
> 
> Do all snapshots have a tag that is unique within their image?  Even
> snapshots created by old versions of QEMU?

I remember there was a discussion which regarded problems with
collisions between the name and the ID of the snapshot when dealing with
loadvm/delvm commands but I can't seem to find it currently. Note that
libvirt plainly issues loadvm/delvm $SNAPSHOTNAME. If the name is
numeric it might clash. Similarly for inactive vms via qemu-img.

Dr. David Alan Gilbert Oct. 11, 2018, 12:06 p.m. UTC | #6

* Peter Krempa (pkrempa@redhat.com) wrote:
> On Tue, Oct 09, 2018 at 19:34:43 +0200, Markus Armbruster wrote:
> > Cc: libvir-list for review of the compatibility argument below.
> > 
> > Daniel Henrique Barboza <danielhb413@gmail.com> writes:
> > 
> > > Hey David,
> > >
> > > On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote:
> > >> * Daniel Henrique Barboza (danielhb413@gmail.com) wrote:
> > >>> changes in v2:
> > >>> - removed the "RFC" marker;
> > >>> - added a new patch (patch 2) that removes
> > >>> bdrv_snapshot_delete_by_id_or_name from the code;
> > >>> - made changes in patch 1 as suggested by Murilo;
> > >>> - previous patch set link:
> > >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> > >>>
> > >>>
> > >>> It is not uncommon to see bugs being opened by testers that attempt to
> > >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> > >>> common snapshot names and they trigger a lot of bugs. I gave an example
> > >>> in the commit message of patch 1, but to sum up here: QEMU treats the
> > >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> > >>> is documented as such, but this can lead to strange situations.
> > >>>
> > >>> Given that it is strange for an API to consider a parameter to be 2 fields
> > >>> at the same time, and inadvently treating them as one or the other, and
> > >>> that removing the ID field is too drastic, my idea here is to keep the
> > >>> ID field for internal control, but do not let the user set it.
> > >>>
> > >>> I guess there's room for discussion about considering this change an API
> > >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> > >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> > >> Can you clarify a couple of things:
> > >>    a) What is it about libvirt's use that means it's OK ? Does it never
> > >> use numeric tags?
> > >
> > > I am glad you asked because my understanding in how Libvirt was dealing
> > > with snapshots was wrong, and I just looked into it further to answer your
> > > question. Luckily, this series fixes the situation for Libvirt as well.
> > >
> > > I was misled by the fact that Libvirt does not show the same symptoms
> > > we see in
> > > QEMU of this problem, but the bug is there. Here's a quick test with
> > > Libvirt with
> > > "0" and "1" as snapshot names, considering a VM named with no snapshots,
> > > using QEMU 2.12 and Libvirt 4.0.0:
> > >
> > > - create the "0" snapshot:
> > >
> > > $ sudo virsh snapshot-create-as --name 0 dhb
> > > Domain snapshot 0 created
> > >
> > > $ sudo virsh snapshot-list dhb
> > > Name Creation Time State
> > > ------------------------------------------------------------
> > > 0 2018-09-24 15:47:56 -0400 running
> > >
> > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots
> > > List of snapshots present on all disks:
> > > ID TAG VM SIZE DATE VM CLOCK
> > > --    0  405M 2018-09-24 15:47:56 00:04:20.867
> > >
> > >
> > > - created the "1" snapshot. Here, Libvirt shows both snapshots with
> > > snapshot-list,
> > > but the snapshot was erased inside QEMU:
> > >
> > > $ sudo virsh snapshot-create-as --name 1 dhb
> > > Domain snapshot 1 created
> > > $
> > > $ sudo virsh snapshot-list dhb
> > > Name Creation Time State
> > > ------------------------------------------------------------
> > > 0 2018-09-24 15:47:56 -0400 running
> > > 1 2018-09-24 15:50:09 -0400 running
> > >
> > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots
> > > List of snapshots present on all disks:
> > > ID TAG VM SIZE DATE VM CLOCK
> > > --    1  404M 2018-09-24 15:50:10 00:05:36.226
> > >
> > >
> > > This is where I stopped checking out Libvirt at first, concluding
> > > wrongly that it
> > > was immune to the bug.
> > >
> > > Libvirt does not throw an error when trying to apply snapshot 0. In
> > > fact, it acts
> > > like everything went fine:
> > >
> > > $ sudo virsh snapshot-revert dhb 0
> > >
> > > $ echo $?
> > > 0
> > 
> > Is that a libvirt bug?
> 
> Probably yes. The error handling from HMP sucks and can't really be
> fixed in all cases. This is for the handler which calls "loadvm":
> 
>     if (strstr(reply, "No block device supports snapshots") != NULL) {
>         virReportError(VIR_ERR_OPERATION_INVALID, "%s",
>                        _("this domain does not have a device to load snapshots"));
>         goto cleanup;
>     } else if (strstr(reply, "Could not find snapshot") != NULL) {
>         virReportError(VIR_ERR_OPERATION_INVALID,
>                        _("the snapshot '%s' does not exist, and was not loaded"),
>                        name);
>         goto cleanup;
>     } else if (strstr(reply, "Snapshots not supported on device") != NULL) {
>         virReportError(VIR_ERR_OPERATION_INVALID, "%s", reply);
>         goto cleanup;
>     } else if (strstr(reply, "Could not open VM state file") != NULL) {
>         virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
>         goto cleanup;
>     } else if (strstr(reply, "Error") != NULL
>              && strstr(reply, "while loading VM state") != NULL) {
>         virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
>         goto cleanup;
>     } else if (strstr(reply, "Error") != NULL
>              && strstr(reply, "while activating snapshot on") != NULL) {
>         virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
>         goto cleanup;
>     }
> 
> If the above does not match the reported error, we report success. The
> same problem can happen with any of the 5 [1] HMP command handlers we
> still have.
> 
> Note that a similar abomination is also for the code which calls
> "savevm".

Would the following small qemu change make life a little safer:

diff --git a/hmp.c b/hmp.c
index 576253a01f..0729a8c7ed 100644
--- a/hmp.c
+++ b/hmp.c
@@ -62,7 +62,7 @@ static void hmp_handle_error(Monitor *mon, Error **errp)
 {
     assert(errp);
     if (*errp) {
-        error_report_err(*errp);
+        error_reportf_err(*errp, "Error: ");
     }
 }

changing:

No block device supports snapshots


to:

Error: No block device supports snapshots

so you could check for Error: at the start and know that you've hit some
unknown error?

Dave


> [1] We technically have 6 HMP handlers, but "cpu_set" is not used if yoy
> have a QEMU younger than 3 years. Soon also "drive_add" and "drive_del"
> will be replaced, so we'll be stuck with the 3 internal snapshot ones.
> > 
> > > Reverting back to snapshot "1" works as intended, restoring the VM
> > > state when it
> > > was created.
> > >
> > >
> > > (perhaps this is something we should let Libvirt be aware of ...)
> 
> The error message from qemu which was wrongly ignored by qemu can be
> found in the libvirtd debug log. It would be helpful if you could
> provide it.
> 
> > >
> > >
> > >
> > > This series fixes this behavior because the snapshot 0 isn't erased
> > > from QEMU, allowing
> > > Libvirt to successfully restore it.
> > >
> > >
> > >>    b) After this series are you always guaranteed to be able to fix
> > >> any existing oddly named snapshots?
> > >
> > > The oddly named snapshots that already exists can be affected by the
> > > semantic
> > > change in loadvm and delvm, in a way that the user can't load/del
> > > using the snap
> > > ID, just the tag. Aside from that, I don't see any side effects with
> > > existing
> > > snapshots and this patch series.
> > 
> > Do all snapshots have a tag that is unique within their image?  Even
> > snapshots created by old versions of QEMU?
> 
> I remember there was a discussion which regarded problems with
> collisions between the name and the ID of the snapshot when dealing with
> loadvm/delvm commands but I can't seem to find it currently. Note that
> libvirt plainly issues loadvm/delvm $SNAPSHOTNAME. If the name is
> numeric it might clash. Similarly for inactive vms via qemu-img.


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Peter Krempa Oct. 11, 2018, 12:35 p.m. UTC | #7

On Thu, Oct 11, 2018 at 13:06:14 +0100, Dr. David Alan Gilbert wrote:
> * Peter Krempa (pkrempa@redhat.com) wrote:
> > On Tue, Oct 09, 2018 at 19:34:43 +0200, Markus Armbruster wrote:
> > > Cc: libvir-list for review of the compatibility argument below.
> > > 
> > > Daniel Henrique Barboza <danielhb413@gmail.com> writes:
> > > 
> > > > Hey David,
> > > >
> > > > On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote:
> > > >> * Daniel Henrique Barboza (danielhb413@gmail.com) wrote:
> > > >>> changes in v2:
> > > >>> - removed the "RFC" marker;
> > > >>> - added a new patch (patch 2) that removes
> > > >>> bdrv_snapshot_delete_by_id_or_name from the code;
> > > >>> - made changes in patch 1 as suggested by Murilo;
> > > >>> - previous patch set link:
> > > >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> > > >>>
> > > >>>
> > > >>> It is not uncommon to see bugs being opened by testers that attempt to
> > > >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> > > >>> common snapshot names and they trigger a lot of bugs. I gave an example
> > > >>> in the commit message of patch 1, but to sum up here: QEMU treats the
> > > >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> > > >>> is documented as such, but this can lead to strange situations.
> > > >>>
> > > >>> Given that it is strange for an API to consider a parameter to be 2 fields
> > > >>> at the same time, and inadvently treating them as one or the other, and
> > > >>> that removing the ID field is too drastic, my idea here is to keep the
> > > >>> ID field for internal control, but do not let the user set it.
> > > >>>
> > > >>> I guess there's room for discussion about considering this change an API
> > > >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> > > >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> > > >> Can you clarify a couple of things:
> > > >>    a) What is it about libvirt's use that means it's OK ? Does it never
> > > >> use numeric tags?
> > > >
> > > > I am glad you asked because my understanding in how Libvirt was dealing
> > > > with snapshots was wrong, and I just looked into it further to answer your
> > > > question. Luckily, this series fixes the situation for Libvirt as well.
> > > >
> > > > I was misled by the fact that Libvirt does not show the same symptoms
> > > > we see in
> > > > QEMU of this problem, but the bug is there. Here's a quick test with
> > > > Libvirt with
> > > > "0" and "1" as snapshot names, considering a VM named with no snapshots,
> > > > using QEMU 2.12 and Libvirt 4.0.0:
> > > >
> > > > - create the "0" snapshot:
> > > >
> > > > $ sudo virsh snapshot-create-as --name 0 dhb
> > > > Domain snapshot 0 created
> > > >
> > > > $ sudo virsh snapshot-list dhb
> > > > Name Creation Time State
> > > > ------------------------------------------------------------
> > > > 0 2018-09-24 15:47:56 -0400 running
> > > >
> > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots
> > > > List of snapshots present on all disks:
> > > > ID TAG VM SIZE DATE VM CLOCK
> > > > --    0  405M 2018-09-24 15:47:56 00:04:20.867
> > > >
> > > >
> > > > - created the "1" snapshot. Here, Libvirt shows both snapshots with
> > > > snapshot-list,
> > > > but the snapshot was erased inside QEMU:
> > > >
> > > > $ sudo virsh snapshot-create-as --name 1 dhb
> > > > Domain snapshot 1 created
> > > > $
> > > > $ sudo virsh snapshot-list dhb
> > > > Name Creation Time State
> > > > ------------------------------------------------------------
> > > > 0 2018-09-24 15:47:56 -0400 running
> > > > 1 2018-09-24 15:50:09 -0400 running
> > > >
> > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots
> > > > List of snapshots present on all disks:
> > > > ID TAG VM SIZE DATE VM CLOCK
> > > > --    1  404M 2018-09-24 15:50:10 00:05:36.226
> > > >
> > > >
> > > > This is where I stopped checking out Libvirt at first, concluding
> > > > wrongly that it
> > > > was immune to the bug.
> > > >
> > > > Libvirt does not throw an error when trying to apply snapshot 0. In
> > > > fact, it acts
> > > > like everything went fine:
> > > >
> > > > $ sudo virsh snapshot-revert dhb 0
> > > >
> > > > $ echo $?
> > > > 0
> > > 
> > > Is that a libvirt bug?
> > 
> > Probably yes. The error handling from HMP sucks and can't really be
> > fixed in all cases. This is for the handler which calls "loadvm":
> > 
> >     if (strstr(reply, "No block device supports snapshots") != NULL) {
> >         virReportError(VIR_ERR_OPERATION_INVALID, "%s",
> >                        _("this domain does not have a device to load snapshots"));
> >         goto cleanup;
> >     } else if (strstr(reply, "Could not find snapshot") != NULL) {
> >         virReportError(VIR_ERR_OPERATION_INVALID,
> >                        _("the snapshot '%s' does not exist, and was not loaded"),
> >                        name);
> >         goto cleanup;
> >     } else if (strstr(reply, "Snapshots not supported on device") != NULL) {
> >         virReportError(VIR_ERR_OPERATION_INVALID, "%s", reply);
> >         goto cleanup;
> >     } else if (strstr(reply, "Could not open VM state file") != NULL) {
> >         virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
> >         goto cleanup;
> >     } else if (strstr(reply, "Error") != NULL
> >              && strstr(reply, "while loading VM state") != NULL) {
> >         virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
> >         goto cleanup;
> >     } else if (strstr(reply, "Error") != NULL
> >              && strstr(reply, "while activating snapshot on") != NULL) {
> >         virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
> >         goto cleanup;
> >     }
> > 
> > If the above does not match the reported error, we report success. The
> > same problem can happen with any of the 5 [1] HMP command handlers we
> > still have.
> > 
> > Note that a similar abomination is also for the code which calls
> > "savevm".
> 
> Would the following small qemu change make life a little safer:
> 
> diff --git a/hmp.c b/hmp.c
> index 576253a01f..0729a8c7ed 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -62,7 +62,7 @@ static void hmp_handle_error(Monitor *mon, Error **errp)
>  {
>      assert(errp);
>      if (*errp) {
> -        error_report_err(*errp);
> +        error_reportf_err(*errp, "Error: ");
>      }
>  }
> 
> changing:
> 
> No block device supports snapshots
> 
> 
> to:
> 
> Error: No block device supports snapshots
> 
> so you could check for Error: at the start and know that you've hit some
> unknown error?

That certainly would help, provided that the message is not localized in
any way. I'd be very glad to see it also for 'savevm' and 'delvm'.

It's a shame that drive_add and drive_del don't have that either, but
they will soon become unused.

Max Reitz Jan. 9, 2019, 2:10 p.m. UTC | #8

On 06.09.18 13:11, Daniel Henrique Barboza wrote:
> changes in v2:
> - removed the "RFC" marker;
> - added a new patch (patch 2) that removes
> bdrv_snapshot_delete_by_id_or_name from the code;
> - made changes in patch 1 as suggested by Murilo;
> - previous patch set link:
> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> 
> 
> It is not uncommon to see bugs being opened by testers that attempt to
> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> common snapshot names and they trigger a lot of bugs. I gave an example
> in the commit message of patch 1, but to sum up here: QEMU treats the
> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> is documented as such, but this can lead to strange situations.
> 
> Given that it is strange for an API to consider a parameter to be 2 fields
> at the same time, and inadvently treating them as one or the other, and
> that removing the ID field is too drastic, my idea here is to keep the
> ID field for internal control, but do not let the user set it.
> 
> I guess there's room for discussion about considering this change an API
> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> but simplifying the meaning of the parameters of savevm/loadvm/delvm.

(Yes, very late reply, I'm sorry...)

I do think it affects users of HMP, because right now you can delete
snapshots with their ID, and after this series you cannot.

I think we had a short discussion about just disallowing numeric
snapshot names.  How bad would that be?

Max

Kevin Wolf Jan. 9, 2019, 2:21 p.m. UTC | #9

Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
> > changes in v2:
> > - removed the "RFC" marker;
> > - added a new patch (patch 2) that removes
> > bdrv_snapshot_delete_by_id_or_name from the code;
> > - made changes in patch 1 as suggested by Murilo;
> > - previous patch set link:
> > https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> > 
> > 
> > It is not uncommon to see bugs being opened by testers that attempt to
> > create VM snapshots using HMP. It turns out that "0" and "1" are quite
> > common snapshot names and they trigger a lot of bugs. I gave an example
> > in the commit message of patch 1, but to sum up here: QEMU treats the
> > input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> > is documented as such, but this can lead to strange situations.
> > 
> > Given that it is strange for an API to consider a parameter to be 2 fields
> > at the same time, and inadvently treating them as one or the other, and
> > that removing the ID field is too drastic, my idea here is to keep the
> > ID field for internal control, but do not let the user set it.
> > 
> > I guess there's room for discussion about considering this change an API
> > change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> > but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> 
> (Yes, very late reply, I'm sorry...)
> 
> I do think it affects users of HMP, because right now you can delete
> snapshots with their ID, and after this series you cannot.

Can there be snapshots that can't be identified by a snapshot name, but
only by their ID?

> I think we had a short discussion about just disallowing numeric
> snapshot names.  How bad would that be?

It would be incompatible with existing images and result in a more
complex snapshot identifier resolution. Why would it be any better?

Kevin

Max Reitz Jan. 9, 2019, 2:27 p.m. UTC | #10

On 09.01.19 15:21, Kevin Wolf wrote:
> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
>> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
>>> changes in v2:
>>> - removed the "RFC" marker;
>>> - added a new patch (patch 2) that removes
>>> bdrv_snapshot_delete_by_id_or_name from the code;
>>> - made changes in patch 1 as suggested by Murilo;
>>> - previous patch set link:
>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>>>
>>>
>>> It is not uncommon to see bugs being opened by testers that attempt to
>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
>>> common snapshot names and they trigger a lot of bugs. I gave an example
>>> in the commit message of patch 1, but to sum up here: QEMU treats the
>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
>>> is documented as such, but this can lead to strange situations.
>>>
>>> Given that it is strange for an API to consider a parameter to be 2 fields
>>> at the same time, and inadvently treating them as one or the other, and
>>> that removing the ID field is too drastic, my idea here is to keep the
>>> ID field for internal control, but do not let the user set it.
>>>
>>> I guess there's room for discussion about considering this change an API
>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
>>
>> (Yes, very late reply, I'm sorry...)
>>
>> I do think it affects users of HMP, because right now you can delete
>> snapshots with their ID, and after this series you cannot.
> 
> Can there be snapshots that can't be identified by a snapshot name, but
> only by their ID?

I don't know, but what I meant is that if you have scripts to do all
this, you might have to adjust them with this change.

>> I think we had a short discussion about just disallowing numeric
>> snapshot names.  How bad would that be?
> 
> It would be incompatible with existing images and result in a more
> complex snapshot identifier resolution. Why would it be any better?

It wouldn't be incompatible with existing images if we'd just disallow
creating such snapshots.  I don't see how the identifier resolution
would be more complex.

I don't know if it'd be better.  I'd just find it simpler (for us, that
is -- for users, I'm not sure).

Max

Kevin Wolf Jan. 9, 2019, 2:48 p.m. UTC | #11

Am 09.01.2019 um 15:27 hat Max Reitz geschrieben:
> On 09.01.19 15:21, Kevin Wolf wrote:
> > Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
> >> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
> >>> changes in v2:
> >>> - removed the "RFC" marker;
> >>> - added a new patch (patch 2) that removes
> >>> bdrv_snapshot_delete_by_id_or_name from the code;
> >>> - made changes in patch 1 as suggested by Murilo;
> >>> - previous patch set link:
> >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> >>>
> >>>
> >>> It is not uncommon to see bugs being opened by testers that attempt to
> >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> >>> common snapshot names and they trigger a lot of bugs. I gave an example
> >>> in the commit message of patch 1, but to sum up here: QEMU treats the
> >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> >>> is documented as such, but this can lead to strange situations.
> >>>
> >>> Given that it is strange for an API to consider a parameter to be 2 fields
> >>> at the same time, and inadvently treating them as one or the other, and
> >>> that removing the ID field is too drastic, my idea here is to keep the
> >>> ID field for internal control, but do not let the user set it.
> >>>
> >>> I guess there's room for discussion about considering this change an API
> >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> >>
> >> (Yes, very late reply, I'm sorry...)
> >>
> >> I do think it affects users of HMP, because right now you can delete
> >> snapshots with their ID, and after this series you cannot.
> > 
> > Can there be snapshots that can't be identified by a snapshot name, but
> > only by their ID?
> 
> I don't know, but what I meant is that if you have scripts to do all
> this, you might have to adjust them with this change.

That's what the H in HMP means.

> >> I think we had a short discussion about just disallowing numeric
> >> snapshot names.  How bad would that be?
> > 
> > It would be incompatible with existing images and result in a more
> > complex snapshot identifier resolution. Why would it be any better?
> 
> It wouldn't be incompatible with existing images if we'd just disallow
> creating such snapshots.  I don't see how the identifier resolution
> would be more complex.
> 
> I don't know if it'd be better.  I'd just find it simpler (for us, that
> is -- for users, I'm not sure).

Identifier resolution A:
- Find a snapshot that has the given identifier as a name
- If no such snapshot exists, it is an error

Identifier resolution B:
- If the identifier is a number, find a snapshot that has the given
  identifier as its ID
- If the identifier is not a number, find a snapshot that has the given
  identifier as a name
- If no such snapshot exists, it is an error

Isn't it rather obvious that B is more complex than A?

Kevin

Max Reitz Jan. 9, 2019, 2:54 p.m. UTC | #12

On 09.01.19 15:48, Kevin Wolf wrote:
> Am 09.01.2019 um 15:27 hat Max Reitz geschrieben:
>> On 09.01.19 15:21, Kevin Wolf wrote:
>>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
>>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
>>>>> changes in v2:
>>>>> - removed the "RFC" marker;
>>>>> - added a new patch (patch 2) that removes
>>>>> bdrv_snapshot_delete_by_id_or_name from the code;
>>>>> - made changes in patch 1 as suggested by Murilo;
>>>>> - previous patch set link:
>>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>>>>>
>>>>>
>>>>> It is not uncommon to see bugs being opened by testers that attempt to
>>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
>>>>> common snapshot names and they trigger a lot of bugs. I gave an example
>>>>> in the commit message of patch 1, but to sum up here: QEMU treats the
>>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
>>>>> is documented as such, but this can lead to strange situations.
>>>>>
>>>>> Given that it is strange for an API to consider a parameter to be 2 fields
>>>>> at the same time, and inadvently treating them as one or the other, and
>>>>> that removing the ID field is too drastic, my idea here is to keep the
>>>>> ID field for internal control, but do not let the user set it.
>>>>>
>>>>> I guess there's room for discussion about considering this change an API
>>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
>>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
>>>>
>>>> (Yes, very late reply, I'm sorry...)
>>>>
>>>> I do think it affects users of HMP, because right now you can delete
>>>> snapshots with their ID, and after this series you cannot.
>>>
>>> Can there be snapshots that can't be identified by a snapshot name, but
>>> only by their ID?
>>
>> I don't know, but what I meant is that if you have scripts to do all
>> this, you might have to adjust them with this change.
> 
> That's what the H in HMP means.
> 
>>>> I think we had a short discussion about just disallowing numeric
>>>> snapshot names.  How bad would that be?
>>>
>>> It would be incompatible with existing images and result in a more
>>> complex snapshot identifier resolution. Why would it be any better?
>>
>> It wouldn't be incompatible with existing images if we'd just disallow
>> creating such snapshots.  I don't see how the identifier resolution
>> would be more complex.
>>
>> I don't know if it'd be better.  I'd just find it simpler (for us, that
>> is -- for users, I'm not sure).
> 
> Identifier resolution A:
> - Find a snapshot that has the given identifier as a name
> - If no such snapshot exists, it is an error
> 
> Identifier resolution B:
> - If the identifier is a number, find a snapshot that has the given
>   identifier as its ID
> - If the identifier is not a number, find a snapshot that has the given
>   identifier as a name
> - If no such snapshot exists, it is an error

No, my idea was to keep the resolution the same as it is; just to forbid
creating new snapshots with numeric names.  This would prevent users
from getting into the whole situation.

Max

Kevin Wolf Jan. 9, 2019, 3:13 p.m. UTC | #13

Am 09.01.2019 um 15:54 hat Max Reitz geschrieben:
> On 09.01.19 15:48, Kevin Wolf wrote:
> > Am 09.01.2019 um 15:27 hat Max Reitz geschrieben:
> >> On 09.01.19 15:21, Kevin Wolf wrote:
> >>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
> >>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
> >>>>> changes in v2:
> >>>>> - removed the "RFC" marker;
> >>>>> - added a new patch (patch 2) that removes
> >>>>> bdrv_snapshot_delete_by_id_or_name from the code;
> >>>>> - made changes in patch 1 as suggested by Murilo;
> >>>>> - previous patch set link:
> >>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> >>>>>
> >>>>>
> >>>>> It is not uncommon to see bugs being opened by testers that attempt to
> >>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> >>>>> common snapshot names and they trigger a lot of bugs. I gave an example
> >>>>> in the commit message of patch 1, but to sum up here: QEMU treats the
> >>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> >>>>> is documented as such, but this can lead to strange situations.
> >>>>>
> >>>>> Given that it is strange for an API to consider a parameter to be 2 fields
> >>>>> at the same time, and inadvently treating them as one or the other, and
> >>>>> that removing the ID field is too drastic, my idea here is to keep the
> >>>>> ID field for internal control, but do not let the user set it.
> >>>>>
> >>>>> I guess there's room for discussion about considering this change an API
> >>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> >>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> >>>>
> >>>> (Yes, very late reply, I'm sorry...)
> >>>>
> >>>> I do think it affects users of HMP, because right now you can delete
> >>>> snapshots with their ID, and after this series you cannot.
> >>>
> >>> Can there be snapshots that can't be identified by a snapshot name, but
> >>> only by their ID?
> >>
> >> I don't know, but what I meant is that if you have scripts to do all
> >> this, you might have to adjust them with this change.
> > 
> > That's what the H in HMP means.
> > 
> >>>> I think we had a short discussion about just disallowing numeric
> >>>> snapshot names.  How bad would that be?
> >>>
> >>> It would be incompatible with existing images and result in a more
> >>> complex snapshot identifier resolution. Why would it be any better?
> >>
> >> It wouldn't be incompatible with existing images if we'd just disallow
> >> creating such snapshots.  I don't see how the identifier resolution
> >> would be more complex.
> >>
> >> I don't know if it'd be better.  I'd just find it simpler (for us, that
> >> is -- for users, I'm not sure).
> > 
> > Identifier resolution A:
> > - Find a snapshot that has the given identifier as a name
> > - If no such snapshot exists, it is an error
> > 
> > Identifier resolution B:
> > - If the identifier is a number, find a snapshot that has the given
> >   identifier as its ID
> > - If the identifier is not a number, find a snapshot that has the given
> >   identifier as a name
> > - If no such snapshot exists, it is an error
> 
> No, my idea was to keep the resolution the same as it is; just to forbid
> creating new snapshots with numeric names.  This would prevent users
> from getting into the whole situation.

That's the version with an even more complex resolution method C. :-)

I actually think the current behaviour is more confusing than helpful.
Without looking into the code or trying it out, I couldn't even tell
whether ID or name takes precedence if there is a matching snapshot for
both. Considering your proposal, it's probably the ID, but how should a
user know that? (If against all expectations documentation exists, it
doesn't count because nobody reads that.)

Kevin

Dr. David Alan Gilbert Jan. 9, 2019, 3:25 p.m. UTC | #14

* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 09.01.2019 um 15:54 hat Max Reitz geschrieben:
> > On 09.01.19 15:48, Kevin Wolf wrote:
> > > Am 09.01.2019 um 15:27 hat Max Reitz geschrieben:
> > >> On 09.01.19 15:21, Kevin Wolf wrote:
> > >>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
> > >>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
> > >>>>> changes in v2:
> > >>>>> - removed the "RFC" marker;
> > >>>>> - added a new patch (patch 2) that removes
> > >>>>> bdrv_snapshot_delete_by_id_or_name from the code;
> > >>>>> - made changes in patch 1 as suggested by Murilo;
> > >>>>> - previous patch set link:
> > >>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> > >>>>>
> > >>>>>
> > >>>>> It is not uncommon to see bugs being opened by testers that attempt to
> > >>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> > >>>>> common snapshot names and they trigger a lot of bugs. I gave an example
> > >>>>> in the commit message of patch 1, but to sum up here: QEMU treats the
> > >>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> > >>>>> is documented as such, but this can lead to strange situations.
> > >>>>>
> > >>>>> Given that it is strange for an API to consider a parameter to be 2 fields
> > >>>>> at the same time, and inadvently treating them as one or the other, and
> > >>>>> that removing the ID field is too drastic, my idea here is to keep the
> > >>>>> ID field for internal control, but do not let the user set it.
> > >>>>>
> > >>>>> I guess there's room for discussion about considering this change an API
> > >>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> > >>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> > >>>>
> > >>>> (Yes, very late reply, I'm sorry...)
> > >>>>
> > >>>> I do think it affects users of HMP, because right now you can delete
> > >>>> snapshots with their ID, and after this series you cannot.
> > >>>
> > >>> Can there be snapshots that can't be identified by a snapshot name, but
> > >>> only by their ID?
> > >>
> > >> I don't know, but what I meant is that if you have scripts to do all
> > >> this, you might have to adjust them with this change.
> > > 
> > > That's what the H in HMP means.
> > > 
> > >>>> I think we had a short discussion about just disallowing numeric
> > >>>> snapshot names.  How bad would that be?
> > >>>
> > >>> It would be incompatible with existing images and result in a more
> > >>> complex snapshot identifier resolution. Why would it be any better?
> > >>
> > >> It wouldn't be incompatible with existing images if we'd just disallow
> > >> creating such snapshots.  I don't see how the identifier resolution
> > >> would be more complex.
> > >>
> > >> I don't know if it'd be better.  I'd just find it simpler (for us, that
> > >> is -- for users, I'm not sure).
> > > 
> > > Identifier resolution A:
> > > - Find a snapshot that has the given identifier as a name
> > > - If no such snapshot exists, it is an error
> > > 
> > > Identifier resolution B:
> > > - If the identifier is a number, find a snapshot that has the given
> > >   identifier as its ID
> > > - If the identifier is not a number, find a snapshot that has the given
> > >   identifier as a name
> > > - If no such snapshot exists, it is an error
> > 
> > No, my idea was to keep the resolution the same as it is; just to forbid
> > creating new snapshots with numeric names.  This would prevent users
> > from getting into the whole situation.
> 
> That's the version with an even more complex resolution method C. :-)
> 
> I actually think the current behaviour is more confusing than helpful.
> Without looking into the code or trying it out, I couldn't even tell
> whether ID or name takes precedence if there is a matching snapshot for
> both. Considering your proposal, it's probably the ID, but how should a
> user know that? (If against all expectations documentation exists, it
> doesn't count because nobody reads that.)

Would adding a flag to the HMP commands to make it explicit help?

Dave

> Kevin


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Markus Armbruster Jan. 9, 2019, 4:25 p.m. UTC | #15

Kevin Wolf <kwolf@redhat.com> writes:

> Am 09.01.2019 um 15:54 hat Max Reitz geschrieben:
>> On 09.01.19 15:48, Kevin Wolf wrote:
>> > Am 09.01.2019 um 15:27 hat Max Reitz geschrieben:
>> >> On 09.01.19 15:21, Kevin Wolf wrote:
>> >>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
>> >>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
>> >>>>> changes in v2:
>> >>>>> - removed the "RFC" marker;
>> >>>>> - added a new patch (patch 2) that removes
>> >>>>> bdrv_snapshot_delete_by_id_or_name from the code;
>> >>>>> - made changes in patch 1 as suggested by Murilo;
>> >>>>> - previous patch set link:
>> >>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>> >>>>>
>> >>>>>
>> >>>>> It is not uncommon to see bugs being opened by testers that attempt to
>> >>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
>> >>>>> common snapshot names and they trigger a lot of bugs. I gave an example
>> >>>>> in the commit message of patch 1, but to sum up here: QEMU treats the
>> >>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
>> >>>>> is documented as such, but this can lead to strange situations.
>> >>>>>
>> >>>>> Given that it is strange for an API to consider a parameter to be 2 fields
>> >>>>> at the same time, and inadvently treating them as one or the other, and
>> >>>>> that removing the ID field is too drastic, my idea here is to keep the
>> >>>>> ID field for internal control, but do not let the user set it.
>> >>>>>
>> >>>>> I guess there's room for discussion about considering this change an API
>> >>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
>> >>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
>> >>>>
>> >>>> (Yes, very late reply, I'm sorry...)
>> >>>>
>> >>>> I do think it affects users of HMP, because right now you can delete
>> >>>> snapshots with their ID, and after this series you cannot.
>> >>>
>> >>> Can there be snapshots that can't be identified by a snapshot name, but
>> >>> only by their ID?
>> >>
>> >> I don't know, but what I meant is that if you have scripts to do all
>> >> this, you might have to adjust them with this change.
>> > 
>> > That's what the H in HMP means.
>> > 
>> >>>> I think we had a short discussion about just disallowing numeric
>> >>>> snapshot names.  How bad would that be?
>> >>>
>> >>> It would be incompatible with existing images and result in a more
>> >>> complex snapshot identifier resolution. Why would it be any better?
>> >>
>> >> It wouldn't be incompatible with existing images if we'd just disallow
>> >> creating such snapshots.  I don't see how the identifier resolution
>> >> would be more complex.
>> >>
>> >> I don't know if it'd be better.  I'd just find it simpler (for us, that
>> >> is -- for users, I'm not sure).
>> > 
>> > Identifier resolution A:
>> > - Find a snapshot that has the given identifier as a name
>> > - If no such snapshot exists, it is an error
>> > 
>> > Identifier resolution B:
>> > - If the identifier is a number, find a snapshot that has the given
>> >   identifier as its ID
>> > - If the identifier is not a number, find a snapshot that has the given
>> >   identifier as a name
>> > - If no such snapshot exists, it is an error
>> 
>> No, my idea was to keep the resolution the same as it is; just to forbid
>> creating new snapshots with numeric names.  This would prevent users
>> from getting into the whole situation.
>
> That's the version with an even more complex resolution method C. :-)
>
> I actually think the current behaviour is more confusing than helpful.
> Without looking into the code or trying it out, I couldn't even tell
> whether ID or name takes precedence if there is a matching snapshot for
> both.

Been there, done that, more than once.

>       Considering your proposal, it's probably the ID, but how should a
> user know that? (If against all expectations documentation exists, it
> doesn't count because nobody reads that.)

In this case, probably for the better --- I'd expect documentation of
this mess (if any) to be rather losely related to the code.

Max Reitz Jan. 9, 2019, 4:27 p.m. UTC | #16

On 09.01.19 16:13, Kevin Wolf wrote:
> Am 09.01.2019 um 15:54 hat Max Reitz geschrieben:
>> On 09.01.19 15:48, Kevin Wolf wrote:
>>> Am 09.01.2019 um 15:27 hat Max Reitz geschrieben:
>>>> On 09.01.19 15:21, Kevin Wolf wrote:
>>>>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
>>>>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
>>>>>>> changes in v2:
>>>>>>> - removed the "RFC" marker;
>>>>>>> - added a new patch (patch 2) that removes
>>>>>>> bdrv_snapshot_delete_by_id_or_name from the code;
>>>>>>> - made changes in patch 1 as suggested by Murilo;
>>>>>>> - previous patch set link:
>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>>>>>>>
>>>>>>>
>>>>>>> It is not uncommon to see bugs being opened by testers that attempt to
>>>>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
>>>>>>> common snapshot names and they trigger a lot of bugs. I gave an example
>>>>>>> in the commit message of patch 1, but to sum up here: QEMU treats the
>>>>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
>>>>>>> is documented as such, but this can lead to strange situations.
>>>>>>>
>>>>>>> Given that it is strange for an API to consider a parameter to be 2 fields
>>>>>>> at the same time, and inadvently treating them as one or the other, and
>>>>>>> that removing the ID field is too drastic, my idea here is to keep the
>>>>>>> ID field for internal control, but do not let the user set it.
>>>>>>>
>>>>>>> I guess there's room for discussion about considering this change an API
>>>>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
>>>>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
>>>>>>
>>>>>> (Yes, very late reply, I'm sorry...)
>>>>>>
>>>>>> I do think it affects users of HMP, because right now you can delete
>>>>>> snapshots with their ID, and after this series you cannot.
>>>>>
>>>>> Can there be snapshots that can't be identified by a snapshot name, but
>>>>> only by their ID?
>>>>
>>>> I don't know, but what I meant is that if you have scripts to do all
>>>> this, you might have to adjust them with this change.
>>>
>>> That's what the H in HMP means.
>>>
>>>>>> I think we had a short discussion about just disallowing numeric
>>>>>> snapshot names.  How bad would that be?
>>>>>
>>>>> It would be incompatible with existing images and result in a more
>>>>> complex snapshot identifier resolution. Why would it be any better?
>>>>
>>>> It wouldn't be incompatible with existing images if we'd just disallow
>>>> creating such snapshots.  I don't see how the identifier resolution
>>>> would be more complex.
>>>>
>>>> I don't know if it'd be better.  I'd just find it simpler (for us, that
>>>> is -- for users, I'm not sure).
>>>
>>> Identifier resolution A:
>>> - Find a snapshot that has the given identifier as a name
>>> - If no such snapshot exists, it is an error
>>>
>>> Identifier resolution B:
>>> - If the identifier is a number, find a snapshot that has the given
>>>   identifier as its ID
>>> - If the identifier is not a number, find a snapshot that has the given
>>>   identifier as a name
>>> - If no such snapshot exists, it is an error
>>
>> No, my idea was to keep the resolution the same as it is; just to forbid
>> creating new snapshots with numeric names.  This would prevent users
>> from getting into the whole situation.
> 
> That's the version with an even more complex resolution method C. :-)

How so if the resolution method stays the same?  Because it already is
too complex?

If so, yes, that is an argument.  I was arguing for the simplest patch
instead of the simplest code, true.

> I actually think the current behaviour is more confusing than helpful.
> Without looking into the code or trying it out, I couldn't even tell
> whether ID or name takes precedence if there is a matching snapshot for
> both. Considering your proposal, it's probably the ID, but how should a
> user know that? (If against all expectations documentation exists, it
> doesn't count because nobody reads that.)

It isn't more confusing than it is right now.  With my proposal, all
current images are simply as confusing as they are right now (I think ID
takes precedence, yes), but if you create new snapshots, it's clear,
since you simply cannot create names that could be IDs.

Max

Kevin Wolf Jan. 9, 2019, 4:45 p.m. UTC | #17

Am 09.01.2019 um 17:27 hat Max Reitz geschrieben:
> On 09.01.19 16:13, Kevin Wolf wrote:
> > Am 09.01.2019 um 15:54 hat Max Reitz geschrieben:
> >> On 09.01.19 15:48, Kevin Wolf wrote:
> >>> Am 09.01.2019 um 15:27 hat Max Reitz geschrieben:
> >>>> On 09.01.19 15:21, Kevin Wolf wrote:
> >>>>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
> >>>>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
> >>>>>>> changes in v2:
> >>>>>>> - removed the "RFC" marker;
> >>>>>>> - added a new patch (patch 2) that removes
> >>>>>>> bdrv_snapshot_delete_by_id_or_name from the code;
> >>>>>>> - made changes in patch 1 as suggested by Murilo;
> >>>>>>> - previous patch set link:
> >>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> >>>>>>>
> >>>>>>>
> >>>>>>> It is not uncommon to see bugs being opened by testers that attempt to
> >>>>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> >>>>>>> common snapshot names and they trigger a lot of bugs. I gave an example
> >>>>>>> in the commit message of patch 1, but to sum up here: QEMU treats the
> >>>>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> >>>>>>> is documented as such, but this can lead to strange situations.
> >>>>>>>
> >>>>>>> Given that it is strange for an API to consider a parameter to be 2 fields
> >>>>>>> at the same time, and inadvently treating them as one or the other, and
> >>>>>>> that removing the ID field is too drastic, my idea here is to keep the
> >>>>>>> ID field for internal control, but do not let the user set it.
> >>>>>>>
> >>>>>>> I guess there's room for discussion about considering this change an API
> >>>>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> >>>>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> >>>>>>
> >>>>>> (Yes, very late reply, I'm sorry...)
> >>>>>>
> >>>>>> I do think it affects users of HMP, because right now you can delete
> >>>>>> snapshots with their ID, and after this series you cannot.
> >>>>>
> >>>>> Can there be snapshots that can't be identified by a snapshot name, but
> >>>>> only by their ID?
> >>>>
> >>>> I don't know, but what I meant is that if you have scripts to do all
> >>>> this, you might have to adjust them with this change.
> >>>
> >>> That's what the H in HMP means.
> >>>
> >>>>>> I think we had a short discussion about just disallowing numeric
> >>>>>> snapshot names.  How bad would that be?
> >>>>>
> >>>>> It would be incompatible with existing images and result in a more
> >>>>> complex snapshot identifier resolution. Why would it be any better?
> >>>>
> >>>> It wouldn't be incompatible with existing images if we'd just disallow
> >>>> creating such snapshots.  I don't see how the identifier resolution
> >>>> would be more complex.
> >>>>
> >>>> I don't know if it'd be better.  I'd just find it simpler (for us, that
> >>>> is -- for users, I'm not sure).
> >>>
> >>> Identifier resolution A:
> >>> - Find a snapshot that has the given identifier as a name
> >>> - If no such snapshot exists, it is an error
> >>>
> >>> Identifier resolution B:
> >>> - If the identifier is a number, find a snapshot that has the given
> >>>   identifier as its ID
> >>> - If the identifier is not a number, find a snapshot that has the given
> >>>   identifier as a name
> >>> - If no such snapshot exists, it is an error
> >>
> >> No, my idea was to keep the resolution the same as it is; just to forbid
> >> creating new snapshots with numeric names.  This would prevent users
> >> from getting into the whole situation.
> > 
> > That's the version with an even more complex resolution method C. :-)
> 
> How so if the resolution method stays the same?  Because it already is
> too complex?
> 
> If so, yes, that is an argument.  I was arguing for the simplest patch
> instead of the simplest code, true.

Yes, because it already is too complex. Not even necessarily the code
(even though that's true as well), but most importantly the interface.

> > I actually think the current behaviour is more confusing than helpful.
> > Without looking into the code or trying it out, I couldn't even tell
> > whether ID or name takes precedence if there is a matching snapshot for
> > both. Considering your proposal, it's probably the ID, but how should a
> > user know that? (If against all expectations documentation exists, it
> > doesn't count because nobody reads that.)
> 
> It isn't more confusing than it is right now.  With my proposal, all
> current images are simply as confusing as they are right now (I think ID
> takes precedence, yes), but if you create new snapshots, it's clear,
> since you simply cannot create names that could be IDs.

I agree. But wasn't the goal of the patch to make it less confusing than
it is right now?

Kevin

Daniel Henrique Barboza Jan. 9, 2019, 4:57 p.m. UTC | #18

On 1/9/19 12:10 PM, Max Reitz wrote:
> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
>> changes in v2:
>> - removed the "RFC" marker;
>> - added a new patch (patch 2) that removes
>> bdrv_snapshot_delete_by_id_or_name from the code;
>> - made changes in patch 1 as suggested by Murilo;
>> - previous patch set link:
>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>>
>>
>> It is not uncommon to see bugs being opened by testers that attempt to
>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
>> common snapshot names and they trigger a lot of bugs. I gave an example
>> in the commit message of patch 1, but to sum up here: QEMU treats the
>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
>> is documented as such, but this can lead to strange situations.
>>
>> Given that it is strange for an API to consider a parameter to be 2 fields
>> at the same time, and inadvently treating them as one or the other, and
>> that removing the ID field is too drastic, my idea here is to keep the
>> ID field for internal control, but do not let the user set it.
>>
>> I guess there's room for discussion about considering this change an API
>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> (Yes, very late reply, I'm sorry...)
>
> I do think it affects users of HMP, because right now you can delete
> snapshots with their ID, and after this series you cannot.

That's true. My idea here was simple: the user can't reliably exclude 
via snapshot ID today
because we're hiding the ID field in info snapshots:


     (qemu) savevm 0
     (qemu) info snapshots
     List of snapshots present on all disks:
     ID        TAG                 VM SIZE                DATE VM CLOCK
     --        0                      741M 2018-07-31 13:39:56 00:41:25.313


Thus, what will end up happening is that the user will be forced to use the
TAG of the snapshot since this is the only available information.


>
> I think we had a short discussion about just disallowing numeric
> snapshot names.  How bad would that be?


This was my first idea when evaluating what to do in this case. I gave 
it up because
I found it to be too extreme. People would start complaining "I was able 
to do
savevm 0 and now I can't".



>
> Max
>

Max Reitz Jan. 9, 2019, 4:58 p.m. UTC | #19

On 09.01.19 17:45, Kevin Wolf wrote:
> Am 09.01.2019 um 17:27 hat Max Reitz geschrieben:
>> On 09.01.19 16:13, Kevin Wolf wrote:
>>> Am 09.01.2019 um 15:54 hat Max Reitz geschrieben:
>>>> On 09.01.19 15:48, Kevin Wolf wrote:
>>>>> Am 09.01.2019 um 15:27 hat Max Reitz geschrieben:
>>>>>> On 09.01.19 15:21, Kevin Wolf wrote:
>>>>>>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
>>>>>>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
>>>>>>>>> changes in v2:
>>>>>>>>> - removed the "RFC" marker;
>>>>>>>>> - added a new patch (patch 2) that removes
>>>>>>>>> bdrv_snapshot_delete_by_id_or_name from the code;
>>>>>>>>> - made changes in patch 1 as suggested by Murilo;
>>>>>>>>> - previous patch set link:
>>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It is not uncommon to see bugs being opened by testers that attempt to
>>>>>>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
>>>>>>>>> common snapshot names and they trigger a lot of bugs. I gave an example
>>>>>>>>> in the commit message of patch 1, but to sum up here: QEMU treats the
>>>>>>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
>>>>>>>>> is documented as such, but this can lead to strange situations.
>>>>>>>>>
>>>>>>>>> Given that it is strange for an API to consider a parameter to be 2 fields
>>>>>>>>> at the same time, and inadvently treating them as one or the other, and
>>>>>>>>> that removing the ID field is too drastic, my idea here is to keep the
>>>>>>>>> ID field for internal control, but do not let the user set it.
>>>>>>>>>
>>>>>>>>> I guess there's room for discussion about considering this change an API
>>>>>>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
>>>>>>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
>>>>>>>>
>>>>>>>> (Yes, very late reply, I'm sorry...)
>>>>>>>>
>>>>>>>> I do think it affects users of HMP, because right now you can delete
>>>>>>>> snapshots with their ID, and after this series you cannot.
>>>>>>>
>>>>>>> Can there be snapshots that can't be identified by a snapshot name, but
>>>>>>> only by their ID?
>>>>>>
>>>>>> I don't know, but what I meant is that if you have scripts to do all
>>>>>> this, you might have to adjust them with this change.
>>>>>
>>>>> That's what the H in HMP means.
>>>>>
>>>>>>>> I think we had a short discussion about just disallowing numeric
>>>>>>>> snapshot names.  How bad would that be?
>>>>>>>
>>>>>>> It would be incompatible with existing images and result in a more
>>>>>>> complex snapshot identifier resolution. Why would it be any better?
>>>>>>
>>>>>> It wouldn't be incompatible with existing images if we'd just disallow
>>>>>> creating such snapshots.  I don't see how the identifier resolution
>>>>>> would be more complex.
>>>>>>
>>>>>> I don't know if it'd be better.  I'd just find it simpler (for us, that
>>>>>> is -- for users, I'm not sure).
>>>>>
>>>>> Identifier resolution A:
>>>>> - Find a snapshot that has the given identifier as a name
>>>>> - If no such snapshot exists, it is an error
>>>>>
>>>>> Identifier resolution B:
>>>>> - If the identifier is a number, find a snapshot that has the given
>>>>>   identifier as its ID
>>>>> - If the identifier is not a number, find a snapshot that has the given
>>>>>   identifier as a name
>>>>> - If no such snapshot exists, it is an error
>>>>
>>>> No, my idea was to keep the resolution the same as it is; just to forbid
>>>> creating new snapshots with numeric names.  This would prevent users
>>>> from getting into the whole situation.
>>>
>>> That's the version with an even more complex resolution method C. :-)
>>
>> How so if the resolution method stays the same?  Because it already is
>> too complex?
>>
>> If so, yes, that is an argument.  I was arguing for the simplest patch
>> instead of the simplest code, true.
> 
> Yes, because it already is too complex. Not even necessarily the code
> (even though that's true as well), but most importantly the interface.
> 
>>> I actually think the current behaviour is more confusing than helpful.
>>> Without looking into the code or trying it out, I couldn't even tell
>>> whether ID or name takes precedence if there is a matching snapshot for
>>> both. Considering your proposal, it's probably the ID, but how should a
>>> user know that? (If against all expectations documentation exists, it
>>> doesn't count because nobody reads that.)
>>
>> It isn't more confusing than it is right now.  With my proposal, all
>> current images are simply as confusing as they are right now (I think ID
>> takes precedence, yes), but if you create new snapshots, it's clear,
>> since you simply cannot create names that could be IDs.
> 
> I agree. But wasn't the goal of the patch to make it less confusing than
> it is right now?

My approach would make it less confusing, too, just not for old images.
 But if you have images lying around with numeric snapshot names,
chances are you must have realized at some point that the ID takes
precedence.

But of course my approach would have to have benefits over this series,
as the latter exists already.  Hm.  The interface and code would remain
more complex, so that's a bad point.

A good point is that I think it's the less dangerous incompatible
change.  What's the worst that can happen if we disallow giving numeric
names?  You get an error with something that used to work.  Ouch, but
not horrible.

What's the worst that could happen with this series?  You used to give
all of your snapshots numeric names, but somehow you were able to deal
with it by always using the IDs when specifying existing snapshots.[1]
Now the behavior is changed and the name takes precedence -- so maybe
you delete the wrong snapshot or revert to the wrong one.  That's not so
good.

[1] I admit that it's more reasonable to assume you were just lucky it
worked, because for every snapshot, its name just happened to match its
ID.  If so, changing behavior doesn't bite you.


So I'm unsure.  I agree that this series is probably what we would have
wanted from the start, but I don't know if the change isn't a bit
adventurous.

Max

Max Reitz Jan. 9, 2019, 5:05 p.m. UTC | #20

On 09.01.19 17:57, Daniel Henrique Barboza wrote:
> 
> 
> On 1/9/19 12:10 PM, Max Reitz wrote:
>> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
>>> changes in v2:
>>> - removed the "RFC" marker;
>>> - added a new patch (patch 2) that removes
>>> bdrv_snapshot_delete_by_id_or_name from the code;
>>> - made changes in patch 1 as suggested by Murilo;
>>> - previous patch set link:
>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>>>
>>>
>>> It is not uncommon to see bugs being opened by testers that attempt to
>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
>>> common snapshot names and they trigger a lot of bugs. I gave an example
>>> in the commit message of patch 1, but to sum up here: QEMU treats the
>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
>>> is documented as such, but this can lead to strange situations.
>>>
>>> Given that it is strange for an API to consider a parameter to be 2
>>> fields
>>> at the same time, and inadvently treating them as one or the other, and
>>> that removing the ID field is too drastic, my idea here is to keep the
>>> ID field for internal control, but do not let the user set it.
>>>
>>> I guess there's room for discussion about considering this change an API
>>> change or not. It doesn't affect users of HMP and it doesn't affect
>>> Libvirt,
>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
>> (Yes, very late reply, I'm sorry...)
>>
>> I do think it affects users of HMP, because right now you can delete
>> snapshots with their ID, and after this series you cannot.
> 
> That's true. My idea here was simple: the user can't reliably exclude
> via snapshot ID today
> because we're hiding the ID field in info snapshots:
> 
> 
>     (qemu) savevm 0
>     (qemu) info snapshots
>     List of snapshots present on all disks:
>     ID        TAG                 VM SIZE                DATE VM CLOCK
>     --        0                      741M 2018-07-31 13:39:56 00:41:25.313
> 
> 
> Thus, what will end up happening is that the user will be forced to use the
> TAG of the snapshot since this is the only available information.

But you can get it through e.g. qemu-img info.

Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         0                      1.6M 2019-01-09 18:01:21   00:00:02.657

So it's not impossible to get.

>> I think we had a short discussion about just disallowing numeric
>> snapshot names.  How bad would that be?
> 
> 
> This was my first idea when evaluating what to do in this case. I gave
> it up because
> I found it to be too extreme. People would start complaining "I was able
> to do
> savevm 0 and now I can't".

True.  But it wouldn't be impossible to do, we'd need to deprecate
numeric names, print a warning for two releases, and then we can make it
an error.

Hm...  If we had a proper deprecation warning in this series, I suppose
it wouldn't be dangerous anymore.  Can we just print a warning whenever
the user specified an ID?  (i.e. if some snapshot's ID matches the
string given by the user and the snapshot's name does not)

Max

Eric Blake Jan. 9, 2019, 5:07 p.m. UTC | #21

On 1/9/19 10:57 AM, Daniel Henrique Barboza wrote:

>> I think we had a short discussion about just disallowing numeric
>> snapshot names.  How bad would that be?
> 
> 
> This was my first idea when evaluating what to do in this case. I gave
> it up because
> I found it to be too extreme. People would start complaining "I was able
> to do
> savevm 0 and now I can't".

I could live with that, especially if it serves to prevent even more
bugs of "I did this and now my image is broken" (I, for one, have known
about the confusion of breaking an image by using a number as a snapshot
name as far back as 2011). What I can't live with is:

"I have an old image where I did savevm 0, and now want to modernize the
image, but the new tools refuse to even let me read from name 0".

I also agree that the code is already so hairy, and the reason we have
punted on solving the issue (for 8 years now!) is because there are so
many corner cases to consider.  It also means that I'm reluctant to
review the patch, because it will require a significant chunk of time
and mental effort to ensure that whatever patch is proposed does not
break something important.  But in general, I'm glad that you are trying
to get the issue fixed, even if the conversation on HOW to fix it is
still undergoing a choice of bikeshed paint colors.

Kevin Wolf Jan. 9, 2019, 5:20 p.m. UTC | #22

Am 09.01.2019 um 18:05 hat Max Reitz geschrieben:
> On 09.01.19 17:57, Daniel Henrique Barboza wrote:
> > 
> > 
> > On 1/9/19 12:10 PM, Max Reitz wrote:
> >> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
> >>> changes in v2:
> >>> - removed the "RFC" marker;
> >>> - added a new patch (patch 2) that removes
> >>> bdrv_snapshot_delete_by_id_or_name from the code;
> >>> - made changes in patch 1 as suggested by Murilo;
> >>> - previous patch set link:
> >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> >>>
> >>>
> >>> It is not uncommon to see bugs being opened by testers that attempt to
> >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> >>> common snapshot names and they trigger a lot of bugs. I gave an example
> >>> in the commit message of patch 1, but to sum up here: QEMU treats the
> >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> >>> is documented as such, but this can lead to strange situations.
> >>>
> >>> Given that it is strange for an API to consider a parameter to be 2
> >>> fields
> >>> at the same time, and inadvently treating them as one or the other, and
> >>> that removing the ID field is too drastic, my idea here is to keep the
> >>> ID field for internal control, but do not let the user set it.
> >>>
> >>> I guess there's room for discussion about considering this change an API
> >>> change or not. It doesn't affect users of HMP and it doesn't affect
> >>> Libvirt,
> >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> >> (Yes, very late reply, I'm sorry...)
> >>
> >> I do think it affects users of HMP, because right now you can delete
> >> snapshots with their ID, and after this series you cannot.
> > 
> > That's true. My idea here was simple: the user can't reliably exclude
> > via snapshot ID today
> > because we're hiding the ID field in info snapshots:
> > 
> > 
> >     (qemu) savevm 0
> >     (qemu) info snapshots
> >     List of snapshots present on all disks:
> >     ID        TAG                 VM SIZE                DATE VM CLOCK
> >     --        0                      741M 2018-07-31 13:39:56 00:41:25.313
> > 
> > 
> > Thus, what will end up happening is that the user will be forced to use the
> > TAG of the snapshot since this is the only available information.
> 
> But you can get it through e.g. qemu-img info.
> 
> Snapshot list:
> ID        TAG                 VM SIZE                DATE       VM CLOCK
> 1         0                      1.6M 2019-01-09 18:01:21   00:00:02.657
> 
> So it's not impossible to get.

Is there a reason why we should display the ID at all when you can't use
it any more to identify snapshots?

> >> I think we had a short discussion about just disallowing numeric
> >> snapshot names.  How bad would that be?
> > 
> > 
> > This was my first idea when evaluating what to do in this case. I gave
> > it up because
> > I found it to be too extreme. People would start complaining "I was able
> > to do
> > savevm 0 and now I can't".
> 
> True.  But it wouldn't be impossible to do, we'd need to deprecate
> numeric names, print a warning for two releases, and then we can make it
> an error.

This. Is. HMP.

Not a stable ABI, no deprecation period of two releases.

> Hm...  If we had a proper deprecation warning in this series, I suppose
> it wouldn't be dangerous anymore.  Can we just print a warning whenever
> the user specified an ID?  (i.e. if some snapshot's ID matches the
> string given by the user and the snapshot's name does not)

How is it less of a problem when a user gets QEMU from the distro and
skips five releases? They will never see the warning. This is different
from QMP where things are fixed in libvirt, which is tested with every
single QEMU release.

Kevin

Daniel Henrique Barboza Jan. 9, 2019, 5:32 p.m. UTC | #23

On 1/9/19 3:05 PM, Max Reitz wrote:
> On 09.01.19 17:57, Daniel Henrique Barboza wrote:
>>
>> On 1/9/19 12:10 PM, Max Reitz wrote:
>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
>>>> changes in v2:
>>>> - removed the "RFC" marker;
>>>> - added a new patch (patch 2) that removes
>>>> bdrv_snapshot_delete_by_id_or_name from the code;
>>>> - made changes in patch 1 as suggested by Murilo;
>>>> - previous patch set link:
>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>>>>
>>>>
>>>> It is not uncommon to see bugs being opened by testers that attempt to
>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
>>>> common snapshot names and they trigger a lot of bugs. I gave an example
>>>> in the commit message of patch 1, but to sum up here: QEMU treats the
>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
>>>> is documented as such, but this can lead to strange situations.
>>>>
>>>> Given that it is strange for an API to consider a parameter to be 2
>>>> fields
>>>> at the same time, and inadvently treating them as one or the other, and
>>>> that removing the ID field is too drastic, my idea here is to keep the
>>>> ID field for internal control, but do not let the user set it.
>>>>
>>>> I guess there's room for discussion about considering this change an API
>>>> change or not. It doesn't affect users of HMP and it doesn't affect
>>>> Libvirt,
>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
>>> (Yes, very late reply, I'm sorry...)
>>>
>>> I do think it affects users of HMP, because right now you can delete
>>> snapshots with their ID, and after this series you cannot.
>> That's true. My idea here was simple: the user can't reliably exclude
>> via snapshot ID today
>> because we're hiding the ID field in info snapshots:
>>
>>
>>      (qemu) savevm 0
>>      (qemu) info snapshots
>>      List of snapshots present on all disks:
>>      ID        TAG                 VM SIZE                DATE VM CLOCK
>>      --        0                      741M 2018-07-31 13:39:56 00:41:25.313
>>
>>
>> Thus, what will end up happening is that the user will be forced to use the
>> TAG of the snapshot since this is the only available information.
> But you can get it through e.g. qemu-img info.
>
> Snapshot list:
> ID        TAG                 VM SIZE                DATE       VM CLOCK
> 1         0                      1.6M 2019-01-09 18:01:21   00:00:02.657
>
> So it's not impossible to get.

Hmpf .... should we obscure the ID in this case as well? I mean, why we
have the ID information here but not in "info snapshots"?


>>> I think we had a short discussion about just disallowing numeric
>>> snapshot names.  How bad would that be?
>>
>> This was my first idea when evaluating what to do in this case. I gave
>> it up because
>> I found it to be too extreme. People would start complaining "I was able
>> to do
>> savevm 0 and now I can't".
> True.  But it wouldn't be impossible to do, we'd need to deprecate
> numeric names, print a warning for two releases, and then we can make it
> an error.
>
> Hm...  If we had a proper deprecation warning in this series, I suppose
> it wouldn't be dangerous anymore.  Can we just print a warning whenever
> the user specified an ID?  (i.e. if some snapshot's ID matches the
> string given by the user and the snapshot's name does not)


I can live with that.



>
> Max
>

Max Reitz Jan. 9, 2019, 5:38 p.m. UTC | #24

On 09.01.19 18:20, Kevin Wolf wrote:
> Am 09.01.2019 um 18:05 hat Max Reitz geschrieben:
>> On 09.01.19 17:57, Daniel Henrique Barboza wrote:
>>>
>>>
>>> On 1/9/19 12:10 PM, Max Reitz wrote:
>>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
>>>>> changes in v2:
>>>>> - removed the "RFC" marker;
>>>>> - added a new patch (patch 2) that removes
>>>>> bdrv_snapshot_delete_by_id_or_name from the code;
>>>>> - made changes in patch 1 as suggested by Murilo;
>>>>> - previous patch set link:
>>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>>>>>
>>>>>
>>>>> It is not uncommon to see bugs being opened by testers that attempt to
>>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
>>>>> common snapshot names and they trigger a lot of bugs. I gave an example
>>>>> in the commit message of patch 1, but to sum up here: QEMU treats the
>>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
>>>>> is documented as such, but this can lead to strange situations.
>>>>>
>>>>> Given that it is strange for an API to consider a parameter to be 2
>>>>> fields
>>>>> at the same time, and inadvently treating them as one or the other, and
>>>>> that removing the ID field is too drastic, my idea here is to keep the
>>>>> ID field for internal control, but do not let the user set it.
>>>>>
>>>>> I guess there's room for discussion about considering this change an API
>>>>> change or not. It doesn't affect users of HMP and it doesn't affect
>>>>> Libvirt,
>>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
>>>> (Yes, very late reply, I'm sorry...)
>>>>
>>>> I do think it affects users of HMP, because right now you can delete
>>>> snapshots with their ID, and after this series you cannot.
>>>
>>> That's true. My idea here was simple: the user can't reliably exclude
>>> via snapshot ID today
>>> because we're hiding the ID field in info snapshots:
>>>
>>>
>>>     (qemu) savevm 0
>>>     (qemu) info snapshots
>>>     List of snapshots present on all disks:
>>>     ID        TAG                 VM SIZE                DATE VM CLOCK
>>>     --        0                      741M 2018-07-31 13:39:56 00:41:25.313
>>>
>>>
>>> Thus, what will end up happening is that the user will be forced to use the
>>> TAG of the snapshot since this is the only available information.
>>
>> But you can get it through e.g. qemu-img info.
>>
>> Snapshot list:
>> ID        TAG                 VM SIZE                DATE       VM CLOCK
>> 1         0                      1.6M 2019-01-09 18:01:21   00:00:02.657
>>
>> So it's not impossible to get.
> 
> Is there a reason why we should display the ID at all when you can't use
> it any more to identify snapshots?

I thought the long-term goal of this series really was to remove all
mentions of the ID, yes.

>>>> I think we had a short discussion about just disallowing numeric
>>>> snapshot names.  How bad would that be?
>>>
>>>
>>> This was my first idea when evaluating what to do in this case. I gave
>>> it up because
>>> I found it to be too extreme. People would start complaining "I was able
>>> to do
>>> savevm 0 and now I can't".
>>
>> True.  But it wouldn't be impossible to do, we'd need to deprecate
>> numeric names, print a warning for two releases, and then we can make it
>> an error.
> 
> This. Is. HMP.
> 
> Not a stable ABI, no deprecation period of two releases.

Well, if you want to do it.

This may be HMP, but this is also the only interface to savevm, so it's
not like users have a choice to use a more stable interface.  I know
that was a conscious decision, more or less, but I don't see why we need
to be so nasty when the hardest thing about doing a nice deprecation
would be to remember to make it an error in half a year.

>> Hm...  If we had a proper deprecation warning in this series, I suppose
>> it wouldn't be dangerous anymore.  Can we just print a warning whenever
>> the user specified an ID?  (i.e. if some snapshot's ID matches the
>> string given by the user and the snapshot's name does not)
> 
> How is it less of a problem when a user gets QEMU from the distro and
> skips five releases? They will never see the warning. This is different
> from QMP where things are fixed in libvirt, which is tested with every
> single QEMU release.

Well, then allowing users to accidentally specify the wrong snapshot
remains adventurous.

The thing is that this really is the only interface to savevm.  So while
it may be HMP, it probably won't be used only by plain human end users
who just don't want to deal with QMP.  I can imagine it is used with
scripts and by people who regularly update their qemu because they have
something important running on top of it.

<general whining>
Actually, to me what you're saying sounds more like "Our deprecation
policy is useless" to which I wholeheartedly agree.  I think we should
only remove things in major releases, and only if it was deprecated in
the previous major release already.  (So if you deprecate something in
4.0, you can remove it in 5.0; but if you deprecate it in 4.1, you can
remove it only in 6.0.)  From a user standpoint I really think we
deprecate stuff too irregularly.
</general whining>

OK, anyway, you're talking about stable distros.  But there are other
users, too, so yes, this is less of a problem because it is a problem
only for users of stable distros; so it's a problem for less users.


But anyway, if you think deprecation is futile here, then I maintain
that this series bears a bit of a risk.

Max

Eric Blake Jan. 9, 2019, 5:52 p.m. UTC | #25

On 1/9/19 11:38 AM, Max Reitz wrote:

> 
> <general whining>
> Actually, to me what you're saying sounds more like "Our deprecation
> policy is useless" to which I wholeheartedly agree.  I think we should
> only remove things in major releases, and only if it was deprecated in
> the previous major release already.  (So if you deprecate something in
> 4.0, you can remove it in 5.0; but if you deprecate it in 4.1, you can
> remove it only in 6.0.)  From a user standpoint I really think we
> deprecate stuff too irregularly.
> </general whining>

That's actually incorrect. Our current version numbering scheme is that
the major version number is NOT synonymous with major releases: we just
bump the major version number once per year, and ALL releases are on
equal footing with no one release being more major than others.  Thus, a
policy that (at least) 2 releases is needed for a deprecation is
consistent, where one that requires waiting for a bump in the major
version number (which is as short as one release and as long as 3, given
that we bump every year with about 3 releases per year) is the one that
is less predictable and less meaningful (why is waiting for January
better than waiting for 2 releases?).

Eric Blake Jan. 9, 2019, 5:55 p.m. UTC | #26

On 1/9/19 11:38 AM, Max Reitz wrote:

>>>>> I do think it affects users of HMP, because right now you can delete
>>>>> snapshots with their ID, and after this series you cannot.
>>>>

>> This. Is. HMP.
>>
>> Not a stable ABI, no deprecation period of two releases.
> 
> Well, if you want to do it.
> 
> This may be HMP, but this is also the only interface to savevm, so it's
> not like users have a choice to use a more stable interface.  I know
> that was a conscious decision, more or less, but I don't see why we need
> to be so nasty when the hardest thing about doing a nice deprecation
> would be to remember to make it an error in half a year.

Indeed, and libvirt IS using 'savevm' via HMP via QMP's
human-monitor-command, since there is no QMP counterpart for internal
snapshot.  Even though lately we consistently tell people that internal
snapshots are underdeveloped and you should use external snapshots, it
does not get away from the fact that libvirt has been using 'savevm' to
drive internal snapshots for years now, and that we MUST consider
back-compat and/or add an introspectible QMP interface before making
changes that would break libvirt.

Daniel Henrique Barboza Jan. 9, 2019, 6:19 p.m. UTC | #27

On 1/9/19 12:21 PM, Kevin Wolf wrote:
> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben:
>> On 06.09.18 13:11, Daniel Henrique Barboza wrote:
>>> changes in v2:
>>> - removed the "RFC" marker;
>>> - added a new patch (patch 2) that removes
>>> bdrv_snapshot_delete_by_id_or_name from the code;
>>> - made changes in patch 1 as suggested by Murilo;
>>> - previous patch set link:
>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
>>>
>>>
>>> It is not uncommon to see bugs being opened by testers that attempt to
>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
>>> common snapshot names and they trigger a lot of bugs. I gave an example
>>> in the commit message of patch 1, but to sum up here: QEMU treats the
>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
>>> is documented as such, but this can lead to strange situations.
>>>
>>> Given that it is strange for an API to consider a parameter to be 2 fields
>>> at the same time, and inadvently treating them as one or the other, and
>>> that removing the ID field is too drastic, my idea here is to keep the
>>> ID field for internal control, but do not let the user set it.
>>>
>>> I guess there's room for discussion about considering this change an API
>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
>> (Yes, very late reply, I'm sorry...)
>>
>> I do think it affects users of HMP, because right now you can delete
>> snapshots with their ID, and after this series you cannot.
> Can there be snapshots that can't be identified by a snapshot name, but
> only by their ID?

Hmmm I would need to read my notes back when I was working on this series.
In a quick look at the code and playing around with HMP, what I can tell 
is that:

- there is no way for a snapshot to have an empty TAG - an auto-generated
name is generated and used as TAG.

-  if you create a snapshot using an existing TAG, the older snapshot is 
overwritten
without warning (would be nice to fix it to at least provide a warning 
message - I can
do it in this series if a respin is needed)

I do not have the expertise of the whole block layer to see further 
implications of
the changes made in this series, but as far as HMP goes, I don't think 
there's too much
to it other than what I've said in the cover letter. The goal here is to 
remove the
ambiguity of save/load/del vm commands that would interpret the argument
either as tag or ID, leading to situations in which snapshots ended up 
getting
overwritten because the tag matched an existing ID.

I am not against providing a warning message if the user tries to create 
a snapshot
using a numerical tag. In the end, although I said otherwise in the 
cover letter, changing
the meaning of an input is still an API change, and I'd rather error on 
the cautious side
and warn the callers about it. Even if this has no/little impact (please 
feel free to prove
me wrong) on them.

Daniel

>
>> I think we had a short discussion about just disallowing numeric
>> snapshot names.  How bad would that be?
> It would be incompatible with existing images and result in a more
> complex snapshot identifier resolution. Why would it be any better?
>
> Kevin

Kevin Wolf Jan. 9, 2019, 6:51 p.m. UTC | #28

Am 09.01.2019 um 18:55 hat Eric Blake geschrieben:
> On 1/9/19 11:38 AM, Max Reitz wrote:
> 
> >>>>> I do think it affects users of HMP, because right now you can delete
> >>>>> snapshots with their ID, and after this series you cannot.
> >>>>
> 
> >> This. Is. HMP.
> >>
> >> Not a stable ABI, no deprecation period of two releases.
> > 
> > Well, if you want to do it.
> > 
> > This may be HMP, but this is also the only interface to savevm, so it's
> > not like users have a choice to use a more stable interface.  I know
> > that was a conscious decision, more or less, but I don't see why we need
> > to be so nasty when the hardest thing about doing a nice deprecation
> > would be to remember to make it an error in half a year.
> 
> Indeed, and libvirt IS using 'savevm' via HMP via QMP's
> human-monitor-command, since there is no QMP counterpart for internal
> snapshot.  Even though lately we consistently tell people that internal
> snapshots are underdeveloped and you should use external snapshots, it
> does not get away from the fact that libvirt has been using 'savevm' to
> drive internal snapshots for years now, and that we MUST consider
> back-compat and/or add an introspectible QMP interface before making
> changes that would break libvirt.

Okay, so what does libvirt do when you request a snapshot with a
numerical name? Without having looked at the code, the best case I would
expect that it forbids them, and more realistically I suspect that we
may actually fix a bug for libvirt by changing the semantics.

Or does libvirt really use snapshot IDs rather than names?

Kevin

Kevin Wolf Jan. 9, 2019, 7 p.m. UTC | #29

Am 09.01.2019 um 18:52 hat Eric Blake geschrieben:
> On 1/9/19 11:38 AM, Max Reitz wrote:
> 
> > 
> > <general whining>
> > Actually, to me what you're saying sounds more like "Our deprecation
> > policy is useless" to which I wholeheartedly agree.

If you restrict it to "Our deprecation policy is useless for user-facing
changes", I might agree. I think the policy is fine for stuff like QMP
where only the management layer needs to be aware of the change. We
still have problems there, but these are not problems of the policy.

> > I think we should only remove things in major releases, and only if
> > it was deprecated in the previous major release already.  (So if you
> > deprecate something in 4.0, you can remove it in 5.0; but if you
> > deprecate it in 4.1, you can remove it only in 6.0.)  From a user
> > standpoint I really think we deprecate stuff too irregularly.
> > </general whining>
> 
> That's actually incorrect. Our current version numbering scheme is that
> the major version number is NOT synonymous with major releases: we just
> bump the major version number once per year, and ALL releases are on
> equal footing with no one release being more major than others.  Thus, a
> policy that (at least) 2 releases is needed for a deprecation is
> consistent, where one that requires waiting for a bump in the major
> version number (which is as short as one release and as long as 3, given
> that we bump every year with about 3 releases per year) is the one that
> is less predictable and less meaningful (why is waiting for January
> better than waiting for 2 releases?).

As someone who usually skips distro versions because he wants to have
the change and necessary adaption only once instead of two or three
times in the same timeframe, I can see some value for users in breaking
stable APIs only in defined versions.

At the same time, I can also see the value in being able to break stable
APIs without waiting for almost two years with Max' policy and bad
timing.

For software I develop I prefer the latter, for software that I use I
prefer the former. ;-)

Kevin

Eric Blake Jan. 9, 2019, 7:02 p.m. UTC | #30

On 1/9/19 12:51 PM, Kevin Wolf wrote:

>> Indeed, and libvirt IS using 'savevm' via HMP via QMP's
>> human-monitor-command, since there is no QMP counterpart for internal
>> snapshot.  Even though lately we consistently tell people that internal
>> snapshots are underdeveloped and you should use external snapshots, it
>> does not get away from the fact that libvirt has been using 'savevm' to
>> drive internal snapshots for years now, and that we MUST consider
>> back-compat and/or add an introspectible QMP interface before making
>> changes that would break libvirt.
> 
> Okay, so what does libvirt do when you request a snapshot with a
> numerical name? Without having looked at the code, the best case I would
> expect that it forbids them, and more realistically I suspect that we
> may actually fix a bug for libvirt by changing the semantics.
> 
> Or does libvirt really use snapshot IDs rather than names?

At the moment, libvirt does not place any restriction on internal
snapshot names, but passes the user's name through without thought of
whether it is an id or a name.

So yes, arguably tightening the semantics fixes a libvirt bug for
libvirt having allowed internal snapshots to get into an inconsistent
state.  But again, it falls in the category of "properly fixing this
requires a lot of auditing, time, mental power, and unit tests", which
makes it a rather daunting task to state for certain.

Kevin Wolf Jan. 10, 2019, 11:25 a.m. UTC | #31

Am 09.01.2019 um 20:02 hat Eric Blake geschrieben:
> On 1/9/19 12:51 PM, Kevin Wolf wrote:
> 
> >> Indeed, and libvirt IS using 'savevm' via HMP via QMP's
> >> human-monitor-command, since there is no QMP counterpart for internal
> >> snapshot.  Even though lately we consistently tell people that internal
> >> snapshots are underdeveloped and you should use external snapshots, it
> >> does not get away from the fact that libvirt has been using 'savevm' to
> >> drive internal snapshots for years now, and that we MUST consider
> >> back-compat and/or add an introspectible QMP interface before making
> >> changes that would break libvirt.
> > 
> > Okay, so what does libvirt do when you request a snapshot with a
> > numerical name? Without having looked at the code, the best case I would
> > expect that it forbids them, and more realistically I suspect that we
> > may actually fix a bug for libvirt by changing the semantics.
> > 
> > Or does libvirt really use snapshot IDs rather than names?
> 
> At the moment, libvirt does not place any restriction on internal
> snapshot names, but passes the user's name through without thought of
> whether it is an id or a name.
> 
> So yes, arguably tightening the semantics fixes a libvirt bug for
> libvirt having allowed internal snapshots to get into an inconsistent
> state.

So there are two scenarios to consider with respect to breaking
backwards compatibility:

1. There may be code out there that relies on numeric arguments being
   interpreted as IDs. This code will break if we make this change and
   numeric snapshot names exist. That such code exists is speculation
   (even though plausible) and we don't know how widespread it is.

2. There may be code out there that blindly assumes that whatever it
   passes is interpreted as a name. Nobody considered that with a
   numeric snapshot name, it maybe misinterpreted as an ID. We know that
   this is true for libvirt, the single most used management tool for
   QEMU. More code like this may (and probably does) exist.

Essentially the same two categories exist for human users: those who
somehow found out that QEMU accepts IDs as monitor command arguments and
are using those (mitigated by not displaying IDs any more), and those
who are trapped because they wanted to access a numeric name, but
surprisingly got it interpreted as an ID. Both are speculation to some
degree, but my guess is that the latter group is larger.

Given all this, this is a no-brainer for me. We simplify the interface,
we simplify the code, we make things less confusing for manual users and
we fix the management tool that everybody uses. How could we not make
this change?

> But again, it falls in the category of "properly fixing this
> requires a lot of auditing, time, mental power, and unit tests", which
> makes it a rather daunting task to state for certain.

Fix the world before we can fix anything?

Kevin

Dr. David Alan Gilbert Jan. 10, 2019, 11:41 a.m. UTC | #32

* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 09.01.2019 um 20:02 hat Eric Blake geschrieben:
> > On 1/9/19 12:51 PM, Kevin Wolf wrote:
> > 
> > >> Indeed, and libvirt IS using 'savevm' via HMP via QMP's
> > >> human-monitor-command, since there is no QMP counterpart for internal
> > >> snapshot.  Even though lately we consistently tell people that internal
> > >> snapshots are underdeveloped and you should use external snapshots, it
> > >> does not get away from the fact that libvirt has been using 'savevm' to
> > >> drive internal snapshots for years now, and that we MUST consider
> > >> back-compat and/or add an introspectible QMP interface before making
> > >> changes that would break libvirt.
> > > 
> > > Okay, so what does libvirt do when you request a snapshot with a
> > > numerical name? Without having looked at the code, the best case I would
> > > expect that it forbids them, and more realistically I suspect that we
> > > may actually fix a bug for libvirt by changing the semantics.
> > > 
> > > Or does libvirt really use snapshot IDs rather than names?
> > 
> > At the moment, libvirt does not place any restriction on internal
> > snapshot names, but passes the user's name through without thought of
> > whether it is an id or a name.
> > 
> > So yes, arguably tightening the semantics fixes a libvirt bug for
> > libvirt having allowed internal snapshots to get into an inconsistent
> > state.
> 
> So there are two scenarios to consider with respect to breaking
> backwards compatibility:
> 
> 1. There may be code out there that relies on numeric arguments being
>    interpreted as IDs. This code will break if we make this change and
>    numeric snapshot names exist. That such code exists is speculation
>    (even though plausible) and we don't know how widespread it is.
> 
> 2. There may be code out there that blindly assumes that whatever it
>    passes is interpreted as a name. Nobody considered that with a
>    numeric snapshot name, it maybe misinterpreted as an ID. We know that
>    this is true for libvirt, the single most used management tool for
>    QEMU. More code like this may (and probably does) exist.
> 
> Essentially the same two categories exist for human users: those who
> somehow found out that QEMU accepts IDs as monitor command arguments and
> are using those (mitigated by not displaying IDs any more), and those
> who are trapped because they wanted to access a numeric name, but
> surprisingly got it interpreted as an ID. Both are speculation to some
> degree, but my guess is that the latter group is larger.
> 
> Given all this, this is a no-brainer for me. We simplify the interface,
> we simplify the code, we make things less confusing for manual users and
> we fix the management tool that everybody uses. How could we not make
> this change?
> 
> > But again, it falls in the category of "properly fixing this
> > requires a lot of auditing, time, mental power, and unit tests", which
> > makes it a rather daunting task to state for certain.
> 
> Fix the world before we can fix anything?

Can't we break this loop for the savevm command fairly easily; it's
currently documnted as:

savevm [tag|id] 

If we made that:

savevm [-t] [-i] [tag|id]

then:
  a) with neither -t or -i  it would behave in the same roulette way
     as it does in the moment, and it might be a tag or id

  b) with -t we'd explicitly treat the parameter as a tag and it
     would error if it wasn't found

  c) With -i we'd explicitly treat the parameter as an id and
     it would error if it wasn't found

Since we still allow (a) it doesn't break any existing code.

Dave


> Kevin


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Daniel Henrique Barboza Jan. 10, 2019, 1:03 p.m. UTC | #33

On 1/10/19 9:41 AM, Dr. David Alan Gilbert wrote:
> * Kevin Wolf (kwolf@redhat.com) wrote:
>> Am 09.01.2019 um 20:02 hat Eric Blake geschrieben:
>>> On 1/9/19 12:51 PM, Kevin Wolf wrote:
>>>
>>>>> Indeed, and libvirt IS using 'savevm' via HMP via QMP's
>>>>> human-monitor-command, since there is no QMP counterpart for internal
>>>>> snapshot.  Even though lately we consistently tell people that internal
>>>>> snapshots are underdeveloped and you should use external snapshots, it
>>>>> does not get away from the fact that libvirt has been using 'savevm' to
>>>>> drive internal snapshots for years now, and that we MUST consider
>>>>> back-compat and/or add an introspectible QMP interface before making
>>>>> changes that would break libvirt.
>>>> Okay, so what does libvirt do when you request a snapshot with a
>>>> numerical name? Without having looked at the code, the best case I would
>>>> expect that it forbids them, and more realistically I suspect that we
>>>> may actually fix a bug for libvirt by changing the semantics.
>>>>
>>>> Or does libvirt really use snapshot IDs rather than names?
>>> At the moment, libvirt does not place any restriction on internal
>>> snapshot names, but passes the user's name through without thought of
>>> whether it is an id or a name.
>>>
>>> So yes, arguably tightening the semantics fixes a libvirt bug for
>>> libvirt having allowed internal snapshots to get into an inconsistent
>>> state.
>> So there are two scenarios to consider with respect to breaking
>> backwards compatibility:
>>
>> 1. There may be code out there that relies on numeric arguments being
>>     interpreted as IDs. This code will break if we make this change and
>>     numeric snapshot names exist. That such code exists is speculation
>>     (even though plausible) and we don't know how widespread it is.
>>
>> 2. There may be code out there that blindly assumes that whatever it
>>     passes is interpreted as a name. Nobody considered that with a
>>     numeric snapshot name, it maybe misinterpreted as an ID. We know that
>>     this is true for libvirt, the single most used management tool for
>>     QEMU. More code like this may (and probably does) exist.
>>
>> Essentially the same two categories exist for human users: those who
>> somehow found out that QEMU accepts IDs as monitor command arguments and
>> are using those (mitigated by not displaying IDs any more), and those
>> who are trapped because they wanted to access a numeric name, but
>> surprisingly got it interpreted as an ID. Both are speculation to some
>> degree, but my guess is that the latter group is larger.
>>
>> Given all this, this is a no-brainer for me. We simplify the interface,
>> we simplify the code, we make things less confusing for manual users and
>> we fix the management tool that everybody uses. How could we not make
>> this change?
>>
>>> But again, it falls in the category of "properly fixing this
>>> requires a lot of auditing, time, mental power, and unit tests", which
>>> makes it a rather daunting task to state for certain.
>> Fix the world before we can fix anything?
> Can't we break this loop for the savevm command fairly easily; it's
> currently documnted as:
>
> savevm [tag|id]
>
> If we made that:
>
> savevm [-t] [-i] [tag|id]
>
> then:
>    a) with neither -t or -i  it would behave in the same roulette way
>       as it does in the moment, and it might be a tag or id

The roulette we have today is too confusing IMO. For starters, we're not 
creating
snapshot using ID at this moment. The ID is being auto-generated by doing
max_ID_found + 1.

Quick example: savevm 4 isn't creating a snapshot with ID=4 and an empty 
tag - it is
creating a snapshot with ID=ID_MAX + 1 and tag = 4. Let's say that 
ID_MAX was 9,
so this new snapshot would be ID=10, tag=4.

Now, the user wants to delete this snapshot. "delvm 4" will  first match 
ID, then tag.
If there is a snapshot with ID=4, it will be mistakenly erased. Note 
that this isn't
a bug - even with tweaks here and there, savevm/delvm (number) will always
be ambiguous because the API is allowing the user to interpret an input 
anyway
he/she wants. This is an API design issue.

The only way I see to reliably make this id|tag API to work is to always 
interpret
numerical arguments as ID. Then we can safely assume what the user wants 
to do.
We wouldn't even need the other 2 options you mentioned (-t and -i). But 
then,
as I've said somewhere in this thread, users would complain that "savevm 
4" isn't
doing what it was doing before. "savevm 4 is creating snapshots with 
auto-generated
tag instead of  tag=4". I can see the bugzillas already.



DHB



>
>    b) with -t we'd explicitly treat the parameter as a tag and it
>       would error if it wasn't found
>
>    c) With -i we'd explicitly treat the parameter as an id and
>       it would error if it wasn't found
>
> Since we still allow (a) it doesn't break any existing code.
>
> Dave
>
>
>> Kevin
>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Kevin Wolf Jan. 10, 2019, 3:11 p.m. UTC | #34

Am 10.01.2019 um 12:41 hat Dr. David Alan Gilbert geschrieben:
> * Kevin Wolf (kwolf@redhat.com) wrote:
> > Am 09.01.2019 um 20:02 hat Eric Blake geschrieben:
> > > On 1/9/19 12:51 PM, Kevin Wolf wrote:
> > > 
> > > >> Indeed, and libvirt IS using 'savevm' via HMP via QMP's
> > > >> human-monitor-command, since there is no QMP counterpart for internal
> > > >> snapshot.  Even though lately we consistently tell people that internal
> > > >> snapshots are underdeveloped and you should use external snapshots, it
> > > >> does not get away from the fact that libvirt has been using 'savevm' to
> > > >> drive internal snapshots for years now, and that we MUST consider
> > > >> back-compat and/or add an introspectible QMP interface before making
> > > >> changes that would break libvirt.
> > > > 
> > > > Okay, so what does libvirt do when you request a snapshot with a
> > > > numerical name? Without having looked at the code, the best case I would
> > > > expect that it forbids them, and more realistically I suspect that we
> > > > may actually fix a bug for libvirt by changing the semantics.
> > > > 
> > > > Or does libvirt really use snapshot IDs rather than names?
> > > 
> > > At the moment, libvirt does not place any restriction on internal
> > > snapshot names, but passes the user's name through without thought of
> > > whether it is an id or a name.
> > > 
> > > So yes, arguably tightening the semantics fixes a libvirt bug for
> > > libvirt having allowed internal snapshots to get into an inconsistent
> > > state.
> > 
> > So there are two scenarios to consider with respect to breaking
> > backwards compatibility:
> > 
> > 1. There may be code out there that relies on numeric arguments being
> >    interpreted as IDs. This code will break if we make this change and
> >    numeric snapshot names exist. That such code exists is speculation
> >    (even though plausible) and we don't know how widespread it is.
> > 
> > 2. There may be code out there that blindly assumes that whatever it
> >    passes is interpreted as a name. Nobody considered that with a
> >    numeric snapshot name, it maybe misinterpreted as an ID. We know that
> >    this is true for libvirt, the single most used management tool for
> >    QEMU. More code like this may (and probably does) exist.
> > 
> > Essentially the same two categories exist for human users: those who
> > somehow found out that QEMU accepts IDs as monitor command arguments and
> > are using those (mitigated by not displaying IDs any more), and those
> > who are trapped because they wanted to access a numeric name, but
> > surprisingly got it interpreted as an ID. Both are speculation to some
> > degree, but my guess is that the latter group is larger.
> > 
> > Given all this, this is a no-brainer for me. We simplify the interface,
> > we simplify the code, we make things less confusing for manual users and
> > we fix the management tool that everybody uses. How could we not make
> > this change?
> > 
> > > But again, it falls in the category of "properly fixing this
> > > requires a lot of auditing, time, mental power, and unit tests", which
> > > makes it a rather daunting task to state for certain.
> > 
> > Fix the world before we can fix anything?
> 
> Can't we break this loop for the savevm command fairly easily; it's
> currently documnted as:
> 
> savevm [tag|id] 
> 
> If we made that:
> 
> savevm [-t] [-i] [tag|id]
> 
> then:
>   a) with neither -t or -i  it would behave in the same roulette way
>      as it does in the moment, and it might be a tag or id
> 
>   b) with -t we'd explicitly treat the parameter as a tag and it
>      would error if it wasn't found
> 
>   c) With -i we'd explicitly treat the parameter as an id and
>      it would error if it wasn't found
> 
> Since we still allow (a) it doesn't break any existing code.

If you can explain why we need both tag and id?

And by keeping the current behaviour, we might not break hypothetically
existing correct code, but we leave currently actually existing broken
code like libvirt broken.

Kevin

Dr. David Alan Gilbert Jan. 10, 2019, 5:06 p.m. UTC | #35

* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 10.01.2019 um 12:41 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > Am 09.01.2019 um 20:02 hat Eric Blake geschrieben:
> > > > On 1/9/19 12:51 PM, Kevin Wolf wrote:
> > > > 
> > > > >> Indeed, and libvirt IS using 'savevm' via HMP via QMP's
> > > > >> human-monitor-command, since there is no QMP counterpart for internal
> > > > >> snapshot.  Even though lately we consistently tell people that internal
> > > > >> snapshots are underdeveloped and you should use external snapshots, it
> > > > >> does not get away from the fact that libvirt has been using 'savevm' to
> > > > >> drive internal snapshots for years now, and that we MUST consider
> > > > >> back-compat and/or add an introspectible QMP interface before making
> > > > >> changes that would break libvirt.
> > > > > 
> > > > > Okay, so what does libvirt do when you request a snapshot with a
> > > > > numerical name? Without having looked at the code, the best case I would
> > > > > expect that it forbids them, and more realistically I suspect that we
> > > > > may actually fix a bug for libvirt by changing the semantics.
> > > > > 
> > > > > Or does libvirt really use snapshot IDs rather than names?
> > > > 
> > > > At the moment, libvirt does not place any restriction on internal
> > > > snapshot names, but passes the user's name through without thought of
> > > > whether it is an id or a name.
> > > > 
> > > > So yes, arguably tightening the semantics fixes a libvirt bug for
> > > > libvirt having allowed internal snapshots to get into an inconsistent
> > > > state.
> > > 
> > > So there are two scenarios to consider with respect to breaking
> > > backwards compatibility:
> > > 
> > > 1. There may be code out there that relies on numeric arguments being
> > >    interpreted as IDs. This code will break if we make this change and
> > >    numeric snapshot names exist. That such code exists is speculation
> > >    (even though plausible) and we don't know how widespread it is.
> > > 
> > > 2. There may be code out there that blindly assumes that whatever it
> > >    passes is interpreted as a name. Nobody considered that with a
> > >    numeric snapshot name, it maybe misinterpreted as an ID. We know that
> > >    this is true for libvirt, the single most used management tool for
> > >    QEMU. More code like this may (and probably does) exist.
> > > 
> > > Essentially the same two categories exist for human users: those who
> > > somehow found out that QEMU accepts IDs as monitor command arguments and
> > > are using those (mitigated by not displaying IDs any more), and those
> > > who are trapped because they wanted to access a numeric name, but
> > > surprisingly got it interpreted as an ID. Both are speculation to some
> > > degree, but my guess is that the latter group is larger.
> > > 
> > > Given all this, this is a no-brainer for me. We simplify the interface,
> > > we simplify the code, we make things less confusing for manual users and
> > > we fix the management tool that everybody uses. How could we not make
> > > this change?
> > > 
> > > > But again, it falls in the category of "properly fixing this
> > > > requires a lot of auditing, time, mental power, and unit tests", which
> > > > makes it a rather daunting task to state for certain.
> > > 
> > > Fix the world before we can fix anything?
> > 
> > Can't we break this loop for the savevm command fairly easily; it's
> > currently documnted as:
> > 
> > savevm [tag|id] 
> > 
> > If we made that:
> > 
> > savevm [-t] [-i] [tag|id]
> > 
> > then:
> >   a) with neither -t or -i  it would behave in the same roulette way
> >      as it does in the moment, and it might be a tag or id
> > 
> >   b) with -t we'd explicitly treat the parameter as a tag and it
> >      would error if it wasn't found
> > 
> >   c) With -i we'd explicitly treat the parameter as an id and
> >      it would error if it wasn't found
> > 
> > Since we still allow (a) it doesn't break any existing code.
> 
> If you can explain why we need both tag and id?
> 
> And by keeping the current behaviour, we might not break hypothetically
> existing correct code, but we leave currently actually existing broken
> code like libvirt broken.

My only reason for leaving both tag & id was for the hypothetical
existing current code; my assumption adding the above would be that we
would then fix libvirt never to use (a),  probably always (b).

Dave
> Kevin
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Eric Blake Jan. 10, 2019, 6:22 p.m. UTC | #36

On 1/10/19 11:06 AM, Dr. David Alan Gilbert wrote:

>>> savevm [-t] [-i] [tag|id]
>>>
>>> then:
>>>   a) with neither -t or -i  it would behave in the same roulette way
>>>      as it does in the moment, and it might be a tag or id
>>>
>>>   b) with -t we'd explicitly treat the parameter as a tag and it
>>>      would error if it wasn't found
>>>
>>>   c) With -i we'd explicitly treat the parameter as an id and
>>>      it would error if it wasn't found
>>>
>>> Since we still allow (a) it doesn't break any existing code.
>>
>> If you can explain why we need both tag and id?
>>
>> And by keeping the current behaviour, we might not break hypothetically
>> existing correct code, but we leave currently actually existing broken
>> code like libvirt broken.
> 
> My only reason for leaving both tag & id was for the hypothetical
> existing current code; my assumption adding the above would be that we
> would then fix libvirt never to use (a),  probably always (b).

How? HMP is not introspectible, so libvirt can't know if 'savevm -t'
works without trying it.

Kevin Wolf Jan. 11, 2019, 12:14 p.m. UTC | #37

Am 10.01.2019 um 19:22 hat Eric Blake geschrieben:
> On 1/10/19 11:06 AM, Dr. David Alan Gilbert wrote:
> 
> >>> savevm [-t] [-i] [tag|id]
> >>>
> >>> then:
> >>>   a) with neither -t or -i  it would behave in the same roulette way
> >>>      as it does in the moment, and it might be a tag or id
> >>>
> >>>   b) with -t we'd explicitly treat the parameter as a tag and it
> >>>      would error if it wasn't found
> >>>
> >>>   c) With -i we'd explicitly treat the parameter as an id and
> >>>      it would error if it wasn't found
> >>>
> >>> Since we still allow (a) it doesn't break any existing code.
> >>
> >> If you can explain why we need both tag and id?
> >>
> >> And by keeping the current behaviour, we might not break hypothetically
> >> existing correct code, but we leave currently actually existing broken
> >> code like libvirt broken.
> > 
> > My only reason for leaving both tag & id was for the hypothetical
> > existing current code; my assumption adding the above would be that we
> > would then fix libvirt never to use (a),  probably always (b).
> 
> How? HMP is not introspectible, so libvirt can't know if 'savevm -t'
> works without trying it.

It is introspectible in a way, 'help savevm'. Just that the result of
that isn't machine readable and not meant to be parsed by programs. But
that's just what happens when you use HMP.

I'm still against the proposal because it makes the interface more
complicated rather than simpler.

Kevin

Max Reitz Jan. 11, 2019, 12:35 p.m. UTC | #38

On 09.01.19 18:52, Eric Blake wrote:
> On 1/9/19 11:38 AM, Max Reitz wrote:
> 
>>
>> <general whining>
>> Actually, to me what you're saying sounds more like "Our deprecation
>> policy is useless" to which I wholeheartedly agree.  I think we should
>> only remove things in major releases, and only if it was deprecated in
>> the previous major release already.  (So if you deprecate something in
>> 4.0, you can remove it in 5.0; but if you deprecate it in 4.1, you can
>> remove it only in 6.0.)  From a user standpoint I really think we
>> deprecate stuff too irregularly.
>> </general whining>
> 
> That's actually incorrect. Our current version numbering scheme is that
> the major version number is NOT synonymous with major releases: we just
> bump the major version number once per year, and ALL releases are on
> equal footing with no one release being more major than others.  Thus, a
> policy that (at least) 2 releases is needed for a deprecation is
> consistent, where one that requires waiting for a bump in the major
> version number (which is as short as one release and as long as 3, given
> that we bump every year with about 3 releases per year) is the one that
> is less predictable and less meaningful (why is waiting for January
> better than waiting for 2 releases?).

This sounds to me like we just don't have major releases because our
deprecation policy doesn't make use of it.  If we only broke APIs when
bumping the major release number, then by definition that would be major
releases, no?

Also, if you %s/major release/x.0 release/g in my whining, the argument
remains the same.

And, yes, I dislike our versioning policy, too.  My whining may have
indicated that I like semver.

Max

Max Reitz Jan. 11, 2019, 1:22 p.m. UTC | #39

On 10.01.19 12:41, Dr. David Alan Gilbert wrote:
> * Kevin Wolf (kwolf@redhat.com) wrote:
>> Am 09.01.2019 um 20:02 hat Eric Blake geschrieben:
>>> On 1/9/19 12:51 PM, Kevin Wolf wrote:
>>>
>>>>> Indeed, and libvirt IS using 'savevm' via HMP via QMP's
>>>>> human-monitor-command, since there is no QMP counterpart for internal
>>>>> snapshot.  Even though lately we consistently tell people that internal
>>>>> snapshots are underdeveloped and you should use external snapshots, it
>>>>> does not get away from the fact that libvirt has been using 'savevm' to
>>>>> drive internal snapshots for years now, and that we MUST consider
>>>>> back-compat and/or add an introspectible QMP interface before making
>>>>> changes that would break libvirt.
>>>>
>>>> Okay, so what does libvirt do when you request a snapshot with a
>>>> numerical name? Without having looked at the code, the best case I would
>>>> expect that it forbids them, and more realistically I suspect that we
>>>> may actually fix a bug for libvirt by changing the semantics.
>>>>
>>>> Or does libvirt really use snapshot IDs rather than names?
>>>
>>> At the moment, libvirt does not place any restriction on internal
>>> snapshot names, but passes the user's name through without thought of
>>> whether it is an id or a name.
>>>
>>> So yes, arguably tightening the semantics fixes a libvirt bug for
>>> libvirt having allowed internal snapshots to get into an inconsistent
>>> state.
>>
>> So there are two scenarios to consider with respect to breaking
>> backwards compatibility:
>>
>> 1. There may be code out there that relies on numeric arguments being
>>    interpreted as IDs. This code will break if we make this change and
>>    numeric snapshot names exist. That such code exists is speculation
>>    (even though plausible) and we don't know how widespread it is.
>>
>> 2. There may be code out there that blindly assumes that whatever it
>>    passes is interpreted as a name. Nobody considered that with a
>>    numeric snapshot name, it maybe misinterpreted as an ID. We know that
>>    this is true for libvirt, the single most used management tool for
>>    QEMU. More code like this may (and probably does) exist.
>>
>> Essentially the same two categories exist for human users: those who
>> somehow found out that QEMU accepts IDs as monitor command arguments and
>> are using those (mitigated by not displaying IDs any more), and those
>> who are trapped because they wanted to access a numeric name, but
>> surprisingly got it interpreted as an ID. Both are speculation to some
>> degree, but my guess is that the latter group is larger.
>>
>> Given all this, this is a no-brainer for me. We simplify the interface,
>> we simplify the code, we make things less confusing for manual users and
>> we fix the management tool that everybody uses. How could we not make
>> this change?

So you're trying to make a bug fix out of this now to get around
deprecation?  To me, changing behavior in qemu to fix a bug in libvirt
doesn't equate to fixing a bug in qemu.  So let's try to find a real bug
in qemu.

>>> But again, it falls in the category of "properly fixing this
>>> requires a lot of auditing, time, mental power, and unit tests", which
>>> makes it a rather daunting task to state for certain.
>>
>> Fix the world before we can fix anything?
> 
> Can't we break this loop for the savevm command fairly easily; it's
> currently documnted as:
> 
> savevm [tag|id]

The bug starts here.  The code doesn't match the interface description,
because this to me as a user suggests that "tag" has precedence over
"id" (same for loadvm and delvm).  But let's go over the strange
behavior for each:

savevm deletes all snapshots where either tag or ID match.[1]  I think
that's buggy because it should replace only one snapshot, not
accidentally remove something totally different.  It then...  Tries to
find that snapshot? and overwrite it with the new one.  Which makes no
sense because it has just deleted that snapshot, if it existed.[2]
So savevm will then just create a new snapshot with the given tag.

So it seems clear that savevm does not accept IDs, but really just tags.
 It follows that it should not try to delete all snapshots where the ID
matches the given tag.

loadvm uses bdrv_all_find_snapshot() -> bdrv_snapshot_find() where the
ID takes precedence over the tag.  This does not seem to match the
interface description, as I've said above.  So the first thing to do
clearly is to switch the comparison order in bdrv_snapshot_find().  Or,
well, apply this series because it removes the ID comparison altogether.
 But that wouldn't be needed for a bug fix.

delvm directly invokes bdrv_all_delete_snapshot() and thus shares the
same woes as savevm (see [1]).

[1] Actually, this is not quite true.  bdrv_all_delete_snapshot()
invokes bdrv_snapshot_delete_by_id_or_name() if bdrv_snapshot_find()
returned true.  So far, so good.  But
bdrv_snapshot_delete_by_id_or_name() will first try to delete a snapshot
with matching ID, and only if that failed it will delete a snapshot with
matching tag.  So if you have two snapshots { "0": "foo", "1": "0" } and
then bdrv_all_delete_snapshot("0"), only "foo" will be deleted.  So this
function doesn't really delete all snapshots that match...

[2] It follows from [1] that this is not quite true as well.  Take the
above example and savevm "0".  It will then delete snapshot "foo" and
indeed overwrite snapshot "0" (ID "1").  But that's just stupid.  It
looks like we have specialized code for after we have deleted an
unrelated snapshot for no reason whatsoever.

So, yes, there are bugs in qemu, but to fix them, we would need to
switch the comparison order in bdrv_snapshot_find() and fix
bdrv_snapshot_delete_by_id_or_name() to always invoke
bdrv_snapshot_delete() for both cases (ID and tag).  And we should
probably remove the special code path in savevm for overwriting an
existing snapshot.

This would also alleviate the bug in libvirt (because the tag then
always takes precedence, like the interface description suggests).

Sure, this series would fix the bugs as well, but it does more than
that.  It really isn't just a bug fix.

> If we made that:
> 
> savevm [-t] [-i] [tag|id]
> 
> then:
>   a) with neither -t or -i  it would behave in the same roulette way
>      as it does in the moment, and it might be a tag or id
> 
>   b) with -t we'd explicitly treat the parameter as a tag and it
>      would error if it wasn't found
> 
>   c) With -i we'd explicitly treat the parameter as an id and
>      it would error if it wasn't found
> 
> Since we still allow (a) it doesn't break any existing code.

I agree with the others that while this might solve issues on the
libvirt side of things, I think we also want to confuse end users less.

Max

Kevin Wolf Jan. 11, 2019, 2:33 p.m. UTC | #40

Am 11.01.2019 um 14:22 hat Max Reitz geschrieben:
> On 10.01.19 12:41, Dr. David Alan Gilbert wrote:
> > * Kevin Wolf (kwolf@redhat.com) wrote:
> >> Am 09.01.2019 um 20:02 hat Eric Blake geschrieben:
> >>> On 1/9/19 12:51 PM, Kevin Wolf wrote:
> >>>
> >>>>> Indeed, and libvirt IS using 'savevm' via HMP via QMP's
> >>>>> human-monitor-command, since there is no QMP counterpart for internal
> >>>>> snapshot.  Even though lately we consistently tell people that internal
> >>>>> snapshots are underdeveloped and you should use external snapshots, it
> >>>>> does not get away from the fact that libvirt has been using 'savevm' to
> >>>>> drive internal snapshots for years now, and that we MUST consider
> >>>>> back-compat and/or add an introspectible QMP interface before making
> >>>>> changes that would break libvirt.
> >>>>
> >>>> Okay, so what does libvirt do when you request a snapshot with a
> >>>> numerical name? Without having looked at the code, the best case I would
> >>>> expect that it forbids them, and more realistically I suspect that we
> >>>> may actually fix a bug for libvirt by changing the semantics.
> >>>>
> >>>> Or does libvirt really use snapshot IDs rather than names?
> >>>
> >>> At the moment, libvirt does not place any restriction on internal
> >>> snapshot names, but passes the user's name through without thought of
> >>> whether it is an id or a name.
> >>>
> >>> So yes, arguably tightening the semantics fixes a libvirt bug for
> >>> libvirt having allowed internal snapshots to get into an inconsistent
> >>> state.
> >>
> >> So there are two scenarios to consider with respect to breaking
> >> backwards compatibility:
> >>
> >> 1. There may be code out there that relies on numeric arguments being
> >>    interpreted as IDs. This code will break if we make this change and
> >>    numeric snapshot names exist. That such code exists is speculation
> >>    (even though plausible) and we don't know how widespread it is.
> >>
> >> 2. There may be code out there that blindly assumes that whatever it
> >>    passes is interpreted as a name. Nobody considered that with a
> >>    numeric snapshot name, it maybe misinterpreted as an ID. We know that
> >>    this is true for libvirt, the single most used management tool for
> >>    QEMU. More code like this may (and probably does) exist.
> >>
> >> Essentially the same two categories exist for human users: those who
> >> somehow found out that QEMU accepts IDs as monitor command arguments and
> >> are using those (mitigated by not displaying IDs any more), and those
> >> who are trapped because they wanted to access a numeric name, but
> >> surprisingly got it interpreted as an ID. Both are speculation to some
> >> degree, but my guess is that the latter group is larger.
> >>
> >> Given all this, this is a no-brainer for me. We simplify the interface,
> >> we simplify the code, we make things less confusing for manual users and
> >> we fix the management tool that everybody uses. How could we not make
> >> this change?
> 
> So you're trying to make a bug fix out of this now to get around
> deprecation?  To me, changing behavior in qemu to fix a bug in libvirt
> doesn't equate to fixing a bug in qemu.  So let's try to find a real bug
> in qemu.

I'm not making this a bug fix, but a interface cleanup that fixes a bug
as a side effect, which makes it only more appealing. I'm also not sure
what exactly you mean by "get around deprecation".

If you mean the formal two releases deprecation period that is required
by our deprecation policy, it doesn't apply because this is HMP and not
a stable interface.

If you mean deprecation not because we must but because we're
considering actual users, then I have described how making the change
fixes things for more users than it could potentially break things. In
addition, I pointed out how a deprecation period is useless for
user-facing changes. I'm concluding that a "voluntary" deprecation for
two releases isn't helpful at all.

> >>> But again, it falls in the category of "properly fixing this
> >>> requires a lot of auditing, time, mental power, and unit tests", which
> >>> makes it a rather daunting task to state for certain.
> >>
> >> Fix the world before we can fix anything?
> > 
> > Can't we break this loop for the savevm command fairly easily; it's
> > currently documnted as:
> > 
> > savevm [tag|id]
> 
> The bug starts here.
> [... snipped long description of how horrible everything is ...]

Yes, it's horrible. Let's radically simplify the interface (and with it
the code) by throwing away the useless ID stuff. Which happens to be
exactly what this series was getting at.

There is no point in spending time for changing the current interface to
make a little more sense while keeping its fundamental insanity
unaddressed, then deprecate it, and then change it again to _actually_
make sense. Let's convert it directly to something that makes sense.

> So, yes, there are bugs in qemu, but to fix them, we would need to
> switch the comparison order in bdrv_snapshot_find() and fix
> bdrv_snapshot_delete_by_id_or_name() to always invoke
> bdrv_snapshot_delete() for both cases (ID and tag).  And we should
> probably remove the special code path in savevm for overwriting an
> existing snapshot.
> 
> This would also alleviate the bug in libvirt (because the tag then
> always takes precedence, like the interface description suggests).

Yes, at least tools which don't use IDs wouldn't be broken any more. But
you removed none of the insanity. The same way addressing by tag was
broken before, addressing by ID is broken now (previously, you need to
check first that no ID exists while you want to address a tag; now, you
need to check first that no tag exists while you want to address an ID).

In fact, if you're willing to go there, you break exactly the same users
that would be broken if we dropped IDs from the interface completely. So
why not do the full thing instead of stopping halfway? You're taking the
potential problems without the advantages you could get.

> Sure, this series would fix the bugs as well, but it does more than
> that.  It really isn't just a bug fix.

It was never meant to be a bug fix (unless, of course, you consider
insane interfaces a bug). But that it fixes a bug as a side effect
contributes to the fact that breaking compatibility has not only
downsides and that a deprecation period would be useless or even
counterproductive.

Kevin

Max Reitz Jan. 11, 2019, 3:23 p.m. UTC | #41

On 11.01.19 15:33, Kevin Wolf wrote:
> Am 11.01.2019 um 14:22 hat Max Reitz geschrieben:
>> On 10.01.19 12:41, Dr. David Alan Gilbert wrote:
>>> * Kevin Wolf (kwolf@redhat.com) wrote:
>>>> Am 09.01.2019 um 20:02 hat Eric Blake geschrieben:
>>>>> On 1/9/19 12:51 PM, Kevin Wolf wrote:
>>>>>
>>>>>>> Indeed, and libvirt IS using 'savevm' via HMP via QMP's
>>>>>>> human-monitor-command, since there is no QMP counterpart for internal
>>>>>>> snapshot.  Even though lately we consistently tell people that internal
>>>>>>> snapshots are underdeveloped and you should use external snapshots, it
>>>>>>> does not get away from the fact that libvirt has been using 'savevm' to
>>>>>>> drive internal snapshots for years now, and that we MUST consider
>>>>>>> back-compat and/or add an introspectible QMP interface before making
>>>>>>> changes that would break libvirt.
>>>>>>
>>>>>> Okay, so what does libvirt do when you request a snapshot with a
>>>>>> numerical name? Without having looked at the code, the best case I would
>>>>>> expect that it forbids them, and more realistically I suspect that we
>>>>>> may actually fix a bug for libvirt by changing the semantics.
>>>>>>
>>>>>> Or does libvirt really use snapshot IDs rather than names?
>>>>>
>>>>> At the moment, libvirt does not place any restriction on internal
>>>>> snapshot names, but passes the user's name through without thought of
>>>>> whether it is an id or a name.
>>>>>
>>>>> So yes, arguably tightening the semantics fixes a libvirt bug for
>>>>> libvirt having allowed internal snapshots to get into an inconsistent
>>>>> state.
>>>>
>>>> So there are two scenarios to consider with respect to breaking
>>>> backwards compatibility:
>>>>
>>>> 1. There may be code out there that relies on numeric arguments being
>>>>    interpreted as IDs. This code will break if we make this change and
>>>>    numeric snapshot names exist. That such code exists is speculation
>>>>    (even though plausible) and we don't know how widespread it is.
>>>>
>>>> 2. There may be code out there that blindly assumes that whatever it
>>>>    passes is interpreted as a name. Nobody considered that with a
>>>>    numeric snapshot name, it maybe misinterpreted as an ID. We know that
>>>>    this is true for libvirt, the single most used management tool for
>>>>    QEMU. More code like this may (and probably does) exist.
>>>>
>>>> Essentially the same two categories exist for human users: those who
>>>> somehow found out that QEMU accepts IDs as monitor command arguments and
>>>> are using those (mitigated by not displaying IDs any more), and those
>>>> who are trapped because they wanted to access a numeric name, but
>>>> surprisingly got it interpreted as an ID. Both are speculation to some
>>>> degree, but my guess is that the latter group is larger.
>>>>
>>>> Given all this, this is a no-brainer for me. We simplify the interface,
>>>> we simplify the code, we make things less confusing for manual users and
>>>> we fix the management tool that everybody uses. How could we not make
>>>> this change?
>>
>> So you're trying to make a bug fix out of this now to get around
>> deprecation?  To me, changing behavior in qemu to fix a bug in libvirt
>> doesn't equate to fixing a bug in qemu.  So let's try to find a real bug
>> in qemu.
> 
> I'm not making this a bug fix, but a interface cleanup that fixes a bug

which is not in qemu (at least the one you're describing)

> as a side effect, which makes it only more appealing. I'm also not sure
> what exactly you mean by "get around deprecation".
> 
> If you mean the formal two releases deprecation period that is required
> by our deprecation policy, it doesn't apply because this is HMP and not
> a stable interface.

Of course you're right, formally.  But that's just not a very good
argument to me, personally.  If deprecation is easy and helpful, there
is no reason to be nasty and skip it just because we are formally
allowed to do so.

Sure, the question is whether it is helpful.

> If you mean deprecation not because we must but because we're
> considering actual users, then I have described how making the change
> fixes things for more users than it could potentially break things

OK.

> In
> addition, I pointed out how a deprecation period is useless for
> user-facing changes.

You pointed that out for users of specific distros.  I myself use Fedora
(every release) and Arch, so as a user I do see all qemu versions.

> I'm concluding that a "voluntary" deprecation for
> two releases isn't helpful at all.

Yes, you can conclude that from your reasoning.  But I disagree with
your reasoning because you are only considering users such as yourself
who skip releases on purpose or users who use long-term distros.  Most
importantly we probably have professional end users that do update qemu
every release because they know the deprecation policy, who do not use
libvirt.

I agree that adding a warning has little impact.  But I also claim that
doing it is little effort.

I mean, whatever.  I'm really not that much against taking the series as-is.

>>>>> But again, it falls in the category of "properly fixing this
>>>>> requires a lot of auditing, time, mental power, and unit tests", which
>>>>> makes it a rather daunting task to state for certain.
>>>>
>>>> Fix the world before we can fix anything?
>>>
>>> Can't we break this loop for the savevm command fairly easily; it's
>>> currently documnted as:
>>>
>>> savevm [tag|id]
>>
>> The bug starts here.
>> [... snipped long description of how horrible everything is ...]
> 
> Yes, it's horrible. Let's radically simplify the interface (and with it
> the code) by throwing away the useless ID stuff. Which happens to be
> exactly what this series was getting at.
> 
> There is no point in spending time for changing the current interface to
> make a little more sense while keeping its fundamental insanity
> unaddressed, then deprecate it, and then change it again to _actually_
> make sense. Let's convert it directly to something that makes sense.

Of course we'd do the deprecation while fixing the bug. :-P

>> So, yes, there are bugs in qemu, but to fix them, we would need to
>> switch the comparison order in bdrv_snapshot_find() and fix
>> bdrv_snapshot_delete_by_id_or_name() to always invoke
>> bdrv_snapshot_delete() for both cases (ID and tag).  And we should
>> probably remove the special code path in savevm for overwriting an
>> existing snapshot.
>>
>> This would also alleviate the bug in libvirt (because the tag then
>> always takes precedence, like the interface description suggests).
> 
> Yes, at least tools which don't use IDs wouldn't be broken any more. But
> you removed none of the insanity. The same way addressing by tag was
> broken before, addressing by ID is broken now (previously, you need to
> check first that no ID exists while you want to address a tag; now, you
> need to check first that no tag exists while you want to address an ID).
> 
> In fact, if you're willing to go there, you break exactly the same users
> that would be broken if we dropped IDs from the interface completely. So
> why not do the full thing instead of stopping halfway? You're taking the
> potential problems without the advantages you could get.

Not quite true.  What I proposed here would still allow addressing
snapshots by ID.  This series completely removes that.

I only break things for users where the tag conflicts with the ID.

>> Sure, this series would fix the bugs as well, but it does more than
>> that.  It really isn't just a bug fix.
> 
> It was never meant to be a bug fix (unless, of course, you consider
> insane interfaces a bug). But that it fixes a bug as a side effect
> contributes to the fact that breaking compatibility has not only
> downsides and that a deprecation period would be useless or even
> counterproductive.

Sure, for this case, probably.

Max