Message ID | 20180906111107.30684-1-danielhb413@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | HMP/snapshot changes - do not use ID anymore | expand |
* Daniel Henrique Barboza (danielhb413@gmail.com) wrote: > changes in v2: > - removed the "RFC" marker; > - added a new patch (patch 2) that removes > bdrv_snapshot_delete_by_id_or_name from the code; > - made changes in patch 1 as suggested by Murilo; > - previous patch set link: > https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > > > It is not uncommon to see bugs being opened by testers that attempt to > create VM snapshots using HMP. It turns out that "0" and "1" are quite > common snapshot names and they trigger a lot of bugs. I gave an example > in the commit message of patch 1, but to sum up here: QEMU treats the > input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > is documented as such, but this can lead to strange situations. > > Given that it is strange for an API to consider a parameter to be 2 fields > at the same time, and inadvently treating them as one or the other, and > that removing the ID field is too drastic, my idea here is to keep the > ID field for internal control, but do not let the user set it. > > I guess there's room for discussion about considering this change an API > change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, > but simplifying the meaning of the parameters of savevm/loadvm/delvm. Can you clarify a couple of things: a) What is it about libvirt's use that means it's OK ? Does it never use numeric tags? b) After this series are you always guaranteed to be able to fix any existing oddly named snapshots? Dave > > Daniel Henrique Barboza (3): > block/snapshot.c: eliminate use of ID input in snapshot operations > block/snapshot: remove bdrv_snapshot_delete_by_id_or_name > qcow2-snapshot: remove redundant find_snapshot_by_id_and_name call > > block/qcow2-snapshot.c | 5 ----- > block/snapshot.c | 25 +++---------------------- > hmp-commands.hx | 24 ++++++++++++------------ > include/block/snapshot.h | 3 --- > qemu-img.c | 15 +++++++++++---- > 5 files changed, 26 insertions(+), 46 deletions(-) > > -- > 2.17.1 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Hey David, On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote: > * Daniel Henrique Barboza (danielhb413@gmail.com) wrote: >> changes in v2: >> - removed the "RFC" marker; >> - added a new patch (patch 2) that removes >> bdrv_snapshot_delete_by_id_or_name from the code; >> - made changes in patch 1 as suggested by Murilo; >> - previous patch set link: >> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html >> >> >> It is not uncommon to see bugs being opened by testers that attempt to >> create VM snapshots using HMP. It turns out that "0" and "1" are quite >> common snapshot names and they trigger a lot of bugs. I gave an example >> in the commit message of patch 1, but to sum up here: QEMU treats the >> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It >> is documented as such, but this can lead to strange situations. >> >> Given that it is strange for an API to consider a parameter to be 2 fields >> at the same time, and inadvently treating them as one or the other, and >> that removing the ID field is too drastic, my idea here is to keep the >> ID field for internal control, but do not let the user set it. >> >> I guess there's room for discussion about considering this change an API >> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, >> but simplifying the meaning of the parameters of savevm/loadvm/delvm. > Can you clarify a couple of things: > a) What is it about libvirt's use that means it's OK ? Does it never > use numeric tags? I am glad you asked because my understanding in how Libvirt was dealing with snapshots was wrong, and I just looked into it further to answer your question. Luckily, this series fixes the situation for Libvirt as well. I was misled by the fact that Libvirt does not show the same symptoms we see in QEMU of this problem, but the bug is there. Here's a quick test with Libvirt with "0" and "1" as snapshot names, considering a VM named with no snapshots, using QEMU 2.12 and Libvirt 4.0.0: - create the "0" snapshot: $ sudo virsh snapshot-create-as --name 0 dhb Domain snapshot 0 created $ sudo virsh snapshot-list dhb Name Creation Time State ------------------------------------------------------------ 0 2018-09-24 15:47:56 -0400 running $ sudo virsh qemu-monitor-command dhb --hmp info snapshots List of snapshots present on all disks: ID TAG VM SIZE DATE VM CLOCK -- 0 405M 2018-09-24 15:47:56 00:04:20.867 - created the "1" snapshot. Here, Libvirt shows both snapshots with snapshot-list, but the snapshot was erased inside QEMU: $ sudo virsh snapshot-create-as --name 1 dhb Domain snapshot 1 created $ $ sudo virsh snapshot-list dhb Name Creation Time State ------------------------------------------------------------ 0 2018-09-24 15:47:56 -0400 running 1 2018-09-24 15:50:09 -0400 running $ sudo virsh qemu-monitor-command dhb --hmp info snapshots List of snapshots present on all disks: ID TAG VM SIZE DATE VM CLOCK -- 1 404M 2018-09-24 15:50:10 00:05:36.226 This is where I stopped checking out Libvirt at first, concluding wrongly that it was immune to the bug. Libvirt does not throw an error when trying to apply snapshot 0. In fact, it acts like everything went fine: $ sudo virsh snapshot-revert dhb 0 $ echo $? 0 Reverting back to snapshot "1" works as intended, restoring the VM state when it was created. (perhaps this is something we should let Libvirt be aware of ...) This series fixes this behavior because the snapshot 0 isn't erased from QEMU, allowing Libvirt to successfully restore it. > b) After this series are you always guaranteed to be able to fix > any existing oddly named snapshots? The oddly named snapshots that already exists can be affected by the semantic change in loadvm and delvm, in a way that the user can't load/del using the snap ID, just the tag. Aside from that, I don't see any side effects with existing snapshots and this patch series. Thanks, Daniel > > Dave > >> Daniel Henrique Barboza (3): >> block/snapshot.c: eliminate use of ID input in snapshot operations >> block/snapshot: remove bdrv_snapshot_delete_by_id_or_name >> qcow2-snapshot: remove redundant find_snapshot_by_id_and_name call >> >> block/qcow2-snapshot.c | 5 ----- >> block/snapshot.c | 25 +++---------------------- >> hmp-commands.hx | 24 ++++++++++++------------ >> include/block/snapshot.h | 3 --- >> qemu-img.c | 15 +++++++++++---- >> 5 files changed, 26 insertions(+), 46 deletions(-) >> >> -- >> 2.17.1 >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
ping On 9/6/18 8:11 AM, Daniel Henrique Barboza wrote: > changes in v2: > - removed the "RFC" marker; > - added a new patch (patch 2) that removes > bdrv_snapshot_delete_by_id_or_name from the code; > - made changes in patch 1 as suggested by Murilo; > - previous patch set link: > https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > > > It is not uncommon to see bugs being opened by testers that attempt to > create VM snapshots using HMP. It turns out that "0" and "1" are quite > common snapshot names and they trigger a lot of bugs. I gave an example > in the commit message of patch 1, but to sum up here: QEMU treats the > input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > is documented as such, but this can lead to strange situations. > > Given that it is strange for an API to consider a parameter to be 2 fields > at the same time, and inadvently treating them as one or the other, and > that removing the ID field is too drastic, my idea here is to keep the > ID field for internal control, but do not let the user set it. > > I guess there's room for discussion about considering this change an API > change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, > but simplifying the meaning of the parameters of savevm/loadvm/delvm. > > > Daniel Henrique Barboza (3): > block/snapshot.c: eliminate use of ID input in snapshot operations > block/snapshot: remove bdrv_snapshot_delete_by_id_or_name > qcow2-snapshot: remove redundant find_snapshot_by_id_and_name call > > block/qcow2-snapshot.c | 5 ----- > block/snapshot.c | 25 +++---------------------- > hmp-commands.hx | 24 ++++++++++++------------ > include/block/snapshot.h | 3 --- > qemu-img.c | 15 +++++++++++---- > 5 files changed, 26 insertions(+), 46 deletions(-) >
Cc: libvir-list for review of the compatibility argument below. Daniel Henrique Barboza <danielhb413@gmail.com> writes: > Hey David, > > On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote: >> * Daniel Henrique Barboza (danielhb413@gmail.com) wrote: >>> changes in v2: >>> - removed the "RFC" marker; >>> - added a new patch (patch 2) that removes >>> bdrv_snapshot_delete_by_id_or_name from the code; >>> - made changes in patch 1 as suggested by Murilo; >>> - previous patch set link: >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html >>> >>> >>> It is not uncommon to see bugs being opened by testers that attempt to >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite >>> common snapshot names and they trigger a lot of bugs. I gave an example >>> in the commit message of patch 1, but to sum up here: QEMU treats the >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It >>> is documented as such, but this can lead to strange situations. >>> >>> Given that it is strange for an API to consider a parameter to be 2 fields >>> at the same time, and inadvently treating them as one or the other, and >>> that removing the ID field is too drastic, my idea here is to keep the >>> ID field for internal control, but do not let the user set it. >>> >>> I guess there's room for discussion about considering this change an API >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. >> Can you clarify a couple of things: >> a) What is it about libvirt's use that means it's OK ? Does it never >> use numeric tags? > > I am glad you asked because my understanding in how Libvirt was dealing > with snapshots was wrong, and I just looked into it further to answer your > question. Luckily, this series fixes the situation for Libvirt as well. > > I was misled by the fact that Libvirt does not show the same symptoms > we see in > QEMU of this problem, but the bug is there. Here's a quick test with > Libvirt with > "0" and "1" as snapshot names, considering a VM named with no snapshots, > using QEMU 2.12 and Libvirt 4.0.0: > > - create the "0" snapshot: > > $ sudo virsh snapshot-create-as --name 0 dhb > Domain snapshot 0 created > > $ sudo virsh snapshot-list dhb > Name Creation Time State > ------------------------------------------------------------ > 0 2018-09-24 15:47:56 -0400 running > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots > List of snapshots present on all disks: > ID TAG VM SIZE DATE VM CLOCK > -- 0 405M 2018-09-24 15:47:56 00:04:20.867 > > > - created the "1" snapshot. Here, Libvirt shows both snapshots with > snapshot-list, > but the snapshot was erased inside QEMU: > > $ sudo virsh snapshot-create-as --name 1 dhb > Domain snapshot 1 created > $ > $ sudo virsh snapshot-list dhb > Name Creation Time State > ------------------------------------------------------------ > 0 2018-09-24 15:47:56 -0400 running > 1 2018-09-24 15:50:09 -0400 running > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots > List of snapshots present on all disks: > ID TAG VM SIZE DATE VM CLOCK > -- 1 404M 2018-09-24 15:50:10 00:05:36.226 > > > This is where I stopped checking out Libvirt at first, concluding > wrongly that it > was immune to the bug. > > Libvirt does not throw an error when trying to apply snapshot 0. In > fact, it acts > like everything went fine: > > $ sudo virsh snapshot-revert dhb 0 > > $ echo $? > 0 Is that a libvirt bug? > Reverting back to snapshot "1" works as intended, restoring the VM > state when it > was created. > > > (perhaps this is something we should let Libvirt be aware of ...) > > > > This series fixes this behavior because the snapshot 0 isn't erased > from QEMU, allowing > Libvirt to successfully restore it. > > >> b) After this series are you always guaranteed to be able to fix >> any existing oddly named snapshots? > > The oddly named snapshots that already exists can be affected by the > semantic > change in loadvm and delvm, in a way that the user can't load/del > using the snap > ID, just the tag. Aside from that, I don't see any side effects with > existing > snapshots and this patch series. Do all snapshots have a tag that is unique within their image? Even snapshots created by old versions of QEMU?
On Tue, Oct 09, 2018 at 19:34:43 +0200, Markus Armbruster wrote: > Cc: libvir-list for review of the compatibility argument below. > > Daniel Henrique Barboza <danielhb413@gmail.com> writes: > > > Hey David, > > > > On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote: > >> * Daniel Henrique Barboza (danielhb413@gmail.com) wrote: > >>> changes in v2: > >>> - removed the "RFC" marker; > >>> - added a new patch (patch 2) that removes > >>> bdrv_snapshot_delete_by_id_or_name from the code; > >>> - made changes in patch 1 as suggested by Murilo; > >>> - previous patch set link: > >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > >>> > >>> > >>> It is not uncommon to see bugs being opened by testers that attempt to > >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite > >>> common snapshot names and they trigger a lot of bugs. I gave an example > >>> in the commit message of patch 1, but to sum up here: QEMU treats the > >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > >>> is documented as such, but this can lead to strange situations. > >>> > >>> Given that it is strange for an API to consider a parameter to be 2 fields > >>> at the same time, and inadvently treating them as one or the other, and > >>> that removing the ID field is too drastic, my idea here is to keep the > >>> ID field for internal control, but do not let the user set it. > >>> > >>> I guess there's room for discussion about considering this change an API > >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, > >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. > >> Can you clarify a couple of things: > >> a) What is it about libvirt's use that means it's OK ? Does it never > >> use numeric tags? > > > > I am glad you asked because my understanding in how Libvirt was dealing > > with snapshots was wrong, and I just looked into it further to answer your > > question. Luckily, this series fixes the situation for Libvirt as well. > > > > I was misled by the fact that Libvirt does not show the same symptoms > > we see in > > QEMU of this problem, but the bug is there. Here's a quick test with > > Libvirt with > > "0" and "1" as snapshot names, considering a VM named with no snapshots, > > using QEMU 2.12 and Libvirt 4.0.0: > > > > - create the "0" snapshot: > > > > $ sudo virsh snapshot-create-as --name 0 dhb > > Domain snapshot 0 created > > > > $ sudo virsh snapshot-list dhb > > Name Creation Time State > > ------------------------------------------------------------ > > 0 2018-09-24 15:47:56 -0400 running > > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots > > List of snapshots present on all disks: > > ID TAG VM SIZE DATE VM CLOCK > > -- 0 405M 2018-09-24 15:47:56 00:04:20.867 > > > > > > - created the "1" snapshot. Here, Libvirt shows both snapshots with > > snapshot-list, > > but the snapshot was erased inside QEMU: > > > > $ sudo virsh snapshot-create-as --name 1 dhb > > Domain snapshot 1 created > > $ > > $ sudo virsh snapshot-list dhb > > Name Creation Time State > > ------------------------------------------------------------ > > 0 2018-09-24 15:47:56 -0400 running > > 1 2018-09-24 15:50:09 -0400 running > > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots > > List of snapshots present on all disks: > > ID TAG VM SIZE DATE VM CLOCK > > -- 1 404M 2018-09-24 15:50:10 00:05:36.226 > > > > > > This is where I stopped checking out Libvirt at first, concluding > > wrongly that it > > was immune to the bug. > > > > Libvirt does not throw an error when trying to apply snapshot 0. In > > fact, it acts > > like everything went fine: > > > > $ sudo virsh snapshot-revert dhb 0 > > > > $ echo $? > > 0 > > Is that a libvirt bug? Probably yes. The error handling from HMP sucks and can't really be fixed in all cases. This is for the handler which calls "loadvm": if (strstr(reply, "No block device supports snapshots") != NULL) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("this domain does not have a device to load snapshots")); goto cleanup; } else if (strstr(reply, "Could not find snapshot") != NULL) { virReportError(VIR_ERR_OPERATION_INVALID, _("the snapshot '%s' does not exist, and was not loaded"), name); goto cleanup; } else if (strstr(reply, "Snapshots not supported on device") != NULL) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", reply); goto cleanup; } else if (strstr(reply, "Could not open VM state file") != NULL) { virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); goto cleanup; } else if (strstr(reply, "Error") != NULL && strstr(reply, "while loading VM state") != NULL) { virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); goto cleanup; } else if (strstr(reply, "Error") != NULL && strstr(reply, "while activating snapshot on") != NULL) { virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); goto cleanup; } If the above does not match the reported error, we report success. The same problem can happen with any of the 5 [1] HMP command handlers we still have. Note that a similar abomination is also for the code which calls "savevm". [1] We technically have 6 HMP handlers, but "cpu_set" is not used if yoy have a QEMU younger than 3 years. Soon also "drive_add" and "drive_del" will be replaced, so we'll be stuck with the 3 internal snapshot ones. > > > Reverting back to snapshot "1" works as intended, restoring the VM > > state when it > > was created. > > > > > > (perhaps this is something we should let Libvirt be aware of ...) The error message from qemu which was wrongly ignored by qemu can be found in the libvirtd debug log. It would be helpful if you could provide it. > > > > > > > > This series fixes this behavior because the snapshot 0 isn't erased > > from QEMU, allowing > > Libvirt to successfully restore it. > > > > > >> b) After this series are you always guaranteed to be able to fix > >> any existing oddly named snapshots? > > > > The oddly named snapshots that already exists can be affected by the > > semantic > > change in loadvm and delvm, in a way that the user can't load/del > > using the snap > > ID, just the tag. Aside from that, I don't see any side effects with > > existing > > snapshots and this patch series. > > Do all snapshots have a tag that is unique within their image? Even > snapshots created by old versions of QEMU? I remember there was a discussion which regarded problems with collisions between the name and the ID of the snapshot when dealing with loadvm/delvm commands but I can't seem to find it currently. Note that libvirt plainly issues loadvm/delvm $SNAPSHOTNAME. If the name is numeric it might clash. Similarly for inactive vms via qemu-img.
* Peter Krempa (pkrempa@redhat.com) wrote: > On Tue, Oct 09, 2018 at 19:34:43 +0200, Markus Armbruster wrote: > > Cc: libvir-list for review of the compatibility argument below. > > > > Daniel Henrique Barboza <danielhb413@gmail.com> writes: > > > > > Hey David, > > > > > > On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote: > > >> * Daniel Henrique Barboza (danielhb413@gmail.com) wrote: > > >>> changes in v2: > > >>> - removed the "RFC" marker; > > >>> - added a new patch (patch 2) that removes > > >>> bdrv_snapshot_delete_by_id_or_name from the code; > > >>> - made changes in patch 1 as suggested by Murilo; > > >>> - previous patch set link: > > >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > > >>> > > >>> > > >>> It is not uncommon to see bugs being opened by testers that attempt to > > >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite > > >>> common snapshot names and they trigger a lot of bugs. I gave an example > > >>> in the commit message of patch 1, but to sum up here: QEMU treats the > > >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > > >>> is documented as such, but this can lead to strange situations. > > >>> > > >>> Given that it is strange for an API to consider a parameter to be 2 fields > > >>> at the same time, and inadvently treating them as one or the other, and > > >>> that removing the ID field is too drastic, my idea here is to keep the > > >>> ID field for internal control, but do not let the user set it. > > >>> > > >>> I guess there's room for discussion about considering this change an API > > >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, > > >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. > > >> Can you clarify a couple of things: > > >> a) What is it about libvirt's use that means it's OK ? Does it never > > >> use numeric tags? > > > > > > I am glad you asked because my understanding in how Libvirt was dealing > > > with snapshots was wrong, and I just looked into it further to answer your > > > question. Luckily, this series fixes the situation for Libvirt as well. > > > > > > I was misled by the fact that Libvirt does not show the same symptoms > > > we see in > > > QEMU of this problem, but the bug is there. Here's a quick test with > > > Libvirt with > > > "0" and "1" as snapshot names, considering a VM named with no snapshots, > > > using QEMU 2.12 and Libvirt 4.0.0: > > > > > > - create the "0" snapshot: > > > > > > $ sudo virsh snapshot-create-as --name 0 dhb > > > Domain snapshot 0 created > > > > > > $ sudo virsh snapshot-list dhb > > > Name Creation Time State > > > ------------------------------------------------------------ > > > 0 2018-09-24 15:47:56 -0400 running > > > > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots > > > List of snapshots present on all disks: > > > ID TAG VM SIZE DATE VM CLOCK > > > -- 0 405M 2018-09-24 15:47:56 00:04:20.867 > > > > > > > > > - created the "1" snapshot. Here, Libvirt shows both snapshots with > > > snapshot-list, > > > but the snapshot was erased inside QEMU: > > > > > > $ sudo virsh snapshot-create-as --name 1 dhb > > > Domain snapshot 1 created > > > $ > > > $ sudo virsh snapshot-list dhb > > > Name Creation Time State > > > ------------------------------------------------------------ > > > 0 2018-09-24 15:47:56 -0400 running > > > 1 2018-09-24 15:50:09 -0400 running > > > > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots > > > List of snapshots present on all disks: > > > ID TAG VM SIZE DATE VM CLOCK > > > -- 1 404M 2018-09-24 15:50:10 00:05:36.226 > > > > > > > > > This is where I stopped checking out Libvirt at first, concluding > > > wrongly that it > > > was immune to the bug. > > > > > > Libvirt does not throw an error when trying to apply snapshot 0. In > > > fact, it acts > > > like everything went fine: > > > > > > $ sudo virsh snapshot-revert dhb 0 > > > > > > $ echo $? > > > 0 > > > > Is that a libvirt bug? > > Probably yes. The error handling from HMP sucks and can't really be > fixed in all cases. This is for the handler which calls "loadvm": > > if (strstr(reply, "No block device supports snapshots") != NULL) { > virReportError(VIR_ERR_OPERATION_INVALID, "%s", > _("this domain does not have a device to load snapshots")); > goto cleanup; > } else if (strstr(reply, "Could not find snapshot") != NULL) { > virReportError(VIR_ERR_OPERATION_INVALID, > _("the snapshot '%s' does not exist, and was not loaded"), > name); > goto cleanup; > } else if (strstr(reply, "Snapshots not supported on device") != NULL) { > virReportError(VIR_ERR_OPERATION_INVALID, "%s", reply); > goto cleanup; > } else if (strstr(reply, "Could not open VM state file") != NULL) { > virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); > goto cleanup; > } else if (strstr(reply, "Error") != NULL > && strstr(reply, "while loading VM state") != NULL) { > virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); > goto cleanup; > } else if (strstr(reply, "Error") != NULL > && strstr(reply, "while activating snapshot on") != NULL) { > virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); > goto cleanup; > } > > If the above does not match the reported error, we report success. The > same problem can happen with any of the 5 [1] HMP command handlers we > still have. > > Note that a similar abomination is also for the code which calls > "savevm". Would the following small qemu change make life a little safer: diff --git a/hmp.c b/hmp.c index 576253a01f..0729a8c7ed 100644 --- a/hmp.c +++ b/hmp.c @@ -62,7 +62,7 @@ static void hmp_handle_error(Monitor *mon, Error **errp) { assert(errp); if (*errp) { - error_report_err(*errp); + error_reportf_err(*errp, "Error: "); } } changing: No block device supports snapshots to: Error: No block device supports snapshots so you could check for Error: at the start and know that you've hit some unknown error? Dave > [1] We technically have 6 HMP handlers, but "cpu_set" is not used if yoy > have a QEMU younger than 3 years. Soon also "drive_add" and "drive_del" > will be replaced, so we'll be stuck with the 3 internal snapshot ones. > > > > > Reverting back to snapshot "1" works as intended, restoring the VM > > > state when it > > > was created. > > > > > > > > > (perhaps this is something we should let Libvirt be aware of ...) > > The error message from qemu which was wrongly ignored by qemu can be > found in the libvirtd debug log. It would be helpful if you could > provide it. > > > > > > > > > > > > > This series fixes this behavior because the snapshot 0 isn't erased > > > from QEMU, allowing > > > Libvirt to successfully restore it. > > > > > > > > >> b) After this series are you always guaranteed to be able to fix > > >> any existing oddly named snapshots? > > > > > > The oddly named snapshots that already exists can be affected by the > > > semantic > > > change in loadvm and delvm, in a way that the user can't load/del > > > using the snap > > > ID, just the tag. Aside from that, I don't see any side effects with > > > existing > > > snapshots and this patch series. > > > > Do all snapshots have a tag that is unique within their image? Even > > snapshots created by old versions of QEMU? > > I remember there was a discussion which regarded problems with > collisions between the name and the ID of the snapshot when dealing with > loadvm/delvm commands but I can't seem to find it currently. Note that > libvirt plainly issues loadvm/delvm $SNAPSHOTNAME. If the name is > numeric it might clash. Similarly for inactive vms via qemu-img. -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Thu, Oct 11, 2018 at 13:06:14 +0100, Dr. David Alan Gilbert wrote: > * Peter Krempa (pkrempa@redhat.com) wrote: > > On Tue, Oct 09, 2018 at 19:34:43 +0200, Markus Armbruster wrote: > > > Cc: libvir-list for review of the compatibility argument below. > > > > > > Daniel Henrique Barboza <danielhb413@gmail.com> writes: > > > > > > > Hey David, > > > > > > > > On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote: > > > >> * Daniel Henrique Barboza (danielhb413@gmail.com) wrote: > > > >>> changes in v2: > > > >>> - removed the "RFC" marker; > > > >>> - added a new patch (patch 2) that removes > > > >>> bdrv_snapshot_delete_by_id_or_name from the code; > > > >>> - made changes in patch 1 as suggested by Murilo; > > > >>> - previous patch set link: > > > >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > > > >>> > > > >>> > > > >>> It is not uncommon to see bugs being opened by testers that attempt to > > > >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite > > > >>> common snapshot names and they trigger a lot of bugs. I gave an example > > > >>> in the commit message of patch 1, but to sum up here: QEMU treats the > > > >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > > > >>> is documented as such, but this can lead to strange situations. > > > >>> > > > >>> Given that it is strange for an API to consider a parameter to be 2 fields > > > >>> at the same time, and inadvently treating them as one or the other, and > > > >>> that removing the ID field is too drastic, my idea here is to keep the > > > >>> ID field for internal control, but do not let the user set it. > > > >>> > > > >>> I guess there's room for discussion about considering this change an API > > > >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, > > > >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. > > > >> Can you clarify a couple of things: > > > >> a) What is it about libvirt's use that means it's OK ? Does it never > > > >> use numeric tags? > > > > > > > > I am glad you asked because my understanding in how Libvirt was dealing > > > > with snapshots was wrong, and I just looked into it further to answer your > > > > question. Luckily, this series fixes the situation for Libvirt as well. > > > > > > > > I was misled by the fact that Libvirt does not show the same symptoms > > > > we see in > > > > QEMU of this problem, but the bug is there. Here's a quick test with > > > > Libvirt with > > > > "0" and "1" as snapshot names, considering a VM named with no snapshots, > > > > using QEMU 2.12 and Libvirt 4.0.0: > > > > > > > > - create the "0" snapshot: > > > > > > > > $ sudo virsh snapshot-create-as --name 0 dhb > > > > Domain snapshot 0 created > > > > > > > > $ sudo virsh snapshot-list dhb > > > > Name Creation Time State > > > > ------------------------------------------------------------ > > > > 0 2018-09-24 15:47:56 -0400 running > > > > > > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots > > > > List of snapshots present on all disks: > > > > ID TAG VM SIZE DATE VM CLOCK > > > > -- 0 405M 2018-09-24 15:47:56 00:04:20.867 > > > > > > > > > > > > - created the "1" snapshot. Here, Libvirt shows both snapshots with > > > > snapshot-list, > > > > but the snapshot was erased inside QEMU: > > > > > > > > $ sudo virsh snapshot-create-as --name 1 dhb > > > > Domain snapshot 1 created > > > > $ > > > > $ sudo virsh snapshot-list dhb > > > > Name Creation Time State > > > > ------------------------------------------------------------ > > > > 0 2018-09-24 15:47:56 -0400 running > > > > 1 2018-09-24 15:50:09 -0400 running > > > > > > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots > > > > List of snapshots present on all disks: > > > > ID TAG VM SIZE DATE VM CLOCK > > > > -- 1 404M 2018-09-24 15:50:10 00:05:36.226 > > > > > > > > > > > > This is where I stopped checking out Libvirt at first, concluding > > > > wrongly that it > > > > was immune to the bug. > > > > > > > > Libvirt does not throw an error when trying to apply snapshot 0. In > > > > fact, it acts > > > > like everything went fine: > > > > > > > > $ sudo virsh snapshot-revert dhb 0 > > > > > > > > $ echo $? > > > > 0 > > > > > > Is that a libvirt bug? > > > > Probably yes. The error handling from HMP sucks and can't really be > > fixed in all cases. This is for the handler which calls "loadvm": > > > > if (strstr(reply, "No block device supports snapshots") != NULL) { > > virReportError(VIR_ERR_OPERATION_INVALID, "%s", > > _("this domain does not have a device to load snapshots")); > > goto cleanup; > > } else if (strstr(reply, "Could not find snapshot") != NULL) { > > virReportError(VIR_ERR_OPERATION_INVALID, > > _("the snapshot '%s' does not exist, and was not loaded"), > > name); > > goto cleanup; > > } else if (strstr(reply, "Snapshots not supported on device") != NULL) { > > virReportError(VIR_ERR_OPERATION_INVALID, "%s", reply); > > goto cleanup; > > } else if (strstr(reply, "Could not open VM state file") != NULL) { > > virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); > > goto cleanup; > > } else if (strstr(reply, "Error") != NULL > > && strstr(reply, "while loading VM state") != NULL) { > > virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); > > goto cleanup; > > } else if (strstr(reply, "Error") != NULL > > && strstr(reply, "while activating snapshot on") != NULL) { > > virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); > > goto cleanup; > > } > > > > If the above does not match the reported error, we report success. The > > same problem can happen with any of the 5 [1] HMP command handlers we > > still have. > > > > Note that a similar abomination is also for the code which calls > > "savevm". > > Would the following small qemu change make life a little safer: > > diff --git a/hmp.c b/hmp.c > index 576253a01f..0729a8c7ed 100644 > --- a/hmp.c > +++ b/hmp.c > @@ -62,7 +62,7 @@ static void hmp_handle_error(Monitor *mon, Error **errp) > { > assert(errp); > if (*errp) { > - error_report_err(*errp); > + error_reportf_err(*errp, "Error: "); > } > } > > changing: > > No block device supports snapshots > > > to: > > Error: No block device supports snapshots > > so you could check for Error: at the start and know that you've hit some > unknown error? That certainly would help, provided that the message is not localized in any way. I'd be very glad to see it also for 'savevm' and 'delvm'. It's a shame that drive_add and drive_del don't have that either, but they will soon become unused.
On 06.09.18 13:11, Daniel Henrique Barboza wrote: > changes in v2: > - removed the "RFC" marker; > - added a new patch (patch 2) that removes > bdrv_snapshot_delete_by_id_or_name from the code; > - made changes in patch 1 as suggested by Murilo; > - previous patch set link: > https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > > > It is not uncommon to see bugs being opened by testers that attempt to > create VM snapshots using HMP. It turns out that "0" and "1" are quite > common snapshot names and they trigger a lot of bugs. I gave an example > in the commit message of patch 1, but to sum up here: QEMU treats the > input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > is documented as such, but this can lead to strange situations. > > Given that it is strange for an API to consider a parameter to be 2 fields > at the same time, and inadvently treating them as one or the other, and > that removing the ID field is too drastic, my idea here is to keep the > ID field for internal control, but do not let the user set it. > > I guess there's room for discussion about considering this change an API > change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, > but simplifying the meaning of the parameters of savevm/loadvm/delvm. (Yes, very late reply, I'm sorry...) I do think it affects users of HMP, because right now you can delete snapshots with their ID, and after this series you cannot. I think we had a short discussion about just disallowing numeric snapshot names. How bad would that be? Max
Am 09.01.2019 um 15:10 hat Max Reitz geschrieben: > On 06.09.18 13:11, Daniel Henrique Barboza wrote: > > changes in v2: > > - removed the "RFC" marker; > > - added a new patch (patch 2) that removes > > bdrv_snapshot_delete_by_id_or_name from the code; > > - made changes in patch 1 as suggested by Murilo; > > - previous patch set link: > > https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > > > > > > It is not uncommon to see bugs being opened by testers that attempt to > > create VM snapshots using HMP. It turns out that "0" and "1" are quite > > common snapshot names and they trigger a lot of bugs. I gave an example > > in the commit message of patch 1, but to sum up here: QEMU treats the > > input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > > is documented as such, but this can lead to strange situations. > > > > Given that it is strange for an API to consider a parameter to be 2 fields > > at the same time, and inadvently treating them as one or the other, and > > that removing the ID field is too drastic, my idea here is to keep the > > ID field for internal control, but do not let the user set it. > > > > I guess there's room for discussion about considering this change an API > > change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, > > but simplifying the meaning of the parameters of savevm/loadvm/delvm. > > (Yes, very late reply, I'm sorry...) > > I do think it affects users of HMP, because right now you can delete > snapshots with their ID, and after this series you cannot. Can there be snapshots that can't be identified by a snapshot name, but only by their ID? > I think we had a short discussion about just disallowing numeric > snapshot names. How bad would that be? It would be incompatible with existing images and result in a more complex snapshot identifier resolution. Why would it be any better? Kevin
On 09.01.19 15:21, Kevin Wolf wrote: > Am 09.01.2019 um 15:10 hat Max Reitz geschrieben: >> On 06.09.18 13:11, Daniel Henrique Barboza wrote: >>> changes in v2: >>> - removed the "RFC" marker; >>> - added a new patch (patch 2) that removes >>> bdrv_snapshot_delete_by_id_or_name from the code; >>> - made changes in patch 1 as suggested by Murilo; >>> - previous patch set link: >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html >>> >>> >>> It is not uncommon to see bugs being opened by testers that attempt to >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite >>> common snapshot names and they trigger a lot of bugs. I gave an example >>> in the commit message of patch 1, but to sum up here: QEMU treats the >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It >>> is documented as such, but this can lead to strange situations. >>> >>> Given that it is strange for an API to consider a parameter to be 2 fields >>> at the same time, and inadvently treating them as one or the other, and >>> that removing the ID field is too drastic, my idea here is to keep the >>> ID field for internal control, but do not let the user set it. >>> >>> I guess there's room for discussion about considering this change an API >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. >> >> (Yes, very late reply, I'm sorry...) >> >> I do think it affects users of HMP, because right now you can delete >> snapshots with their ID, and after this series you cannot. > > Can there be snapshots that can't be identified by a snapshot name, but > only by their ID? I don't know, but what I meant is that if you have scripts to do all this, you might have to adjust them with this change. >> I think we had a short discussion about just disallowing numeric >> snapshot names. How bad would that be? > > It would be incompatible with existing images and result in a more > complex snapshot identifier resolution. Why would it be any better? It wouldn't be incompatible with existing images if we'd just disallow creating such snapshots. I don't see how the identifier resolution would be more complex. I don't know if it'd be better. I'd just find it simpler (for us, that is -- for users, I'm not sure). Max
Am 09.01.2019 um 15:27 hat Max Reitz geschrieben: > On 09.01.19 15:21, Kevin Wolf wrote: > > Am 09.01.2019 um 15:10 hat Max Reitz geschrieben: > >> On 06.09.18 13:11, Daniel Henrique Barboza wrote: > >>> changes in v2: > >>> - removed the "RFC" marker; > >>> - added a new patch (patch 2) that removes > >>> bdrv_snapshot_delete_by_id_or_name from the code; > >>> - made changes in patch 1 as suggested by Murilo; > >>> - previous patch set link: > >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > >>> > >>> > >>> It is not uncommon to see bugs being opened by testers that attempt to > >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite > >>> common snapshot names and they trigger a lot of bugs. I gave an example > >>> in the commit message of patch 1, but to sum up here: QEMU treats the > >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > >>> is documented as such, but this can lead to strange situations. > >>> > >>> Given that it is strange for an API to consider a parameter to be 2 fields > >>> at the same time, and inadvently treating them as one or the other, and > >>> that removing the ID field is too drastic, my idea here is to keep the > >>> ID field for internal control, but do not let the user set it. > >>> > >>> I guess there's room for discussion about considering this change an API > >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, > >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. > >> > >> (Yes, very late reply, I'm sorry...) > >> > >> I do think it affects users of HMP, because right now you can delete > >> snapshots with their ID, and after this series you cannot. > > > > Can there be snapshots that can't be identified by a snapshot name, but > > only by their ID? > > I don't know, but what I meant is that if you have scripts to do all > this, you might have to adjust them with this change. That's what the H in HMP means. > >> I think we had a short discussion about just disallowing numeric > >> snapshot names. How bad would that be? > > > > It would be incompatible with existing images and result in a more > > complex snapshot identifier resolution. Why would it be any better? > > It wouldn't be incompatible with existing images if we'd just disallow > creating such snapshots. I don't see how the identifier resolution > would be more complex. > > I don't know if it'd be better. I'd just find it simpler (for us, that > is -- for users, I'm not sure). Identifier resolution A: - Find a snapshot that has the given identifier as a name - If no such snapshot exists, it is an error Identifier resolution B: - If the identifier is a number, find a snapshot that has the given identifier as its ID - If the identifier is not a number, find a snapshot that has the given identifier as a name - If no such snapshot exists, it is an error Isn't it rather obvious that B is more complex than A? Kevin
On 09.01.19 15:48, Kevin Wolf wrote: > Am 09.01.2019 um 15:27 hat Max Reitz geschrieben: >> On 09.01.19 15:21, Kevin Wolf wrote: >>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben: >>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote: >>>>> changes in v2: >>>>> - removed the "RFC" marker; >>>>> - added a new patch (patch 2) that removes >>>>> bdrv_snapshot_delete_by_id_or_name from the code; >>>>> - made changes in patch 1 as suggested by Murilo; >>>>> - previous patch set link: >>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html >>>>> >>>>> >>>>> It is not uncommon to see bugs being opened by testers that attempt to >>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite >>>>> common snapshot names and they trigger a lot of bugs. I gave an example >>>>> in the commit message of patch 1, but to sum up here: QEMU treats the >>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It >>>>> is documented as such, but this can lead to strange situations. >>>>> >>>>> Given that it is strange for an API to consider a parameter to be 2 fields >>>>> at the same time, and inadvently treating them as one or the other, and >>>>> that removing the ID field is too drastic, my idea here is to keep the >>>>> ID field for internal control, but do not let the user set it. >>>>> >>>>> I guess there's room for discussion about considering this change an API >>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, >>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. >>>> >>>> (Yes, very late reply, I'm sorry...) >>>> >>>> I do think it affects users of HMP, because right now you can delete >>>> snapshots with their ID, and after this series you cannot. >>> >>> Can there be snapshots that can't be identified by a snapshot name, but >>> only by their ID? >> >> I don't know, but what I meant is that if you have scripts to do all >> this, you might have to adjust them with this change. > > That's what the H in HMP means. > >>>> I think we had a short discussion about just disallowing numeric >>>> snapshot names. How bad would that be? >>> >>> It would be incompatible with existing images and result in a more >>> complex snapshot identifier resolution. Why would it be any better? >> >> It wouldn't be incompatible with existing images if we'd just disallow >> creating such snapshots. I don't see how the identifier resolution >> would be more complex. >> >> I don't know if it'd be better. I'd just find it simpler (for us, that >> is -- for users, I'm not sure). > > Identifier resolution A: > - Find a snapshot that has the given identifier as a name > - If no such snapshot exists, it is an error > > Identifier resolution B: > - If the identifier is a number, find a snapshot that has the given > identifier as its ID > - If the identifier is not a number, find a snapshot that has the given > identifier as a name > - If no such snapshot exists, it is an error No, my idea was to keep the resolution the same as it is; just to forbid creating new snapshots with numeric names. This would prevent users from getting into the whole situation. Max
Am 09.01.2019 um 15:54 hat Max Reitz geschrieben: > On 09.01.19 15:48, Kevin Wolf wrote: > > Am 09.01.2019 um 15:27 hat Max Reitz geschrieben: > >> On 09.01.19 15:21, Kevin Wolf wrote: > >>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben: > >>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote: > >>>>> changes in v2: > >>>>> - removed the "RFC" marker; > >>>>> - added a new patch (patch 2) that removes > >>>>> bdrv_snapshot_delete_by_id_or_name from the code; > >>>>> - made changes in patch 1 as suggested by Murilo; > >>>>> - previous patch set link: > >>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > >>>>> > >>>>> > >>>>> It is not uncommon to see bugs being opened by testers that attempt to > >>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite > >>>>> common snapshot names and they trigger a lot of bugs. I gave an example > >>>>> in the commit message of patch 1, but to sum up here: QEMU treats the > >>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > >>>>> is documented as such, but this can lead to strange situations. > >>>>> > >>>>> Given that it is strange for an API to consider a parameter to be 2 fields > >>>>> at the same time, and inadvently treating them as one or the other, and > >>>>> that removing the ID field is too drastic, my idea here is to keep the > >>>>> ID field for internal control, but do not let the user set it. > >>>>> > >>>>> I guess there's room for discussion about considering this change an API > >>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, > >>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. > >>>> > >>>> (Yes, very late reply, I'm sorry...) > >>>> > >>>> I do think it affects users of HMP, because right now you can delete > >>>> snapshots with their ID, and after this series you cannot. > >>> > >>> Can there be snapshots that can't be identified by a snapshot name, but > >>> only by their ID? > >> > >> I don't know, but what I meant is that if you have scripts to do all > >> this, you might have to adjust them with this change. > > > > That's what the H in HMP means. > > > >>>> I think we had a short discussion about just disallowing numeric > >>>> snapshot names. How bad would that be? > >>> > >>> It would be incompatible with existing images and result in a more > >>> complex snapshot identifier resolution. Why would it be any better? > >> > >> It wouldn't be incompatible with existing images if we'd just disallow > >> creating such snapshots. I don't see how the identifier resolution > >> would be more complex. > >> > >> I don't know if it'd be better. I'd just find it simpler (for us, that > >> is -- for users, I'm not sure). > > > > Identifier resolution A: > > - Find a snapshot that has the given identifier as a name > > - If no such snapshot exists, it is an error > > > > Identifier resolution B: > > - If the identifier is a number, find a snapshot that has the given > > identifier as its ID > > - If the identifier is not a number, find a snapshot that has the given > > identifier as a name > > - If no such snapshot exists, it is an error > > No, my idea was to keep the resolution the same as it is; just to forbid > creating new snapshots with numeric names. This would prevent users > from getting into the whole situation. That's the version with an even more complex resolution method C. :-) I actually think the current behaviour is more confusing than helpful. Without looking into the code or trying it out, I couldn't even tell whether ID or name takes precedence if there is a matching snapshot for both. Considering your proposal, it's probably the ID, but how should a user know that? (If against all expectations documentation exists, it doesn't count because nobody reads that.) Kevin
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 09.01.2019 um 15:54 hat Max Reitz geschrieben: > > On 09.01.19 15:48, Kevin Wolf wrote: > > > Am 09.01.2019 um 15:27 hat Max Reitz geschrieben: > > >> On 09.01.19 15:21, Kevin Wolf wrote: > > >>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben: > > >>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote: > > >>>>> changes in v2: > > >>>>> - removed the "RFC" marker; > > >>>>> - added a new patch (patch 2) that removes > > >>>>> bdrv_snapshot_delete_by_id_or_name from the code; > > >>>>> - made changes in patch 1 as suggested by Murilo; > > >>>>> - previous patch set link: > > >>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > > >>>>> > > >>>>> > > >>>>> It is not uncommon to see bugs being opened by testers that attempt to > > >>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite > > >>>>> common snapshot names and they trigger a lot of bugs. I gave an example > > >>>>> in the commit message of patch 1, but to sum up here: QEMU treats the > > >>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > > >>>>> is documented as such, but this can lead to strange situations. > > >>>>> > > >>>>> Given that it is strange for an API to consider a parameter to be 2 fields > > >>>>> at the same time, and inadvently treating them as one or the other, and > > >>>>> that removing the ID field is too drastic, my idea here is to keep the > > >>>>> ID field for internal control, but do not let the user set it. > > >>>>> > > >>>>> I guess there's room for discussion about considering this change an API > > >>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, > > >>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. > > >>>> > > >>>> (Yes, very late reply, I'm sorry...) > > >>>> > > >>>> I do think it affects users of HMP, because right now you can delete > > >>>> snapshots with their ID, and after this series you cannot. > > >>> > > >>> Can there be snapshots that can't be identified by a snapshot name, but > > >>> only by their ID? > > >> > > >> I don't know, but what I meant is that if you have scripts to do all > > >> this, you might have to adjust them with this change. > > > > > > That's what the H in HMP means. > > > > > >>>> I think we had a short discussion about just disallowing numeric > > >>>> snapshot names. How bad would that be? > > >>> > > >>> It would be incompatible with existing images and result in a more > > >>> complex snapshot identifier resolution. Why would it be any better? > > >> > > >> It wouldn't be incompatible with existing images if we'd just disallow > > >> creating such snapshots. I don't see how the identifier resolution > > >> would be more complex. > > >> > > >> I don't know if it'd be better. I'd just find it simpler (for us, that > > >> is -- for users, I'm not sure). > > > > > > Identifier resolution A: > > > - Find a snapshot that has the given identifier as a name > > > - If no such snapshot exists, it is an error > > > > > > Identifier resolution B: > > > - If the identifier is a number, find a snapshot that has the given > > > identifier as its ID > > > - If the identifier is not a number, find a snapshot that has the given > > > identifier as a name > > > - If no such snapshot exists, it is an error > > > > No, my idea was to keep the resolution the same as it is; just to forbid > > creating new snapshots with numeric names. This would prevent users > > from getting into the whole situation. > > That's the version with an even more complex resolution method C. :-) > > I actually think the current behaviour is more confusing than helpful. > Without looking into the code or trying it out, I couldn't even tell > whether ID or name takes precedence if there is a matching snapshot for > both. Considering your proposal, it's probably the ID, but how should a > user know that? (If against all expectations documentation exists, it > doesn't count because nobody reads that.) Would adding a flag to the HMP commands to make it explicit help? Dave > Kevin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Kevin Wolf <kwolf@redhat.com> writes: > Am 09.01.2019 um 15:54 hat Max Reitz geschrieben: >> On 09.01.19 15:48, Kevin Wolf wrote: >> > Am 09.01.2019 um 15:27 hat Max Reitz geschrieben: >> >> On 09.01.19 15:21, Kevin Wolf wrote: >> >>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben: >> >>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote: >> >>>>> changes in v2: >> >>>>> - removed the "RFC" marker; >> >>>>> - added a new patch (patch 2) that removes >> >>>>> bdrv_snapshot_delete_by_id_or_name from the code; >> >>>>> - made changes in patch 1 as suggested by Murilo; >> >>>>> - previous patch set link: >> >>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html >> >>>>> >> >>>>> >> >>>>> It is not uncommon to see bugs being opened by testers that attempt to >> >>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite >> >>>>> common snapshot names and they trigger a lot of bugs. I gave an example >> >>>>> in the commit message of patch 1, but to sum up here: QEMU treats the >> >>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It >> >>>>> is documented as such, but this can lead to strange situations. >> >>>>> >> >>>>> Given that it is strange for an API to consider a parameter to be 2 fields >> >>>>> at the same time, and inadvently treating them as one or the other, and >> >>>>> that removing the ID field is too drastic, my idea here is to keep the >> >>>>> ID field for internal control, but do not let the user set it. >> >>>>> >> >>>>> I guess there's room for discussion about considering this change an API >> >>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, >> >>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. >> >>>> >> >>>> (Yes, very late reply, I'm sorry...) >> >>>> >> >>>> I do think it affects users of HMP, because right now you can delete >> >>>> snapshots with their ID, and after this series you cannot. >> >>> >> >>> Can there be snapshots that can't be identified by a snapshot name, but >> >>> only by their ID? >> >> >> >> I don't know, but what I meant is that if you have scripts to do all >> >> this, you might have to adjust them with this change. >> > >> > That's what the H in HMP means. >> > >> >>>> I think we had a short discussion about just disallowing numeric >> >>>> snapshot names. How bad would that be? >> >>> >> >>> It would be incompatible with existing images and result in a more >> >>> complex snapshot identifier resolution. Why would it be any better? >> >> >> >> It wouldn't be incompatible with existing images if we'd just disallow >> >> creating such snapshots. I don't see how the identifier resolution >> >> would be more complex. >> >> >> >> I don't know if it'd be better. I'd just find it simpler (for us, that >> >> is -- for users, I'm not sure). >> > >> > Identifier resolution A: >> > - Find a snapshot that has the given identifier as a name >> > - If no such snapshot exists, it is an error >> > >> > Identifier resolution B: >> > - If the identifier is a number, find a snapshot that has the given >> > identifier as its ID >> > - If the identifier is not a number, find a snapshot that has the given >> > identifier as a name >> > - If no such snapshot exists, it is an error >> >> No, my idea was to keep the resolution the same as it is; just to forbid >> creating new snapshots with numeric names. This would prevent users >> from getting into the whole situation. > > That's the version with an even more complex resolution method C. :-) > > I actually think the current behaviour is more confusing than helpful. > Without looking into the code or trying it out, I couldn't even tell > whether ID or name takes precedence if there is a matching snapshot for > both. Been there, done that, more than once. > Considering your proposal, it's probably the ID, but how should a > user know that? (If against all expectations documentation exists, it > doesn't count because nobody reads that.) In this case, probably for the better --- I'd expect documentation of this mess (if any) to be rather losely related to the code.
On 09.01.19 16:13, Kevin Wolf wrote: > Am 09.01.2019 um 15:54 hat Max Reitz geschrieben: >> On 09.01.19 15:48, Kevin Wolf wrote: >>> Am 09.01.2019 um 15:27 hat Max Reitz geschrieben: >>>> On 09.01.19 15:21, Kevin Wolf wrote: >>>>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben: >>>>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote: >>>>>>> changes in v2: >>>>>>> - removed the "RFC" marker; >>>>>>> - added a new patch (patch 2) that removes >>>>>>> bdrv_snapshot_delete_by_id_or_name from the code; >>>>>>> - made changes in patch 1 as suggested by Murilo; >>>>>>> - previous patch set link: >>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html >>>>>>> >>>>>>> >>>>>>> It is not uncommon to see bugs being opened by testers that attempt to >>>>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite >>>>>>> common snapshot names and they trigger a lot of bugs. I gave an example >>>>>>> in the commit message of patch 1, but to sum up here: QEMU treats the >>>>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It >>>>>>> is documented as such, but this can lead to strange situations. >>>>>>> >>>>>>> Given that it is strange for an API to consider a parameter to be 2 fields >>>>>>> at the same time, and inadvently treating them as one or the other, and >>>>>>> that removing the ID field is too drastic, my idea here is to keep the >>>>>>> ID field for internal control, but do not let the user set it. >>>>>>> >>>>>>> I guess there's room for discussion about considering this change an API >>>>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, >>>>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. >>>>>> >>>>>> (Yes, very late reply, I'm sorry...) >>>>>> >>>>>> I do think it affects users of HMP, because right now you can delete >>>>>> snapshots with their ID, and after this series you cannot. >>>>> >>>>> Can there be snapshots that can't be identified by a snapshot name, but >>>>> only by their ID? >>>> >>>> I don't know, but what I meant is that if you have scripts to do all >>>> this, you might have to adjust them with this change. >>> >>> That's what the H in HMP means. >>> >>>>>> I think we had a short discussion about just disallowing numeric >>>>>> snapshot names. How bad would that be? >>>>> >>>>> It would be incompatible with existing images and result in a more >>>>> complex snapshot identifier resolution. Why would it be any better? >>>> >>>> It wouldn't be incompatible with existing images if we'd just disallow >>>> creating such snapshots. I don't see how the identifier resolution >>>> would be more complex. >>>> >>>> I don't know if it'd be better. I'd just find it simpler (for us, that >>>> is -- for users, I'm not sure). >>> >>> Identifier resolution A: >>> - Find a snapshot that has the given identifier as a name >>> - If no such snapshot exists, it is an error >>> >>> Identifier resolution B: >>> - If the identifier is a number, find a snapshot that has the given >>> identifier as its ID >>> - If the identifier is not a number, find a snapshot that has the given >>> identifier as a name >>> - If no such snapshot exists, it is an error >> >> No, my idea was to keep the resolution the same as it is; just to forbid >> creating new snapshots with numeric names. This would prevent users >> from getting into the whole situation. > > That's the version with an even more complex resolution method C. :-) How so if the resolution method stays the same? Because it already is too complex? If so, yes, that is an argument. I was arguing for the simplest patch instead of the simplest code, true. > I actually think the current behaviour is more confusing than helpful. > Without looking into the code or trying it out, I couldn't even tell > whether ID or name takes precedence if there is a matching snapshot for > both. Considering your proposal, it's probably the ID, but how should a > user know that? (If against all expectations documentation exists, it > doesn't count because nobody reads that.) It isn't more confusing than it is right now. With my proposal, all current images are simply as confusing as they are right now (I think ID takes precedence, yes), but if you create new snapshots, it's clear, since you simply cannot create names that could be IDs. Max
Am 09.01.2019 um 17:27 hat Max Reitz geschrieben: > On 09.01.19 16:13, Kevin Wolf wrote: > > Am 09.01.2019 um 15:54 hat Max Reitz geschrieben: > >> On 09.01.19 15:48, Kevin Wolf wrote: > >>> Am 09.01.2019 um 15:27 hat Max Reitz geschrieben: > >>>> On 09.01.19 15:21, Kevin Wolf wrote: > >>>>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben: > >>>>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote: > >>>>>>> changes in v2: > >>>>>>> - removed the "RFC" marker; > >>>>>>> - added a new patch (patch 2) that removes > >>>>>>> bdrv_snapshot_delete_by_id_or_name from the code; > >>>>>>> - made changes in patch 1 as suggested by Murilo; > >>>>>>> - previous patch set link: > >>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > >>>>>>> > >>>>>>> > >>>>>>> It is not uncommon to see bugs being opened by testers that attempt to > >>>>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite > >>>>>>> common snapshot names and they trigger a lot of bugs. I gave an example > >>>>>>> in the commit message of patch 1, but to sum up here: QEMU treats the > >>>>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > >>>>>>> is documented as such, but this can lead to strange situations. > >>>>>>> > >>>>>>> Given that it is strange for an API to consider a parameter to be 2 fields > >>>>>>> at the same time, and inadvently treating them as one or the other, and > >>>>>>> that removing the ID field is too drastic, my idea here is to keep the > >>>>>>> ID field for internal control, but do not let the user set it. > >>>>>>> > >>>>>>> I guess there's room for discussion about considering this change an API > >>>>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, > >>>>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. > >>>>>> > >>>>>> (Yes, very late reply, I'm sorry...) > >>>>>> > >>>>>> I do think it affects users of HMP, because right now you can delete > >>>>>> snapshots with their ID, and after this series you cannot. > >>>>> > >>>>> Can there be snapshots that can't be identified by a snapshot name, but > >>>>> only by their ID? > >>>> > >>>> I don't know, but what I meant is that if you have scripts to do all > >>>> this, you might have to adjust them with this change. > >>> > >>> That's what the H in HMP means. > >>> > >>>>>> I think we had a short discussion about just disallowing numeric > >>>>>> snapshot names. How bad would that be? > >>>>> > >>>>> It would be incompatible with existing images and result in a more > >>>>> complex snapshot identifier resolution. Why would it be any better? > >>>> > >>>> It wouldn't be incompatible with existing images if we'd just disallow > >>>> creating such snapshots. I don't see how the identifier resolution > >>>> would be more complex. > >>>> > >>>> I don't know if it'd be better. I'd just find it simpler (for us, that > >>>> is -- for users, I'm not sure). > >>> > >>> Identifier resolution A: > >>> - Find a snapshot that has the given identifier as a name > >>> - If no such snapshot exists, it is an error > >>> > >>> Identifier resolution B: > >>> - If the identifier is a number, find a snapshot that has the given > >>> identifier as its ID > >>> - If the identifier is not a number, find a snapshot that has the given > >>> identifier as a name > >>> - If no such snapshot exists, it is an error > >> > >> No, my idea was to keep the resolution the same as it is; just to forbid > >> creating new snapshots with numeric names. This would prevent users > >> from getting into the whole situation. > > > > That's the version with an even more complex resolution method C. :-) > > How so if the resolution method stays the same? Because it already is > too complex? > > If so, yes, that is an argument. I was arguing for the simplest patch > instead of the simplest code, true. Yes, because it already is too complex. Not even necessarily the code (even though that's true as well), but most importantly the interface. > > I actually think the current behaviour is more confusing than helpful. > > Without looking into the code or trying it out, I couldn't even tell > > whether ID or name takes precedence if there is a matching snapshot for > > both. Considering your proposal, it's probably the ID, but how should a > > user know that? (If against all expectations documentation exists, it > > doesn't count because nobody reads that.) > > It isn't more confusing than it is right now. With my proposal, all > current images are simply as confusing as they are right now (I think ID > takes precedence, yes), but if you create new snapshots, it's clear, > since you simply cannot create names that could be IDs. I agree. But wasn't the goal of the patch to make it less confusing than it is right now? Kevin
On 1/9/19 12:10 PM, Max Reitz wrote: > On 06.09.18 13:11, Daniel Henrique Barboza wrote: >> changes in v2: >> - removed the "RFC" marker; >> - added a new patch (patch 2) that removes >> bdrv_snapshot_delete_by_id_or_name from the code; >> - made changes in patch 1 as suggested by Murilo; >> - previous patch set link: >> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html >> >> >> It is not uncommon to see bugs being opened by testers that attempt to >> create VM snapshots using HMP. It turns out that "0" and "1" are quite >> common snapshot names and they trigger a lot of bugs. I gave an example >> in the commit message of patch 1, but to sum up here: QEMU treats the >> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It >> is documented as such, but this can lead to strange situations. >> >> Given that it is strange for an API to consider a parameter to be 2 fields >> at the same time, and inadvently treating them as one or the other, and >> that removing the ID field is too drastic, my idea here is to keep the >> ID field for internal control, but do not let the user set it. >> >> I guess there's room for discussion about considering this change an API >> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, >> but simplifying the meaning of the parameters of savevm/loadvm/delvm. > (Yes, very late reply, I'm sorry...) > > I do think it affects users of HMP, because right now you can delete > snapshots with their ID, and after this series you cannot. That's true. My idea here was simple: the user can't reliably exclude via snapshot ID today because we're hiding the ID field in info snapshots: (qemu) savevm 0 (qemu) info snapshots List of snapshots present on all disks: ID TAG VM SIZE DATE VM CLOCK -- 0 741M 2018-07-31 13:39:56 00:41:25.313 Thus, what will end up happening is that the user will be forced to use the TAG of the snapshot since this is the only available information. > > I think we had a short discussion about just disallowing numeric > snapshot names. How bad would that be? This was my first idea when evaluating what to do in this case. I gave it up because I found it to be too extreme. People would start complaining "I was able to do savevm 0 and now I can't". > > Max >
On 09.01.19 17:45, Kevin Wolf wrote: > Am 09.01.2019 um 17:27 hat Max Reitz geschrieben: >> On 09.01.19 16:13, Kevin Wolf wrote: >>> Am 09.01.2019 um 15:54 hat Max Reitz geschrieben: >>>> On 09.01.19 15:48, Kevin Wolf wrote: >>>>> Am 09.01.2019 um 15:27 hat Max Reitz geschrieben: >>>>>> On 09.01.19 15:21, Kevin Wolf wrote: >>>>>>> Am 09.01.2019 um 15:10 hat Max Reitz geschrieben: >>>>>>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote: >>>>>>>>> changes in v2: >>>>>>>>> - removed the "RFC" marker; >>>>>>>>> - added a new patch (patch 2) that removes >>>>>>>>> bdrv_snapshot_delete_by_id_or_name from the code; >>>>>>>>> - made changes in patch 1 as suggested by Murilo; >>>>>>>>> - previous patch set link: >>>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html >>>>>>>>> >>>>>>>>> >>>>>>>>> It is not uncommon to see bugs being opened by testers that attempt to >>>>>>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite >>>>>>>>> common snapshot names and they trigger a lot of bugs. I gave an example >>>>>>>>> in the commit message of patch 1, but to sum up here: QEMU treats the >>>>>>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It >>>>>>>>> is documented as such, but this can lead to strange situations. >>>>>>>>> >>>>>>>>> Given that it is strange for an API to consider a parameter to be 2 fields >>>>>>>>> at the same time, and inadvently treating them as one or the other, and >>>>>>>>> that removing the ID field is too drastic, my idea here is to keep the >>>>>>>>> ID field for internal control, but do not let the user set it. >>>>>>>>> >>>>>>>>> I guess there's room for discussion about considering this change an API >>>>>>>>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, >>>>>>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. >>>>>>>> >>>>>>>> (Yes, very late reply, I'm sorry...) >>>>>>>> >>>>>>>> I do think it affects users of HMP, because right now you can delete >>>>>>>> snapshots with their ID, and after this series you cannot. >>>>>>> >>>>>>> Can there be snapshots that can't be identified by a snapshot name, but >>>>>>> only by their ID? >>>>>> >>>>>> I don't know, but what I meant is that if you have scripts to do all >>>>>> this, you might have to adjust them with this change. >>>>> >>>>> That's what the H in HMP means. >>>>> >>>>>>>> I think we had a short discussion about just disallowing numeric >>>>>>>> snapshot names. How bad would that be? >>>>>>> >>>>>>> It would be incompatible with existing images and result in a more >>>>>>> complex snapshot identifier resolution. Why would it be any better? >>>>>> >>>>>> It wouldn't be incompatible with existing images if we'd just disallow >>>>>> creating such snapshots. I don't see how the identifier resolution >>>>>> would be more complex. >>>>>> >>>>>> I don't know if it'd be better. I'd just find it simpler (for us, that >>>>>> is -- for users, I'm not sure). >>>>> >>>>> Identifier resolution A: >>>>> - Find a snapshot that has the given identifier as a name >>>>> - If no such snapshot exists, it is an error >>>>> >>>>> Identifier resolution B: >>>>> - If the identifier is a number, find a snapshot that has the given >>>>> identifier as its ID >>>>> - If the identifier is not a number, find a snapshot that has the given >>>>> identifier as a name >>>>> - If no such snapshot exists, it is an error >>>> >>>> No, my idea was to keep the resolution the same as it is; just to forbid >>>> creating new snapshots with numeric names. This would prevent users >>>> from getting into the whole situation. >>> >>> That's the version with an even more complex resolution method C. :-) >> >> How so if the resolution method stays the same? Because it already is >> too complex? >> >> If so, yes, that is an argument. I was arguing for the simplest patch >> instead of the simplest code, true. > > Yes, because it already is too complex. Not even necessarily the code > (even though that's true as well), but most importantly the interface. > >>> I actually think the current behaviour is more confusing than helpful. >>> Without looking into the code or trying it out, I couldn't even tell >>> whether ID or name takes precedence if there is a matching snapshot for >>> both. Considering your proposal, it's probably the ID, but how should a >>> user know that? (If against all expectations documentation exists, it >>> doesn't count because nobody reads that.) >> >> It isn't more confusing than it is right now. With my proposal, all >> current images are simply as confusing as they are right now (I think ID >> takes precedence, yes), but if you create new snapshots, it's clear, >> since you simply cannot create names that could be IDs. > > I agree. But wasn't the goal of the patch to make it less confusing than > it is right now? My approach would make it less confusing, too, just not for old images. But if you have images lying around with numeric snapshot names, chances are you must have realized at some point that the ID takes precedence. But of course my approach would have to have benefits over this series, as the latter exists already. Hm. The interface and code would remain more complex, so that's a bad point. A good point is that I think it's the less dangerous incompatible change. What's the worst that can happen if we disallow giving numeric names? You get an error with something that used to work. Ouch, but not horrible. What's the worst that could happen with this series? You used to give all of your snapshots numeric names, but somehow you were able to deal with it by always using the IDs when specifying existing snapshots.[1] Now the behavior is changed and the name takes precedence -- so maybe you delete the wrong snapshot or revert to the wrong one. That's not so good. [1] I admit that it's more reasonable to assume you were just lucky it worked, because for every snapshot, its name just happened to match its ID. If so, changing behavior doesn't bite you. So I'm unsure. I agree that this series is probably what we would have wanted from the start, but I don't know if the change isn't a bit adventurous. Max
On 09.01.19 17:57, Daniel Henrique Barboza wrote: > > > On 1/9/19 12:10 PM, Max Reitz wrote: >> On 06.09.18 13:11, Daniel Henrique Barboza wrote: >>> changes in v2: >>> - removed the "RFC" marker; >>> - added a new patch (patch 2) that removes >>> bdrv_snapshot_delete_by_id_or_name from the code; >>> - made changes in patch 1 as suggested by Murilo; >>> - previous patch set link: >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html >>> >>> >>> It is not uncommon to see bugs being opened by testers that attempt to >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite >>> common snapshot names and they trigger a lot of bugs. I gave an example >>> in the commit message of patch 1, but to sum up here: QEMU treats the >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It >>> is documented as such, but this can lead to strange situations. >>> >>> Given that it is strange for an API to consider a parameter to be 2 >>> fields >>> at the same time, and inadvently treating them as one or the other, and >>> that removing the ID field is too drastic, my idea here is to keep the >>> ID field for internal control, but do not let the user set it. >>> >>> I guess there's room for discussion about considering this change an API >>> change or not. It doesn't affect users of HMP and it doesn't affect >>> Libvirt, >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. >> (Yes, very late reply, I'm sorry...) >> >> I do think it affects users of HMP, because right now you can delete >> snapshots with their ID, and after this series you cannot. > > That's true. My idea here was simple: the user can't reliably exclude > via snapshot ID today > because we're hiding the ID field in info snapshots: > > > (qemu) savevm 0 > (qemu) info snapshots > List of snapshots present on all disks: > ID TAG VM SIZE DATE VM CLOCK > -- 0 741M 2018-07-31 13:39:56 00:41:25.313 > > > Thus, what will end up happening is that the user will be forced to use the > TAG of the snapshot since this is the only available information. But you can get it through e.g. qemu-img info. Snapshot list: ID TAG VM SIZE DATE VM CLOCK 1 0 1.6M 2019-01-09 18:01:21 00:00:02.657 So it's not impossible to get. >> I think we had a short discussion about just disallowing numeric >> snapshot names. How bad would that be? > > > This was my first idea when evaluating what to do in this case. I gave > it up because > I found it to be too extreme. People would start complaining "I was able > to do > savevm 0 and now I can't". True. But it wouldn't be impossible to do, we'd need to deprecate numeric names, print a warning for two releases, and then we can make it an error. Hm... If we had a proper deprecation warning in this series, I suppose it wouldn't be dangerous anymore. Can we just print a warning whenever the user specified an ID? (i.e. if some snapshot's ID matches the string given by the user and the snapshot's name does not) Max
On 1/9/19 10:57 AM, Daniel Henrique Barboza wrote: >> I think we had a short discussion about just disallowing numeric >> snapshot names. How bad would that be? > > > This was my first idea when evaluating what to do in this case. I gave > it up because > I found it to be too extreme. People would start complaining "I was able > to do > savevm 0 and now I can't". I could live with that, especially if it serves to prevent even more bugs of "I did this and now my image is broken" (I, for one, have known about the confusion of breaking an image by using a number as a snapshot name as far back as 2011). What I can't live with is: "I have an old image where I did savevm 0, and now want to modernize the image, but the new tools refuse to even let me read from name 0". I also agree that the code is already so hairy, and the reason we have punted on solving the issue (for 8 years now!) is because there are so many corner cases to consider. It also means that I'm reluctant to review the patch, because it will require a significant chunk of time and mental effort to ensure that whatever patch is proposed does not break something important. But in general, I'm glad that you are trying to get the issue fixed, even if the conversation on HOW to fix it is still undergoing a choice of bikeshed paint colors.
Am 09.01.2019 um 18:05 hat Max Reitz geschrieben: > On 09.01.19 17:57, Daniel Henrique Barboza wrote: > > > > > > On 1/9/19 12:10 PM, Max Reitz wrote: > >> On 06.09.18 13:11, Daniel Henrique Barboza wrote: > >>> changes in v2: > >>> - removed the "RFC" marker; > >>> - added a new patch (patch 2) that removes > >>> bdrv_snapshot_delete_by_id_or_name from the code; > >>> - made changes in patch 1 as suggested by Murilo; > >>> - previous patch set link: > >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > >>> > >>> > >>> It is not uncommon to see bugs being opened by testers that attempt to > >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite > >>> common snapshot names and they trigger a lot of bugs. I gave an example > >>> in the commit message of patch 1, but to sum up here: QEMU treats the > >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > >>> is documented as such, but this can lead to strange situations. > >>> > >>> Given that it is strange for an API to consider a parameter to be 2 > >>> fields > >>> at the same time, and inadvently treating them as one or the other, and > >>> that removing the ID field is too drastic, my idea here is to keep the > >>> ID field for internal control, but do not let the user set it. > >>> > >>> I guess there's room for discussion about considering this change an API > >>> change or not. It doesn't affect users of HMP and it doesn't affect > >>> Libvirt, > >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. > >> (Yes, very late reply, I'm sorry...) > >> > >> I do think it affects users of HMP, because right now you can delete > >> snapshots with their ID, and after this series you cannot. > > > > That's true. My idea here was simple: the user can't reliably exclude > > via snapshot ID today > > because we're hiding the ID field in info snapshots: > > > > > > (qemu) savevm 0 > > (qemu) info snapshots > > List of snapshots present on all disks: > > ID TAG VM SIZE DATE VM CLOCK > > -- 0 741M 2018-07-31 13:39:56 00:41:25.313 > > > > > > Thus, what will end up happening is that the user will be forced to use the > > TAG of the snapshot since this is the only available information. > > But you can get it through e.g. qemu-img info. > > Snapshot list: > ID TAG VM SIZE DATE VM CLOCK > 1 0 1.6M 2019-01-09 18:01:21 00:00:02.657 > > So it's not impossible to get. Is there a reason why we should display the ID at all when you can't use it any more to identify snapshots? > >> I think we had a short discussion about just disallowing numeric > >> snapshot names. How bad would that be? > > > > > > This was my first idea when evaluating what to do in this case. I gave > > it up because > > I found it to be too extreme. People would start complaining "I was able > > to do > > savevm 0 and now I can't". > > True. But it wouldn't be impossible to do, we'd need to deprecate > numeric names, print a warning for two releases, and then we can make it > an error. This. Is. HMP. Not a stable ABI, no deprecation period of two releases. > Hm... If we had a proper deprecation warning in this series, I suppose > it wouldn't be dangerous anymore. Can we just print a warning whenever > the user specified an ID? (i.e. if some snapshot's ID matches the > string given by the user and the snapshot's name does not) How is it less of a problem when a user gets QEMU from the distro and skips five releases? They will never see the warning. This is different from QMP where things are fixed in libvirt, which is tested with every single QEMU release. Kevin
On 1/9/19 3:05 PM, Max Reitz wrote: > On 09.01.19 17:57, Daniel Henrique Barboza wrote: >> >> On 1/9/19 12:10 PM, Max Reitz wrote: >>> On 06.09.18 13:11, Daniel Henrique Barboza wrote: >>>> changes in v2: >>>> - removed the "RFC" marker; >>>> - added a new patch (patch 2) that removes >>>> bdrv_snapshot_delete_by_id_or_name from the code; >>>> - made changes in patch 1 as suggested by Murilo; >>>> - previous patch set link: >>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html >>>> >>>> >>>> It is not uncommon to see bugs being opened by testers that attempt to >>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite >>>> common snapshot names and they trigger a lot of bugs. I gave an example >>>> in the commit message of patch 1, but to sum up here: QEMU treats the >>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It >>>> is documented as such, but this can lead to strange situations. >>>> >>>> Given that it is strange for an API to consider a parameter to be 2 >>>> fields >>>> at the same time, and inadvently treating them as one or the other, and >>>> that removing the ID field is too drastic, my idea here is to keep the >>>> ID field for internal control, but do not let the user set it. >>>> >>>> I guess there's room for discussion about considering this change an API >>>> change or not. It doesn't affect users of HMP and it doesn't affect >>>> Libvirt, >>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. >>> (Yes, very late reply, I'm sorry...) >>> >>> I do think it affects users of HMP, because right now you can delete >>> snapshots with their ID, and after this series you cannot. >> That's true. My idea here was simple: the user can't reliably exclude >> via snapshot ID today >> because we're hiding the ID field in info snapshots: >> >> >> (qemu) savevm 0 >> (qemu) info snapshots >> List of snapshots present on all disks: >> ID TAG VM SIZE DATE VM CLOCK >> -- 0 741M 2018-07-31 13:39:56 00:41:25.313 >> >> >> Thus, what will end up happening is that the user will be forced to use the >> TAG of the snapshot since this is the only available information. > But you can get it through e.g. qemu-img info. > > Snapshot list: > ID TAG VM SIZE DATE VM CLOCK > 1 0 1.6M 2019-01-09 18:01:21 00:00:02.657 > > So it's not impossible to get. Hmpf .... should we obscure the ID in this case as well? I mean, why we have the ID information here but not in "info snapshots"? >>> I think we had a short discussion about just disallowing numeric >>> snapshot names. How bad would that be? >> >> This was my first idea when evaluating what to do in this case. I gave >> it up because >> I found it to be too extreme. People would start complaining "I was able >> to do >> savevm 0 and now I can't". > True. But it wouldn't be impossible to do, we'd need to deprecate > numeric names, print a warning for two releases, and then we can make it > an error. > > Hm... If we had a proper deprecation warning in this series, I suppose > it wouldn't be dangerous anymore. Can we just print a warning whenever > the user specified an ID? (i.e. if some snapshot's ID matches the > string given by the user and the snapshot's name does not) I can live with that. > > Max >
On 09.01.19 18:20, Kevin Wolf wrote: > Am 09.01.2019 um 18:05 hat Max Reitz geschrieben: >> On 09.01.19 17:57, Daniel Henrique Barboza wrote: >>> >>> >>> On 1/9/19 12:10 PM, Max Reitz wrote: >>>> On 06.09.18 13:11, Daniel Henrique Barboza wrote: >>>>> changes in v2: >>>>> - removed the "RFC" marker; >>>>> - added a new patch (patch 2) that removes >>>>> bdrv_snapshot_delete_by_id_or_name from the code; >>>>> - made changes in patch 1 as suggested by Murilo; >>>>> - previous patch set link: >>>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html >>>>> >>>>> >>>>> It is not uncommon to see bugs being opened by testers that attempt to >>>>> create VM snapshots using HMP. It turns out that "0" and "1" are quite >>>>> common snapshot names and they trigger a lot of bugs. I gave an example >>>>> in the commit message of patch 1, but to sum up here: QEMU treats the >>>>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It >>>>> is documented as such, but this can lead to strange situations. >>>>> >>>>> Given that it is strange for an API to consider a parameter to be 2 >>>>> fields >>>>> at the same time, and inadvently treating them as one or the other, and >>>>> that removing the ID field is too drastic, my idea here is to keep the >>>>> ID field for internal control, but do not let the user set it. >>>>> >>>>> I guess there's room for discussion about considering this change an API >>>>> change or not. It doesn't affect users of HMP and it doesn't affect >>>>> Libvirt, >>>>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. >>>> (Yes, very late reply, I'm sorry...) >>>> >>>> I do think it affects users of HMP, because right now you can delete >>>> snapshots with their ID, and after this series you cannot. >>> >>> That's true. My idea here was simple: the user can't reliably exclude >>> via snapshot ID today >>> because we're hiding the ID field in info snapshots: >>> >>> >>> (qemu) savevm 0 >>> (qemu) info snapshots >>> List of snapshots present on all disks: >>> ID TAG VM SIZE DATE VM CLOCK >>> -- 0 741M 2018-07-31 13:39:56 00:41:25.313 >>> >>> >>> Thus, what will end up happening is that the user will be forced to use the >>> TAG of the snapshot since this is the only available information. >> >> But you can get it through e.g. qemu-img info. >> >> Snapshot list: >> ID TAG VM SIZE DATE VM CLOCK >> 1 0 1.6M 2019-01-09 18:01:21 00:00:02.657 >> >> So it's not impossible to get. > > Is there a reason why we should display the ID at all when you can't use > it any more to identify snapshots? I thought the long-term goal of this series really was to remove all mentions of the ID, yes. >>>> I think we had a short discussion about just disallowing numeric >>>> snapshot names. How bad would that be? >>> >>> >>> This was my first idea when evaluating what to do in this case. I gave >>> it up because >>> I found it to be too extreme. People would start complaining "I was able >>> to do >>> savevm 0 and now I can't". >> >> True. But it wouldn't be impossible to do, we'd need to deprecate >> numeric names, print a warning for two releases, and then we can make it >> an error. > > This. Is. HMP. > > Not a stable ABI, no deprecation period of two releases. Well, if you want to do it. This may be HMP, but this is also the only interface to savevm, so it's not like users have a choice to use a more stable interface. I know that was a conscious decision, more or less, but I don't see why we need to be so nasty when the hardest thing about doing a nice deprecation would be to remember to make it an error in half a year. >> Hm... If we had a proper deprecation warning in this series, I suppose >> it wouldn't be dangerous anymore. Can we just print a warning whenever >> the user specified an ID? (i.e. if some snapshot's ID matches the >> string given by the user and the snapshot's name does not) > > How is it less of a problem when a user gets QEMU from the distro and > skips five releases? They will never see the warning. This is different > from QMP where things are fixed in libvirt, which is tested with every > single QEMU release. Well, then allowing users to accidentally specify the wrong snapshot remains adventurous. The thing is that this really is the only interface to savevm. So while it may be HMP, it probably won't be used only by plain human end users who just don't want to deal with QMP. I can imagine it is used with scripts and by people who regularly update their qemu because they have something important running on top of it. <general whining> Actually, to me what you're saying sounds more like "Our deprecation policy is useless" to which I wholeheartedly agree. I think we should only remove things in major releases, and only if it was deprecated in the previous major release already. (So if you deprecate something in 4.0, you can remove it in 5.0; but if you deprecate it in 4.1, you can remove it only in 6.0.) From a user standpoint I really think we deprecate stuff too irregularly. </general whining> OK, anyway, you're talking about stable distros. But there are other users, too, so yes, this is less of a problem because it is a problem only for users of stable distros; so it's a problem for less users. But anyway, if you think deprecation is futile here, then I maintain that this series bears a bit of a risk. Max
On 1/9/19 11:38 AM, Max Reitz wrote: > > <general whining> > Actually, to me what you're saying sounds more like "Our deprecation > policy is useless" to which I wholeheartedly agree. I think we should > only remove things in major releases, and only if it was deprecated in > the previous major release already. (So if you deprecate something in > 4.0, you can remove it in 5.0; but if you deprecate it in 4.1, you can > remove it only in 6.0.) From a user standpoint I really think we > deprecate stuff too irregularly. > </general whining> That's actually incorrect. Our current version numbering scheme is that the major version number is NOT synonymous with major releases: we just bump the major version number once per year, and ALL releases are on equal footing with no one release being more major than others. Thus, a policy that (at least) 2 releases is needed for a deprecation is consistent, where one that requires waiting for a bump in the major version number (which is as short as one release and as long as 3, given that we bump every year with about 3 releases per year) is the one that is less predictable and less meaningful (why is waiting for January better than waiting for 2 releases?).
On 1/9/19 11:38 AM, Max Reitz wrote: >>>>> I do think it affects users of HMP, because right now you can delete >>>>> snapshots with their ID, and after this series you cannot. >>>> >> This. Is. HMP. >> >> Not a stable ABI, no deprecation period of two releases. > > Well, if you want to do it. > > This may be HMP, but this is also the only interface to savevm, so it's > not like users have a choice to use a more stable interface. I know > that was a conscious decision, more or less, but I don't see why we need > to be so nasty when the hardest thing about doing a nice deprecation > would be to remember to make it an error in half a year. Indeed, and libvirt IS using 'savevm' via HMP via QMP's human-monitor-command, since there is no QMP counterpart for internal snapshot. Even though lately we consistently tell people that internal snapshots are underdeveloped and you should use external snapshots, it does not get away from the fact that libvirt has been using 'savevm' to drive internal snapshots for years now, and that we MUST consider back-compat and/or add an introspectible QMP interface before making changes that would break libvirt.
On 1/9/19 12:21 PM, Kevin Wolf wrote: > Am 09.01.2019 um 15:10 hat Max Reitz geschrieben: >> On 06.09.18 13:11, Daniel Henrique Barboza wrote: >>> changes in v2: >>> - removed the "RFC" marker; >>> - added a new patch (patch 2) that removes >>> bdrv_snapshot_delete_by_id_or_name from the code; >>> - made changes in patch 1 as suggested by Murilo; >>> - previous patch set link: >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html >>> >>> >>> It is not uncommon to see bugs being opened by testers that attempt to >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite >>> common snapshot names and they trigger a lot of bugs. I gave an example >>> in the commit message of patch 1, but to sum up here: QEMU treats the >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It >>> is documented as such, but this can lead to strange situations. >>> >>> Given that it is strange for an API to consider a parameter to be 2 fields >>> at the same time, and inadvently treating them as one or the other, and >>> that removing the ID field is too drastic, my idea here is to keep the >>> ID field for internal control, but do not let the user set it. >>> >>> I guess there's room for discussion about considering this change an API >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. >> (Yes, very late reply, I'm sorry...) >> >> I do think it affects users of HMP, because right now you can delete >> snapshots with their ID, and after this series you cannot. > Can there be snapshots that can't be identified by a snapshot name, but > only by their ID? Hmmm I would need to read my notes back when I was working on this series. In a quick look at the code and playing around with HMP, what I can tell is that: - there is no way for a snapshot to have an empty TAG - an auto-generated name is generated and used as TAG. - if you create a snapshot using an existing TAG, the older snapshot is overwritten without warning (would be nice to fix it to at least provide a warning message - I can do it in this series if a respin is needed) I do not have the expertise of the whole block layer to see further implications of the changes made in this series, but as far as HMP goes, I don't think there's too much to it other than what I've said in the cover letter. The goal here is to remove the ambiguity of save/load/del vm commands that would interpret the argument either as tag or ID, leading to situations in which snapshots ended up getting overwritten because the tag matched an existing ID. I am not against providing a warning message if the user tries to create a snapshot using a numerical tag. In the end, although I said otherwise in the cover letter, changing the meaning of an input is still an API change, and I'd rather error on the cautious side and warn the callers about it. Even if this has no/little impact (please feel free to prove me wrong) on them. Daniel > >> I think we had a short discussion about just disallowing numeric >> snapshot names. How bad would that be? > It would be incompatible with existing images and result in a more > complex snapshot identifier resolution. Why would it be any better? > > Kevin
Am 09.01.2019 um 18:55 hat Eric Blake geschrieben: > On 1/9/19 11:38 AM, Max Reitz wrote: > > >>>>> I do think it affects users of HMP, because right now you can delete > >>>>> snapshots with their ID, and after this series you cannot. > >>>> > > >> This. Is. HMP. > >> > >> Not a stable ABI, no deprecation period of two releases. > > > > Well, if you want to do it. > > > > This may be HMP, but this is also the only interface to savevm, so it's > > not like users have a choice to use a more stable interface. I know > > that was a conscious decision, more or less, but I don't see why we need > > to be so nasty when the hardest thing about doing a nice deprecation > > would be to remember to make it an error in half a year. > > Indeed, and libvirt IS using 'savevm' via HMP via QMP's > human-monitor-command, since there is no QMP counterpart for internal > snapshot. Even though lately we consistently tell people that internal > snapshots are underdeveloped and you should use external snapshots, it > does not get away from the fact that libvirt has been using 'savevm' to > drive internal snapshots for years now, and that we MUST consider > back-compat and/or add an introspectible QMP interface before making > changes that would break libvirt. Okay, so what does libvirt do when you request a snapshot with a numerical name? Without having looked at the code, the best case I would expect that it forbids them, and more realistically I suspect that we may actually fix a bug for libvirt by changing the semantics. Or does libvirt really use snapshot IDs rather than names? Kevin
Am 09.01.2019 um 18:52 hat Eric Blake geschrieben: > On 1/9/19 11:38 AM, Max Reitz wrote: > > > > > <general whining> > > Actually, to me what you're saying sounds more like "Our deprecation > > policy is useless" to which I wholeheartedly agree. If you restrict it to "Our deprecation policy is useless for user-facing changes", I might agree. I think the policy is fine for stuff like QMP where only the management layer needs to be aware of the change. We still have problems there, but these are not problems of the policy. > > I think we should only remove things in major releases, and only if > > it was deprecated in the previous major release already. (So if you > > deprecate something in 4.0, you can remove it in 5.0; but if you > > deprecate it in 4.1, you can remove it only in 6.0.) From a user > > standpoint I really think we deprecate stuff too irregularly. > > </general whining> > > That's actually incorrect. Our current version numbering scheme is that > the major version number is NOT synonymous with major releases: we just > bump the major version number once per year, and ALL releases are on > equal footing with no one release being more major than others. Thus, a > policy that (at least) 2 releases is needed for a deprecation is > consistent, where one that requires waiting for a bump in the major > version number (which is as short as one release and as long as 3, given > that we bump every year with about 3 releases per year) is the one that > is less predictable and less meaningful (why is waiting for January > better than waiting for 2 releases?). As someone who usually skips distro versions because he wants to have the change and necessary adaption only once instead of two or three times in the same timeframe, I can see some value for users in breaking stable APIs only in defined versions. At the same time, I can also see the value in being able to break stable APIs without waiting for almost two years with Max' policy and bad timing. For software I develop I prefer the latter, for software that I use I prefer the former. ;-) Kevin
On 1/9/19 12:51 PM, Kevin Wolf wrote: >> Indeed, and libvirt IS using 'savevm' via HMP via QMP's >> human-monitor-command, since there is no QMP counterpart for internal >> snapshot. Even though lately we consistently tell people that internal >> snapshots are underdeveloped and you should use external snapshots, it >> does not get away from the fact that libvirt has been using 'savevm' to >> drive internal snapshots for years now, and that we MUST consider >> back-compat and/or add an introspectible QMP interface before making >> changes that would break libvirt. > > Okay, so what does libvirt do when you request a snapshot with a > numerical name? Without having looked at the code, the best case I would > expect that it forbids them, and more realistically I suspect that we > may actually fix a bug for libvirt by changing the semantics. > > Or does libvirt really use snapshot IDs rather than names? At the moment, libvirt does not place any restriction on internal snapshot names, but passes the user's name through without thought of whether it is an id or a name. So yes, arguably tightening the semantics fixes a libvirt bug for libvirt having allowed internal snapshots to get into an inconsistent state. But again, it falls in the category of "properly fixing this requires a lot of auditing, time, mental power, and unit tests", which makes it a rather daunting task to state for certain.
Am 09.01.2019 um 20:02 hat Eric Blake geschrieben: > On 1/9/19 12:51 PM, Kevin Wolf wrote: > > >> Indeed, and libvirt IS using 'savevm' via HMP via QMP's > >> human-monitor-command, since there is no QMP counterpart for internal > >> snapshot. Even though lately we consistently tell people that internal > >> snapshots are underdeveloped and you should use external snapshots, it > >> does not get away from the fact that libvirt has been using 'savevm' to > >> drive internal snapshots for years now, and that we MUST consider > >> back-compat and/or add an introspectible QMP interface before making > >> changes that would break libvirt. > > > > Okay, so what does libvirt do when you request a snapshot with a > > numerical name? Without having looked at the code, the best case I would > > expect that it forbids them, and more realistically I suspect that we > > may actually fix a bug for libvirt by changing the semantics. > > > > Or does libvirt really use snapshot IDs rather than names? > > At the moment, libvirt does not place any restriction on internal > snapshot names, but passes the user's name through without thought of > whether it is an id or a name. > > So yes, arguably tightening the semantics fixes a libvirt bug for > libvirt having allowed internal snapshots to get into an inconsistent > state. So there are two scenarios to consider with respect to breaking backwards compatibility: 1. There may be code out there that relies on numeric arguments being interpreted as IDs. This code will break if we make this change and numeric snapshot names exist. That such code exists is speculation (even though plausible) and we don't know how widespread it is. 2. There may be code out there that blindly assumes that whatever it passes is interpreted as a name. Nobody considered that with a numeric snapshot name, it maybe misinterpreted as an ID. We know that this is true for libvirt, the single most used management tool for QEMU. More code like this may (and probably does) exist. Essentially the same two categories exist for human users: those who somehow found out that QEMU accepts IDs as monitor command arguments and are using those (mitigated by not displaying IDs any more), and those who are trapped because they wanted to access a numeric name, but surprisingly got it interpreted as an ID. Both are speculation to some degree, but my guess is that the latter group is larger. Given all this, this is a no-brainer for me. We simplify the interface, we simplify the code, we make things less confusing for manual users and we fix the management tool that everybody uses. How could we not make this change? > But again, it falls in the category of "properly fixing this > requires a lot of auditing, time, mental power, and unit tests", which > makes it a rather daunting task to state for certain. Fix the world before we can fix anything? Kevin
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 09.01.2019 um 20:02 hat Eric Blake geschrieben: > > On 1/9/19 12:51 PM, Kevin Wolf wrote: > > > > >> Indeed, and libvirt IS using 'savevm' via HMP via QMP's > > >> human-monitor-command, since there is no QMP counterpart for internal > > >> snapshot. Even though lately we consistently tell people that internal > > >> snapshots are underdeveloped and you should use external snapshots, it > > >> does not get away from the fact that libvirt has been using 'savevm' to > > >> drive internal snapshots for years now, and that we MUST consider > > >> back-compat and/or add an introspectible QMP interface before making > > >> changes that would break libvirt. > > > > > > Okay, so what does libvirt do when you request a snapshot with a > > > numerical name? Without having looked at the code, the best case I would > > > expect that it forbids them, and more realistically I suspect that we > > > may actually fix a bug for libvirt by changing the semantics. > > > > > > Or does libvirt really use snapshot IDs rather than names? > > > > At the moment, libvirt does not place any restriction on internal > > snapshot names, but passes the user's name through without thought of > > whether it is an id or a name. > > > > So yes, arguably tightening the semantics fixes a libvirt bug for > > libvirt having allowed internal snapshots to get into an inconsistent > > state. > > So there are two scenarios to consider with respect to breaking > backwards compatibility: > > 1. There may be code out there that relies on numeric arguments being > interpreted as IDs. This code will break if we make this change and > numeric snapshot names exist. That such code exists is speculation > (even though plausible) and we don't know how widespread it is. > > 2. There may be code out there that blindly assumes that whatever it > passes is interpreted as a name. Nobody considered that with a > numeric snapshot name, it maybe misinterpreted as an ID. We know that > this is true for libvirt, the single most used management tool for > QEMU. More code like this may (and probably does) exist. > > Essentially the same two categories exist for human users: those who > somehow found out that QEMU accepts IDs as monitor command arguments and > are using those (mitigated by not displaying IDs any more), and those > who are trapped because they wanted to access a numeric name, but > surprisingly got it interpreted as an ID. Both are speculation to some > degree, but my guess is that the latter group is larger. > > Given all this, this is a no-brainer for me. We simplify the interface, > we simplify the code, we make things less confusing for manual users and > we fix the management tool that everybody uses. How could we not make > this change? > > > But again, it falls in the category of "properly fixing this > > requires a lot of auditing, time, mental power, and unit tests", which > > makes it a rather daunting task to state for certain. > > Fix the world before we can fix anything? Can't we break this loop for the savevm command fairly easily; it's currently documnted as: savevm [tag|id] If we made that: savevm [-t] [-i] [tag|id] then: a) with neither -t or -i it would behave in the same roulette way as it does in the moment, and it might be a tag or id b) with -t we'd explicitly treat the parameter as a tag and it would error if it wasn't found c) With -i we'd explicitly treat the parameter as an id and it would error if it wasn't found Since we still allow (a) it doesn't break any existing code. Dave > Kevin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 1/10/19 9:41 AM, Dr. David Alan Gilbert wrote: > * Kevin Wolf (kwolf@redhat.com) wrote: >> Am 09.01.2019 um 20:02 hat Eric Blake geschrieben: >>> On 1/9/19 12:51 PM, Kevin Wolf wrote: >>> >>>>> Indeed, and libvirt IS using 'savevm' via HMP via QMP's >>>>> human-monitor-command, since there is no QMP counterpart for internal >>>>> snapshot. Even though lately we consistently tell people that internal >>>>> snapshots are underdeveloped and you should use external snapshots, it >>>>> does not get away from the fact that libvirt has been using 'savevm' to >>>>> drive internal snapshots for years now, and that we MUST consider >>>>> back-compat and/or add an introspectible QMP interface before making >>>>> changes that would break libvirt. >>>> Okay, so what does libvirt do when you request a snapshot with a >>>> numerical name? Without having looked at the code, the best case I would >>>> expect that it forbids them, and more realistically I suspect that we >>>> may actually fix a bug for libvirt by changing the semantics. >>>> >>>> Or does libvirt really use snapshot IDs rather than names? >>> At the moment, libvirt does not place any restriction on internal >>> snapshot names, but passes the user's name through without thought of >>> whether it is an id or a name. >>> >>> So yes, arguably tightening the semantics fixes a libvirt bug for >>> libvirt having allowed internal snapshots to get into an inconsistent >>> state. >> So there are two scenarios to consider with respect to breaking >> backwards compatibility: >> >> 1. There may be code out there that relies on numeric arguments being >> interpreted as IDs. This code will break if we make this change and >> numeric snapshot names exist. That such code exists is speculation >> (even though plausible) and we don't know how widespread it is. >> >> 2. There may be code out there that blindly assumes that whatever it >> passes is interpreted as a name. Nobody considered that with a >> numeric snapshot name, it maybe misinterpreted as an ID. We know that >> this is true for libvirt, the single most used management tool for >> QEMU. More code like this may (and probably does) exist. >> >> Essentially the same two categories exist for human users: those who >> somehow found out that QEMU accepts IDs as monitor command arguments and >> are using those (mitigated by not displaying IDs any more), and those >> who are trapped because they wanted to access a numeric name, but >> surprisingly got it interpreted as an ID. Both are speculation to some >> degree, but my guess is that the latter group is larger. >> >> Given all this, this is a no-brainer for me. We simplify the interface, >> we simplify the code, we make things less confusing for manual users and >> we fix the management tool that everybody uses. How could we not make >> this change? >> >>> But again, it falls in the category of "properly fixing this >>> requires a lot of auditing, time, mental power, and unit tests", which >>> makes it a rather daunting task to state for certain. >> Fix the world before we can fix anything? > Can't we break this loop for the savevm command fairly easily; it's > currently documnted as: > > savevm [tag|id] > > If we made that: > > savevm [-t] [-i] [tag|id] > > then: > a) with neither -t or -i it would behave in the same roulette way > as it does in the moment, and it might be a tag or id The roulette we have today is too confusing IMO. For starters, we're not creating snapshot using ID at this moment. The ID is being auto-generated by doing max_ID_found + 1. Quick example: savevm 4 isn't creating a snapshot with ID=4 and an empty tag - it is creating a snapshot with ID=ID_MAX + 1 and tag = 4. Let's say that ID_MAX was 9, so this new snapshot would be ID=10, tag=4. Now, the user wants to delete this snapshot. "delvm 4" will first match ID, then tag. If there is a snapshot with ID=4, it will be mistakenly erased. Note that this isn't a bug - even with tweaks here and there, savevm/delvm (number) will always be ambiguous because the API is allowing the user to interpret an input anyway he/she wants. This is an API design issue. The only way I see to reliably make this id|tag API to work is to always interpret numerical arguments as ID. Then we can safely assume what the user wants to do. We wouldn't even need the other 2 options you mentioned (-t and -i). But then, as I've said somewhere in this thread, users would complain that "savevm 4" isn't doing what it was doing before. "savevm 4 is creating snapshots with auto-generated tag instead of tag=4". I can see the bugzillas already. DHB > > b) with -t we'd explicitly treat the parameter as a tag and it > would error if it wasn't found > > c) With -i we'd explicitly treat the parameter as an id and > it would error if it wasn't found > > Since we still allow (a) it doesn't break any existing code. > > Dave > > >> Kevin > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Am 10.01.2019 um 12:41 hat Dr. David Alan Gilbert geschrieben: > * Kevin Wolf (kwolf@redhat.com) wrote: > > Am 09.01.2019 um 20:02 hat Eric Blake geschrieben: > > > On 1/9/19 12:51 PM, Kevin Wolf wrote: > > > > > > >> Indeed, and libvirt IS using 'savevm' via HMP via QMP's > > > >> human-monitor-command, since there is no QMP counterpart for internal > > > >> snapshot. Even though lately we consistently tell people that internal > > > >> snapshots are underdeveloped and you should use external snapshots, it > > > >> does not get away from the fact that libvirt has been using 'savevm' to > > > >> drive internal snapshots for years now, and that we MUST consider > > > >> back-compat and/or add an introspectible QMP interface before making > > > >> changes that would break libvirt. > > > > > > > > Okay, so what does libvirt do when you request a snapshot with a > > > > numerical name? Without having looked at the code, the best case I would > > > > expect that it forbids them, and more realistically I suspect that we > > > > may actually fix a bug for libvirt by changing the semantics. > > > > > > > > Or does libvirt really use snapshot IDs rather than names? > > > > > > At the moment, libvirt does not place any restriction on internal > > > snapshot names, but passes the user's name through without thought of > > > whether it is an id or a name. > > > > > > So yes, arguably tightening the semantics fixes a libvirt bug for > > > libvirt having allowed internal snapshots to get into an inconsistent > > > state. > > > > So there are two scenarios to consider with respect to breaking > > backwards compatibility: > > > > 1. There may be code out there that relies on numeric arguments being > > interpreted as IDs. This code will break if we make this change and > > numeric snapshot names exist. That such code exists is speculation > > (even though plausible) and we don't know how widespread it is. > > > > 2. There may be code out there that blindly assumes that whatever it > > passes is interpreted as a name. Nobody considered that with a > > numeric snapshot name, it maybe misinterpreted as an ID. We know that > > this is true for libvirt, the single most used management tool for > > QEMU. More code like this may (and probably does) exist. > > > > Essentially the same two categories exist for human users: those who > > somehow found out that QEMU accepts IDs as monitor command arguments and > > are using those (mitigated by not displaying IDs any more), and those > > who are trapped because they wanted to access a numeric name, but > > surprisingly got it interpreted as an ID. Both are speculation to some > > degree, but my guess is that the latter group is larger. > > > > Given all this, this is a no-brainer for me. We simplify the interface, > > we simplify the code, we make things less confusing for manual users and > > we fix the management tool that everybody uses. How could we not make > > this change? > > > > > But again, it falls in the category of "properly fixing this > > > requires a lot of auditing, time, mental power, and unit tests", which > > > makes it a rather daunting task to state for certain. > > > > Fix the world before we can fix anything? > > Can't we break this loop for the savevm command fairly easily; it's > currently documnted as: > > savevm [tag|id] > > If we made that: > > savevm [-t] [-i] [tag|id] > > then: > a) with neither -t or -i it would behave in the same roulette way > as it does in the moment, and it might be a tag or id > > b) with -t we'd explicitly treat the parameter as a tag and it > would error if it wasn't found > > c) With -i we'd explicitly treat the parameter as an id and > it would error if it wasn't found > > Since we still allow (a) it doesn't break any existing code. If you can explain why we need both tag and id? And by keeping the current behaviour, we might not break hypothetically existing correct code, but we leave currently actually existing broken code like libvirt broken. Kevin
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 10.01.2019 um 12:41 hat Dr. David Alan Gilbert geschrieben: > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > Am 09.01.2019 um 20:02 hat Eric Blake geschrieben: > > > > On 1/9/19 12:51 PM, Kevin Wolf wrote: > > > > > > > > >> Indeed, and libvirt IS using 'savevm' via HMP via QMP's > > > > >> human-monitor-command, since there is no QMP counterpart for internal > > > > >> snapshot. Even though lately we consistently tell people that internal > > > > >> snapshots are underdeveloped and you should use external snapshots, it > > > > >> does not get away from the fact that libvirt has been using 'savevm' to > > > > >> drive internal snapshots for years now, and that we MUST consider > > > > >> back-compat and/or add an introspectible QMP interface before making > > > > >> changes that would break libvirt. > > > > > > > > > > Okay, so what does libvirt do when you request a snapshot with a > > > > > numerical name? Without having looked at the code, the best case I would > > > > > expect that it forbids them, and more realistically I suspect that we > > > > > may actually fix a bug for libvirt by changing the semantics. > > > > > > > > > > Or does libvirt really use snapshot IDs rather than names? > > > > > > > > At the moment, libvirt does not place any restriction on internal > > > > snapshot names, but passes the user's name through without thought of > > > > whether it is an id or a name. > > > > > > > > So yes, arguably tightening the semantics fixes a libvirt bug for > > > > libvirt having allowed internal snapshots to get into an inconsistent > > > > state. > > > > > > So there are two scenarios to consider with respect to breaking > > > backwards compatibility: > > > > > > 1. There may be code out there that relies on numeric arguments being > > > interpreted as IDs. This code will break if we make this change and > > > numeric snapshot names exist. That such code exists is speculation > > > (even though plausible) and we don't know how widespread it is. > > > > > > 2. There may be code out there that blindly assumes that whatever it > > > passes is interpreted as a name. Nobody considered that with a > > > numeric snapshot name, it maybe misinterpreted as an ID. We know that > > > this is true for libvirt, the single most used management tool for > > > QEMU. More code like this may (and probably does) exist. > > > > > > Essentially the same two categories exist for human users: those who > > > somehow found out that QEMU accepts IDs as monitor command arguments and > > > are using those (mitigated by not displaying IDs any more), and those > > > who are trapped because they wanted to access a numeric name, but > > > surprisingly got it interpreted as an ID. Both are speculation to some > > > degree, but my guess is that the latter group is larger. > > > > > > Given all this, this is a no-brainer for me. We simplify the interface, > > > we simplify the code, we make things less confusing for manual users and > > > we fix the management tool that everybody uses. How could we not make > > > this change? > > > > > > > But again, it falls in the category of "properly fixing this > > > > requires a lot of auditing, time, mental power, and unit tests", which > > > > makes it a rather daunting task to state for certain. > > > > > > Fix the world before we can fix anything? > > > > Can't we break this loop for the savevm command fairly easily; it's > > currently documnted as: > > > > savevm [tag|id] > > > > If we made that: > > > > savevm [-t] [-i] [tag|id] > > > > then: > > a) with neither -t or -i it would behave in the same roulette way > > as it does in the moment, and it might be a tag or id > > > > b) with -t we'd explicitly treat the parameter as a tag and it > > would error if it wasn't found > > > > c) With -i we'd explicitly treat the parameter as an id and > > it would error if it wasn't found > > > > Since we still allow (a) it doesn't break any existing code. > > If you can explain why we need both tag and id? > > And by keeping the current behaviour, we might not break hypothetically > existing correct code, but we leave currently actually existing broken > code like libvirt broken. My only reason for leaving both tag & id was for the hypothetical existing current code; my assumption adding the above would be that we would then fix libvirt never to use (a), probably always (b). Dave > Kevin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 1/10/19 11:06 AM, Dr. David Alan Gilbert wrote: >>> savevm [-t] [-i] [tag|id] >>> >>> then: >>> a) with neither -t or -i it would behave in the same roulette way >>> as it does in the moment, and it might be a tag or id >>> >>> b) with -t we'd explicitly treat the parameter as a tag and it >>> would error if it wasn't found >>> >>> c) With -i we'd explicitly treat the parameter as an id and >>> it would error if it wasn't found >>> >>> Since we still allow (a) it doesn't break any existing code. >> >> If you can explain why we need both tag and id? >> >> And by keeping the current behaviour, we might not break hypothetically >> existing correct code, but we leave currently actually existing broken >> code like libvirt broken. > > My only reason for leaving both tag & id was for the hypothetical > existing current code; my assumption adding the above would be that we > would then fix libvirt never to use (a), probably always (b). How? HMP is not introspectible, so libvirt can't know if 'savevm -t' works without trying it.
Am 10.01.2019 um 19:22 hat Eric Blake geschrieben: > On 1/10/19 11:06 AM, Dr. David Alan Gilbert wrote: > > >>> savevm [-t] [-i] [tag|id] > >>> > >>> then: > >>> a) with neither -t or -i it would behave in the same roulette way > >>> as it does in the moment, and it might be a tag or id > >>> > >>> b) with -t we'd explicitly treat the parameter as a tag and it > >>> would error if it wasn't found > >>> > >>> c) With -i we'd explicitly treat the parameter as an id and > >>> it would error if it wasn't found > >>> > >>> Since we still allow (a) it doesn't break any existing code. > >> > >> If you can explain why we need both tag and id? > >> > >> And by keeping the current behaviour, we might not break hypothetically > >> existing correct code, but we leave currently actually existing broken > >> code like libvirt broken. > > > > My only reason for leaving both tag & id was for the hypothetical > > existing current code; my assumption adding the above would be that we > > would then fix libvirt never to use (a), probably always (b). > > How? HMP is not introspectible, so libvirt can't know if 'savevm -t' > works without trying it. It is introspectible in a way, 'help savevm'. Just that the result of that isn't machine readable and not meant to be parsed by programs. But that's just what happens when you use HMP. I'm still against the proposal because it makes the interface more complicated rather than simpler. Kevin
On 09.01.19 18:52, Eric Blake wrote: > On 1/9/19 11:38 AM, Max Reitz wrote: > >> >> <general whining> >> Actually, to me what you're saying sounds more like "Our deprecation >> policy is useless" to which I wholeheartedly agree. I think we should >> only remove things in major releases, and only if it was deprecated in >> the previous major release already. (So if you deprecate something in >> 4.0, you can remove it in 5.0; but if you deprecate it in 4.1, you can >> remove it only in 6.0.) From a user standpoint I really think we >> deprecate stuff too irregularly. >> </general whining> > > That's actually incorrect. Our current version numbering scheme is that > the major version number is NOT synonymous with major releases: we just > bump the major version number once per year, and ALL releases are on > equal footing with no one release being more major than others. Thus, a > policy that (at least) 2 releases is needed for a deprecation is > consistent, where one that requires waiting for a bump in the major > version number (which is as short as one release and as long as 3, given > that we bump every year with about 3 releases per year) is the one that > is less predictable and less meaningful (why is waiting for January > better than waiting for 2 releases?). This sounds to me like we just don't have major releases because our deprecation policy doesn't make use of it. If we only broke APIs when bumping the major release number, then by definition that would be major releases, no? Also, if you %s/major release/x.0 release/g in my whining, the argument remains the same. And, yes, I dislike our versioning policy, too. My whining may have indicated that I like semver. Max
On 10.01.19 12:41, Dr. David Alan Gilbert wrote: > * Kevin Wolf (kwolf@redhat.com) wrote: >> Am 09.01.2019 um 20:02 hat Eric Blake geschrieben: >>> On 1/9/19 12:51 PM, Kevin Wolf wrote: >>> >>>>> Indeed, and libvirt IS using 'savevm' via HMP via QMP's >>>>> human-monitor-command, since there is no QMP counterpart for internal >>>>> snapshot. Even though lately we consistently tell people that internal >>>>> snapshots are underdeveloped and you should use external snapshots, it >>>>> does not get away from the fact that libvirt has been using 'savevm' to >>>>> drive internal snapshots for years now, and that we MUST consider >>>>> back-compat and/or add an introspectible QMP interface before making >>>>> changes that would break libvirt. >>>> >>>> Okay, so what does libvirt do when you request a snapshot with a >>>> numerical name? Without having looked at the code, the best case I would >>>> expect that it forbids them, and more realistically I suspect that we >>>> may actually fix a bug for libvirt by changing the semantics. >>>> >>>> Or does libvirt really use snapshot IDs rather than names? >>> >>> At the moment, libvirt does not place any restriction on internal >>> snapshot names, but passes the user's name through without thought of >>> whether it is an id or a name. >>> >>> So yes, arguably tightening the semantics fixes a libvirt bug for >>> libvirt having allowed internal snapshots to get into an inconsistent >>> state. >> >> So there are two scenarios to consider with respect to breaking >> backwards compatibility: >> >> 1. There may be code out there that relies on numeric arguments being >> interpreted as IDs. This code will break if we make this change and >> numeric snapshot names exist. That such code exists is speculation >> (even though plausible) and we don't know how widespread it is. >> >> 2. There may be code out there that blindly assumes that whatever it >> passes is interpreted as a name. Nobody considered that with a >> numeric snapshot name, it maybe misinterpreted as an ID. We know that >> this is true for libvirt, the single most used management tool for >> QEMU. More code like this may (and probably does) exist. >> >> Essentially the same two categories exist for human users: those who >> somehow found out that QEMU accepts IDs as monitor command arguments and >> are using those (mitigated by not displaying IDs any more), and those >> who are trapped because they wanted to access a numeric name, but >> surprisingly got it interpreted as an ID. Both are speculation to some >> degree, but my guess is that the latter group is larger. >> >> Given all this, this is a no-brainer for me. We simplify the interface, >> we simplify the code, we make things less confusing for manual users and >> we fix the management tool that everybody uses. How could we not make >> this change? So you're trying to make a bug fix out of this now to get around deprecation? To me, changing behavior in qemu to fix a bug in libvirt doesn't equate to fixing a bug in qemu. So let's try to find a real bug in qemu. >>> But again, it falls in the category of "properly fixing this >>> requires a lot of auditing, time, mental power, and unit tests", which >>> makes it a rather daunting task to state for certain. >> >> Fix the world before we can fix anything? > > Can't we break this loop for the savevm command fairly easily; it's > currently documnted as: > > savevm [tag|id] The bug starts here. The code doesn't match the interface description, because this to me as a user suggests that "tag" has precedence over "id" (same for loadvm and delvm). But let's go over the strange behavior for each: savevm deletes all snapshots where either tag or ID match.[1] I think that's buggy because it should replace only one snapshot, not accidentally remove something totally different. It then... Tries to find that snapshot? and overwrite it with the new one. Which makes no sense because it has just deleted that snapshot, if it existed.[2] So savevm will then just create a new snapshot with the given tag. So it seems clear that savevm does not accept IDs, but really just tags. It follows that it should not try to delete all snapshots where the ID matches the given tag. loadvm uses bdrv_all_find_snapshot() -> bdrv_snapshot_find() where the ID takes precedence over the tag. This does not seem to match the interface description, as I've said above. So the first thing to do clearly is to switch the comparison order in bdrv_snapshot_find(). Or, well, apply this series because it removes the ID comparison altogether. But that wouldn't be needed for a bug fix. delvm directly invokes bdrv_all_delete_snapshot() and thus shares the same woes as savevm (see [1]). [1] Actually, this is not quite true. bdrv_all_delete_snapshot() invokes bdrv_snapshot_delete_by_id_or_name() if bdrv_snapshot_find() returned true. So far, so good. But bdrv_snapshot_delete_by_id_or_name() will first try to delete a snapshot with matching ID, and only if that failed it will delete a snapshot with matching tag. So if you have two snapshots { "0": "foo", "1": "0" } and then bdrv_all_delete_snapshot("0"), only "foo" will be deleted. So this function doesn't really delete all snapshots that match... [2] It follows from [1] that this is not quite true as well. Take the above example and savevm "0". It will then delete snapshot "foo" and indeed overwrite snapshot "0" (ID "1"). But that's just stupid. It looks like we have specialized code for after we have deleted an unrelated snapshot for no reason whatsoever. So, yes, there are bugs in qemu, but to fix them, we would need to switch the comparison order in bdrv_snapshot_find() and fix bdrv_snapshot_delete_by_id_or_name() to always invoke bdrv_snapshot_delete() for both cases (ID and tag). And we should probably remove the special code path in savevm for overwriting an existing snapshot. This would also alleviate the bug in libvirt (because the tag then always takes precedence, like the interface description suggests). Sure, this series would fix the bugs as well, but it does more than that. It really isn't just a bug fix. > If we made that: > > savevm [-t] [-i] [tag|id] > > then: > a) with neither -t or -i it would behave in the same roulette way > as it does in the moment, and it might be a tag or id > > b) with -t we'd explicitly treat the parameter as a tag and it > would error if it wasn't found > > c) With -i we'd explicitly treat the parameter as an id and > it would error if it wasn't found > > Since we still allow (a) it doesn't break any existing code. I agree with the others that while this might solve issues on the libvirt side of things, I think we also want to confuse end users less. Max
Am 11.01.2019 um 14:22 hat Max Reitz geschrieben: > On 10.01.19 12:41, Dr. David Alan Gilbert wrote: > > * Kevin Wolf (kwolf@redhat.com) wrote: > >> Am 09.01.2019 um 20:02 hat Eric Blake geschrieben: > >>> On 1/9/19 12:51 PM, Kevin Wolf wrote: > >>> > >>>>> Indeed, and libvirt IS using 'savevm' via HMP via QMP's > >>>>> human-monitor-command, since there is no QMP counterpart for internal > >>>>> snapshot. Even though lately we consistently tell people that internal > >>>>> snapshots are underdeveloped and you should use external snapshots, it > >>>>> does not get away from the fact that libvirt has been using 'savevm' to > >>>>> drive internal snapshots for years now, and that we MUST consider > >>>>> back-compat and/or add an introspectible QMP interface before making > >>>>> changes that would break libvirt. > >>>> > >>>> Okay, so what does libvirt do when you request a snapshot with a > >>>> numerical name? Without having looked at the code, the best case I would > >>>> expect that it forbids them, and more realistically I suspect that we > >>>> may actually fix a bug for libvirt by changing the semantics. > >>>> > >>>> Or does libvirt really use snapshot IDs rather than names? > >>> > >>> At the moment, libvirt does not place any restriction on internal > >>> snapshot names, but passes the user's name through without thought of > >>> whether it is an id or a name. > >>> > >>> So yes, arguably tightening the semantics fixes a libvirt bug for > >>> libvirt having allowed internal snapshots to get into an inconsistent > >>> state. > >> > >> So there are two scenarios to consider with respect to breaking > >> backwards compatibility: > >> > >> 1. There may be code out there that relies on numeric arguments being > >> interpreted as IDs. This code will break if we make this change and > >> numeric snapshot names exist. That such code exists is speculation > >> (even though plausible) and we don't know how widespread it is. > >> > >> 2. There may be code out there that blindly assumes that whatever it > >> passes is interpreted as a name. Nobody considered that with a > >> numeric snapshot name, it maybe misinterpreted as an ID. We know that > >> this is true for libvirt, the single most used management tool for > >> QEMU. More code like this may (and probably does) exist. > >> > >> Essentially the same two categories exist for human users: those who > >> somehow found out that QEMU accepts IDs as monitor command arguments and > >> are using those (mitigated by not displaying IDs any more), and those > >> who are trapped because they wanted to access a numeric name, but > >> surprisingly got it interpreted as an ID. Both are speculation to some > >> degree, but my guess is that the latter group is larger. > >> > >> Given all this, this is a no-brainer for me. We simplify the interface, > >> we simplify the code, we make things less confusing for manual users and > >> we fix the management tool that everybody uses. How could we not make > >> this change? > > So you're trying to make a bug fix out of this now to get around > deprecation? To me, changing behavior in qemu to fix a bug in libvirt > doesn't equate to fixing a bug in qemu. So let's try to find a real bug > in qemu. I'm not making this a bug fix, but a interface cleanup that fixes a bug as a side effect, which makes it only more appealing. I'm also not sure what exactly you mean by "get around deprecation". If you mean the formal two releases deprecation period that is required by our deprecation policy, it doesn't apply because this is HMP and not a stable interface. If you mean deprecation not because we must but because we're considering actual users, then I have described how making the change fixes things for more users than it could potentially break things. In addition, I pointed out how a deprecation period is useless for user-facing changes. I'm concluding that a "voluntary" deprecation for two releases isn't helpful at all. > >>> But again, it falls in the category of "properly fixing this > >>> requires a lot of auditing, time, mental power, and unit tests", which > >>> makes it a rather daunting task to state for certain. > >> > >> Fix the world before we can fix anything? > > > > Can't we break this loop for the savevm command fairly easily; it's > > currently documnted as: > > > > savevm [tag|id] > > The bug starts here. > [... snipped long description of how horrible everything is ...] Yes, it's horrible. Let's radically simplify the interface (and with it the code) by throwing away the useless ID stuff. Which happens to be exactly what this series was getting at. There is no point in spending time for changing the current interface to make a little more sense while keeping its fundamental insanity unaddressed, then deprecate it, and then change it again to _actually_ make sense. Let's convert it directly to something that makes sense. > So, yes, there are bugs in qemu, but to fix them, we would need to > switch the comparison order in bdrv_snapshot_find() and fix > bdrv_snapshot_delete_by_id_or_name() to always invoke > bdrv_snapshot_delete() for both cases (ID and tag). And we should > probably remove the special code path in savevm for overwriting an > existing snapshot. > > This would also alleviate the bug in libvirt (because the tag then > always takes precedence, like the interface description suggests). Yes, at least tools which don't use IDs wouldn't be broken any more. But you removed none of the insanity. The same way addressing by tag was broken before, addressing by ID is broken now (previously, you need to check first that no ID exists while you want to address a tag; now, you need to check first that no tag exists while you want to address an ID). In fact, if you're willing to go there, you break exactly the same users that would be broken if we dropped IDs from the interface completely. So why not do the full thing instead of stopping halfway? You're taking the potential problems without the advantages you could get. > Sure, this series would fix the bugs as well, but it does more than > that. It really isn't just a bug fix. It was never meant to be a bug fix (unless, of course, you consider insane interfaces a bug). But that it fixes a bug as a side effect contributes to the fact that breaking compatibility has not only downsides and that a deprecation period would be useless or even counterproductive. Kevin
On 11.01.19 15:33, Kevin Wolf wrote: > Am 11.01.2019 um 14:22 hat Max Reitz geschrieben: >> On 10.01.19 12:41, Dr. David Alan Gilbert wrote: >>> * Kevin Wolf (kwolf@redhat.com) wrote: >>>> Am 09.01.2019 um 20:02 hat Eric Blake geschrieben: >>>>> On 1/9/19 12:51 PM, Kevin Wolf wrote: >>>>> >>>>>>> Indeed, and libvirt IS using 'savevm' via HMP via QMP's >>>>>>> human-monitor-command, since there is no QMP counterpart for internal >>>>>>> snapshot. Even though lately we consistently tell people that internal >>>>>>> snapshots are underdeveloped and you should use external snapshots, it >>>>>>> does not get away from the fact that libvirt has been using 'savevm' to >>>>>>> drive internal snapshots for years now, and that we MUST consider >>>>>>> back-compat and/or add an introspectible QMP interface before making >>>>>>> changes that would break libvirt. >>>>>> >>>>>> Okay, so what does libvirt do when you request a snapshot with a >>>>>> numerical name? Without having looked at the code, the best case I would >>>>>> expect that it forbids them, and more realistically I suspect that we >>>>>> may actually fix a bug for libvirt by changing the semantics. >>>>>> >>>>>> Or does libvirt really use snapshot IDs rather than names? >>>>> >>>>> At the moment, libvirt does not place any restriction on internal >>>>> snapshot names, but passes the user's name through without thought of >>>>> whether it is an id or a name. >>>>> >>>>> So yes, arguably tightening the semantics fixes a libvirt bug for >>>>> libvirt having allowed internal snapshots to get into an inconsistent >>>>> state. >>>> >>>> So there are two scenarios to consider with respect to breaking >>>> backwards compatibility: >>>> >>>> 1. There may be code out there that relies on numeric arguments being >>>> interpreted as IDs. This code will break if we make this change and >>>> numeric snapshot names exist. That such code exists is speculation >>>> (even though plausible) and we don't know how widespread it is. >>>> >>>> 2. There may be code out there that blindly assumes that whatever it >>>> passes is interpreted as a name. Nobody considered that with a >>>> numeric snapshot name, it maybe misinterpreted as an ID. We know that >>>> this is true for libvirt, the single most used management tool for >>>> QEMU. More code like this may (and probably does) exist. >>>> >>>> Essentially the same two categories exist for human users: those who >>>> somehow found out that QEMU accepts IDs as monitor command arguments and >>>> are using those (mitigated by not displaying IDs any more), and those >>>> who are trapped because they wanted to access a numeric name, but >>>> surprisingly got it interpreted as an ID. Both are speculation to some >>>> degree, but my guess is that the latter group is larger. >>>> >>>> Given all this, this is a no-brainer for me. We simplify the interface, >>>> we simplify the code, we make things less confusing for manual users and >>>> we fix the management tool that everybody uses. How could we not make >>>> this change? >> >> So you're trying to make a bug fix out of this now to get around >> deprecation? To me, changing behavior in qemu to fix a bug in libvirt >> doesn't equate to fixing a bug in qemu. So let's try to find a real bug >> in qemu. > > I'm not making this a bug fix, but a interface cleanup that fixes a bug which is not in qemu (at least the one you're describing) > as a side effect, which makes it only more appealing. I'm also not sure > what exactly you mean by "get around deprecation". > > If you mean the formal two releases deprecation period that is required > by our deprecation policy, it doesn't apply because this is HMP and not > a stable interface. Of course you're right, formally. But that's just not a very good argument to me, personally. If deprecation is easy and helpful, there is no reason to be nasty and skip it just because we are formally allowed to do so. Sure, the question is whether it is helpful. > If you mean deprecation not because we must but because we're > considering actual users, then I have described how making the change > fixes things for more users than it could potentially break things OK. > In > addition, I pointed out how a deprecation period is useless for > user-facing changes. You pointed that out for users of specific distros. I myself use Fedora (every release) and Arch, so as a user I do see all qemu versions. > I'm concluding that a "voluntary" deprecation for > two releases isn't helpful at all. Yes, you can conclude that from your reasoning. But I disagree with your reasoning because you are only considering users such as yourself who skip releases on purpose or users who use long-term distros. Most importantly we probably have professional end users that do update qemu every release because they know the deprecation policy, who do not use libvirt. I agree that adding a warning has little impact. But I also claim that doing it is little effort. I mean, whatever. I'm really not that much against taking the series as-is. >>>>> But again, it falls in the category of "properly fixing this >>>>> requires a lot of auditing, time, mental power, and unit tests", which >>>>> makes it a rather daunting task to state for certain. >>>> >>>> Fix the world before we can fix anything? >>> >>> Can't we break this loop for the savevm command fairly easily; it's >>> currently documnted as: >>> >>> savevm [tag|id] >> >> The bug starts here. >> [... snipped long description of how horrible everything is ...] > > Yes, it's horrible. Let's radically simplify the interface (and with it > the code) by throwing away the useless ID stuff. Which happens to be > exactly what this series was getting at. > > There is no point in spending time for changing the current interface to > make a little more sense while keeping its fundamental insanity > unaddressed, then deprecate it, and then change it again to _actually_ > make sense. Let's convert it directly to something that makes sense. Of course we'd do the deprecation while fixing the bug. :-P >> So, yes, there are bugs in qemu, but to fix them, we would need to >> switch the comparison order in bdrv_snapshot_find() and fix >> bdrv_snapshot_delete_by_id_or_name() to always invoke >> bdrv_snapshot_delete() for both cases (ID and tag). And we should >> probably remove the special code path in savevm for overwriting an >> existing snapshot. >> >> This would also alleviate the bug in libvirt (because the tag then >> always takes precedence, like the interface description suggests). > > Yes, at least tools which don't use IDs wouldn't be broken any more. But > you removed none of the insanity. The same way addressing by tag was > broken before, addressing by ID is broken now (previously, you need to > check first that no ID exists while you want to address a tag; now, you > need to check first that no tag exists while you want to address an ID). > > In fact, if you're willing to go there, you break exactly the same users > that would be broken if we dropped IDs from the interface completely. So > why not do the full thing instead of stopping halfway? You're taking the > potential problems without the advantages you could get. Not quite true. What I proposed here would still allow addressing snapshots by ID. This series completely removes that. I only break things for users where the tag conflicts with the ID. >> Sure, this series would fix the bugs as well, but it does more than >> that. It really isn't just a bug fix. > > It was never meant to be a bug fix (unless, of course, you consider > insane interfaces a bug). But that it fixes a bug as a side effect > contributes to the fact that breaking compatibility has not only > downsides and that a deprecation period would be useless or even > counterproductive. Sure, for this case, probably. Max