[v8,4/8] ACPI: Add Virtual Machine Generation ID support

Message ID	918524f7-26cf-3fce-d9e3-7316ca69285b@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> To: Igor Mammedov <imammedo@redhat.com>, ben@skyportsystems.com References: <cover.1487286467.git.ben@skyportsystems.com> <88232638f9ff3b17b54987624468678ea14a3037.1487286467.git.ben@skyportsystems.com> <20170217114321.6c8577e1@nial.brq.redhat.com> From: Laszlo Ersek <lersek@redhat.com> Message-ID: <918524f7-26cf-3fce-d9e3-7316ca69285b@redhat.com> Date: Fri, 17 Feb 2017 13:50:40 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <20170217114321.6c8577e1@nial.brq.redhat.com> Content-Type: multipart/mixed; boundary="------------0479B87207D380CFB2D3ECFE" Subject: Re: [Qemu-devel] [PATCH v8 4/8] ACPI: Add Virtual Machine Generation ID support Precedence: list Cc: qemu-devel@nongnu.org, "Dr. David Alan Gilbert \(git\)" <dgilbert@redhat.com>, mst@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Laszlo Ersek Feb. 17, 2017, 12:50 p.m. UTC

CC Dave

On 02/17/17 11:43, Igor Mammedov wrote:
> On Thu, 16 Feb 2017 15:15:36 -0800
> ben@skyportsystems.com wrote:
> 
>> From: Ben Warren <ben@skyportsystems.com>
>>
>> This implements the VM Generation ID feature by passing a 128-bit
>> GUID to the guest via a fw_cfg blob.
>> Any time the GUID changes, an ACPI notify event is sent to the guest
>>
>> The user interface is a simple device with one parameter:
>>  - guid (string, must be "auto" or in UUID format
>>    xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
> I've given it some testing with WS2012R2 and v4 patches for Seabios,
> 
> Windows is able to read initial GUID allocation and writeback
> seems to work somehow:
> 
> (qemu) info vm-generation-id 
> c109c09b-0e8b-42d5-9b33-8409c9dcd16c
> 
> vmgenid client in Windows reads it as 2 following 64bit integers:
> 42d50e8bc109c09b:6cd1dcc90984339b
> 
> However update path/restore from snapshot doesn't
> here is as I've tested it:
> 
> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
> (qemu) info vm-generation-id 
> c109c09b-0e8b-42d5-9b33-8409c9dcd16c
> (qemu) stop
> (qemu) migrate "exec:gzip -c > STATEFILE.gz" 
> (qemu) quit
> 
> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
> -incoming "exec: gzip -c -d STATEFILE.gz"
> (qemu) info vm-generation-id 
> 28b587fa-991b-4267-80d7-9cf28b746fe9
> 
> guest
>  1. doesn't get GPE notification that it must receive
>  2. vmgenid client in Windows reads the same value
>       42d50e8bc109c09b:6cd1dcc90984339b

Hmmm, I wonder if we need something like this, in vmgenid_post_load():

commit 90c647db8d59e47c9000affc0d81754eb346e939
Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
Date:   Fri Apr 15 12:41:30 2016 +0100

    Fix pflash migration

with the idea being that in a single device's post_load callback, we
shouldn't perform machine-wide actions (post_load is likely for fixing
up the device itself). If machine-wide actions are necessary, we should
temporarily register a "vm change state handler", and do the thing once
that handler is called (when the machine has been loaded fully and is
about to continue execution).

Can you please try the attached patch on top? (Build tested only.)

Thanks!
Laszlo

Igor Mammedov Feb. 17, 2017, 1:05 p.m. UTC | #1

On Fri, 17 Feb 2017 13:50:40 +0100
Laszlo Ersek <lersek@redhat.com> wrote:

> CC Dave
> 
> On 02/17/17 11:43, Igor Mammedov wrote:
> > On Thu, 16 Feb 2017 15:15:36 -0800
> > ben@skyportsystems.com wrote:
> > 
> >> From: Ben Warren <ben@skyportsystems.com>
> >>
> >> This implements the VM Generation ID feature by passing a 128-bit
> >> GUID to the guest via a fw_cfg blob.
> >> Any time the GUID changes, an ACPI notify event is sent to the guest
> >>
> >> The user interface is a simple device with one parameter:
> >>  - guid (string, must be "auto" or in UUID format
> >>    xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
> > I've given it some testing with WS2012R2 and v4 patches for Seabios,
> > 
> > Windows is able to read initial GUID allocation and writeback
> > seems to work somehow:
> > 
> > (qemu) info vm-generation-id 
> > c109c09b-0e8b-42d5-9b33-8409c9dcd16c
> > 
> > vmgenid client in Windows reads it as 2 following 64bit integers:
> > 42d50e8bc109c09b:6cd1dcc90984339b
> > 
> > However update path/restore from snapshot doesn't
> > here is as I've tested it:
> > 
> > qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
> > (qemu) info vm-generation-id 
> > c109c09b-0e8b-42d5-9b33-8409c9dcd16c
> > (qemu) stop
> > (qemu) migrate "exec:gzip -c > STATEFILE.gz" 
> > (qemu) quit
> > 
> > qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
> > -incoming "exec: gzip -c -d STATEFILE.gz"
> > (qemu) info vm-generation-id 
> > 28b587fa-991b-4267-80d7-9cf28b746fe9
> > 
> > guest
> >  1. doesn't get GPE notification that it must receive
> >  2. vmgenid client in Windows reads the same value
> >       42d50e8bc109c09b:6cd1dcc90984339b
> 
> Hmmm, I wonder if we need something like this, in vmgenid_post_load():
> 
> commit 90c647db8d59e47c9000affc0d81754eb346e939
> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Date:   Fri Apr 15 12:41:30 2016 +0100
> 
>     Fix pflash migration
> 
> with the idea being that in a single device's post_load callback, we
> shouldn't perform machine-wide actions (post_load is likely for fixing
> up the device itself). If machine-wide actions are necessary, we should
> temporarily register a "vm change state handler", and do the thing once
> that handler is called (when the machine has been loaded fully and is
> about to continue execution).
> 
> Can you please try the attached patch on top? (Build tested only.)
it doesn't help

> 
> Thanks!
> Laszlo

Laszlo Ersek Feb. 17, 2017, 1:41 p.m. UTC | #2

On 02/17/17 14:05, Igor Mammedov wrote:
> On Fri, 17 Feb 2017 13:50:40 +0100
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> CC Dave
>>
>> On 02/17/17 11:43, Igor Mammedov wrote:
>>> On Thu, 16 Feb 2017 15:15:36 -0800
>>> ben@skyportsystems.com wrote:
>>>
>>>> From: Ben Warren <ben@skyportsystems.com>
>>>>
>>>> This implements the VM Generation ID feature by passing a 128-bit
>>>> GUID to the guest via a fw_cfg blob.
>>>> Any time the GUID changes, an ACPI notify event is sent to the guest
>>>>
>>>> The user interface is a simple device with one parameter:
>>>>  - guid (string, must be "auto" or in UUID format
>>>>    xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
>>> I've given it some testing with WS2012R2 and v4 patches for Seabios,
>>>
>>> Windows is able to read initial GUID allocation and writeback
>>> seems to work somehow:
>>>
>>> (qemu) info vm-generation-id 
>>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c
>>>
>>> vmgenid client in Windows reads it as 2 following 64bit integers:
>>> 42d50e8bc109c09b:6cd1dcc90984339b
>>>
>>> However update path/restore from snapshot doesn't
>>> here is as I've tested it:
>>>
>>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
>>> (qemu) info vm-generation-id 
>>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c
>>> (qemu) stop
>>> (qemu) migrate "exec:gzip -c > STATEFILE.gz" 
>>> (qemu) quit
>>>
>>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
>>> -incoming "exec: gzip -c -d STATEFILE.gz"
>>> (qemu) info vm-generation-id 
>>> 28b587fa-991b-4267-80d7-9cf28b746fe9
>>>
>>> guest
>>>  1. doesn't get GPE notification that it must receive
>>>  2. vmgenid client in Windows reads the same value
>>>       42d50e8bc109c09b:6cd1dcc90984339b
>>
>> Hmmm, I wonder if we need something like this, in vmgenid_post_load():
>>
>> commit 90c647db8d59e47c9000affc0d81754eb346e939
>> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Date:   Fri Apr 15 12:41:30 2016 +0100
>>
>>     Fix pflash migration
>>
>> with the idea being that in a single device's post_load callback, we
>> shouldn't perform machine-wide actions (post_load is likely for fixing
>> up the device itself). If machine-wide actions are necessary, we should
>> temporarily register a "vm change state handler", and do the thing once
>> that handler is called (when the machine has been loaded fully and is
>> about to continue execution).
>>
>> Can you please try the attached patch on top? (Build tested only.)
> it doesn't help

Thanks for trying! And, well, sh*t. :(

I guess it's time to resurrect the monitor command (temporarily, for
testing) so we can inject the SCI at will, without migration. I don't
want to burden you unreasonably, so I'll make an effort to try that myself.

Thanks!
Laszlo

Dr. David Alan Gilbert Feb. 20, 2017, 10:23 a.m. UTC | #3

* Laszlo Ersek (lersek@redhat.com) wrote:
> CC Dave

This isn't an area I really understand; but if I'm
reading this right then 
   vmgenid is stored in fw_cfg?
   fw_cfg isn't migrated

So why should any changes to it get migrated, except if it's already
been read by the guest (and if the guest reads it again aftwards what's
it expected to read?)

Dave

> On 02/17/17 11:43, Igor Mammedov wrote:
> > On Thu, 16 Feb 2017 15:15:36 -0800
> > ben@skyportsystems.com wrote:
> > 
> >> From: Ben Warren <ben@skyportsystems.com>
> >>
> >> This implements the VM Generation ID feature by passing a 128-bit
> >> GUID to the guest via a fw_cfg blob.
> >> Any time the GUID changes, an ACPI notify event is sent to the guest
> >>
> >> The user interface is a simple device with one parameter:
> >>  - guid (string, must be "auto" or in UUID format
> >>    xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
> > I've given it some testing with WS2012R2 and v4 patches for Seabios,
> > 
> > Windows is able to read initial GUID allocation and writeback
> > seems to work somehow:
> > 
> > (qemu) info vm-generation-id 
> > c109c09b-0e8b-42d5-9b33-8409c9dcd16c
> > 
> > vmgenid client in Windows reads it as 2 following 64bit integers:
> > 42d50e8bc109c09b:6cd1dcc90984339b
> > 
> > However update path/restore from snapshot doesn't
> > here is as I've tested it:
> > 
> > qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
> > (qemu) info vm-generation-id 
> > c109c09b-0e8b-42d5-9b33-8409c9dcd16c
> > (qemu) stop
> > (qemu) migrate "exec:gzip -c > STATEFILE.gz" 
> > (qemu) quit
> > 
> > qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
> > -incoming "exec: gzip -c -d STATEFILE.gz"
> > (qemu) info vm-generation-id 
> > 28b587fa-991b-4267-80d7-9cf28b746fe9
> > 
> > guest
> >  1. doesn't get GPE notification that it must receive
> >  2. vmgenid client in Windows reads the same value
> >       42d50e8bc109c09b:6cd1dcc90984339b
> 
> Hmmm, I wonder if we need something like this, in vmgenid_post_load():
> 
> commit 90c647db8d59e47c9000affc0d81754eb346e939
> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Date:   Fri Apr 15 12:41:30 2016 +0100
> 
>     Fix pflash migration
> 
> with the idea being that in a single device's post_load callback, we
> shouldn't perform machine-wide actions (post_load is likely for fixing
> up the device itself). If machine-wide actions are necessary, we should
> temporarily register a "vm change state handler", and do the thing once
> that handler is called (when the machine has been loaded fully and is
> about to continue execution).
> 
> Can you please try the attached patch on top? (Build tested only.)
> 
> Thanks!
> Laszlo

> diff --git a/include/hw/acpi/vmgenid.h b/include/hw/acpi/vmgenid.h
> index db7fa0e63303..a2ae450b1f56 100644
> --- a/include/hw/acpi/vmgenid.h
> +++ b/include/hw/acpi/vmgenid.h
> @@ -4,6 +4,7 @@
>  #include "hw/acpi/bios-linker-loader.h"
>  #include "hw/qdev.h"
>  #include "qemu/uuid.h"
> +#include "sysemu/sysemu.h"
>  
>  #define VMGENID_DEVICE           "vmgenid"
>  #define VMGENID_GUID             "guid"
> @@ -21,6 +22,7 @@ typedef struct VmGenIdState {
>      DeviceClass parent_obj;
>      QemuUUID guid;                /* The 128-bit GUID seen by the guest */
>      uint8_t vmgenid_addr_le[8];   /* Address of the GUID (little-endian) */
> +    VMChangeStateEntry *vmstate;
>  } VmGenIdState;
>  
>  static inline Object *find_vmgenid_dev(void)
> diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
> index 9f97b722761b..0ae1d56ff297 100644
> --- a/hw/acpi/vmgenid.c
> +++ b/hw/acpi/vmgenid.c
> @@ -177,10 +177,20 @@ static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
>  /* After restoring an image, we need to update the guest memory and notify
>   * it of a potential change to VM Generation ID
>   */
> +static void postload_update_guest_cb(void *opaque, int running, RunState state)
> +{
> +    VmGenIdState *vms = opaque;
> +
> +    qemu_del_vm_change_state_handler(vms->vmstate);
> +    vms->vmstate = NULL;
> +    vmgenid_update_guest(vms);
> +}
> +
>  static int vmgenid_post_load(void *opaque, int version_id)
>  {
>      VmGenIdState *vms = opaque;
> -    vmgenid_update_guest(vms);
> +    vms->vmstate = qemu_add_vm_change_state_handler(postload_update_guest_cb,
> +                                                    vms);
>      return 0;
>  }
>  

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Laszlo Ersek Feb. 20, 2017, 10:40 a.m. UTC | #4

On 02/20/17 11:23, Dr. David Alan Gilbert wrote:
> * Laszlo Ersek (lersek@redhat.com) wrote:
>> CC Dave
> 
> This isn't an area I really understand; but if I'm
> reading this right then 
>    vmgenid is stored in fw_cfg?
>    fw_cfg isn't migrated
> 
> So why should any changes to it get migrated, except if it's already
> been read by the guest (and if the guest reads it again aftwards what's
> it expected to read?)

This is what we have here:
- QEMU formats read-only fw_cfg blob with GUID
- guest downloads blob, places it in guest RAM
- guest tells QEMU the guest-side address of the blob
- during migration, guest RAM is transferred
- after migration, in the device's post_load callback, QEMU overwrites
  the GUID in guest RAM with a different value, and injects an SCI

I CC'd you for the following reason: Igor reported that he didn't see
either the fresh GUID or the SCI in the guest, on the target host, after
migration. I figured that perhaps there was an ordering issue between
RAM loading and post_load execution on the target host, and so I
proposed to delay the RAM overwrite + SCI injection a bit more;
following the pattern seen in your commit 90c647db8d59.

However, since then, both Ben and myself tested the code with migration
(using "virsh save" (Ben) and "virsh managedsave" (myself)), with
Windows and Linux guests, and it works for us; there seems to be no
ordering issue with the current code (= overwrite RAM + inject SCI in
the post_load callback()).

For now we don't understand why it doesn't work for Igor (Igor used
exec/gzip migration to/from a local file using direct QEMU monitor
commands / options, no libvirt). And, copying the pattern seen in your
commit 90c647db8d59 didn't help in his case (while it wasn't even
necessary for success in Ben's and my testing).

So it seems that delaying the deed with
qemu_add_vm_change_state_handler() is neither needed nor effective in
this case; but then we still don't know why it doesn't work for Igor.

Thanks
Laszlo

> 
> Dave
> 
>> On 02/17/17 11:43, Igor Mammedov wrote:
>>> On Thu, 16 Feb 2017 15:15:36 -0800
>>> ben@skyportsystems.com wrote:
>>>
>>>> From: Ben Warren <ben@skyportsystems.com>
>>>>
>>>> This implements the VM Generation ID feature by passing a 128-bit
>>>> GUID to the guest via a fw_cfg blob.
>>>> Any time the GUID changes, an ACPI notify event is sent to the guest
>>>>
>>>> The user interface is a simple device with one parameter:
>>>>  - guid (string, must be "auto" or in UUID format
>>>>    xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
>>> I've given it some testing with WS2012R2 and v4 patches for Seabios,
>>>
>>> Windows is able to read initial GUID allocation and writeback
>>> seems to work somehow:
>>>
>>> (qemu) info vm-generation-id 
>>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c
>>>
>>> vmgenid client in Windows reads it as 2 following 64bit integers:
>>> 42d50e8bc109c09b:6cd1dcc90984339b
>>>
>>> However update path/restore from snapshot doesn't
>>> here is as I've tested it:
>>>
>>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
>>> (qemu) info vm-generation-id 
>>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c
>>> (qemu) stop
>>> (qemu) migrate "exec:gzip -c > STATEFILE.gz" 
>>> (qemu) quit
>>>
>>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
>>> -incoming "exec: gzip -c -d STATEFILE.gz"
>>> (qemu) info vm-generation-id 
>>> 28b587fa-991b-4267-80d7-9cf28b746fe9
>>>
>>> guest
>>>  1. doesn't get GPE notification that it must receive
>>>  2. vmgenid client in Windows reads the same value
>>>       42d50e8bc109c09b:6cd1dcc90984339b
>>
>> Hmmm, I wonder if we need something like this, in vmgenid_post_load():
>>
>> commit 90c647db8d59e47c9000affc0d81754eb346e939
>> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Date:   Fri Apr 15 12:41:30 2016 +0100
>>
>>     Fix pflash migration
>>
>> with the idea being that in a single device's post_load callback, we
>> shouldn't perform machine-wide actions (post_load is likely for fixing
>> up the device itself). If machine-wide actions are necessary, we should
>> temporarily register a "vm change state handler", and do the thing once
>> that handler is called (when the machine has been loaded fully and is
>> about to continue execution).
>>
>> Can you please try the attached patch on top? (Build tested only.)
>>
>> Thanks!
>> Laszlo
> 
>> diff --git a/include/hw/acpi/vmgenid.h b/include/hw/acpi/vmgenid.h
>> index db7fa0e63303..a2ae450b1f56 100644
>> --- a/include/hw/acpi/vmgenid.h
>> +++ b/include/hw/acpi/vmgenid.h
>> @@ -4,6 +4,7 @@
>>  #include "hw/acpi/bios-linker-loader.h"
>>  #include "hw/qdev.h"
>>  #include "qemu/uuid.h"
>> +#include "sysemu/sysemu.h"
>>  
>>  #define VMGENID_DEVICE           "vmgenid"
>>  #define VMGENID_GUID             "guid"
>> @@ -21,6 +22,7 @@ typedef struct VmGenIdState {
>>      DeviceClass parent_obj;
>>      QemuUUID guid;                /* The 128-bit GUID seen by the guest */
>>      uint8_t vmgenid_addr_le[8];   /* Address of the GUID (little-endian) */
>> +    VMChangeStateEntry *vmstate;
>>  } VmGenIdState;
>>  
>>  static inline Object *find_vmgenid_dev(void)
>> diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
>> index 9f97b722761b..0ae1d56ff297 100644
>> --- a/hw/acpi/vmgenid.c
>> +++ b/hw/acpi/vmgenid.c
>> @@ -177,10 +177,20 @@ static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
>>  /* After restoring an image, we need to update the guest memory and notify
>>   * it of a potential change to VM Generation ID
>>   */
>> +static void postload_update_guest_cb(void *opaque, int running, RunState state)
>> +{
>> +    VmGenIdState *vms = opaque;
>> +
>> +    qemu_del_vm_change_state_handler(vms->vmstate);
>> +    vms->vmstate = NULL;
>> +    vmgenid_update_guest(vms);
>> +}
>> +
>>  static int vmgenid_post_load(void *opaque, int version_id)
>>  {
>>      VmGenIdState *vms = opaque;
>> -    vmgenid_update_guest(vms);
>> +    vms->vmstate = qemu_add_vm_change_state_handler(postload_update_guest_cb,
>> +                                                    vms);
>>      return 0;
>>  }
>>  
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>

Dr. David Alan Gilbert Feb. 20, 2017, 11 a.m. UTC | #5

* Laszlo Ersek (lersek@redhat.com) wrote:
> On 02/20/17 11:23, Dr. David Alan Gilbert wrote:
> > * Laszlo Ersek (lersek@redhat.com) wrote:
> >> CC Dave
> > 
> > This isn't an area I really understand; but if I'm
> > reading this right then 
> >    vmgenid is stored in fw_cfg?
> >    fw_cfg isn't migrated
> > 
> > So why should any changes to it get migrated, except if it's already
> > been read by the guest (and if the guest reads it again aftwards what's
> > it expected to read?)
> 
> This is what we have here:
> - QEMU formats read-only fw_cfg blob with GUID
> - guest downloads blob, places it in guest RAM
> - guest tells QEMU the guest-side address of the blob
> - during migration, guest RAM is transferred
> - after migration, in the device's post_load callback, QEMU overwrites
>   the GUID in guest RAM with a different value, and injects an SCI
> 
> I CC'd you for the following reason: Igor reported that he didn't see
> either the fresh GUID or the SCI in the guest, on the target host, after
> migration. I figured that perhaps there was an ordering issue between
> RAM loading and post_load execution on the target host, and so I
> proposed to delay the RAM overwrite + SCI injection a bit more;
> following the pattern seen in your commit 90c647db8d59.
> 
> However, since then, both Ben and myself tested the code with migration
> (using "virsh save" (Ben) and "virsh managedsave" (myself)), with
> Windows and Linux guests, and it works for us; there seems to be no
> ordering issue with the current code (= overwrite RAM + inject SCI in
> the post_load callback()).
> 
> For now we don't understand why it doesn't work for Igor (Igor used
> exec/gzip migration to/from a local file using direct QEMU monitor
> commands / options, no libvirt). And, copying the pattern seen in your
> commit 90c647db8d59 didn't help in his case (while it wasn't even
> necessary for success in Ben's and my testing).

One thing I noticed in Igor's test was that he did a 'stop' on the source
before the migate, and so it's probably still paused on the destination
after the migration is loaded, so anything the guest needs to do might
not have happened until it's started.

You say;
   'guest tells QEMU the guest-side address of the blob'
how is that stored/migrated/etc ?


> So it seems that delaying the deed with
> qemu_add_vm_change_state_handler() is neither needed nor effective in
> this case; but then we still don't know why it doesn't work for Igor.

Nod.

Dave

> 
> Thanks
> Laszlo
> 
> > 
> > Dave
> > 
> >> On 02/17/17 11:43, Igor Mammedov wrote:
> >>> On Thu, 16 Feb 2017 15:15:36 -0800
> >>> ben@skyportsystems.com wrote:
> >>>
> >>>> From: Ben Warren <ben@skyportsystems.com>
> >>>>
> >>>> This implements the VM Generation ID feature by passing a 128-bit
> >>>> GUID to the guest via a fw_cfg blob.
> >>>> Any time the GUID changes, an ACPI notify event is sent to the guest
> >>>>
> >>>> The user interface is a simple device with one parameter:
> >>>>  - guid (string, must be "auto" or in UUID format
> >>>>    xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
> >>> I've given it some testing with WS2012R2 and v4 patches for Seabios,
> >>>
> >>> Windows is able to read initial GUID allocation and writeback
> >>> seems to work somehow:
> >>>
> >>> (qemu) info vm-generation-id 
> >>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c
> >>>
> >>> vmgenid client in Windows reads it as 2 following 64bit integers:
> >>> 42d50e8bc109c09b:6cd1dcc90984339b
> >>>
> >>> However update path/restore from snapshot doesn't
> >>> here is as I've tested it:
> >>>
> >>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
> >>> (qemu) info vm-generation-id 
> >>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c
> >>> (qemu) stop
> >>> (qemu) migrate "exec:gzip -c > STATEFILE.gz" 
> >>> (qemu) quit
> >>>
> >>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio
> >>> -incoming "exec: gzip -c -d STATEFILE.gz"
> >>> (qemu) info vm-generation-id 
> >>> 28b587fa-991b-4267-80d7-9cf28b746fe9
> >>>
> >>> guest
> >>>  1. doesn't get GPE notification that it must receive
> >>>  2. vmgenid client in Windows reads the same value
> >>>       42d50e8bc109c09b:6cd1dcc90984339b
> >>
> >> Hmmm, I wonder if we need something like this, in vmgenid_post_load():
> >>
> >> commit 90c647db8d59e47c9000affc0d81754eb346e939
> >> Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >> Date:   Fri Apr 15 12:41:30 2016 +0100
> >>
> >>     Fix pflash migration
> >>
> >> with the idea being that in a single device's post_load callback, we
> >> shouldn't perform machine-wide actions (post_load is likely for fixing
> >> up the device itself). If machine-wide actions are necessary, we should
> >> temporarily register a "vm change state handler", and do the thing once
> >> that handler is called (when the machine has been loaded fully and is
> >> about to continue execution).
> >>
> >> Can you please try the attached patch on top? (Build tested only.)
> >>
> >> Thanks!
> >> Laszlo
> > 
> >> diff --git a/include/hw/acpi/vmgenid.h b/include/hw/acpi/vmgenid.h
> >> index db7fa0e63303..a2ae450b1f56 100644
> >> --- a/include/hw/acpi/vmgenid.h
> >> +++ b/include/hw/acpi/vmgenid.h
> >> @@ -4,6 +4,7 @@
> >>  #include "hw/acpi/bios-linker-loader.h"
> >>  #include "hw/qdev.h"
> >>  #include "qemu/uuid.h"
> >> +#include "sysemu/sysemu.h"
> >>  
> >>  #define VMGENID_DEVICE           "vmgenid"
> >>  #define VMGENID_GUID             "guid"
> >> @@ -21,6 +22,7 @@ typedef struct VmGenIdState {
> >>      DeviceClass parent_obj;
> >>      QemuUUID guid;                /* The 128-bit GUID seen by the guest */
> >>      uint8_t vmgenid_addr_le[8];   /* Address of the GUID (little-endian) */
> >> +    VMChangeStateEntry *vmstate;
> >>  } VmGenIdState;
> >>  
> >>  static inline Object *find_vmgenid_dev(void)
> >> diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
> >> index 9f97b722761b..0ae1d56ff297 100644
> >> --- a/hw/acpi/vmgenid.c
> >> +++ b/hw/acpi/vmgenid.c
> >> @@ -177,10 +177,20 @@ static void vmgenid_set_guid(Object *obj, const char *value, Error **errp)
> >>  /* After restoring an image, we need to update the guest memory and notify
> >>   * it of a potential change to VM Generation ID
> >>   */
> >> +static void postload_update_guest_cb(void *opaque, int running, RunState state)
> >> +{
> >> +    VmGenIdState *vms = opaque;
> >> +
> >> +    qemu_del_vm_change_state_handler(vms->vmstate);
> >> +    vms->vmstate = NULL;
> >> +    vmgenid_update_guest(vms);
> >> +}
> >> +
> >>  static int vmgenid_post_load(void *opaque, int version_id)
> >>  {
> >>      VmGenIdState *vms = opaque;
> >> -    vmgenid_update_guest(vms);
> >> +    vms->vmstate = qemu_add_vm_change_state_handler(postload_update_guest_cb,
> >> +                                                    vms);
> >>      return 0;
> >>  }
> >>  
> > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Laszlo Ersek Feb. 20, 2017, 11:38 a.m. UTC | #6

On 02/20/17 12:00, Dr. David Alan Gilbert wrote:
> * Laszlo Ersek (lersek@redhat.com) wrote:
>> On 02/20/17 11:23, Dr. David Alan Gilbert wrote:
>>> * Laszlo Ersek (lersek@redhat.com) wrote:
>>>> CC Dave
>>>
>>> This isn't an area I really understand; but if I'm
>>> reading this right then 
>>>    vmgenid is stored in fw_cfg?
>>>    fw_cfg isn't migrated
>>>
>>> So why should any changes to it get migrated, except if it's already
>>> been read by the guest (and if the guest reads it again aftwards what's
>>> it expected to read?)
>>
>> This is what we have here:
>> - QEMU formats read-only fw_cfg blob with GUID
>> - guest downloads blob, places it in guest RAM
>> - guest tells QEMU the guest-side address of the blob
>> - during migration, guest RAM is transferred
>> - after migration, in the device's post_load callback, QEMU overwrites
>>   the GUID in guest RAM with a different value, and injects an SCI
>>
>> I CC'd you for the following reason: Igor reported that he didn't see
>> either the fresh GUID or the SCI in the guest, on the target host, after
>> migration. I figured that perhaps there was an ordering issue between
>> RAM loading and post_load execution on the target host, and so I
>> proposed to delay the RAM overwrite + SCI injection a bit more;
>> following the pattern seen in your commit 90c647db8d59.
>>
>> However, since then, both Ben and myself tested the code with migration
>> (using "virsh save" (Ben) and "virsh managedsave" (myself)), with
>> Windows and Linux guests, and it works for us; there seems to be no
>> ordering issue with the current code (= overwrite RAM + inject SCI in
>> the post_load callback()).
>>
>> For now we don't understand why it doesn't work for Igor (Igor used
>> exec/gzip migration to/from a local file using direct QEMU monitor
>> commands / options, no libvirt). And, copying the pattern seen in your
>> commit 90c647db8d59 didn't help in his case (while it wasn't even
>> necessary for success in Ben's and my testing).
> 
> One thing I noticed in Igor's test was that he did a 'stop' on the source
> before the migate, and so it's probably still paused on the destination
> after the migration is loaded, so anything the guest needs to do might
> not have happened until it's started.

Interesting! I hope Igor can double-check this!

In the virsh docs, before doing my tests, I read that "managedsave"
optionally took --running or --paused:

    Normally, starting a managed save will decide between running or
    paused based on the state the domain was in when the save was done;
    passing either the --running or --paused flag will allow overriding
    which state the start should use.

I didn't pass any such flag ultimately, and I didn't stop the guests
before the managedsave. Indeed they continued execution right after
being loaded with "virsh start".

(Side point: managedsave is awesome. :) )

> 
> You say;
>    'guest tells QEMU the guest-side address of the blob'
> how is that stored/migrated/etc ?

It is a uint8_t[8] array (little endian representation), linked into
another (writeable) fw_cfg entry, and it's migrated explicitly (it has a
descriptor in the device's vmstate descriptor). The post_load callback
relies on this array being restored before the migration infrastructure
calls post_load.

Thanks
Laszlo

Dr. David Alan Gilbert Feb. 20, 2017, 12:32 p.m. UTC | #7

* Laszlo Ersek (lersek@redhat.com) wrote:
> On 02/20/17 12:00, Dr. David Alan Gilbert wrote:
> > * Laszlo Ersek (lersek@redhat.com) wrote:
> >> On 02/20/17 11:23, Dr. David Alan Gilbert wrote:
> >>> * Laszlo Ersek (lersek@redhat.com) wrote:
> >>>> CC Dave
> >>>
> >>> This isn't an area I really understand; but if I'm
> >>> reading this right then 
> >>>    vmgenid is stored in fw_cfg?
> >>>    fw_cfg isn't migrated
> >>>
> >>> So why should any changes to it get migrated, except if it's already
> >>> been read by the guest (and if the guest reads it again aftwards what's
> >>> it expected to read?)
> >>
> >> This is what we have here:
> >> - QEMU formats read-only fw_cfg blob with GUID
> >> - guest downloads blob, places it in guest RAM
> >> - guest tells QEMU the guest-side address of the blob
> >> - during migration, guest RAM is transferred
> >> - after migration, in the device's post_load callback, QEMU overwrites
> >>   the GUID in guest RAM with a different value, and injects an SCI
> >>
> >> I CC'd you for the following reason: Igor reported that he didn't see
> >> either the fresh GUID or the SCI in the guest, on the target host, after
> >> migration. I figured that perhaps there was an ordering issue between
> >> RAM loading and post_load execution on the target host, and so I
> >> proposed to delay the RAM overwrite + SCI injection a bit more;
> >> following the pattern seen in your commit 90c647db8d59.
> >>
> >> However, since then, both Ben and myself tested the code with migration
> >> (using "virsh save" (Ben) and "virsh managedsave" (myself)), with
> >> Windows and Linux guests, and it works for us; there seems to be no
> >> ordering issue with the current code (= overwrite RAM + inject SCI in
> >> the post_load callback()).
> >>
> >> For now we don't understand why it doesn't work for Igor (Igor used
> >> exec/gzip migration to/from a local file using direct QEMU monitor
> >> commands / options, no libvirt). And, copying the pattern seen in your
> >> commit 90c647db8d59 didn't help in his case (while it wasn't even
> >> necessary for success in Ben's and my testing).
> > 
> > One thing I noticed in Igor's test was that he did a 'stop' on the source
> > before the migate, and so it's probably still paused on the destination
> > after the migration is loaded, so anything the guest needs to do might
> > not have happened until it's started.
> 
> Interesting! I hope Igor can double-check this!
> 
> In the virsh docs, before doing my tests, I read that "managedsave"
> optionally took --running or --paused:
> 
>     Normally, starting a managed save will decide between running or
>     paused based on the state the domain was in when the save was done;
>     passing either the --running or --paused flag will allow overriding
>     which state the start should use.
> 
> I didn't pass any such flag ultimately, and I didn't stop the guests
> before the managedsave. Indeed they continued execution right after
> being loaded with "virsh start".
> 
> (Side point: managedsave is awesome. :) )

If I've followed the bread crumbs correctly, I think managedsave
is just using a migrate to fd anyway, so the same code.

> > 
> > You say;
> >    'guest tells QEMU the guest-side address of the blob'
> > how is that stored/migrated/etc ?
> 
> It is a uint8_t[8] array (little endian representation), linked into
> another (writeable) fw_cfg entry, and it's migrated explicitly (it has a
> descriptor in the device's vmstate descriptor). The post_load callback
> relies on this array being restored before the migration infrastructure
> calls post_load.

RAM normally comes back before other devices, so you should be OK;
although we frequently have problems with devices reading from RAM
during device init before migration has started, or writing to it
after migration has finished on the source.

Dave

> 
> Thanks
> Laszlo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Igor Mammedov Feb. 20, 2017, 1:13 p.m. UTC | #8

On Mon, 20 Feb 2017 12:38:06 +0100
Laszlo Ersek <lersek@redhat.com> wrote:

> On 02/20/17 12:00, Dr. David Alan Gilbert wrote:
> > * Laszlo Ersek (lersek@redhat.com) wrote:  
> >> On 02/20/17 11:23, Dr. David Alan Gilbert wrote:  
> >>> * Laszlo Ersek (lersek@redhat.com) wrote:  
> >>>> CC Dave  
> >>>
> >>> This isn't an area I really understand; but if I'm
> >>> reading this right then 
> >>>    vmgenid is stored in fw_cfg?
> >>>    fw_cfg isn't migrated
> >>>
> >>> So why should any changes to it get migrated, except if it's already
> >>> been read by the guest (and if the guest reads it again aftwards what's
> >>> it expected to read?)  
> >>
> >> This is what we have here:
> >> - QEMU formats read-only fw_cfg blob with GUID
> >> - guest downloads blob, places it in guest RAM
> >> - guest tells QEMU the guest-side address of the blob
> >> - during migration, guest RAM is transferred
> >> - after migration, in the device's post_load callback, QEMU overwrites
> >>   the GUID in guest RAM with a different value, and injects an SCI
> >>
> >> I CC'd you for the following reason: Igor reported that he didn't see
> >> either the fresh GUID or the SCI in the guest, on the target host, after
> >> migration. I figured that perhaps there was an ordering issue between
> >> RAM loading and post_load execution on the target host, and so I
> >> proposed to delay the RAM overwrite + SCI injection a bit more;
> >> following the pattern seen in your commit 90c647db8d59.
> >>
> >> However, since then, both Ben and myself tested the code with migration
> >> (using "virsh save" (Ben) and "virsh managedsave" (myself)), with
> >> Windows and Linux guests, and it works for us; there seems to be no
> >> ordering issue with the current code (= overwrite RAM + inject SCI in
> >> the post_load callback()).
> >>
> >> For now we don't understand why it doesn't work for Igor (Igor used
> >> exec/gzip migration to/from a local file using direct QEMU monitor
> >> commands / options, no libvirt). And, copying the pattern seen in your
> >> commit 90c647db8d59 didn't help in his case (while it wasn't even
> >> necessary for success in Ben's and my testing).  
> > 
> > One thing I noticed in Igor's test was that he did a 'stop' on the source
> > before the migate, and so it's probably still paused on the destination
> > after the migration is loaded, so anything the guest needs to do might
> > not have happened until it's started.  
> 
> Interesting! I hope Igor can double-check this!
I've retested v7, and it reliably fails (vmgenid_wait doesn't see change)
then I tested v8(qemu) + (seabios v5/v4) with the same steps as before
and it appears to work as expected, i.e. vmgenid_wait reports GUID
change after executing 'continue' monitor command so something
has been fixed in v8.


> 
> In the virsh docs, before doing my tests, I read that "managedsave"
> optionally took --running or --paused:
> 
>     Normally, starting a managed save will decide between running or
>     paused based on the state the domain was in when the save was done;
>     passing either the --running or --paused flag will allow overriding
>     which state the start should use.
> 
> I didn't pass any such flag ultimately, and I didn't stop the guests
> before the managedsave. Indeed they continued execution right after
> being loaded with "virsh start".
> 
> (Side point: managedsave is awesome. :) )
> 
> > 
> > You say;
> >    'guest tells QEMU the guest-side address of the blob'
> > how is that stored/migrated/etc ?  
> 
> It is a uint8_t[8] array (little endian representation), linked into
> another (writeable) fw_cfg entry, and it's migrated explicitly (it has a
> descriptor in the device's vmstate descriptor). The post_load callback
> relies on this array being restored before the migration infrastructure
> calls post_load.
> 
> Thanks
> Laszlo
>

Laszlo Ersek Feb. 20, 2017, 1:28 p.m. UTC | #9

On 02/20/17 14:13, Igor Mammedov wrote:
> On Mon, 20 Feb 2017 12:38:06 +0100
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 02/20/17 12:00, Dr. David Alan Gilbert wrote:
>>> * Laszlo Ersek (lersek@redhat.com) wrote:  
>>>> On 02/20/17 11:23, Dr. David Alan Gilbert wrote:  
>>>>> * Laszlo Ersek (lersek@redhat.com) wrote:  
>>>>>> CC Dave  
>>>>>
>>>>> This isn't an area I really understand; but if I'm
>>>>> reading this right then 
>>>>>    vmgenid is stored in fw_cfg?
>>>>>    fw_cfg isn't migrated
>>>>>
>>>>> So why should any changes to it get migrated, except if it's already
>>>>> been read by the guest (and if the guest reads it again aftwards what's
>>>>> it expected to read?)  
>>>>
>>>> This is what we have here:
>>>> - QEMU formats read-only fw_cfg blob with GUID
>>>> - guest downloads blob, places it in guest RAM
>>>> - guest tells QEMU the guest-side address of the blob
>>>> - during migration, guest RAM is transferred
>>>> - after migration, in the device's post_load callback, QEMU overwrites
>>>>   the GUID in guest RAM with a different value, and injects an SCI
>>>>
>>>> I CC'd you for the following reason: Igor reported that he didn't see
>>>> either the fresh GUID or the SCI in the guest, on the target host, after
>>>> migration. I figured that perhaps there was an ordering issue between
>>>> RAM loading and post_load execution on the target host, and so I
>>>> proposed to delay the RAM overwrite + SCI injection a bit more;
>>>> following the pattern seen in your commit 90c647db8d59.
>>>>
>>>> However, since then, both Ben and myself tested the code with migration
>>>> (using "virsh save" (Ben) and "virsh managedsave" (myself)), with
>>>> Windows and Linux guests, and it works for us; there seems to be no
>>>> ordering issue with the current code (= overwrite RAM + inject SCI in
>>>> the post_load callback()).
>>>>
>>>> For now we don't understand why it doesn't work for Igor (Igor used
>>>> exec/gzip migration to/from a local file using direct QEMU monitor
>>>> commands / options, no libvirt). And, copying the pattern seen in your
>>>> commit 90c647db8d59 didn't help in his case (while it wasn't even
>>>> necessary for success in Ben's and my testing).  
>>>
>>> One thing I noticed in Igor's test was that he did a 'stop' on the source
>>> before the migate, and so it's probably still paused on the destination
>>> after the migration is loaded, so anything the guest needs to do might
>>> not have happened until it's started.  
>>
>> Interesting! I hope Igor can double-check this!
> I've retested v7, and it reliably fails (vmgenid_wait doesn't see change)
> then I tested v8(qemu) + (seabios v5/v4) with the same steps as before
> and it appears to work as expected, i.e. vmgenid_wait reports GUID
> change after executing 'continue' monitor command so something
> has been fixed in v8.

Yes, I know what. Please see item (2) in this reply of mine, for v7 1/8:

msgid: <9e222b4c-c05d-8fd0-6c55-4b2e52cab7b0@redhat.com>
URL: https://www.mail-archive.com/qemu-devel@nongnu.org/msg430440.html

With that copy/paste bug in the code, the "src_offset" field of
WRITE_POINTER was not populated correctly. The BIOS would carry that out
faithfully, of course, but then later QEMU would write the fresh GUID to
an incorrect offset in the guest firmware allocated area -- the offset
wouldn't match the AML code (ADDR method), so the guest OS wouldn't see
the change.


If you scroll to the end of my message linked above, I wrote -- again,
for v7 --:

    I also tested this series (with the assignment under (2) fixed up,
    of course), as documented earlier in
<https://www.mail-archive.com/qemu-devel@nongnu.org/msg430075.html>
    (msgid  <678c203f-3768-7e65-6e48-6729473b6...@redhat.com>).

    Hence, with (1) and (2) fixed, you can also add

    Tested-by: Laszlo Ersek <ler...@redhat.com>

In other words, my positive testing for v7 was conditional on my *local*
(but reported, suggested) fix for bug (2) in v7 1/8. And that issue has
been fixed in v8.

... So, I guess we're all OK now. Can you confirm please?

Thanks!
Laszlo



> 
> 
>>
>> In the virsh docs, before doing my tests, I read that "managedsave"
>> optionally took --running or --paused:
>>
>>     Normally, starting a managed save will decide between running or
>>     paused based on the state the domain was in when the save was done;
>>     passing either the --running or --paused flag will allow overriding
>>     which state the start should use.
>>
>> I didn't pass any such flag ultimately, and I didn't stop the guests
>> before the managedsave. Indeed they continued execution right after
>> being loaded with "virsh start".
>>
>> (Side point: managedsave is awesome. :) )
>>
>>>
>>> You say;
>>>    'guest tells QEMU the guest-side address of the blob'
>>> how is that stored/migrated/etc ?  
>>
>> It is a uint8_t[8] array (little endian representation), linked into
>> another (writeable) fw_cfg entry, and it's migrated explicitly (it has a
>> descriptor in the device's vmstate descriptor). The post_load callback
>> relies on this array being restored before the migration infrastructure
>> calls post_load.
>>
>> Thanks
>> Laszlo
>>
>

Igor Mammedov Feb. 20, 2017, 2:40 p.m. UTC | #10

On Mon, 20 Feb 2017 14:28:11 +0100
Laszlo Ersek <lersek@redhat.com> wrote:

> On 02/20/17 14:13, Igor Mammedov wrote:
> > On Mon, 20 Feb 2017 12:38:06 +0100
> > Laszlo Ersek <lersek@redhat.com> wrote:
[...]
> >> Interesting! I hope Igor can double-check this!  
> > I've retested v7, and it reliably fails (vmgenid_wait doesn't see change)
> > then I tested v8(qemu) + (seabios v5/v4) with the same steps as before
> > and it appears to work as expected, i.e. vmgenid_wait reports GUID
> > change after executing 'continue' monitor command so something
> > has been fixed in v8.  
> 
> Yes, I know what. Please see item (2) in this reply of mine, for v7 1/8:
> 
> msgid: <9e222b4c-c05d-8fd0-6c55-4b2e52cab7b0@redhat.com>
> URL: https://www.mail-archive.com/qemu-devel@nongnu.org/msg430440.html
> 
> With that copy/paste bug in the code, the "src_offset" field of
> WRITE_POINTER was not populated correctly. The BIOS would carry that out
> faithfully, of course, but then later QEMU would write the fresh GUID to
> an incorrect offset in the guest firmware allocated area -- the offset
> wouldn't match the AML code (ADDR method), so the guest OS wouldn't see
> the change.
> 
> 
> If you scroll to the end of my message linked above, I wrote -- again,
> for v7 --:
> 
>     I also tested this series (with the assignment under (2) fixed up,
>     of course), as documented earlier in
> <https://www.mail-archive.com/qemu-devel@nongnu.org/msg430075.html>
>     (msgid  <678c203f-3768-7e65-6e48-6729473b6...@redhat.com>).
> 
>     Hence, with (1) and (2) fixed, you can also add
> 
>     Tested-by: Laszlo Ersek <ler...@redhat.com>
> 
> In other words, my positive testing for v7 was conditional on my *local*
> (but reported, suggested) fix for bug (2) in v7 1/8. And that issue has
> been fixed in v8.
> 
> ... So, I guess we're all OK now. Can you confirm please?
Confirmed

> 
> Thanks!
> Laszlo
[...]

Laszlo Ersek Feb. 20, 2017, 3:35 p.m. UTC | #11

So Igor has now confirmed he's fine with v8 (thanks!), but I still
wanted to respond here:

On 02/20/17 13:32, Dr. David Alan Gilbert wrote:
> * Laszlo Ersek (lersek@redhat.com) wrote:
>> On 02/20/17 12:00, Dr. David Alan Gilbert wrote:
>>> * Laszlo Ersek (lersek@redhat.com) wrote:
>>>> On 02/20/17 11:23, Dr. David Alan Gilbert wrote:
>>>>> * Laszlo Ersek (lersek@redhat.com) wrote:
>>>>>> CC Dave
>>>>>
>>>>> This isn't an area I really understand; but if I'm
>>>>> reading this right then 
>>>>>    vmgenid is stored in fw_cfg?
>>>>>    fw_cfg isn't migrated
>>>>>
>>>>> So why should any changes to it get migrated, except if it's already
>>>>> been read by the guest (and if the guest reads it again aftwards what's
>>>>> it expected to read?)
>>>>
>>>> This is what we have here:
>>>> - QEMU formats read-only fw_cfg blob with GUID
>>>> - guest downloads blob, places it in guest RAM
>>>> - guest tells QEMU the guest-side address of the blob
>>>> - during migration, guest RAM is transferred
>>>> - after migration, in the device's post_load callback, QEMU overwrites
>>>>   the GUID in guest RAM with a different value, and injects an SCI
>>>>
>>>> I CC'd you for the following reason: Igor reported that he didn't see
>>>> either the fresh GUID or the SCI in the guest, on the target host, after
>>>> migration. I figured that perhaps there was an ordering issue between
>>>> RAM loading and post_load execution on the target host, and so I
>>>> proposed to delay the RAM overwrite + SCI injection a bit more;
>>>> following the pattern seen in your commit 90c647db8d59.
>>>>
>>>> However, since then, both Ben and myself tested the code with migration
>>>> (using "virsh save" (Ben) and "virsh managedsave" (myself)), with
>>>> Windows and Linux guests, and it works for us; there seems to be no
>>>> ordering issue with the current code (= overwrite RAM + inject SCI in
>>>> the post_load callback()).
>>>>
>>>> For now we don't understand why it doesn't work for Igor (Igor used
>>>> exec/gzip migration to/from a local file using direct QEMU monitor
>>>> commands / options, no libvirt). And, copying the pattern seen in your
>>>> commit 90c647db8d59 didn't help in his case (while it wasn't even
>>>> necessary for success in Ben's and my testing).
>>>
>>> One thing I noticed in Igor's test was that he did a 'stop' on the source
>>> before the migate, and so it's probably still paused on the destination
>>> after the migration is loaded, so anything the guest needs to do might
>>> not have happened until it's started.
>>
>> Interesting! I hope Igor can double-check this!
>>
>> In the virsh docs, before doing my tests, I read that "managedsave"
>> optionally took --running or --paused:
>>
>>     Normally, starting a managed save will decide between running or
>>     paused based on the state the domain was in when the save was done;
>>     passing either the --running or --paused flag will allow overriding
>>     which state the start should use.
>>
>> I didn't pass any such flag ultimately, and I didn't stop the guests
>> before the managedsave. Indeed they continued execution right after
>> being loaded with "virsh start".
>>
>> (Side point: managedsave is awesome. :) )
> 
> If I've followed the bread crumbs correctly, I think managedsave
> is just using a migrate to fd anyway, so the same code.

Yes, I agree.

My enthusiasm for "managedsave" is due to "virsh start"'s awareness as
to whether it should boot the guest from zero, or in-migrate it from the
"managed" saved state.

Plain "save" is much more risky for the admin to mess up (because it
needs specialized guest startup too).

Of course, I also find QEMU's migration feature awesome in the first
place. :)

> 
>>>
>>> You say;
>>>    'guest tells QEMU the guest-side address of the blob'
>>> how is that stored/migrated/etc ?
>>
>> It is a uint8_t[8] array (little endian representation), linked into
>> another (writeable) fw_cfg entry, and it's migrated explicitly (it has a
>> descriptor in the device's vmstate descriptor). The post_load callback
>> relies on this array being restored before the migration infrastructure
>> calls post_load.
> 
> RAM normally comes back before other devices, so you should be OK;
> although we frequently have problems with devices reading from RAM
> during device init before migration has started, or writing to it
> after migration has finished on the source.

Thanks; we should be fine then. (We only write to RAM in post_load.)

Laszlo

Eric Blake Feb. 20, 2017, 8 p.m. UTC | #12

On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote:
> * Laszlo Ersek (lersek@redhat.com) wrote:
>> CC Dave
> 
> This isn't an area I really understand; but if I'm
> reading this right then 
>    vmgenid is stored in fw_cfg?
>    fw_cfg isn't migrated
> 
> So why should any changes to it get migrated, except if it's already
> been read by the guest (and if the guest reads it again aftwards what's
> it expected to read?)

Why are we expecting it to change on migration?  You want a new value
when you load state from disk (you don't know how many times the same
state has been loaded previously, so each load is effectively forking
the VM and you want a different value), but for a single live migration,
you aren't forking the VM and don't need a new generation ID.

I guess it all boils down to what command line you're using: if libvirt
is driving a live migration, it will request the same UUID in the
command line of the destination as what is on the source; while if
libvirt is loading from a [managed]save to restore state from a file, it
will either request a new UUID directly or request auto to let qemu
generate the new id.

Dr. David Alan Gilbert Feb. 20, 2017, 8:19 p.m. UTC | #13

* Eric Blake (eblake@redhat.com) wrote:
> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote:
> > * Laszlo Ersek (lersek@redhat.com) wrote:
> >> CC Dave
> > 
> > This isn't an area I really understand; but if I'm
> > reading this right then 
> >    vmgenid is stored in fw_cfg?
> >    fw_cfg isn't migrated
> > 
> > So why should any changes to it get migrated, except if it's already
> > been read by the guest (and if the guest reads it again aftwards what's
> > it expected to read?)
> 
> Why are we expecting it to change on migration?  You want a new value

I'm not; I was asking why a change made prior to migration would be
preserved across migration.


> when you load state from disk (you don't know how many times the same
> state has been loaded previously, so each load is effectively forking
> the VM and you want a different value), but for a single live migration,
> you aren't forking the VM and don't need a new generation ID.
> 
> I guess it all boils down to what command line you're using: if libvirt
> is driving a live migration, it will request the same UUID in the
> command line of the destination as what is on the source; while if
> libvirt is loading from a [managed]save to restore state from a file, it
> will either request a new UUID directly or request auto to let qemu
> generate the new id.

Hmm now I've lost it a bit; I thought we would preserve the value
transmitted from the source, not the value on the command line of the destination.

Dave

> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 



--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Eric Blake Feb. 20, 2017, 8:45 p.m. UTC | #14

On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote:
> * Eric Blake (eblake@redhat.com) wrote:
>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote:
>>> * Laszlo Ersek (lersek@redhat.com) wrote:
>>>> CC Dave
>>>
>>> This isn't an area I really understand; but if I'm
>>> reading this right then 
>>>    vmgenid is stored in fw_cfg?
>>>    fw_cfg isn't migrated
>>>
>>> So why should any changes to it get migrated, except if it's already
>>> been read by the guest (and if the guest reads it again aftwards what's
>>> it expected to read?)
>>
>> Why are we expecting it to change on migration?  You want a new value
> 
> I'm not; I was asking why a change made prior to migration would be
> preserved across migration.

Okay, so you're asking what happens if the source requests the vmgenid
device, and sets an id, but the destination of the migration does not
request anything - how does the guest on the destination see the same id
as was in place on the source at the time migration started.

> 
> 
>> when you load state from disk (you don't know how many times the same
>> state has been loaded previously, so each load is effectively forking
>> the VM and you want a different value), but for a single live migration,
>> you aren't forking the VM and don't need a new generation ID.
>>
>> I guess it all boils down to what command line you're using: if libvirt
>> is driving a live migration, it will request the same UUID in the
>> command line of the destination as what is on the source; while if
>> libvirt is loading from a [managed]save to restore state from a file, it
>> will either request a new UUID directly or request auto to let qemu
>> generate the new id.
> 
> Hmm now I've lost it a bit; I thought we would preserve the value
> transmitted from the source, not the value on the command line of the destination.

I guess I'm trying to figure out whether libvirt MUST read the current
id and explicitly tell the destination of migration to reuse that id, or
if libvirt can omit the id on migration and everything just works
because the id was migrated from the source.

Laszlo Ersek Feb. 20, 2017, 8:49 p.m. UTC | #15

On 02/20/17 21:19, Dr. David Alan Gilbert wrote:
> * Eric Blake (eblake@redhat.com) wrote:
>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote:
>>> * Laszlo Ersek (lersek@redhat.com) wrote:
>>>> CC Dave
>>>
>>> This isn't an area I really understand; but if I'm
>>> reading this right then 
>>>    vmgenid is stored in fw_cfg?
>>>    fw_cfg isn't migrated
>>>
>>> So why should any changes to it get migrated, except if it's already
>>> been read by the guest (and if the guest reads it again aftwards what's
>>> it expected to read?)
>>
>> Why are we expecting it to change on migration?  You want a new value
> 
> I'm not; I was asking why a change made prior to migration would be
> preserved across migration.
> 
> 
>> when you load state from disk (you don't know how many times the same
>> state has been loaded previously, so each load is effectively forking
>> the VM and you want a different value), but for a single live migration,
>> you aren't forking the VM and don't need a new generation ID.
>>
>> I guess it all boils down to what command line you're using: if libvirt
>> is driving a live migration, it will request the same UUID in the
>> command line of the destination as what is on the source; while if
>> libvirt is loading from a [managed]save to restore state from a file, it
>> will either request a new UUID directly or request auto to let qemu
>> generate the new id.
> 
> Hmm now I've lost it a bit; I thought we would preserve the value
> transmitted from the source, not the value on the command line of the destination.

The are two relevant pieces of data here.

(a) the GUID in guest RAM
(b) the guest-phys address of the GUID, written back by the guest fw to
a guest-writeable fw_cfg file, to be dereferenced by QEMU, for updating
the GUID in guest RAM

For both live migration and restoring saved state from disk, (b) doesn't
change. It is also not exposed on the QEMU command line. (It is
configured by the guest firmware during initial boot.)

(a) is taken from the QEMU command line. It can be "auto" (and then QEMU
generates a random GUID), or a specific GUID string. This GUID is always
written to guest RAM (assuming (b) has been configured) in the vmgenid
device's post_load callback. However, whether the new GUID should be
different from the one already present in guest RAM is a separate question.

- For restoring state from disk, a different GUID (either generated by
libvirt, or by QEMU due to "auto") makes sense.

- For live migration, it makes sense for libvirt to pass in the same
GUID on the target host as was used on the source host. The guest RAM
update, and the ACPI interrupt (SCI), will occur on the target host, but
the GUID won't change effectively. (The VMGENID spec explicitly permits
spurious notifications, i.e., an SCI with no change to the GUID in RAM.)

Thanks
Laszlo

Laszlo Ersek Feb. 20, 2017, 8:55 p.m. UTC | #16

On 02/20/17 21:45, Eric Blake wrote:
> On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote:
>> * Eric Blake (eblake@redhat.com) wrote:
>>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote:
>>>> * Laszlo Ersek (lersek@redhat.com) wrote:
>>>>> CC Dave
>>>>
>>>> This isn't an area I really understand; but if I'm
>>>> reading this right then 
>>>>    vmgenid is stored in fw_cfg?
>>>>    fw_cfg isn't migrated
>>>>
>>>> So why should any changes to it get migrated, except if it's already
>>>> been read by the guest (and if the guest reads it again aftwards what's
>>>> it expected to read?)
>>>
>>> Why are we expecting it to change on migration?  You want a new value
>>
>> I'm not; I was asking why a change made prior to migration would be
>> preserved across migration.
> 
> Okay, so you're asking what happens if the source requests the vmgenid
> device, and sets an id, but the destination of the migration does not
> request anything

This should never happen, as it means different QEMU command lines on
source vs. target hosts. (Different as in "incorrectly different".)

Dave writes, "a change made prior to migration". Change made to what?

- the GUID cannot be changed via the monitor once QEMU has been started.
We dropped the monitor command for that, due to lack of a good use case,
and due to lifecycle complexities. We have figured out a way to make it
safe, but until there's a really convincing use case, we shouldn't add
that complexity.

- the address of the GUID is changed (the firmware programs it from
"zero" to an actual address, in a writeable fw_cfg file), and that piece
of info is explicitly migrated, as part of the vmgenid device's vmsd.

Thanks
Laszlo


> - how does the guest on the destination see the same id
> as was in place on the source at the time migration started.
> 
>>
>>
>>> when you load state from disk (you don't know how many times the same
>>> state has been loaded previously, so each load is effectively forking
>>> the VM and you want a different value), but for a single live migration,
>>> you aren't forking the VM and don't need a new generation ID.
>>>
>>> I guess it all boils down to what command line you're using: if libvirt
>>> is driving a live migration, it will request the same UUID in the
>>> command line of the destination as what is on the source; while if
>>> libvirt is loading from a [managed]save to restore state from a file, it
>>> will either request a new UUID directly or request auto to let qemu
>>> generate the new id.
>>
>> Hmm now I've lost it a bit; I thought we would preserve the value
>> transmitted from the source, not the value on the command line of the destination.
> 
> I guess I'm trying to figure out whether libvirt MUST read the current
> id and explicitly tell the destination of migration to reuse that id, or
> if libvirt can omit the id on migration and everything just works
> because the id was migrated from the source.
>

Michael S. Tsirkin Feb. 21, 2017, 1:43 a.m. UTC | #17

On Mon, Feb 20, 2017 at 09:55:40PM +0100, Laszlo Ersek wrote:
> On 02/20/17 21:45, Eric Blake wrote:
> > On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote:
> >> * Eric Blake (eblake@redhat.com) wrote:
> >>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote:
> >>>> * Laszlo Ersek (lersek@redhat.com) wrote:
> >>>>> CC Dave
> >>>>
> >>>> This isn't an area I really understand; but if I'm
> >>>> reading this right then 
> >>>>    vmgenid is stored in fw_cfg?
> >>>>    fw_cfg isn't migrated
> >>>>
> >>>> So why should any changes to it get migrated, except if it's already
> >>>> been read by the guest (and if the guest reads it again aftwards what's
> >>>> it expected to read?)
> >>>
> >>> Why are we expecting it to change on migration?  You want a new value
> >>
> >> I'm not; I was asking why a change made prior to migration would be
> >> preserved across migration.
> > 
> > Okay, so you're asking what happens if the source requests the vmgenid
> > device, and sets an id, but the destination of the migration does not
> > request anything
> 
> This should never happen, as it means different QEMU command lines on
> source vs. target hosts. (Different as in "incorrectly different".)
> 
> Dave writes, "a change made prior to migration". Change made to what?
> 
> - the GUID cannot be changed via the monitor once QEMU has been started.
> We dropped the monitor command for that, due to lack of a good use case,
> and due to lifecycle complexities. We have figured out a way to make it
> safe, but until there's a really convincing use case, we shouldn't add
> that complexity.

True but we might in the future, and it seems prudent to make
migration stream future-proof for that.

> - the address of the GUID is changed (the firmware programs it from
> "zero" to an actual address, in a writeable fw_cfg file), and that piece
> of info is explicitly migrated, as part of the vmgenid device's vmsd.
> 
> Thanks
> Laszlo
> 
> 
> > - how does the guest on the destination see the same id
> > as was in place on the source at the time migration started.
> > 
> >>
> >>
> >>> when you load state from disk (you don't know how many times the same
> >>> state has been loaded previously, so each load is effectively forking
> >>> the VM and you want a different value), but for a single live migration,
> >>> you aren't forking the VM and don't need a new generation ID.
> >>>
> >>> I guess it all boils down to what command line you're using: if libvirt
> >>> is driving a live migration, it will request the same UUID in the
> >>> command line of the destination as what is on the source; while if
> >>> libvirt is loading from a [managed]save to restore state from a file, it
> >>> will either request a new UUID directly or request auto to let qemu
> >>> generate the new id.
> >>
> >> Hmm now I've lost it a bit; I thought we would preserve the value
> >> transmitted from the source, not the value on the command line of the destination.
> > 
> > I guess I'm trying to figure out whether libvirt MUST read the current
> > id and explicitly tell the destination of migration to reuse that id, or
> > if libvirt can omit the id on migration and everything just works
> > because the id was migrated from the source.
> >

Laszlo Ersek Feb. 21, 2017, 9:58 a.m. UTC | #18

On 02/21/17 02:43, Michael S. Tsirkin wrote:
> On Mon, Feb 20, 2017 at 09:55:40PM +0100, Laszlo Ersek wrote:
>> On 02/20/17 21:45, Eric Blake wrote:
>>> On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote:
>>>> * Eric Blake (eblake@redhat.com) wrote:
>>>>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote:
>>>>>> * Laszlo Ersek (lersek@redhat.com) wrote:
>>>>>>> CC Dave
>>>>>>
>>>>>> This isn't an area I really understand; but if I'm
>>>>>> reading this right then 
>>>>>>    vmgenid is stored in fw_cfg?
>>>>>>    fw_cfg isn't migrated
>>>>>>
>>>>>> So why should any changes to it get migrated, except if it's already
>>>>>> been read by the guest (and if the guest reads it again aftwards what's
>>>>>> it expected to read?)
>>>>>
>>>>> Why are we expecting it to change on migration?  You want a new value
>>>>
>>>> I'm not; I was asking why a change made prior to migration would be
>>>> preserved across migration.
>>>
>>> Okay, so you're asking what happens if the source requests the vmgenid
>>> device, and sets an id, but the destination of the migration does not
>>> request anything
>>
>> This should never happen, as it means different QEMU command lines on
>> source vs. target hosts. (Different as in "incorrectly different".)
>>
>> Dave writes, "a change made prior to migration". Change made to what?
>>
>> - the GUID cannot be changed via the monitor once QEMU has been started.
>> We dropped the monitor command for that, due to lack of a good use case,
>> and due to lifecycle complexities. We have figured out a way to make it
>> safe, but until there's a really convincing use case, we shouldn't add
>> that complexity.
> 
> True but we might in the future, and it seems prudent to make
> migration stream future-proof for that.

It is already.

The monitor command, if we add it, can be implemented incrementally. I
described it as "approach (iii)" elsewhere in the thread. This is a more
detailed recap:

- introduce a new device property (internal only), such as
  "x-enable-set-vmgenid". Make it reflect whether a given machine type
  supports the monitor command.

- change the /etc/vmgenid_guid fw_cfg blob from callback-less to one
  with a selection callback

- add a new boolean latch to the vmgenid device, called
  "guid_blob_selected" or something similar

- the reset handler sets the latch to FALSE
  (NB: the reset handler already sets /etc/vmgenid_addr to zero)

- the select callback for /etc/vmgenid_guid sets the latch to TRUE

- the latch is added to the migration stream as a subsection *if*
  x-enable-set-vmgenid is TRUE

- the set-vmgenid monitor command checks all three of:
  x-enable-set-vmgenid, the latch, and the contents of
  /etc/vmgenid_addr:

  - if x-enable-set-vmgenid is FALSE, the monitor command returns
    QERR_UNSUPPORTED (this is a generic error class, with an
    "unsupported" error message). Otherwise,

  - if the latch is TRUE *and* /etc/vmgenid_addr is zero, then the
    guest firmware has executed (or started executing) ALLOCATE for
    /etc/vmgenid_guid, but it has not executed WRITE_POINTER yet.
    In this case updating the VMGENID from the monitor is unsafe
    (we cannot guarantee informing the guest successfully), so in this
    case the monitor command fails with ERROR_CLASS_DEVICE_NOT_ACTIVE.
    The caller should simply try a bit later. (By which time the
    firmware will likely have programmed /etc/vmgenid_addr.)

    Libvirt can recognize this error specifically, because it is not the
    generic error class. ERROR_CLASS_DEVICE_NOT_ACTIVE stands for
    "EAGAIN", practically, in this case.

  - Otherwise -- meaning latch is FALSE *or* /etc/vmgenid_addr is
    nonzero, that is, the guest has either not run ALLOCATE since
    reset, *or* it has, but it has also run WRITE_POINTER):

    - refresh the GUID within the fw_cfg blob for /etc/vmgenid_guid
      in-place -- the guest will see this whenever it runs ALLOCATE for
      /etc/vmgenid_guid, *AND*

    - if /etc/vmgenid_addr is not zero, then update the guest (that is,
      RAM write + SCI)

Thanks
Laszlo

> 
>> - the address of the GUID is changed (the firmware programs it from
>> "zero" to an actual address, in a writeable fw_cfg file), and that piece
>> of info is explicitly migrated, as part of the vmgenid device's vmsd.
>>
>> Thanks
>> Laszlo
>>
>>
>>> - how does the guest on the destination see the same id
>>> as was in place on the source at the time migration started.
>>>
>>>>
>>>>
>>>>> when you load state from disk (you don't know how many times the same
>>>>> state has been loaded previously, so each load is effectively forking
>>>>> the VM and you want a different value), but for a single live migration,
>>>>> you aren't forking the VM and don't need a new generation ID.
>>>>>
>>>>> I guess it all boils down to what command line you're using: if libvirt
>>>>> is driving a live migration, it will request the same UUID in the
>>>>> command line of the destination as what is on the source; while if
>>>>> libvirt is loading from a [managed]save to restore state from a file, it
>>>>> will either request a new UUID directly or request auto to let qemu
>>>>> generate the new id.
>>>>
>>>> Hmm now I've lost it a bit; I thought we would preserve the value
>>>> transmitted from the source, not the value on the command line of the destination.
>>>
>>> I guess I'm trying to figure out whether libvirt MUST read the current
>>> id and explicitly tell the destination of migration to reuse that id, or
>>> if libvirt can omit the id on migration and everything just works
>>> because the id was migrated from the source.
>>>

Michael S. Tsirkin Feb. 21, 2017, 2:14 p.m. UTC | #19

On Tue, Feb 21, 2017 at 10:58:05AM +0100, Laszlo Ersek wrote:
> On 02/21/17 02:43, Michael S. Tsirkin wrote:
> > On Mon, Feb 20, 2017 at 09:55:40PM +0100, Laszlo Ersek wrote:
> >> On 02/20/17 21:45, Eric Blake wrote:
> >>> On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote:
> >>>> * Eric Blake (eblake@redhat.com) wrote:
> >>>>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote:
> >>>>>> * Laszlo Ersek (lersek@redhat.com) wrote:
> >>>>>>> CC Dave
> >>>>>>
> >>>>>> This isn't an area I really understand; but if I'm
> >>>>>> reading this right then 
> >>>>>>    vmgenid is stored in fw_cfg?
> >>>>>>    fw_cfg isn't migrated
> >>>>>>
> >>>>>> So why should any changes to it get migrated, except if it's already
> >>>>>> been read by the guest (and if the guest reads it again aftwards what's
> >>>>>> it expected to read?)
> >>>>>
> >>>>> Why are we expecting it to change on migration?  You want a new value
> >>>>
> >>>> I'm not; I was asking why a change made prior to migration would be
> >>>> preserved across migration.
> >>>
> >>> Okay, so you're asking what happens if the source requests the vmgenid
> >>> device, and sets an id, but the destination of the migration does not
> >>> request anything
> >>
> >> This should never happen, as it means different QEMU command lines on
> >> source vs. target hosts. (Different as in "incorrectly different".)
> >>
> >> Dave writes, "a change made prior to migration". Change made to what?
> >>
> >> - the GUID cannot be changed via the monitor once QEMU has been started.
> >> We dropped the monitor command for that, due to lack of a good use case,
> >> and due to lifecycle complexities. We have figured out a way to make it
> >> safe, but until there's a really convincing use case, we shouldn't add
> >> that complexity.
> > 
> > True but we might in the future, and it seems prudent to make
> > migration stream future-proof for that.
> 
> It is already.
> 
> The monitor command, if we add it, can be implemented incrementally. I
> described it as "approach (iii)" elsewhere in the thread. This is a more
> detailed recap:
> 
> - introduce a new device property (internal only), such as
>   "x-enable-set-vmgenid". Make it reflect whether a given machine type
>   supports the monitor command.

This is the part we can avoid at no real cost just
by making sure the guid is migrated.


> - change the /etc/vmgenid_guid fw_cfg blob from callback-less to one
>   with a selection callback
> 
> - add a new boolean latch to the vmgenid device, called
>   "guid_blob_selected" or something similar
> 
> - the reset handler sets the latch to FALSE
>   (NB: the reset handler already sets /etc/vmgenid_addr to zero)
> 
> - the select callback for /etc/vmgenid_guid sets the latch to TRUE
> 
> - the latch is added to the migration stream as a subsection *if*
>   x-enable-set-vmgenid is TRUE
> 
> - the set-vmgenid monitor command checks all three of:
>   x-enable-set-vmgenid, the latch, and the contents of
>   /etc/vmgenid_addr:
> 
>   - if x-enable-set-vmgenid is FALSE, the monitor command returns
>     QERR_UNSUPPORTED (this is a generic error class, with an
>     "unsupported" error message). Otherwise,
> 
>   - if the latch is TRUE *and* /etc/vmgenid_addr is zero, then the
>     guest firmware has executed (or started executing) ALLOCATE for
>     /etc/vmgenid_guid, but it has not executed WRITE_POINTER yet.
>     In this case updating the VMGENID from the monitor is unsafe
>     (we cannot guarantee informing the guest successfully), so in this
>     case the monitor command fails with ERROR_CLASS_DEVICE_NOT_ACTIVE.
>     The caller should simply try a bit later. (By which time the
>     firmware will likely have programmed /etc/vmgenid_addr.)

This makes no sense to me. Just update it in qemu memory
and write when guest asks for it.


>     Libvirt can recognize this error specifically, because it is not the
>     generic error class. ERROR_CLASS_DEVICE_NOT_ACTIVE stands for
>     "EAGAIN", practically, in this case.
> 
>   - Otherwise -- meaning latch is FALSE *or* /etc/vmgenid_addr is
>     nonzero, that is, the guest has either not run ALLOCATE since
>     reset, *or* it has, but it has also run WRITE_POINTER):
> 
>     - refresh the GUID within the fw_cfg blob for /etc/vmgenid_guid
>       in-place -- the guest will see this whenever it runs ALLOCATE for
>       /etc/vmgenid_guid, *AND*
> 
>     - if /etc/vmgenid_addr is not zero, then update the guest (that is,
>       RAM write + SCI)
> 
> Thanks
> Laszlo

Seems way more painful than it has to be. Just migrate the guid
and then management can write it at any time.


> > 
> >> - the address of the GUID is changed (the firmware programs it from
> >> "zero" to an actual address, in a writeable fw_cfg file), and that piece
> >> of info is explicitly migrated, as part of the vmgenid device's vmsd.
> >>
> >> Thanks
> >> Laszlo
> >>
> >>
> >>> - how does the guest on the destination see the same id
> >>> as was in place on the source at the time migration started.
> >>>
> >>>>
> >>>>
> >>>>> when you load state from disk (you don't know how many times the same
> >>>>> state has been loaded previously, so each load is effectively forking
> >>>>> the VM and you want a different value), but for a single live migration,
> >>>>> you aren't forking the VM and don't need a new generation ID.
> >>>>>
> >>>>> I guess it all boils down to what command line you're using: if libvirt
> >>>>> is driving a live migration, it will request the same UUID in the
> >>>>> command line of the destination as what is on the source; while if
> >>>>> libvirt is loading from a [managed]save to restore state from a file, it
> >>>>> will either request a new UUID directly or request auto to let qemu
> >>>>> generate the new id.
> >>>>
> >>>> Hmm now I've lost it a bit; I thought we would preserve the value
> >>>> transmitted from the source, not the value on the command line of the destination.
> >>>
> >>> I guess I'm trying to figure out whether libvirt MUST read the current
> >>> id and explicitly tell the destination of migration to reuse that id, or
> >>> if libvirt can omit the id on migration and everything just works
> >>> because the id was migrated from the source.
> >>>

Laszlo Ersek Feb. 21, 2017, 4:08 p.m. UTC | #20

On 02/21/17 15:14, Michael S. Tsirkin wrote:
> On Tue, Feb 21, 2017 at 10:58:05AM +0100, Laszlo Ersek wrote:
>> On 02/21/17 02:43, Michael S. Tsirkin wrote:
>>> On Mon, Feb 20, 2017 at 09:55:40PM +0100, Laszlo Ersek wrote:
>>>> On 02/20/17 21:45, Eric Blake wrote:
>>>>> On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote:
>>>>>> * Eric Blake (eblake@redhat.com) wrote:
>>>>>>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote:
>>>>>>>> * Laszlo Ersek (lersek@redhat.com) wrote:
>>>>>>>>> CC Dave
>>>>>>>>
>>>>>>>> This isn't an area I really understand; but if I'm
>>>>>>>> reading this right then 
>>>>>>>>    vmgenid is stored in fw_cfg?
>>>>>>>>    fw_cfg isn't migrated
>>>>>>>>
>>>>>>>> So why should any changes to it get migrated, except if it's already
>>>>>>>> been read by the guest (and if the guest reads it again aftwards what's
>>>>>>>> it expected to read?)
>>>>>>>
>>>>>>> Why are we expecting it to change on migration?  You want a new value
>>>>>>
>>>>>> I'm not; I was asking why a change made prior to migration would be
>>>>>> preserved across migration.
>>>>>
>>>>> Okay, so you're asking what happens if the source requests the vmgenid
>>>>> device, and sets an id, but the destination of the migration does not
>>>>> request anything
>>>>
>>>> This should never happen, as it means different QEMU command lines on
>>>> source vs. target hosts. (Different as in "incorrectly different".)
>>>>
>>>> Dave writes, "a change made prior to migration". Change made to what?
>>>>
>>>> - the GUID cannot be changed via the monitor once QEMU has been started.
>>>> We dropped the monitor command for that, due to lack of a good use case,
>>>> and due to lifecycle complexities. We have figured out a way to make it
>>>> safe, but until there's a really convincing use case, we shouldn't add
>>>> that complexity.
>>>
>>> True but we might in the future, and it seems prudent to make
>>> migration stream future-proof for that.
>>
>> It is already.
>>
>> The monitor command, if we add it, can be implemented incrementally. I
>> described it as "approach (iii)" elsewhere in the thread. This is a more
>> detailed recap:
>>
>> - introduce a new device property (internal only), such as
>>   "x-enable-set-vmgenid". Make it reflect whether a given machine type
>>   supports the monitor command.
> 
> This is the part we can avoid at no real cost just
> by making sure the guid is migrated.
> 
> 
>> - change the /etc/vmgenid_guid fw_cfg blob from callback-less to one
>>   with a selection callback
>>
>> - add a new boolean latch to the vmgenid device, called
>>   "guid_blob_selected" or something similar
>>
>> - the reset handler sets the latch to FALSE
>>   (NB: the reset handler already sets /etc/vmgenid_addr to zero)
>>
>> - the select callback for /etc/vmgenid_guid sets the latch to TRUE
>>
>> - the latch is added to the migration stream as a subsection *if*
>>   x-enable-set-vmgenid is TRUE
>>
>> - the set-vmgenid monitor command checks all three of:
>>   x-enable-set-vmgenid, the latch, and the contents of
>>   /etc/vmgenid_addr:
>>
>>   - if x-enable-set-vmgenid is FALSE, the monitor command returns
>>     QERR_UNSUPPORTED (this is a generic error class, with an
>>     "unsupported" error message). Otherwise,
>>
>>   - if the latch is TRUE *and* /etc/vmgenid_addr is zero, then the
>>     guest firmware has executed (or started executing) ALLOCATE for
>>     /etc/vmgenid_guid, but it has not executed WRITE_POINTER yet.
>>     In this case updating the VMGENID from the monitor is unsafe
>>     (we cannot guarantee informing the guest successfully), so in this
>>     case the monitor command fails with ERROR_CLASS_DEVICE_NOT_ACTIVE.
>>     The caller should simply try a bit later. (By which time the
>>     firmware will likely have programmed /etc/vmgenid_addr.)
> 
> This makes no sense to me. Just update it in qemu memory
> and write when guest asks for it.

I designed the above (sorry if "designed" is a word too pompous for
this) quite explicitly to address your concern as to what would happen
if someone tried to massage the GUID via the monitor while the firmware
was between ALLOCATE and WRITE_POINTER.

Also, we don't know when the guest "asks" for the GUID (in guest RAM).
It just evaluates ADDR (maybe always, maybe only once, at guest driver
startup), and then it just looks at RAM whenever it wants to.

This is why this idea seeks to track the guest's state -- if the guest
is before ALLOCATE, it's okay to update the fw_cfg blob, if it is
between ALLOCATE and WRITE_POINTER, reject the monitor command
(temporarily), and if the guest is after WRITE_POINTER, update the RAM
and inject the SCI.

We cannot see *exactly* when the guest has just finished writing the
address. We have only select callbacks for fw_cfg items, not write
callbacks. And a select callback is no good for the address blob,
because it would be invoked *before* the guest writes the address.

We discussed these facts several days (weeks?) and several iterations
ago. The longer term remedy we came up was the above design. The shorter
term remedy was to drop the "set" monitor command, because we couldn't
figure out a management layer use case for that monitor command.

If you now (at v8) insist to future proof the design for a potential
"set" monitor command, that's exactly the same as if you were requiring
Ben to implement the monitor command right now. Except this is worse,
because we dropped the monitor command in v6 (from v5), and you didn't
protest.

> 
> 
>>     Libvirt can recognize this error specifically, because it is not the
>>     generic error class. ERROR_CLASS_DEVICE_NOT_ACTIVE stands for
>>     "EAGAIN", practically, in this case.
>>
>>   - Otherwise -- meaning latch is FALSE *or* /etc/vmgenid_addr is
>>     nonzero, that is, the guest has either not run ALLOCATE since
>>     reset, *or* it has, but it has also run WRITE_POINTER):
>>
>>     - refresh the GUID within the fw_cfg blob for /etc/vmgenid_guid
>>       in-place -- the guest will see this whenever it runs ALLOCATE for
>>       /etc/vmgenid_guid, *AND*
>>
>>     - if /etc/vmgenid_addr is not zero, then update the guest (that is,
>>       RAM write + SCI)
>>
>> Thanks
>> Laszlo
> 
> Seems way more painful than it has to be. Just migrate the guid
> and then management can write it at any time.

Yes, management would be able to do that, but we won't know when to
expose it to the guest. (Because, again, we don't exactly know when the
guest looks at the GUID in RAM, and if the address is not configured
yet, we cannot put the GUID anywhere in RAM.)

Do you intend to block v8 over this?

Thanks
Laszlo

> 
> 
>>>
>>>> - the address of the GUID is changed (the firmware programs it from
>>>> "zero" to an actual address, in a writeable fw_cfg file), and that piece
>>>> of info is explicitly migrated, as part of the vmgenid device's vmsd.
>>>>
>>>> Thanks
>>>> Laszlo
>>>>
>>>>
>>>>> - how does the guest on the destination see the same id
>>>>> as was in place on the source at the time migration started.
>>>>>
>>>>>>
>>>>>>
>>>>>>> when you load state from disk (you don't know how many times the same
>>>>>>> state has been loaded previously, so each load is effectively forking
>>>>>>> the VM and you want a different value), but for a single live migration,
>>>>>>> you aren't forking the VM and don't need a new generation ID.
>>>>>>>
>>>>>>> I guess it all boils down to what command line you're using: if libvirt
>>>>>>> is driving a live migration, it will request the same UUID in the
>>>>>>> command line of the destination as what is on the source; while if
>>>>>>> libvirt is loading from a [managed]save to restore state from a file, it
>>>>>>> will either request a new UUID directly or request auto to let qemu
>>>>>>> generate the new id.
>>>>>>
>>>>>> Hmm now I've lost it a bit; I thought we would preserve the value
>>>>>> transmitted from the source, not the value on the command line of the destination.
>>>>>
>>>>> I guess I'm trying to figure out whether libvirt MUST read the current
>>>>> id and explicitly tell the destination of migration to reuse that id, or
>>>>> if libvirt can omit the id on migration and everything just works
>>>>> because the id was migrated from the source.
>>>>>

Michael S. Tsirkin Feb. 21, 2017, 4:17 p.m. UTC | #21

On Tue, Feb 21, 2017 at 05:08:40PM +0100, Laszlo Ersek wrote:
> On 02/21/17 15:14, Michael S. Tsirkin wrote:
> > On Tue, Feb 21, 2017 at 10:58:05AM +0100, Laszlo Ersek wrote:
> >> On 02/21/17 02:43, Michael S. Tsirkin wrote:
> >>> On Mon, Feb 20, 2017 at 09:55:40PM +0100, Laszlo Ersek wrote:
> >>>> On 02/20/17 21:45, Eric Blake wrote:
> >>>>> On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote:
> >>>>>> * Eric Blake (eblake@redhat.com) wrote:
> >>>>>>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote:
> >>>>>>>> * Laszlo Ersek (lersek@redhat.com) wrote:
> >>>>>>>>> CC Dave
> >>>>>>>>
> >>>>>>>> This isn't an area I really understand; but if I'm
> >>>>>>>> reading this right then 
> >>>>>>>>    vmgenid is stored in fw_cfg?
> >>>>>>>>    fw_cfg isn't migrated
> >>>>>>>>
> >>>>>>>> So why should any changes to it get migrated, except if it's already
> >>>>>>>> been read by the guest (and if the guest reads it again aftwards what's
> >>>>>>>> it expected to read?)
> >>>>>>>
> >>>>>>> Why are we expecting it to change on migration?  You want a new value
> >>>>>>
> >>>>>> I'm not; I was asking why a change made prior to migration would be
> >>>>>> preserved across migration.
> >>>>>
> >>>>> Okay, so you're asking what happens if the source requests the vmgenid
> >>>>> device, and sets an id, but the destination of the migration does not
> >>>>> request anything
> >>>>
> >>>> This should never happen, as it means different QEMU command lines on
> >>>> source vs. target hosts. (Different as in "incorrectly different".)
> >>>>
> >>>> Dave writes, "a change made prior to migration". Change made to what?
> >>>>
> >>>> - the GUID cannot be changed via the monitor once QEMU has been started.
> >>>> We dropped the monitor command for that, due to lack of a good use case,
> >>>> and due to lifecycle complexities. We have figured out a way to make it
> >>>> safe, but until there's a really convincing use case, we shouldn't add
> >>>> that complexity.
> >>>
> >>> True but we might in the future, and it seems prudent to make
> >>> migration stream future-proof for that.
> >>
> >> It is already.
> >>
> >> The monitor command, if we add it, can be implemented incrementally. I
> >> described it as "approach (iii)" elsewhere in the thread. This is a more
> >> detailed recap:
> >>
> >> - introduce a new device property (internal only), such as
> >>   "x-enable-set-vmgenid". Make it reflect whether a given machine type
> >>   supports the monitor command.
> > 
> > This is the part we can avoid at no real cost just
> > by making sure the guid is migrated.
> > 
> > 
> >> - change the /etc/vmgenid_guid fw_cfg blob from callback-less to one
> >>   with a selection callback
> >>
> >> - add a new boolean latch to the vmgenid device, called
> >>   "guid_blob_selected" or something similar
> >>
> >> - the reset handler sets the latch to FALSE
> >>   (NB: the reset handler already sets /etc/vmgenid_addr to zero)
> >>
> >> - the select callback for /etc/vmgenid_guid sets the latch to TRUE
> >>
> >> - the latch is added to the migration stream as a subsection *if*
> >>   x-enable-set-vmgenid is TRUE
> >>
> >> - the set-vmgenid monitor command checks all three of:
> >>   x-enable-set-vmgenid, the latch, and the contents of
> >>   /etc/vmgenid_addr:
> >>
> >>   - if x-enable-set-vmgenid is FALSE, the monitor command returns
> >>     QERR_UNSUPPORTED (this is a generic error class, with an
> >>     "unsupported" error message). Otherwise,
> >>
> >>   - if the latch is TRUE *and* /etc/vmgenid_addr is zero, then the
> >>     guest firmware has executed (or started executing) ALLOCATE for
> >>     /etc/vmgenid_guid, but it has not executed WRITE_POINTER yet.
> >>     In this case updating the VMGENID from the monitor is unsafe
> >>     (we cannot guarantee informing the guest successfully), so in this
> >>     case the monitor command fails with ERROR_CLASS_DEVICE_NOT_ACTIVE.
> >>     The caller should simply try a bit later. (By which time the
> >>     firmware will likely have programmed /etc/vmgenid_addr.)
> > 
> > This makes no sense to me. Just update it in qemu memory
> > and write when guest asks for it.
> 
> I designed the above (sorry if "designed" is a word too pompous for
> this) quite explicitly to address your concern as to what would happen
> if someone tried to massage the GUID via the monitor while the firmware
> was between ALLOCATE and WRITE_POINTER.
> 
> Also, we don't know when the guest "asks" for the GUID (in guest RAM).
> It just evaluates ADDR (maybe always, maybe only once, at guest driver
> startup), and then it just looks at RAM whenever it wants to.
> 
> This is why this idea seeks to track the guest's state -- if the guest
> is before ALLOCATE, it's okay to update the fw_cfg blob, if it is
> between ALLOCATE and WRITE_POINTER, reject the monitor command
> (temporarily), and if the guest is after WRITE_POINTER, update the RAM
> and inject the SCI.
> 
> We cannot see *exactly* when the guest has just finished writing the
> address. We have only select callbacks for fw_cfg items, not write
> callbacks. And a select callback is no good for the address blob,
> because it would be invoked *before* the guest writes the address.
> 
> We discussed these facts several days (weeks?) and several iterations
> ago. The longer term remedy we came up was the above design. The shorter
> term remedy was to drop the "set" monitor command, because we couldn't
> figure out a management layer use case for that monitor command.
> 
> If you now (at v8) insist to future proof the design for a potential
> "set" monitor command, that's exactly the same as if you were requiring
> Ben to implement the monitor command right now. Except this is worse,
> because we dropped the monitor command in v6 (from v5), and you didn't
> protest.

I'm merging this as-is but I think the concerns are overblown.
We have many fields which devices DMA into guest memory
and changing them is easy.

It should be a simple matter to update guid copy in
fw cfg blob, and *if we have the address*, DMA there
and send SCI.

Yes we don't know when does guest look at guid but that
is simply up to guest. It needs to look at it at the
right time.

So the implementation is really easy I think.

The real problem is that we will have migrated guid
and command line guid and which one wins if they conflict.
And that is IMO something we need to figure out now and
not later.




> > 
> > 
> >>     Libvirt can recognize this error specifically, because it is not the
> >>     generic error class. ERROR_CLASS_DEVICE_NOT_ACTIVE stands for
> >>     "EAGAIN", practically, in this case.
> >>
> >>   - Otherwise -- meaning latch is FALSE *or* /etc/vmgenid_addr is
> >>     nonzero, that is, the guest has either not run ALLOCATE since
> >>     reset, *or* it has, but it has also run WRITE_POINTER):
> >>
> >>     - refresh the GUID within the fw_cfg blob for /etc/vmgenid_guid
> >>       in-place -- the guest will see this whenever it runs ALLOCATE for
> >>       /etc/vmgenid_guid, *AND*
> >>
> >>     - if /etc/vmgenid_addr is not zero, then update the guest (that is,
> >>       RAM write + SCI)
> >>
> >> Thanks
> >> Laszlo
> > 
> > Seems way more painful than it has to be. Just migrate the guid
> > and then management can write it at any time.
> 
> Yes, management would be able to do that, but we won't know when to
> expose it to the guest. (Because, again, we don't exactly know when the
> guest looks at the GUID in RAM, and if the address is not configured
> yet, we cannot put the GUID anywhere in RAM.)
> 
> Do you intend to block v8 over this?
> 
> Thanks
> Laszlo
> 
> > 
> > 
> >>>
> >>>> - the address of the GUID is changed (the firmware programs it from
> >>>> "zero" to an actual address, in a writeable fw_cfg file), and that piece
> >>>> of info is explicitly migrated, as part of the vmgenid device's vmsd.
> >>>>
> >>>> Thanks
> >>>> Laszlo
> >>>>
> >>>>
> >>>>> - how does the guest on the destination see the same id
> >>>>> as was in place on the source at the time migration started.
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>> when you load state from disk (you don't know how many times the same
> >>>>>>> state has been loaded previously, so each load is effectively forking
> >>>>>>> the VM and you want a different value), but for a single live migration,
> >>>>>>> you aren't forking the VM and don't need a new generation ID.
> >>>>>>>
> >>>>>>> I guess it all boils down to what command line you're using: if libvirt
> >>>>>>> is driving a live migration, it will request the same UUID in the
> >>>>>>> command line of the destination as what is on the source; while if
> >>>>>>> libvirt is loading from a [managed]save to restore state from a file, it
> >>>>>>> will either request a new UUID directly or request auto to let qemu
> >>>>>>> generate the new id.
> >>>>>>
> >>>>>> Hmm now I've lost it a bit; I thought we would preserve the value
> >>>>>> transmitted from the source, not the value on the command line of the destination.
> >>>>>
> >>>>> I guess I'm trying to figure out whether libvirt MUST read the current
> >>>>> id and explicitly tell the destination of migration to reuse that id, or
> >>>>> if libvirt can omit the id on migration and everything just works
> >>>>> because the id was migrated from the source.
> >>>>>

Laszlo Ersek Feb. 21, 2017, 4:50 p.m. UTC | #22

On 02/21/17 17:17, Michael S. Tsirkin wrote:
> On Tue, Feb 21, 2017 at 05:08:40PM +0100, Laszlo Ersek wrote:
>> On 02/21/17 15:14, Michael S. Tsirkin wrote:
>>> On Tue, Feb 21, 2017 at 10:58:05AM +0100, Laszlo Ersek wrote:
>>>> On 02/21/17 02:43, Michael S. Tsirkin wrote:
>>>>> On Mon, Feb 20, 2017 at 09:55:40PM +0100, Laszlo Ersek wrote:
>>>>>> On 02/20/17 21:45, Eric Blake wrote:
>>>>>>> On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote:
>>>>>>>> * Eric Blake (eblake@redhat.com) wrote:
>>>>>>>>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote:
>>>>>>>>>> * Laszlo Ersek (lersek@redhat.com) wrote:
>>>>>>>>>>> CC Dave
>>>>>>>>>>
>>>>>>>>>> This isn't an area I really understand; but if I'm
>>>>>>>>>> reading this right then 
>>>>>>>>>>    vmgenid is stored in fw_cfg?
>>>>>>>>>>    fw_cfg isn't migrated
>>>>>>>>>>
>>>>>>>>>> So why should any changes to it get migrated, except if it's already
>>>>>>>>>> been read by the guest (and if the guest reads it again aftwards what's
>>>>>>>>>> it expected to read?)
>>>>>>>>>
>>>>>>>>> Why are we expecting it to change on migration?  You want a new value
>>>>>>>>
>>>>>>>> I'm not; I was asking why a change made prior to migration would be
>>>>>>>> preserved across migration.
>>>>>>>
>>>>>>> Okay, so you're asking what happens if the source requests the vmgenid
>>>>>>> device, and sets an id, but the destination of the migration does not
>>>>>>> request anything
>>>>>>
>>>>>> This should never happen, as it means different QEMU command lines on
>>>>>> source vs. target hosts. (Different as in "incorrectly different".)
>>>>>>
>>>>>> Dave writes, "a change made prior to migration". Change made to what?
>>>>>>
>>>>>> - the GUID cannot be changed via the monitor once QEMU has been started.
>>>>>> We dropped the monitor command for that, due to lack of a good use case,
>>>>>> and due to lifecycle complexities. We have figured out a way to make it
>>>>>> safe, but until there's a really convincing use case, we shouldn't add
>>>>>> that complexity.
>>>>>
>>>>> True but we might in the future, and it seems prudent to make
>>>>> migration stream future-proof for that.
>>>>
>>>> It is already.
>>>>
>>>> The monitor command, if we add it, can be implemented incrementally. I
>>>> described it as "approach (iii)" elsewhere in the thread. This is a more
>>>> detailed recap:
>>>>
>>>> - introduce a new device property (internal only), such as
>>>>   "x-enable-set-vmgenid". Make it reflect whether a given machine type
>>>>   supports the monitor command.
>>>
>>> This is the part we can avoid at no real cost just
>>> by making sure the guid is migrated.
>>>
>>>
>>>> - change the /etc/vmgenid_guid fw_cfg blob from callback-less to one
>>>>   with a selection callback
>>>>
>>>> - add a new boolean latch to the vmgenid device, called
>>>>   "guid_blob_selected" or something similar
>>>>
>>>> - the reset handler sets the latch to FALSE
>>>>   (NB: the reset handler already sets /etc/vmgenid_addr to zero)
>>>>
>>>> - the select callback for /etc/vmgenid_guid sets the latch to TRUE
>>>>
>>>> - the latch is added to the migration stream as a subsection *if*
>>>>   x-enable-set-vmgenid is TRUE
>>>>
>>>> - the set-vmgenid monitor command checks all three of:
>>>>   x-enable-set-vmgenid, the latch, and the contents of
>>>>   /etc/vmgenid_addr:
>>>>
>>>>   - if x-enable-set-vmgenid is FALSE, the monitor command returns
>>>>     QERR_UNSUPPORTED (this is a generic error class, with an
>>>>     "unsupported" error message). Otherwise,
>>>>
>>>>   - if the latch is TRUE *and* /etc/vmgenid_addr is zero, then the
>>>>     guest firmware has executed (or started executing) ALLOCATE for
>>>>     /etc/vmgenid_guid, but it has not executed WRITE_POINTER yet.
>>>>     In this case updating the VMGENID from the monitor is unsafe
>>>>     (we cannot guarantee informing the guest successfully), so in this
>>>>     case the monitor command fails with ERROR_CLASS_DEVICE_NOT_ACTIVE.
>>>>     The caller should simply try a bit later. (By which time the
>>>>     firmware will likely have programmed /etc/vmgenid_addr.)
>>>
>>> This makes no sense to me. Just update it in qemu memory
>>> and write when guest asks for it.
>>
>> I designed the above (sorry if "designed" is a word too pompous for
>> this) quite explicitly to address your concern as to what would happen
>> if someone tried to massage the GUID via the monitor while the firmware
>> was between ALLOCATE and WRITE_POINTER.
>>
>> Also, we don't know when the guest "asks" for the GUID (in guest RAM).
>> It just evaluates ADDR (maybe always, maybe only once, at guest driver
>> startup), and then it just looks at RAM whenever it wants to.
>>
>> This is why this idea seeks to track the guest's state -- if the guest
>> is before ALLOCATE, it's okay to update the fw_cfg blob, if it is
>> between ALLOCATE and WRITE_POINTER, reject the monitor command
>> (temporarily), and if the guest is after WRITE_POINTER, update the RAM
>> and inject the SCI.
>>
>> We cannot see *exactly* when the guest has just finished writing the
>> address. We have only select callbacks for fw_cfg items, not write
>> callbacks. And a select callback is no good for the address blob,
>> because it would be invoked *before* the guest writes the address.
>>
>> We discussed these facts several days (weeks?) and several iterations
>> ago. The longer term remedy we came up was the above design. The shorter
>> term remedy was to drop the "set" monitor command, because we couldn't
>> figure out a management layer use case for that monitor command.
>>
>> If you now (at v8) insist to future proof the design for a potential
>> "set" monitor command, that's exactly the same as if you were requiring
>> Ben to implement the monitor command right now. Except this is worse,
>> because we dropped the monitor command in v6 (from v5), and you didn't
>> protest.
> 
> I'm merging this as-is

Thank you!

> but I think the concerns are overblown.
> We have many fields which devices DMA into guest memory
> and changing them is easy.
> 
> It should be a simple matter to update guid copy in
> fw cfg blob, and *if we have the address*, DMA there
> and send SCI.

I think this was more or less what Ben's v5 did, and (again, as far as I
recall) you were concerned about its safety:

msgid: <20170206201249-mutt-send-email-mst@kernel.org>
URL: https://www.mail-archive.com/qemu-devel@nongnu.org/msg427927.html

msgid: <20170206210237-mutt-send-email-mst@kernel.org>
URL: https://www.mail-archive.com/qemu-devel@nongnu.org/msg427935.html

Again, at that point I "invented" the above elaborate design *only* to
address your concern. If you are not concerned any longer (or, if you
had never had this exact concern, I just misunderstood you), then I'm
fine dropping all of the above -- I definitely don't strive to implement
(or request) the above out of my own initiative.

Please see item (5) in the following message:

msgid: <14f224ed-08e2-cbad-9d1d-8f559cd399a6@redhat.com>
URL: https://www.mail-archive.com/qemu-devel@nongnu.org/msg428296.html

The design above is just "approach (iii)" expanded with more details,
from under said item (5). You didn't react there, and I thought you were
okay with the idea.

Then Ben went on to drop the "set" monitor command in v6, and you didn't
comment on that either -- so I assumed you were okay with that too.

> 
> Yes we don't know when does guest look at guid but that
> is simply up to guest. It needs to look at it at the
> right time.
> 
> So the implementation is really easy I think.

That's for the best!

> 
> The real problem is that we will have migrated guid
> and command line guid and which one wins if they conflict.
> And that is IMO something we need to figure out now and
> not later.

Neither Ben nor myself seem to know when the management layer would want
to call the "set" monitor command, and youor question is really hard to
answer without that knowledge.

(Under my proposal, the question does not really exist, because the GUID
set last on the source host needs not be migrated except as part of
guest RAM, and it's always the command line GUID on the target host that
takes precedence after migration and gets written into guest RAM.)

In other words, it is for libvirt / users / etc to say why they would
want to set GUID-A with the monitor command on the source host, *and*
then start up QEMU on the target host with GUID-B on the command line.

... Either way, I would let GUID-B take effect.

Thanks
Laszlo

[v8,4/8] ACPI: Add Virtual Machine Generation ID support

Commit Message

Comments

Patch