Message ID | 918524f7-26cf-3fce-d9e3-7316ca69285b@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 17 Feb 2017 13:50:40 +0100 Laszlo Ersek <lersek@redhat.com> wrote: > CC Dave > > On 02/17/17 11:43, Igor Mammedov wrote: > > On Thu, 16 Feb 2017 15:15:36 -0800 > > ben@skyportsystems.com wrote: > > > >> From: Ben Warren <ben@skyportsystems.com> > >> > >> This implements the VM Generation ID feature by passing a 128-bit > >> GUID to the guest via a fw_cfg blob. > >> Any time the GUID changes, an ACPI notify event is sent to the guest > >> > >> The user interface is a simple device with one parameter: > >> - guid (string, must be "auto" or in UUID format > >> xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) > > I've given it some testing with WS2012R2 and v4 patches for Seabios, > > > > Windows is able to read initial GUID allocation and writeback > > seems to work somehow: > > > > (qemu) info vm-generation-id > > c109c09b-0e8b-42d5-9b33-8409c9dcd16c > > > > vmgenid client in Windows reads it as 2 following 64bit integers: > > 42d50e8bc109c09b:6cd1dcc90984339b > > > > However update path/restore from snapshot doesn't > > here is as I've tested it: > > > > qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio > > (qemu) info vm-generation-id > > c109c09b-0e8b-42d5-9b33-8409c9dcd16c > > (qemu) stop > > (qemu) migrate "exec:gzip -c > STATEFILE.gz" > > (qemu) quit > > > > qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio > > -incoming "exec: gzip -c -d STATEFILE.gz" > > (qemu) info vm-generation-id > > 28b587fa-991b-4267-80d7-9cf28b746fe9 > > > > guest > > 1. doesn't get GPE notification that it must receive > > 2. vmgenid client in Windows reads the same value > > 42d50e8bc109c09b:6cd1dcc90984339b > > Hmmm, I wonder if we need something like this, in vmgenid_post_load(): > > commit 90c647db8d59e47c9000affc0d81754eb346e939 > Author: Dr. David Alan Gilbert <dgilbert@redhat.com> > Date: Fri Apr 15 12:41:30 2016 +0100 > > Fix pflash migration > > with the idea being that in a single device's post_load callback, we > shouldn't perform machine-wide actions (post_load is likely for fixing > up the device itself). If machine-wide actions are necessary, we should > temporarily register a "vm change state handler", and do the thing once > that handler is called (when the machine has been loaded fully and is > about to continue execution). > > Can you please try the attached patch on top? (Build tested only.) it doesn't help > > Thanks! > Laszlo
On 02/17/17 14:05, Igor Mammedov wrote: > On Fri, 17 Feb 2017 13:50:40 +0100 > Laszlo Ersek <lersek@redhat.com> wrote: > >> CC Dave >> >> On 02/17/17 11:43, Igor Mammedov wrote: >>> On Thu, 16 Feb 2017 15:15:36 -0800 >>> ben@skyportsystems.com wrote: >>> >>>> From: Ben Warren <ben@skyportsystems.com> >>>> >>>> This implements the VM Generation ID feature by passing a 128-bit >>>> GUID to the guest via a fw_cfg blob. >>>> Any time the GUID changes, an ACPI notify event is sent to the guest >>>> >>>> The user interface is a simple device with one parameter: >>>> - guid (string, must be "auto" or in UUID format >>>> xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) >>> I've given it some testing with WS2012R2 and v4 patches for Seabios, >>> >>> Windows is able to read initial GUID allocation and writeback >>> seems to work somehow: >>> >>> (qemu) info vm-generation-id >>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c >>> >>> vmgenid client in Windows reads it as 2 following 64bit integers: >>> 42d50e8bc109c09b:6cd1dcc90984339b >>> >>> However update path/restore from snapshot doesn't >>> here is as I've tested it: >>> >>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio >>> (qemu) info vm-generation-id >>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c >>> (qemu) stop >>> (qemu) migrate "exec:gzip -c > STATEFILE.gz" >>> (qemu) quit >>> >>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio >>> -incoming "exec: gzip -c -d STATEFILE.gz" >>> (qemu) info vm-generation-id >>> 28b587fa-991b-4267-80d7-9cf28b746fe9 >>> >>> guest >>> 1. doesn't get GPE notification that it must receive >>> 2. vmgenid client in Windows reads the same value >>> 42d50e8bc109c09b:6cd1dcc90984339b >> >> Hmmm, I wonder if we need something like this, in vmgenid_post_load(): >> >> commit 90c647db8d59e47c9000affc0d81754eb346e939 >> Author: Dr. David Alan Gilbert <dgilbert@redhat.com> >> Date: Fri Apr 15 12:41:30 2016 +0100 >> >> Fix pflash migration >> >> with the idea being that in a single device's post_load callback, we >> shouldn't perform machine-wide actions (post_load is likely for fixing >> up the device itself). If machine-wide actions are necessary, we should >> temporarily register a "vm change state handler", and do the thing once >> that handler is called (when the machine has been loaded fully and is >> about to continue execution). >> >> Can you please try the attached patch on top? (Build tested only.) > it doesn't help Thanks for trying! And, well, sh*t. :( I guess it's time to resurrect the monitor command (temporarily, for testing) so we can inject the SCI at will, without migration. I don't want to burden you unreasonably, so I'll make an effort to try that myself. Thanks! Laszlo
* Laszlo Ersek (lersek@redhat.com) wrote: > CC Dave This isn't an area I really understand; but if I'm reading this right then vmgenid is stored in fw_cfg? fw_cfg isn't migrated So why should any changes to it get migrated, except if it's already been read by the guest (and if the guest reads it again aftwards what's it expected to read?) Dave > On 02/17/17 11:43, Igor Mammedov wrote: > > On Thu, 16 Feb 2017 15:15:36 -0800 > > ben@skyportsystems.com wrote: > > > >> From: Ben Warren <ben@skyportsystems.com> > >> > >> This implements the VM Generation ID feature by passing a 128-bit > >> GUID to the guest via a fw_cfg blob. > >> Any time the GUID changes, an ACPI notify event is sent to the guest > >> > >> The user interface is a simple device with one parameter: > >> - guid (string, must be "auto" or in UUID format > >> xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) > > I've given it some testing with WS2012R2 and v4 patches for Seabios, > > > > Windows is able to read initial GUID allocation and writeback > > seems to work somehow: > > > > (qemu) info vm-generation-id > > c109c09b-0e8b-42d5-9b33-8409c9dcd16c > > > > vmgenid client in Windows reads it as 2 following 64bit integers: > > 42d50e8bc109c09b:6cd1dcc90984339b > > > > However update path/restore from snapshot doesn't > > here is as I've tested it: > > > > qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio > > (qemu) info vm-generation-id > > c109c09b-0e8b-42d5-9b33-8409c9dcd16c > > (qemu) stop > > (qemu) migrate "exec:gzip -c > STATEFILE.gz" > > (qemu) quit > > > > qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio > > -incoming "exec: gzip -c -d STATEFILE.gz" > > (qemu) info vm-generation-id > > 28b587fa-991b-4267-80d7-9cf28b746fe9 > > > > guest > > 1. doesn't get GPE notification that it must receive > > 2. vmgenid client in Windows reads the same value > > 42d50e8bc109c09b:6cd1dcc90984339b > > Hmmm, I wonder if we need something like this, in vmgenid_post_load(): > > commit 90c647db8d59e47c9000affc0d81754eb346e939 > Author: Dr. David Alan Gilbert <dgilbert@redhat.com> > Date: Fri Apr 15 12:41:30 2016 +0100 > > Fix pflash migration > > with the idea being that in a single device's post_load callback, we > shouldn't perform machine-wide actions (post_load is likely for fixing > up the device itself). If machine-wide actions are necessary, we should > temporarily register a "vm change state handler", and do the thing once > that handler is called (when the machine has been loaded fully and is > about to continue execution). > > Can you please try the attached patch on top? (Build tested only.) > > Thanks! > Laszlo > diff --git a/include/hw/acpi/vmgenid.h b/include/hw/acpi/vmgenid.h > index db7fa0e63303..a2ae450b1f56 100644 > --- a/include/hw/acpi/vmgenid.h > +++ b/include/hw/acpi/vmgenid.h > @@ -4,6 +4,7 @@ > #include "hw/acpi/bios-linker-loader.h" > #include "hw/qdev.h" > #include "qemu/uuid.h" > +#include "sysemu/sysemu.h" > > #define VMGENID_DEVICE "vmgenid" > #define VMGENID_GUID "guid" > @@ -21,6 +22,7 @@ typedef struct VmGenIdState { > DeviceClass parent_obj; > QemuUUID guid; /* The 128-bit GUID seen by the guest */ > uint8_t vmgenid_addr_le[8]; /* Address of the GUID (little-endian) */ > + VMChangeStateEntry *vmstate; > } VmGenIdState; > > static inline Object *find_vmgenid_dev(void) > diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c > index 9f97b722761b..0ae1d56ff297 100644 > --- a/hw/acpi/vmgenid.c > +++ b/hw/acpi/vmgenid.c > @@ -177,10 +177,20 @@ static void vmgenid_set_guid(Object *obj, const char *value, Error **errp) > /* After restoring an image, we need to update the guest memory and notify > * it of a potential change to VM Generation ID > */ > +static void postload_update_guest_cb(void *opaque, int running, RunState state) > +{ > + VmGenIdState *vms = opaque; > + > + qemu_del_vm_change_state_handler(vms->vmstate); > + vms->vmstate = NULL; > + vmgenid_update_guest(vms); > +} > + > static int vmgenid_post_load(void *opaque, int version_id) > { > VmGenIdState *vms = opaque; > - vmgenid_update_guest(vms); > + vms->vmstate = qemu_add_vm_change_state_handler(postload_update_guest_cb, > + vms); > return 0; > } > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 02/20/17 11:23, Dr. David Alan Gilbert wrote: > * Laszlo Ersek (lersek@redhat.com) wrote: >> CC Dave > > This isn't an area I really understand; but if I'm > reading this right then > vmgenid is stored in fw_cfg? > fw_cfg isn't migrated > > So why should any changes to it get migrated, except if it's already > been read by the guest (and if the guest reads it again aftwards what's > it expected to read?) This is what we have here: - QEMU formats read-only fw_cfg blob with GUID - guest downloads blob, places it in guest RAM - guest tells QEMU the guest-side address of the blob - during migration, guest RAM is transferred - after migration, in the device's post_load callback, QEMU overwrites the GUID in guest RAM with a different value, and injects an SCI I CC'd you for the following reason: Igor reported that he didn't see either the fresh GUID or the SCI in the guest, on the target host, after migration. I figured that perhaps there was an ordering issue between RAM loading and post_load execution on the target host, and so I proposed to delay the RAM overwrite + SCI injection a bit more; following the pattern seen in your commit 90c647db8d59. However, since then, both Ben and myself tested the code with migration (using "virsh save" (Ben) and "virsh managedsave" (myself)), with Windows and Linux guests, and it works for us; there seems to be no ordering issue with the current code (= overwrite RAM + inject SCI in the post_load callback()). For now we don't understand why it doesn't work for Igor (Igor used exec/gzip migration to/from a local file using direct QEMU monitor commands / options, no libvirt). And, copying the pattern seen in your commit 90c647db8d59 didn't help in his case (while it wasn't even necessary for success in Ben's and my testing). So it seems that delaying the deed with qemu_add_vm_change_state_handler() is neither needed nor effective in this case; but then we still don't know why it doesn't work for Igor. Thanks Laszlo > > Dave > >> On 02/17/17 11:43, Igor Mammedov wrote: >>> On Thu, 16 Feb 2017 15:15:36 -0800 >>> ben@skyportsystems.com wrote: >>> >>>> From: Ben Warren <ben@skyportsystems.com> >>>> >>>> This implements the VM Generation ID feature by passing a 128-bit >>>> GUID to the guest via a fw_cfg blob. >>>> Any time the GUID changes, an ACPI notify event is sent to the guest >>>> >>>> The user interface is a simple device with one parameter: >>>> - guid (string, must be "auto" or in UUID format >>>> xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) >>> I've given it some testing with WS2012R2 and v4 patches for Seabios, >>> >>> Windows is able to read initial GUID allocation and writeback >>> seems to work somehow: >>> >>> (qemu) info vm-generation-id >>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c >>> >>> vmgenid client in Windows reads it as 2 following 64bit integers: >>> 42d50e8bc109c09b:6cd1dcc90984339b >>> >>> However update path/restore from snapshot doesn't >>> here is as I've tested it: >>> >>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio >>> (qemu) info vm-generation-id >>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c >>> (qemu) stop >>> (qemu) migrate "exec:gzip -c > STATEFILE.gz" >>> (qemu) quit >>> >>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio >>> -incoming "exec: gzip -c -d STATEFILE.gz" >>> (qemu) info vm-generation-id >>> 28b587fa-991b-4267-80d7-9cf28b746fe9 >>> >>> guest >>> 1. doesn't get GPE notification that it must receive >>> 2. vmgenid client in Windows reads the same value >>> 42d50e8bc109c09b:6cd1dcc90984339b >> >> Hmmm, I wonder if we need something like this, in vmgenid_post_load(): >> >> commit 90c647db8d59e47c9000affc0d81754eb346e939 >> Author: Dr. David Alan Gilbert <dgilbert@redhat.com> >> Date: Fri Apr 15 12:41:30 2016 +0100 >> >> Fix pflash migration >> >> with the idea being that in a single device's post_load callback, we >> shouldn't perform machine-wide actions (post_load is likely for fixing >> up the device itself). If machine-wide actions are necessary, we should >> temporarily register a "vm change state handler", and do the thing once >> that handler is called (when the machine has been loaded fully and is >> about to continue execution). >> >> Can you please try the attached patch on top? (Build tested only.) >> >> Thanks! >> Laszlo > >> diff --git a/include/hw/acpi/vmgenid.h b/include/hw/acpi/vmgenid.h >> index db7fa0e63303..a2ae450b1f56 100644 >> --- a/include/hw/acpi/vmgenid.h >> +++ b/include/hw/acpi/vmgenid.h >> @@ -4,6 +4,7 @@ >> #include "hw/acpi/bios-linker-loader.h" >> #include "hw/qdev.h" >> #include "qemu/uuid.h" >> +#include "sysemu/sysemu.h" >> >> #define VMGENID_DEVICE "vmgenid" >> #define VMGENID_GUID "guid" >> @@ -21,6 +22,7 @@ typedef struct VmGenIdState { >> DeviceClass parent_obj; >> QemuUUID guid; /* The 128-bit GUID seen by the guest */ >> uint8_t vmgenid_addr_le[8]; /* Address of the GUID (little-endian) */ >> + VMChangeStateEntry *vmstate; >> } VmGenIdState; >> >> static inline Object *find_vmgenid_dev(void) >> diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c >> index 9f97b722761b..0ae1d56ff297 100644 >> --- a/hw/acpi/vmgenid.c >> +++ b/hw/acpi/vmgenid.c >> @@ -177,10 +177,20 @@ static void vmgenid_set_guid(Object *obj, const char *value, Error **errp) >> /* After restoring an image, we need to update the guest memory and notify >> * it of a potential change to VM Generation ID >> */ >> +static void postload_update_guest_cb(void *opaque, int running, RunState state) >> +{ >> + VmGenIdState *vms = opaque; >> + >> + qemu_del_vm_change_state_handler(vms->vmstate); >> + vms->vmstate = NULL; >> + vmgenid_update_guest(vms); >> +} >> + >> static int vmgenid_post_load(void *opaque, int version_id) >> { >> VmGenIdState *vms = opaque; >> - vmgenid_update_guest(vms); >> + vms->vmstate = qemu_add_vm_change_state_handler(postload_update_guest_cb, >> + vms); >> return 0; >> } >> > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >
* Laszlo Ersek (lersek@redhat.com) wrote: > On 02/20/17 11:23, Dr. David Alan Gilbert wrote: > > * Laszlo Ersek (lersek@redhat.com) wrote: > >> CC Dave > > > > This isn't an area I really understand; but if I'm > > reading this right then > > vmgenid is stored in fw_cfg? > > fw_cfg isn't migrated > > > > So why should any changes to it get migrated, except if it's already > > been read by the guest (and if the guest reads it again aftwards what's > > it expected to read?) > > This is what we have here: > - QEMU formats read-only fw_cfg blob with GUID > - guest downloads blob, places it in guest RAM > - guest tells QEMU the guest-side address of the blob > - during migration, guest RAM is transferred > - after migration, in the device's post_load callback, QEMU overwrites > the GUID in guest RAM with a different value, and injects an SCI > > I CC'd you for the following reason: Igor reported that he didn't see > either the fresh GUID or the SCI in the guest, on the target host, after > migration. I figured that perhaps there was an ordering issue between > RAM loading and post_load execution on the target host, and so I > proposed to delay the RAM overwrite + SCI injection a bit more; > following the pattern seen in your commit 90c647db8d59. > > However, since then, both Ben and myself tested the code with migration > (using "virsh save" (Ben) and "virsh managedsave" (myself)), with > Windows and Linux guests, and it works for us; there seems to be no > ordering issue with the current code (= overwrite RAM + inject SCI in > the post_load callback()). > > For now we don't understand why it doesn't work for Igor (Igor used > exec/gzip migration to/from a local file using direct QEMU monitor > commands / options, no libvirt). And, copying the pattern seen in your > commit 90c647db8d59 didn't help in his case (while it wasn't even > necessary for success in Ben's and my testing). One thing I noticed in Igor's test was that he did a 'stop' on the source before the migate, and so it's probably still paused on the destination after the migration is loaded, so anything the guest needs to do might not have happened until it's started. You say; 'guest tells QEMU the guest-side address of the blob' how is that stored/migrated/etc ? > So it seems that delaying the deed with > qemu_add_vm_change_state_handler() is neither needed nor effective in > this case; but then we still don't know why it doesn't work for Igor. Nod. Dave > > Thanks > Laszlo > > > > > Dave > > > >> On 02/17/17 11:43, Igor Mammedov wrote: > >>> On Thu, 16 Feb 2017 15:15:36 -0800 > >>> ben@skyportsystems.com wrote: > >>> > >>>> From: Ben Warren <ben@skyportsystems.com> > >>>> > >>>> This implements the VM Generation ID feature by passing a 128-bit > >>>> GUID to the guest via a fw_cfg blob. > >>>> Any time the GUID changes, an ACPI notify event is sent to the guest > >>>> > >>>> The user interface is a simple device with one parameter: > >>>> - guid (string, must be "auto" or in UUID format > >>>> xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx) > >>> I've given it some testing with WS2012R2 and v4 patches for Seabios, > >>> > >>> Windows is able to read initial GUID allocation and writeback > >>> seems to work somehow: > >>> > >>> (qemu) info vm-generation-id > >>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c > >>> > >>> vmgenid client in Windows reads it as 2 following 64bit integers: > >>> 42d50e8bc109c09b:6cd1dcc90984339b > >>> > >>> However update path/restore from snapshot doesn't > >>> here is as I've tested it: > >>> > >>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio > >>> (qemu) info vm-generation-id > >>> c109c09b-0e8b-42d5-9b33-8409c9dcd16c > >>> (qemu) stop > >>> (qemu) migrate "exec:gzip -c > STATEFILE.gz" > >>> (qemu) quit > >>> > >>> qemu-system-x86_64 -device vmgenid,id=testvgid,guid=auto -monitor stdio > >>> -incoming "exec: gzip -c -d STATEFILE.gz" > >>> (qemu) info vm-generation-id > >>> 28b587fa-991b-4267-80d7-9cf28b746fe9 > >>> > >>> guest > >>> 1. doesn't get GPE notification that it must receive > >>> 2. vmgenid client in Windows reads the same value > >>> 42d50e8bc109c09b:6cd1dcc90984339b > >> > >> Hmmm, I wonder if we need something like this, in vmgenid_post_load(): > >> > >> commit 90c647db8d59e47c9000affc0d81754eb346e939 > >> Author: Dr. David Alan Gilbert <dgilbert@redhat.com> > >> Date: Fri Apr 15 12:41:30 2016 +0100 > >> > >> Fix pflash migration > >> > >> with the idea being that in a single device's post_load callback, we > >> shouldn't perform machine-wide actions (post_load is likely for fixing > >> up the device itself). If machine-wide actions are necessary, we should > >> temporarily register a "vm change state handler", and do the thing once > >> that handler is called (when the machine has been loaded fully and is > >> about to continue execution). > >> > >> Can you please try the attached patch on top? (Build tested only.) > >> > >> Thanks! > >> Laszlo > > > >> diff --git a/include/hw/acpi/vmgenid.h b/include/hw/acpi/vmgenid.h > >> index db7fa0e63303..a2ae450b1f56 100644 > >> --- a/include/hw/acpi/vmgenid.h > >> +++ b/include/hw/acpi/vmgenid.h > >> @@ -4,6 +4,7 @@ > >> #include "hw/acpi/bios-linker-loader.h" > >> #include "hw/qdev.h" > >> #include "qemu/uuid.h" > >> +#include "sysemu/sysemu.h" > >> > >> #define VMGENID_DEVICE "vmgenid" > >> #define VMGENID_GUID "guid" > >> @@ -21,6 +22,7 @@ typedef struct VmGenIdState { > >> DeviceClass parent_obj; > >> QemuUUID guid; /* The 128-bit GUID seen by the guest */ > >> uint8_t vmgenid_addr_le[8]; /* Address of the GUID (little-endian) */ > >> + VMChangeStateEntry *vmstate; > >> } VmGenIdState; > >> > >> static inline Object *find_vmgenid_dev(void) > >> diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c > >> index 9f97b722761b..0ae1d56ff297 100644 > >> --- a/hw/acpi/vmgenid.c > >> +++ b/hw/acpi/vmgenid.c > >> @@ -177,10 +177,20 @@ static void vmgenid_set_guid(Object *obj, const char *value, Error **errp) > >> /* After restoring an image, we need to update the guest memory and notify > >> * it of a potential change to VM Generation ID > >> */ > >> +static void postload_update_guest_cb(void *opaque, int running, RunState state) > >> +{ > >> + VmGenIdState *vms = opaque; > >> + > >> + qemu_del_vm_change_state_handler(vms->vmstate); > >> + vms->vmstate = NULL; > >> + vmgenid_update_guest(vms); > >> +} > >> + > >> static int vmgenid_post_load(void *opaque, int version_id) > >> { > >> VmGenIdState *vms = opaque; > >> - vmgenid_update_guest(vms); > >> + vms->vmstate = qemu_add_vm_change_state_handler(postload_update_guest_cb, > >> + vms); > >> return 0; > >> } > >> > > > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 02/20/17 12:00, Dr. David Alan Gilbert wrote: > * Laszlo Ersek (lersek@redhat.com) wrote: >> On 02/20/17 11:23, Dr. David Alan Gilbert wrote: >>> * Laszlo Ersek (lersek@redhat.com) wrote: >>>> CC Dave >>> >>> This isn't an area I really understand; but if I'm >>> reading this right then >>> vmgenid is stored in fw_cfg? >>> fw_cfg isn't migrated >>> >>> So why should any changes to it get migrated, except if it's already >>> been read by the guest (and if the guest reads it again aftwards what's >>> it expected to read?) >> >> This is what we have here: >> - QEMU formats read-only fw_cfg blob with GUID >> - guest downloads blob, places it in guest RAM >> - guest tells QEMU the guest-side address of the blob >> - during migration, guest RAM is transferred >> - after migration, in the device's post_load callback, QEMU overwrites >> the GUID in guest RAM with a different value, and injects an SCI >> >> I CC'd you for the following reason: Igor reported that he didn't see >> either the fresh GUID or the SCI in the guest, on the target host, after >> migration. I figured that perhaps there was an ordering issue between >> RAM loading and post_load execution on the target host, and so I >> proposed to delay the RAM overwrite + SCI injection a bit more; >> following the pattern seen in your commit 90c647db8d59. >> >> However, since then, both Ben and myself tested the code with migration >> (using "virsh save" (Ben) and "virsh managedsave" (myself)), with >> Windows and Linux guests, and it works for us; there seems to be no >> ordering issue with the current code (= overwrite RAM + inject SCI in >> the post_load callback()). >> >> For now we don't understand why it doesn't work for Igor (Igor used >> exec/gzip migration to/from a local file using direct QEMU monitor >> commands / options, no libvirt). And, copying the pattern seen in your >> commit 90c647db8d59 didn't help in his case (while it wasn't even >> necessary for success in Ben's and my testing). > > One thing I noticed in Igor's test was that he did a 'stop' on the source > before the migate, and so it's probably still paused on the destination > after the migration is loaded, so anything the guest needs to do might > not have happened until it's started. Interesting! I hope Igor can double-check this! In the virsh docs, before doing my tests, I read that "managedsave" optionally took --running or --paused: Normally, starting a managed save will decide between running or paused based on the state the domain was in when the save was done; passing either the --running or --paused flag will allow overriding which state the start should use. I didn't pass any such flag ultimately, and I didn't stop the guests before the managedsave. Indeed they continued execution right after being loaded with "virsh start". (Side point: managedsave is awesome. :) ) > > You say; > 'guest tells QEMU the guest-side address of the blob' > how is that stored/migrated/etc ? It is a uint8_t[8] array (little endian representation), linked into another (writeable) fw_cfg entry, and it's migrated explicitly (it has a descriptor in the device's vmstate descriptor). The post_load callback relies on this array being restored before the migration infrastructure calls post_load. Thanks Laszlo
* Laszlo Ersek (lersek@redhat.com) wrote: > On 02/20/17 12:00, Dr. David Alan Gilbert wrote: > > * Laszlo Ersek (lersek@redhat.com) wrote: > >> On 02/20/17 11:23, Dr. David Alan Gilbert wrote: > >>> * Laszlo Ersek (lersek@redhat.com) wrote: > >>>> CC Dave > >>> > >>> This isn't an area I really understand; but if I'm > >>> reading this right then > >>> vmgenid is stored in fw_cfg? > >>> fw_cfg isn't migrated > >>> > >>> So why should any changes to it get migrated, except if it's already > >>> been read by the guest (and if the guest reads it again aftwards what's > >>> it expected to read?) > >> > >> This is what we have here: > >> - QEMU formats read-only fw_cfg blob with GUID > >> - guest downloads blob, places it in guest RAM > >> - guest tells QEMU the guest-side address of the blob > >> - during migration, guest RAM is transferred > >> - after migration, in the device's post_load callback, QEMU overwrites > >> the GUID in guest RAM with a different value, and injects an SCI > >> > >> I CC'd you for the following reason: Igor reported that he didn't see > >> either the fresh GUID or the SCI in the guest, on the target host, after > >> migration. I figured that perhaps there was an ordering issue between > >> RAM loading and post_load execution on the target host, and so I > >> proposed to delay the RAM overwrite + SCI injection a bit more; > >> following the pattern seen in your commit 90c647db8d59. > >> > >> However, since then, both Ben and myself tested the code with migration > >> (using "virsh save" (Ben) and "virsh managedsave" (myself)), with > >> Windows and Linux guests, and it works for us; there seems to be no > >> ordering issue with the current code (= overwrite RAM + inject SCI in > >> the post_load callback()). > >> > >> For now we don't understand why it doesn't work for Igor (Igor used > >> exec/gzip migration to/from a local file using direct QEMU monitor > >> commands / options, no libvirt). And, copying the pattern seen in your > >> commit 90c647db8d59 didn't help in his case (while it wasn't even > >> necessary for success in Ben's and my testing). > > > > One thing I noticed in Igor's test was that he did a 'stop' on the source > > before the migate, and so it's probably still paused on the destination > > after the migration is loaded, so anything the guest needs to do might > > not have happened until it's started. > > Interesting! I hope Igor can double-check this! > > In the virsh docs, before doing my tests, I read that "managedsave" > optionally took --running or --paused: > > Normally, starting a managed save will decide between running or > paused based on the state the domain was in when the save was done; > passing either the --running or --paused flag will allow overriding > which state the start should use. > > I didn't pass any such flag ultimately, and I didn't stop the guests > before the managedsave. Indeed they continued execution right after > being loaded with "virsh start". > > (Side point: managedsave is awesome. :) ) If I've followed the bread crumbs correctly, I think managedsave is just using a migrate to fd anyway, so the same code. > > > > You say; > > 'guest tells QEMU the guest-side address of the blob' > > how is that stored/migrated/etc ? > > It is a uint8_t[8] array (little endian representation), linked into > another (writeable) fw_cfg entry, and it's migrated explicitly (it has a > descriptor in the device's vmstate descriptor). The post_load callback > relies on this array being restored before the migration infrastructure > calls post_load. RAM normally comes back before other devices, so you should be OK; although we frequently have problems with devices reading from RAM during device init before migration has started, or writing to it after migration has finished on the source. Dave > > Thanks > Laszlo -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Mon, 20 Feb 2017 12:38:06 +0100 Laszlo Ersek <lersek@redhat.com> wrote: > On 02/20/17 12:00, Dr. David Alan Gilbert wrote: > > * Laszlo Ersek (lersek@redhat.com) wrote: > >> On 02/20/17 11:23, Dr. David Alan Gilbert wrote: > >>> * Laszlo Ersek (lersek@redhat.com) wrote: > >>>> CC Dave > >>> > >>> This isn't an area I really understand; but if I'm > >>> reading this right then > >>> vmgenid is stored in fw_cfg? > >>> fw_cfg isn't migrated > >>> > >>> So why should any changes to it get migrated, except if it's already > >>> been read by the guest (and if the guest reads it again aftwards what's > >>> it expected to read?) > >> > >> This is what we have here: > >> - QEMU formats read-only fw_cfg blob with GUID > >> - guest downloads blob, places it in guest RAM > >> - guest tells QEMU the guest-side address of the blob > >> - during migration, guest RAM is transferred > >> - after migration, in the device's post_load callback, QEMU overwrites > >> the GUID in guest RAM with a different value, and injects an SCI > >> > >> I CC'd you for the following reason: Igor reported that he didn't see > >> either the fresh GUID or the SCI in the guest, on the target host, after > >> migration. I figured that perhaps there was an ordering issue between > >> RAM loading and post_load execution on the target host, and so I > >> proposed to delay the RAM overwrite + SCI injection a bit more; > >> following the pattern seen in your commit 90c647db8d59. > >> > >> However, since then, both Ben and myself tested the code with migration > >> (using "virsh save" (Ben) and "virsh managedsave" (myself)), with > >> Windows and Linux guests, and it works for us; there seems to be no > >> ordering issue with the current code (= overwrite RAM + inject SCI in > >> the post_load callback()). > >> > >> For now we don't understand why it doesn't work for Igor (Igor used > >> exec/gzip migration to/from a local file using direct QEMU monitor > >> commands / options, no libvirt). And, copying the pattern seen in your > >> commit 90c647db8d59 didn't help in his case (while it wasn't even > >> necessary for success in Ben's and my testing). > > > > One thing I noticed in Igor's test was that he did a 'stop' on the source > > before the migate, and so it's probably still paused on the destination > > after the migration is loaded, so anything the guest needs to do might > > not have happened until it's started. > > Interesting! I hope Igor can double-check this! I've retested v7, and it reliably fails (vmgenid_wait doesn't see change) then I tested v8(qemu) + (seabios v5/v4) with the same steps as before and it appears to work as expected, i.e. vmgenid_wait reports GUID change after executing 'continue' monitor command so something has been fixed in v8. > > In the virsh docs, before doing my tests, I read that "managedsave" > optionally took --running or --paused: > > Normally, starting a managed save will decide between running or > paused based on the state the domain was in when the save was done; > passing either the --running or --paused flag will allow overriding > which state the start should use. > > I didn't pass any such flag ultimately, and I didn't stop the guests > before the managedsave. Indeed they continued execution right after > being loaded with "virsh start". > > (Side point: managedsave is awesome. :) ) > > > > > You say; > > 'guest tells QEMU the guest-side address of the blob' > > how is that stored/migrated/etc ? > > It is a uint8_t[8] array (little endian representation), linked into > another (writeable) fw_cfg entry, and it's migrated explicitly (it has a > descriptor in the device's vmstate descriptor). The post_load callback > relies on this array being restored before the migration infrastructure > calls post_load. > > Thanks > Laszlo >
On 02/20/17 14:13, Igor Mammedov wrote: > On Mon, 20 Feb 2017 12:38:06 +0100 > Laszlo Ersek <lersek@redhat.com> wrote: > >> On 02/20/17 12:00, Dr. David Alan Gilbert wrote: >>> * Laszlo Ersek (lersek@redhat.com) wrote: >>>> On 02/20/17 11:23, Dr. David Alan Gilbert wrote: >>>>> * Laszlo Ersek (lersek@redhat.com) wrote: >>>>>> CC Dave >>>>> >>>>> This isn't an area I really understand; but if I'm >>>>> reading this right then >>>>> vmgenid is stored in fw_cfg? >>>>> fw_cfg isn't migrated >>>>> >>>>> So why should any changes to it get migrated, except if it's already >>>>> been read by the guest (and if the guest reads it again aftwards what's >>>>> it expected to read?) >>>> >>>> This is what we have here: >>>> - QEMU formats read-only fw_cfg blob with GUID >>>> - guest downloads blob, places it in guest RAM >>>> - guest tells QEMU the guest-side address of the blob >>>> - during migration, guest RAM is transferred >>>> - after migration, in the device's post_load callback, QEMU overwrites >>>> the GUID in guest RAM with a different value, and injects an SCI >>>> >>>> I CC'd you for the following reason: Igor reported that he didn't see >>>> either the fresh GUID or the SCI in the guest, on the target host, after >>>> migration. I figured that perhaps there was an ordering issue between >>>> RAM loading and post_load execution on the target host, and so I >>>> proposed to delay the RAM overwrite + SCI injection a bit more; >>>> following the pattern seen in your commit 90c647db8d59. >>>> >>>> However, since then, both Ben and myself tested the code with migration >>>> (using "virsh save" (Ben) and "virsh managedsave" (myself)), with >>>> Windows and Linux guests, and it works for us; there seems to be no >>>> ordering issue with the current code (= overwrite RAM + inject SCI in >>>> the post_load callback()). >>>> >>>> For now we don't understand why it doesn't work for Igor (Igor used >>>> exec/gzip migration to/from a local file using direct QEMU monitor >>>> commands / options, no libvirt). And, copying the pattern seen in your >>>> commit 90c647db8d59 didn't help in his case (while it wasn't even >>>> necessary for success in Ben's and my testing). >>> >>> One thing I noticed in Igor's test was that he did a 'stop' on the source >>> before the migate, and so it's probably still paused on the destination >>> after the migration is loaded, so anything the guest needs to do might >>> not have happened until it's started. >> >> Interesting! I hope Igor can double-check this! > I've retested v7, and it reliably fails (vmgenid_wait doesn't see change) > then I tested v8(qemu) + (seabios v5/v4) with the same steps as before > and it appears to work as expected, i.e. vmgenid_wait reports GUID > change after executing 'continue' monitor command so something > has been fixed in v8. Yes, I know what. Please see item (2) in this reply of mine, for v7 1/8: msgid: <9e222b4c-c05d-8fd0-6c55-4b2e52cab7b0@redhat.com> URL: https://www.mail-archive.com/qemu-devel@nongnu.org/msg430440.html With that copy/paste bug in the code, the "src_offset" field of WRITE_POINTER was not populated correctly. The BIOS would carry that out faithfully, of course, but then later QEMU would write the fresh GUID to an incorrect offset in the guest firmware allocated area -- the offset wouldn't match the AML code (ADDR method), so the guest OS wouldn't see the change. If you scroll to the end of my message linked above, I wrote -- again, for v7 --: I also tested this series (with the assignment under (2) fixed up, of course), as documented earlier in <https://www.mail-archive.com/qemu-devel@nongnu.org/msg430075.html> (msgid <678c203f-3768-7e65-6e48-6729473b6...@redhat.com>). Hence, with (1) and (2) fixed, you can also add Tested-by: Laszlo Ersek <ler...@redhat.com> In other words, my positive testing for v7 was conditional on my *local* (but reported, suggested) fix for bug (2) in v7 1/8. And that issue has been fixed in v8. ... So, I guess we're all OK now. Can you confirm please? Thanks! Laszlo > > >> >> In the virsh docs, before doing my tests, I read that "managedsave" >> optionally took --running or --paused: >> >> Normally, starting a managed save will decide between running or >> paused based on the state the domain was in when the save was done; >> passing either the --running or --paused flag will allow overriding >> which state the start should use. >> >> I didn't pass any such flag ultimately, and I didn't stop the guests >> before the managedsave. Indeed they continued execution right after >> being loaded with "virsh start". >> >> (Side point: managedsave is awesome. :) ) >> >>> >>> You say; >>> 'guest tells QEMU the guest-side address of the blob' >>> how is that stored/migrated/etc ? >> >> It is a uint8_t[8] array (little endian representation), linked into >> another (writeable) fw_cfg entry, and it's migrated explicitly (it has a >> descriptor in the device's vmstate descriptor). The post_load callback >> relies on this array being restored before the migration infrastructure >> calls post_load. >> >> Thanks >> Laszlo >> >
On Mon, 20 Feb 2017 14:28:11 +0100 Laszlo Ersek <lersek@redhat.com> wrote: > On 02/20/17 14:13, Igor Mammedov wrote: > > On Mon, 20 Feb 2017 12:38:06 +0100 > > Laszlo Ersek <lersek@redhat.com> wrote: [...] > >> Interesting! I hope Igor can double-check this! > > I've retested v7, and it reliably fails (vmgenid_wait doesn't see change) > > then I tested v8(qemu) + (seabios v5/v4) with the same steps as before > > and it appears to work as expected, i.e. vmgenid_wait reports GUID > > change after executing 'continue' monitor command so something > > has been fixed in v8. > > Yes, I know what. Please see item (2) in this reply of mine, for v7 1/8: > > msgid: <9e222b4c-c05d-8fd0-6c55-4b2e52cab7b0@redhat.com> > URL: https://www.mail-archive.com/qemu-devel@nongnu.org/msg430440.html > > With that copy/paste bug in the code, the "src_offset" field of > WRITE_POINTER was not populated correctly. The BIOS would carry that out > faithfully, of course, but then later QEMU would write the fresh GUID to > an incorrect offset in the guest firmware allocated area -- the offset > wouldn't match the AML code (ADDR method), so the guest OS wouldn't see > the change. > > > If you scroll to the end of my message linked above, I wrote -- again, > for v7 --: > > I also tested this series (with the assignment under (2) fixed up, > of course), as documented earlier in > <https://www.mail-archive.com/qemu-devel@nongnu.org/msg430075.html> > (msgid <678c203f-3768-7e65-6e48-6729473b6...@redhat.com>). > > Hence, with (1) and (2) fixed, you can also add > > Tested-by: Laszlo Ersek <ler...@redhat.com> > > In other words, my positive testing for v7 was conditional on my *local* > (but reported, suggested) fix for bug (2) in v7 1/8. And that issue has > been fixed in v8. > > ... So, I guess we're all OK now. Can you confirm please? Confirmed > > Thanks! > Laszlo [...]
So Igor has now confirmed he's fine with v8 (thanks!), but I still wanted to respond here: On 02/20/17 13:32, Dr. David Alan Gilbert wrote: > * Laszlo Ersek (lersek@redhat.com) wrote: >> On 02/20/17 12:00, Dr. David Alan Gilbert wrote: >>> * Laszlo Ersek (lersek@redhat.com) wrote: >>>> On 02/20/17 11:23, Dr. David Alan Gilbert wrote: >>>>> * Laszlo Ersek (lersek@redhat.com) wrote: >>>>>> CC Dave >>>>> >>>>> This isn't an area I really understand; but if I'm >>>>> reading this right then >>>>> vmgenid is stored in fw_cfg? >>>>> fw_cfg isn't migrated >>>>> >>>>> So why should any changes to it get migrated, except if it's already >>>>> been read by the guest (and if the guest reads it again aftwards what's >>>>> it expected to read?) >>>> >>>> This is what we have here: >>>> - QEMU formats read-only fw_cfg blob with GUID >>>> - guest downloads blob, places it in guest RAM >>>> - guest tells QEMU the guest-side address of the blob >>>> - during migration, guest RAM is transferred >>>> - after migration, in the device's post_load callback, QEMU overwrites >>>> the GUID in guest RAM with a different value, and injects an SCI >>>> >>>> I CC'd you for the following reason: Igor reported that he didn't see >>>> either the fresh GUID or the SCI in the guest, on the target host, after >>>> migration. I figured that perhaps there was an ordering issue between >>>> RAM loading and post_load execution on the target host, and so I >>>> proposed to delay the RAM overwrite + SCI injection a bit more; >>>> following the pattern seen in your commit 90c647db8d59. >>>> >>>> However, since then, both Ben and myself tested the code with migration >>>> (using "virsh save" (Ben) and "virsh managedsave" (myself)), with >>>> Windows and Linux guests, and it works for us; there seems to be no >>>> ordering issue with the current code (= overwrite RAM + inject SCI in >>>> the post_load callback()). >>>> >>>> For now we don't understand why it doesn't work for Igor (Igor used >>>> exec/gzip migration to/from a local file using direct QEMU monitor >>>> commands / options, no libvirt). And, copying the pattern seen in your >>>> commit 90c647db8d59 didn't help in his case (while it wasn't even >>>> necessary for success in Ben's and my testing). >>> >>> One thing I noticed in Igor's test was that he did a 'stop' on the source >>> before the migate, and so it's probably still paused on the destination >>> after the migration is loaded, so anything the guest needs to do might >>> not have happened until it's started. >> >> Interesting! I hope Igor can double-check this! >> >> In the virsh docs, before doing my tests, I read that "managedsave" >> optionally took --running or --paused: >> >> Normally, starting a managed save will decide between running or >> paused based on the state the domain was in when the save was done; >> passing either the --running or --paused flag will allow overriding >> which state the start should use. >> >> I didn't pass any such flag ultimately, and I didn't stop the guests >> before the managedsave. Indeed they continued execution right after >> being loaded with "virsh start". >> >> (Side point: managedsave is awesome. :) ) > > If I've followed the bread crumbs correctly, I think managedsave > is just using a migrate to fd anyway, so the same code. Yes, I agree. My enthusiasm for "managedsave" is due to "virsh start"'s awareness as to whether it should boot the guest from zero, or in-migrate it from the "managed" saved state. Plain "save" is much more risky for the admin to mess up (because it needs specialized guest startup too). Of course, I also find QEMU's migration feature awesome in the first place. :) > >>> >>> You say; >>> 'guest tells QEMU the guest-side address of the blob' >>> how is that stored/migrated/etc ? >> >> It is a uint8_t[8] array (little endian representation), linked into >> another (writeable) fw_cfg entry, and it's migrated explicitly (it has a >> descriptor in the device's vmstate descriptor). The post_load callback >> relies on this array being restored before the migration infrastructure >> calls post_load. > > RAM normally comes back before other devices, so you should be OK; > although we frequently have problems with devices reading from RAM > during device init before migration has started, or writing to it > after migration has finished on the source. Thanks; we should be fine then. (We only write to RAM in post_load.) Laszlo
On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote: > * Laszlo Ersek (lersek@redhat.com) wrote: >> CC Dave > > This isn't an area I really understand; but if I'm > reading this right then > vmgenid is stored in fw_cfg? > fw_cfg isn't migrated > > So why should any changes to it get migrated, except if it's already > been read by the guest (and if the guest reads it again aftwards what's > it expected to read?) Why are we expecting it to change on migration? You want a new value when you load state from disk (you don't know how many times the same state has been loaded previously, so each load is effectively forking the VM and you want a different value), but for a single live migration, you aren't forking the VM and don't need a new generation ID. I guess it all boils down to what command line you're using: if libvirt is driving a live migration, it will request the same UUID in the command line of the destination as what is on the source; while if libvirt is loading from a [managed]save to restore state from a file, it will either request a new UUID directly or request auto to let qemu generate the new id.
* Eric Blake (eblake@redhat.com) wrote: > On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote: > > * Laszlo Ersek (lersek@redhat.com) wrote: > >> CC Dave > > > > This isn't an area I really understand; but if I'm > > reading this right then > > vmgenid is stored in fw_cfg? > > fw_cfg isn't migrated > > > > So why should any changes to it get migrated, except if it's already > > been read by the guest (and if the guest reads it again aftwards what's > > it expected to read?) > > Why are we expecting it to change on migration? You want a new value I'm not; I was asking why a change made prior to migration would be preserved across migration. > when you load state from disk (you don't know how many times the same > state has been loaded previously, so each load is effectively forking > the VM and you want a different value), but for a single live migration, > you aren't forking the VM and don't need a new generation ID. > > I guess it all boils down to what command line you're using: if libvirt > is driving a live migration, it will request the same UUID in the > command line of the destination as what is on the source; while if > libvirt is loading from a [managed]save to restore state from a file, it > will either request a new UUID directly or request auto to let qemu > generate the new id. Hmm now I've lost it a bit; I thought we would preserve the value transmitted from the source, not the value on the command line of the destination. Dave > > -- > Eric Blake eblake redhat com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote: > * Eric Blake (eblake@redhat.com) wrote: >> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote: >>> * Laszlo Ersek (lersek@redhat.com) wrote: >>>> CC Dave >>> >>> This isn't an area I really understand; but if I'm >>> reading this right then >>> vmgenid is stored in fw_cfg? >>> fw_cfg isn't migrated >>> >>> So why should any changes to it get migrated, except if it's already >>> been read by the guest (and if the guest reads it again aftwards what's >>> it expected to read?) >> >> Why are we expecting it to change on migration? You want a new value > > I'm not; I was asking why a change made prior to migration would be > preserved across migration. Okay, so you're asking what happens if the source requests the vmgenid device, and sets an id, but the destination of the migration does not request anything - how does the guest on the destination see the same id as was in place on the source at the time migration started. > > >> when you load state from disk (you don't know how many times the same >> state has been loaded previously, so each load is effectively forking >> the VM and you want a different value), but for a single live migration, >> you aren't forking the VM and don't need a new generation ID. >> >> I guess it all boils down to what command line you're using: if libvirt >> is driving a live migration, it will request the same UUID in the >> command line of the destination as what is on the source; while if >> libvirt is loading from a [managed]save to restore state from a file, it >> will either request a new UUID directly or request auto to let qemu >> generate the new id. > > Hmm now I've lost it a bit; I thought we would preserve the value > transmitted from the source, not the value on the command line of the destination. I guess I'm trying to figure out whether libvirt MUST read the current id and explicitly tell the destination of migration to reuse that id, or if libvirt can omit the id on migration and everything just works because the id was migrated from the source.
On 02/20/17 21:19, Dr. David Alan Gilbert wrote: > * Eric Blake (eblake@redhat.com) wrote: >> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote: >>> * Laszlo Ersek (lersek@redhat.com) wrote: >>>> CC Dave >>> >>> This isn't an area I really understand; but if I'm >>> reading this right then >>> vmgenid is stored in fw_cfg? >>> fw_cfg isn't migrated >>> >>> So why should any changes to it get migrated, except if it's already >>> been read by the guest (and if the guest reads it again aftwards what's >>> it expected to read?) >> >> Why are we expecting it to change on migration? You want a new value > > I'm not; I was asking why a change made prior to migration would be > preserved across migration. > > >> when you load state from disk (you don't know how many times the same >> state has been loaded previously, so each load is effectively forking >> the VM and you want a different value), but for a single live migration, >> you aren't forking the VM and don't need a new generation ID. >> >> I guess it all boils down to what command line you're using: if libvirt >> is driving a live migration, it will request the same UUID in the >> command line of the destination as what is on the source; while if >> libvirt is loading from a [managed]save to restore state from a file, it >> will either request a new UUID directly or request auto to let qemu >> generate the new id. > > Hmm now I've lost it a bit; I thought we would preserve the value > transmitted from the source, not the value on the command line of the destination. The are two relevant pieces of data here. (a) the GUID in guest RAM (b) the guest-phys address of the GUID, written back by the guest fw to a guest-writeable fw_cfg file, to be dereferenced by QEMU, for updating the GUID in guest RAM For both live migration and restoring saved state from disk, (b) doesn't change. It is also not exposed on the QEMU command line. (It is configured by the guest firmware during initial boot.) (a) is taken from the QEMU command line. It can be "auto" (and then QEMU generates a random GUID), or a specific GUID string. This GUID is always written to guest RAM (assuming (b) has been configured) in the vmgenid device's post_load callback. However, whether the new GUID should be different from the one already present in guest RAM is a separate question. - For restoring state from disk, a different GUID (either generated by libvirt, or by QEMU due to "auto") makes sense. - For live migration, it makes sense for libvirt to pass in the same GUID on the target host as was used on the source host. The guest RAM update, and the ACPI interrupt (SCI), will occur on the target host, but the GUID won't change effectively. (The VMGENID spec explicitly permits spurious notifications, i.e., an SCI with no change to the GUID in RAM.) Thanks Laszlo
On 02/20/17 21:45, Eric Blake wrote: > On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote: >> * Eric Blake (eblake@redhat.com) wrote: >>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote: >>>> * Laszlo Ersek (lersek@redhat.com) wrote: >>>>> CC Dave >>>> >>>> This isn't an area I really understand; but if I'm >>>> reading this right then >>>> vmgenid is stored in fw_cfg? >>>> fw_cfg isn't migrated >>>> >>>> So why should any changes to it get migrated, except if it's already >>>> been read by the guest (and if the guest reads it again aftwards what's >>>> it expected to read?) >>> >>> Why are we expecting it to change on migration? You want a new value >> >> I'm not; I was asking why a change made prior to migration would be >> preserved across migration. > > Okay, so you're asking what happens if the source requests the vmgenid > device, and sets an id, but the destination of the migration does not > request anything This should never happen, as it means different QEMU command lines on source vs. target hosts. (Different as in "incorrectly different".) Dave writes, "a change made prior to migration". Change made to what? - the GUID cannot be changed via the monitor once QEMU has been started. We dropped the monitor command for that, due to lack of a good use case, and due to lifecycle complexities. We have figured out a way to make it safe, but until there's a really convincing use case, we shouldn't add that complexity. - the address of the GUID is changed (the firmware programs it from "zero" to an actual address, in a writeable fw_cfg file), and that piece of info is explicitly migrated, as part of the vmgenid device's vmsd. Thanks Laszlo > - how does the guest on the destination see the same id > as was in place on the source at the time migration started. > >> >> >>> when you load state from disk (you don't know how many times the same >>> state has been loaded previously, so each load is effectively forking >>> the VM and you want a different value), but for a single live migration, >>> you aren't forking the VM and don't need a new generation ID. >>> >>> I guess it all boils down to what command line you're using: if libvirt >>> is driving a live migration, it will request the same UUID in the >>> command line of the destination as what is on the source; while if >>> libvirt is loading from a [managed]save to restore state from a file, it >>> will either request a new UUID directly or request auto to let qemu >>> generate the new id. >> >> Hmm now I've lost it a bit; I thought we would preserve the value >> transmitted from the source, not the value on the command line of the destination. > > I guess I'm trying to figure out whether libvirt MUST read the current > id and explicitly tell the destination of migration to reuse that id, or > if libvirt can omit the id on migration and everything just works > because the id was migrated from the source. >
On Mon, Feb 20, 2017 at 09:55:40PM +0100, Laszlo Ersek wrote: > On 02/20/17 21:45, Eric Blake wrote: > > On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote: > >> * Eric Blake (eblake@redhat.com) wrote: > >>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote: > >>>> * Laszlo Ersek (lersek@redhat.com) wrote: > >>>>> CC Dave > >>>> > >>>> This isn't an area I really understand; but if I'm > >>>> reading this right then > >>>> vmgenid is stored in fw_cfg? > >>>> fw_cfg isn't migrated > >>>> > >>>> So why should any changes to it get migrated, except if it's already > >>>> been read by the guest (and if the guest reads it again aftwards what's > >>>> it expected to read?) > >>> > >>> Why are we expecting it to change on migration? You want a new value > >> > >> I'm not; I was asking why a change made prior to migration would be > >> preserved across migration. > > > > Okay, so you're asking what happens if the source requests the vmgenid > > device, and sets an id, but the destination of the migration does not > > request anything > > This should never happen, as it means different QEMU command lines on > source vs. target hosts. (Different as in "incorrectly different".) > > Dave writes, "a change made prior to migration". Change made to what? > > - the GUID cannot be changed via the monitor once QEMU has been started. > We dropped the monitor command for that, due to lack of a good use case, > and due to lifecycle complexities. We have figured out a way to make it > safe, but until there's a really convincing use case, we shouldn't add > that complexity. True but we might in the future, and it seems prudent to make migration stream future-proof for that. > - the address of the GUID is changed (the firmware programs it from > "zero" to an actual address, in a writeable fw_cfg file), and that piece > of info is explicitly migrated, as part of the vmgenid device's vmsd. > > Thanks > Laszlo > > > > - how does the guest on the destination see the same id > > as was in place on the source at the time migration started. > > > >> > >> > >>> when you load state from disk (you don't know how many times the same > >>> state has been loaded previously, so each load is effectively forking > >>> the VM and you want a different value), but for a single live migration, > >>> you aren't forking the VM and don't need a new generation ID. > >>> > >>> I guess it all boils down to what command line you're using: if libvirt > >>> is driving a live migration, it will request the same UUID in the > >>> command line of the destination as what is on the source; while if > >>> libvirt is loading from a [managed]save to restore state from a file, it > >>> will either request a new UUID directly or request auto to let qemu > >>> generate the new id. > >> > >> Hmm now I've lost it a bit; I thought we would preserve the value > >> transmitted from the source, not the value on the command line of the destination. > > > > I guess I'm trying to figure out whether libvirt MUST read the current > > id and explicitly tell the destination of migration to reuse that id, or > > if libvirt can omit the id on migration and everything just works > > because the id was migrated from the source. > >
On 02/21/17 02:43, Michael S. Tsirkin wrote: > On Mon, Feb 20, 2017 at 09:55:40PM +0100, Laszlo Ersek wrote: >> On 02/20/17 21:45, Eric Blake wrote: >>> On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote: >>>> * Eric Blake (eblake@redhat.com) wrote: >>>>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote: >>>>>> * Laszlo Ersek (lersek@redhat.com) wrote: >>>>>>> CC Dave >>>>>> >>>>>> This isn't an area I really understand; but if I'm >>>>>> reading this right then >>>>>> vmgenid is stored in fw_cfg? >>>>>> fw_cfg isn't migrated >>>>>> >>>>>> So why should any changes to it get migrated, except if it's already >>>>>> been read by the guest (and if the guest reads it again aftwards what's >>>>>> it expected to read?) >>>>> >>>>> Why are we expecting it to change on migration? You want a new value >>>> >>>> I'm not; I was asking why a change made prior to migration would be >>>> preserved across migration. >>> >>> Okay, so you're asking what happens if the source requests the vmgenid >>> device, and sets an id, but the destination of the migration does not >>> request anything >> >> This should never happen, as it means different QEMU command lines on >> source vs. target hosts. (Different as in "incorrectly different".) >> >> Dave writes, "a change made prior to migration". Change made to what? >> >> - the GUID cannot be changed via the monitor once QEMU has been started. >> We dropped the monitor command for that, due to lack of a good use case, >> and due to lifecycle complexities. We have figured out a way to make it >> safe, but until there's a really convincing use case, we shouldn't add >> that complexity. > > True but we might in the future, and it seems prudent to make > migration stream future-proof for that. It is already. The monitor command, if we add it, can be implemented incrementally. I described it as "approach (iii)" elsewhere in the thread. This is a more detailed recap: - introduce a new device property (internal only), such as "x-enable-set-vmgenid". Make it reflect whether a given machine type supports the monitor command. - change the /etc/vmgenid_guid fw_cfg blob from callback-less to one with a selection callback - add a new boolean latch to the vmgenid device, called "guid_blob_selected" or something similar - the reset handler sets the latch to FALSE (NB: the reset handler already sets /etc/vmgenid_addr to zero) - the select callback for /etc/vmgenid_guid sets the latch to TRUE - the latch is added to the migration stream as a subsection *if* x-enable-set-vmgenid is TRUE - the set-vmgenid monitor command checks all three of: x-enable-set-vmgenid, the latch, and the contents of /etc/vmgenid_addr: - if x-enable-set-vmgenid is FALSE, the monitor command returns QERR_UNSUPPORTED (this is a generic error class, with an "unsupported" error message). Otherwise, - if the latch is TRUE *and* /etc/vmgenid_addr is zero, then the guest firmware has executed (or started executing) ALLOCATE for /etc/vmgenid_guid, but it has not executed WRITE_POINTER yet. In this case updating the VMGENID from the monitor is unsafe (we cannot guarantee informing the guest successfully), so in this case the monitor command fails with ERROR_CLASS_DEVICE_NOT_ACTIVE. The caller should simply try a bit later. (By which time the firmware will likely have programmed /etc/vmgenid_addr.) Libvirt can recognize this error specifically, because it is not the generic error class. ERROR_CLASS_DEVICE_NOT_ACTIVE stands for "EAGAIN", practically, in this case. - Otherwise -- meaning latch is FALSE *or* /etc/vmgenid_addr is nonzero, that is, the guest has either not run ALLOCATE since reset, *or* it has, but it has also run WRITE_POINTER): - refresh the GUID within the fw_cfg blob for /etc/vmgenid_guid in-place -- the guest will see this whenever it runs ALLOCATE for /etc/vmgenid_guid, *AND* - if /etc/vmgenid_addr is not zero, then update the guest (that is, RAM write + SCI) Thanks Laszlo > >> - the address of the GUID is changed (the firmware programs it from >> "zero" to an actual address, in a writeable fw_cfg file), and that piece >> of info is explicitly migrated, as part of the vmgenid device's vmsd. >> >> Thanks >> Laszlo >> >> >>> - how does the guest on the destination see the same id >>> as was in place on the source at the time migration started. >>> >>>> >>>> >>>>> when you load state from disk (you don't know how many times the same >>>>> state has been loaded previously, so each load is effectively forking >>>>> the VM and you want a different value), but for a single live migration, >>>>> you aren't forking the VM and don't need a new generation ID. >>>>> >>>>> I guess it all boils down to what command line you're using: if libvirt >>>>> is driving a live migration, it will request the same UUID in the >>>>> command line of the destination as what is on the source; while if >>>>> libvirt is loading from a [managed]save to restore state from a file, it >>>>> will either request a new UUID directly or request auto to let qemu >>>>> generate the new id. >>>> >>>> Hmm now I've lost it a bit; I thought we would preserve the value >>>> transmitted from the source, not the value on the command line of the destination. >>> >>> I guess I'm trying to figure out whether libvirt MUST read the current >>> id and explicitly tell the destination of migration to reuse that id, or >>> if libvirt can omit the id on migration and everything just works >>> because the id was migrated from the source. >>>
On Tue, Feb 21, 2017 at 10:58:05AM +0100, Laszlo Ersek wrote: > On 02/21/17 02:43, Michael S. Tsirkin wrote: > > On Mon, Feb 20, 2017 at 09:55:40PM +0100, Laszlo Ersek wrote: > >> On 02/20/17 21:45, Eric Blake wrote: > >>> On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote: > >>>> * Eric Blake (eblake@redhat.com) wrote: > >>>>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote: > >>>>>> * Laszlo Ersek (lersek@redhat.com) wrote: > >>>>>>> CC Dave > >>>>>> > >>>>>> This isn't an area I really understand; but if I'm > >>>>>> reading this right then > >>>>>> vmgenid is stored in fw_cfg? > >>>>>> fw_cfg isn't migrated > >>>>>> > >>>>>> So why should any changes to it get migrated, except if it's already > >>>>>> been read by the guest (and if the guest reads it again aftwards what's > >>>>>> it expected to read?) > >>>>> > >>>>> Why are we expecting it to change on migration? You want a new value > >>>> > >>>> I'm not; I was asking why a change made prior to migration would be > >>>> preserved across migration. > >>> > >>> Okay, so you're asking what happens if the source requests the vmgenid > >>> device, and sets an id, but the destination of the migration does not > >>> request anything > >> > >> This should never happen, as it means different QEMU command lines on > >> source vs. target hosts. (Different as in "incorrectly different".) > >> > >> Dave writes, "a change made prior to migration". Change made to what? > >> > >> - the GUID cannot be changed via the monitor once QEMU has been started. > >> We dropped the monitor command for that, due to lack of a good use case, > >> and due to lifecycle complexities. We have figured out a way to make it > >> safe, but until there's a really convincing use case, we shouldn't add > >> that complexity. > > > > True but we might in the future, and it seems prudent to make > > migration stream future-proof for that. > > It is already. > > The monitor command, if we add it, can be implemented incrementally. I > described it as "approach (iii)" elsewhere in the thread. This is a more > detailed recap: > > - introduce a new device property (internal only), such as > "x-enable-set-vmgenid". Make it reflect whether a given machine type > supports the monitor command. This is the part we can avoid at no real cost just by making sure the guid is migrated. > - change the /etc/vmgenid_guid fw_cfg blob from callback-less to one > with a selection callback > > - add a new boolean latch to the vmgenid device, called > "guid_blob_selected" or something similar > > - the reset handler sets the latch to FALSE > (NB: the reset handler already sets /etc/vmgenid_addr to zero) > > - the select callback for /etc/vmgenid_guid sets the latch to TRUE > > - the latch is added to the migration stream as a subsection *if* > x-enable-set-vmgenid is TRUE > > - the set-vmgenid monitor command checks all three of: > x-enable-set-vmgenid, the latch, and the contents of > /etc/vmgenid_addr: > > - if x-enable-set-vmgenid is FALSE, the monitor command returns > QERR_UNSUPPORTED (this is a generic error class, with an > "unsupported" error message). Otherwise, > > - if the latch is TRUE *and* /etc/vmgenid_addr is zero, then the > guest firmware has executed (or started executing) ALLOCATE for > /etc/vmgenid_guid, but it has not executed WRITE_POINTER yet. > In this case updating the VMGENID from the monitor is unsafe > (we cannot guarantee informing the guest successfully), so in this > case the monitor command fails with ERROR_CLASS_DEVICE_NOT_ACTIVE. > The caller should simply try a bit later. (By which time the > firmware will likely have programmed /etc/vmgenid_addr.) This makes no sense to me. Just update it in qemu memory and write when guest asks for it. > Libvirt can recognize this error specifically, because it is not the > generic error class. ERROR_CLASS_DEVICE_NOT_ACTIVE stands for > "EAGAIN", practically, in this case. > > - Otherwise -- meaning latch is FALSE *or* /etc/vmgenid_addr is > nonzero, that is, the guest has either not run ALLOCATE since > reset, *or* it has, but it has also run WRITE_POINTER): > > - refresh the GUID within the fw_cfg blob for /etc/vmgenid_guid > in-place -- the guest will see this whenever it runs ALLOCATE for > /etc/vmgenid_guid, *AND* > > - if /etc/vmgenid_addr is not zero, then update the guest (that is, > RAM write + SCI) > > Thanks > Laszlo Seems way more painful than it has to be. Just migrate the guid and then management can write it at any time. > > > >> - the address of the GUID is changed (the firmware programs it from > >> "zero" to an actual address, in a writeable fw_cfg file), and that piece > >> of info is explicitly migrated, as part of the vmgenid device's vmsd. > >> > >> Thanks > >> Laszlo > >> > >> > >>> - how does the guest on the destination see the same id > >>> as was in place on the source at the time migration started. > >>> > >>>> > >>>> > >>>>> when you load state from disk (you don't know how many times the same > >>>>> state has been loaded previously, so each load is effectively forking > >>>>> the VM and you want a different value), but for a single live migration, > >>>>> you aren't forking the VM and don't need a new generation ID. > >>>>> > >>>>> I guess it all boils down to what command line you're using: if libvirt > >>>>> is driving a live migration, it will request the same UUID in the > >>>>> command line of the destination as what is on the source; while if > >>>>> libvirt is loading from a [managed]save to restore state from a file, it > >>>>> will either request a new UUID directly or request auto to let qemu > >>>>> generate the new id. > >>>> > >>>> Hmm now I've lost it a bit; I thought we would preserve the value > >>>> transmitted from the source, not the value on the command line of the destination. > >>> > >>> I guess I'm trying to figure out whether libvirt MUST read the current > >>> id and explicitly tell the destination of migration to reuse that id, or > >>> if libvirt can omit the id on migration and everything just works > >>> because the id was migrated from the source. > >>>
On 02/21/17 15:14, Michael S. Tsirkin wrote: > On Tue, Feb 21, 2017 at 10:58:05AM +0100, Laszlo Ersek wrote: >> On 02/21/17 02:43, Michael S. Tsirkin wrote: >>> On Mon, Feb 20, 2017 at 09:55:40PM +0100, Laszlo Ersek wrote: >>>> On 02/20/17 21:45, Eric Blake wrote: >>>>> On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote: >>>>>> * Eric Blake (eblake@redhat.com) wrote: >>>>>>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote: >>>>>>>> * Laszlo Ersek (lersek@redhat.com) wrote: >>>>>>>>> CC Dave >>>>>>>> >>>>>>>> This isn't an area I really understand; but if I'm >>>>>>>> reading this right then >>>>>>>> vmgenid is stored in fw_cfg? >>>>>>>> fw_cfg isn't migrated >>>>>>>> >>>>>>>> So why should any changes to it get migrated, except if it's already >>>>>>>> been read by the guest (and if the guest reads it again aftwards what's >>>>>>>> it expected to read?) >>>>>>> >>>>>>> Why are we expecting it to change on migration? You want a new value >>>>>> >>>>>> I'm not; I was asking why a change made prior to migration would be >>>>>> preserved across migration. >>>>> >>>>> Okay, so you're asking what happens if the source requests the vmgenid >>>>> device, and sets an id, but the destination of the migration does not >>>>> request anything >>>> >>>> This should never happen, as it means different QEMU command lines on >>>> source vs. target hosts. (Different as in "incorrectly different".) >>>> >>>> Dave writes, "a change made prior to migration". Change made to what? >>>> >>>> - the GUID cannot be changed via the monitor once QEMU has been started. >>>> We dropped the monitor command for that, due to lack of a good use case, >>>> and due to lifecycle complexities. We have figured out a way to make it >>>> safe, but until there's a really convincing use case, we shouldn't add >>>> that complexity. >>> >>> True but we might in the future, and it seems prudent to make >>> migration stream future-proof for that. >> >> It is already. >> >> The monitor command, if we add it, can be implemented incrementally. I >> described it as "approach (iii)" elsewhere in the thread. This is a more >> detailed recap: >> >> - introduce a new device property (internal only), such as >> "x-enable-set-vmgenid". Make it reflect whether a given machine type >> supports the monitor command. > > This is the part we can avoid at no real cost just > by making sure the guid is migrated. > > >> - change the /etc/vmgenid_guid fw_cfg blob from callback-less to one >> with a selection callback >> >> - add a new boolean latch to the vmgenid device, called >> "guid_blob_selected" or something similar >> >> - the reset handler sets the latch to FALSE >> (NB: the reset handler already sets /etc/vmgenid_addr to zero) >> >> - the select callback for /etc/vmgenid_guid sets the latch to TRUE >> >> - the latch is added to the migration stream as a subsection *if* >> x-enable-set-vmgenid is TRUE >> >> - the set-vmgenid monitor command checks all three of: >> x-enable-set-vmgenid, the latch, and the contents of >> /etc/vmgenid_addr: >> >> - if x-enable-set-vmgenid is FALSE, the monitor command returns >> QERR_UNSUPPORTED (this is a generic error class, with an >> "unsupported" error message). Otherwise, >> >> - if the latch is TRUE *and* /etc/vmgenid_addr is zero, then the >> guest firmware has executed (or started executing) ALLOCATE for >> /etc/vmgenid_guid, but it has not executed WRITE_POINTER yet. >> In this case updating the VMGENID from the monitor is unsafe >> (we cannot guarantee informing the guest successfully), so in this >> case the monitor command fails with ERROR_CLASS_DEVICE_NOT_ACTIVE. >> The caller should simply try a bit later. (By which time the >> firmware will likely have programmed /etc/vmgenid_addr.) > > This makes no sense to me. Just update it in qemu memory > and write when guest asks for it. I designed the above (sorry if "designed" is a word too pompous for this) quite explicitly to address your concern as to what would happen if someone tried to massage the GUID via the monitor while the firmware was between ALLOCATE and WRITE_POINTER. Also, we don't know when the guest "asks" for the GUID (in guest RAM). It just evaluates ADDR (maybe always, maybe only once, at guest driver startup), and then it just looks at RAM whenever it wants to. This is why this idea seeks to track the guest's state -- if the guest is before ALLOCATE, it's okay to update the fw_cfg blob, if it is between ALLOCATE and WRITE_POINTER, reject the monitor command (temporarily), and if the guest is after WRITE_POINTER, update the RAM and inject the SCI. We cannot see *exactly* when the guest has just finished writing the address. We have only select callbacks for fw_cfg items, not write callbacks. And a select callback is no good for the address blob, because it would be invoked *before* the guest writes the address. We discussed these facts several days (weeks?) and several iterations ago. The longer term remedy we came up was the above design. The shorter term remedy was to drop the "set" monitor command, because we couldn't figure out a management layer use case for that monitor command. If you now (at v8) insist to future proof the design for a potential "set" monitor command, that's exactly the same as if you were requiring Ben to implement the monitor command right now. Except this is worse, because we dropped the monitor command in v6 (from v5), and you didn't protest. > > >> Libvirt can recognize this error specifically, because it is not the >> generic error class. ERROR_CLASS_DEVICE_NOT_ACTIVE stands for >> "EAGAIN", practically, in this case. >> >> - Otherwise -- meaning latch is FALSE *or* /etc/vmgenid_addr is >> nonzero, that is, the guest has either not run ALLOCATE since >> reset, *or* it has, but it has also run WRITE_POINTER): >> >> - refresh the GUID within the fw_cfg blob for /etc/vmgenid_guid >> in-place -- the guest will see this whenever it runs ALLOCATE for >> /etc/vmgenid_guid, *AND* >> >> - if /etc/vmgenid_addr is not zero, then update the guest (that is, >> RAM write + SCI) >> >> Thanks >> Laszlo > > Seems way more painful than it has to be. Just migrate the guid > and then management can write it at any time. Yes, management would be able to do that, but we won't know when to expose it to the guest. (Because, again, we don't exactly know when the guest looks at the GUID in RAM, and if the address is not configured yet, we cannot put the GUID anywhere in RAM.) Do you intend to block v8 over this? Thanks Laszlo > > >>> >>>> - the address of the GUID is changed (the firmware programs it from >>>> "zero" to an actual address, in a writeable fw_cfg file), and that piece >>>> of info is explicitly migrated, as part of the vmgenid device's vmsd. >>>> >>>> Thanks >>>> Laszlo >>>> >>>> >>>>> - how does the guest on the destination see the same id >>>>> as was in place on the source at the time migration started. >>>>> >>>>>> >>>>>> >>>>>>> when you load state from disk (you don't know how many times the same >>>>>>> state has been loaded previously, so each load is effectively forking >>>>>>> the VM and you want a different value), but for a single live migration, >>>>>>> you aren't forking the VM and don't need a new generation ID. >>>>>>> >>>>>>> I guess it all boils down to what command line you're using: if libvirt >>>>>>> is driving a live migration, it will request the same UUID in the >>>>>>> command line of the destination as what is on the source; while if >>>>>>> libvirt is loading from a [managed]save to restore state from a file, it >>>>>>> will either request a new UUID directly or request auto to let qemu >>>>>>> generate the new id. >>>>>> >>>>>> Hmm now I've lost it a bit; I thought we would preserve the value >>>>>> transmitted from the source, not the value on the command line of the destination. >>>>> >>>>> I guess I'm trying to figure out whether libvirt MUST read the current >>>>> id and explicitly tell the destination of migration to reuse that id, or >>>>> if libvirt can omit the id on migration and everything just works >>>>> because the id was migrated from the source. >>>>>
On Tue, Feb 21, 2017 at 05:08:40PM +0100, Laszlo Ersek wrote: > On 02/21/17 15:14, Michael S. Tsirkin wrote: > > On Tue, Feb 21, 2017 at 10:58:05AM +0100, Laszlo Ersek wrote: > >> On 02/21/17 02:43, Michael S. Tsirkin wrote: > >>> On Mon, Feb 20, 2017 at 09:55:40PM +0100, Laszlo Ersek wrote: > >>>> On 02/20/17 21:45, Eric Blake wrote: > >>>>> On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote: > >>>>>> * Eric Blake (eblake@redhat.com) wrote: > >>>>>>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote: > >>>>>>>> * Laszlo Ersek (lersek@redhat.com) wrote: > >>>>>>>>> CC Dave > >>>>>>>> > >>>>>>>> This isn't an area I really understand; but if I'm > >>>>>>>> reading this right then > >>>>>>>> vmgenid is stored in fw_cfg? > >>>>>>>> fw_cfg isn't migrated > >>>>>>>> > >>>>>>>> So why should any changes to it get migrated, except if it's already > >>>>>>>> been read by the guest (and if the guest reads it again aftwards what's > >>>>>>>> it expected to read?) > >>>>>>> > >>>>>>> Why are we expecting it to change on migration? You want a new value > >>>>>> > >>>>>> I'm not; I was asking why a change made prior to migration would be > >>>>>> preserved across migration. > >>>>> > >>>>> Okay, so you're asking what happens if the source requests the vmgenid > >>>>> device, and sets an id, but the destination of the migration does not > >>>>> request anything > >>>> > >>>> This should never happen, as it means different QEMU command lines on > >>>> source vs. target hosts. (Different as in "incorrectly different".) > >>>> > >>>> Dave writes, "a change made prior to migration". Change made to what? > >>>> > >>>> - the GUID cannot be changed via the monitor once QEMU has been started. > >>>> We dropped the monitor command for that, due to lack of a good use case, > >>>> and due to lifecycle complexities. We have figured out a way to make it > >>>> safe, but until there's a really convincing use case, we shouldn't add > >>>> that complexity. > >>> > >>> True but we might in the future, and it seems prudent to make > >>> migration stream future-proof for that. > >> > >> It is already. > >> > >> The monitor command, if we add it, can be implemented incrementally. I > >> described it as "approach (iii)" elsewhere in the thread. This is a more > >> detailed recap: > >> > >> - introduce a new device property (internal only), such as > >> "x-enable-set-vmgenid". Make it reflect whether a given machine type > >> supports the monitor command. > > > > This is the part we can avoid at no real cost just > > by making sure the guid is migrated. > > > > > >> - change the /etc/vmgenid_guid fw_cfg blob from callback-less to one > >> with a selection callback > >> > >> - add a new boolean latch to the vmgenid device, called > >> "guid_blob_selected" or something similar > >> > >> - the reset handler sets the latch to FALSE > >> (NB: the reset handler already sets /etc/vmgenid_addr to zero) > >> > >> - the select callback for /etc/vmgenid_guid sets the latch to TRUE > >> > >> - the latch is added to the migration stream as a subsection *if* > >> x-enable-set-vmgenid is TRUE > >> > >> - the set-vmgenid monitor command checks all three of: > >> x-enable-set-vmgenid, the latch, and the contents of > >> /etc/vmgenid_addr: > >> > >> - if x-enable-set-vmgenid is FALSE, the monitor command returns > >> QERR_UNSUPPORTED (this is a generic error class, with an > >> "unsupported" error message). Otherwise, > >> > >> - if the latch is TRUE *and* /etc/vmgenid_addr is zero, then the > >> guest firmware has executed (or started executing) ALLOCATE for > >> /etc/vmgenid_guid, but it has not executed WRITE_POINTER yet. > >> In this case updating the VMGENID from the monitor is unsafe > >> (we cannot guarantee informing the guest successfully), so in this > >> case the monitor command fails with ERROR_CLASS_DEVICE_NOT_ACTIVE. > >> The caller should simply try a bit later. (By which time the > >> firmware will likely have programmed /etc/vmgenid_addr.) > > > > This makes no sense to me. Just update it in qemu memory > > and write when guest asks for it. > > I designed the above (sorry if "designed" is a word too pompous for > this) quite explicitly to address your concern as to what would happen > if someone tried to massage the GUID via the monitor while the firmware > was between ALLOCATE and WRITE_POINTER. > > Also, we don't know when the guest "asks" for the GUID (in guest RAM). > It just evaluates ADDR (maybe always, maybe only once, at guest driver > startup), and then it just looks at RAM whenever it wants to. > > This is why this idea seeks to track the guest's state -- if the guest > is before ALLOCATE, it's okay to update the fw_cfg blob, if it is > between ALLOCATE and WRITE_POINTER, reject the monitor command > (temporarily), and if the guest is after WRITE_POINTER, update the RAM > and inject the SCI. > > We cannot see *exactly* when the guest has just finished writing the > address. We have only select callbacks for fw_cfg items, not write > callbacks. And a select callback is no good for the address blob, > because it would be invoked *before* the guest writes the address. > > We discussed these facts several days (weeks?) and several iterations > ago. The longer term remedy we came up was the above design. The shorter > term remedy was to drop the "set" monitor command, because we couldn't > figure out a management layer use case for that monitor command. > > If you now (at v8) insist to future proof the design for a potential > "set" monitor command, that's exactly the same as if you were requiring > Ben to implement the monitor command right now. Except this is worse, > because we dropped the monitor command in v6 (from v5), and you didn't > protest. I'm merging this as-is but I think the concerns are overblown. We have many fields which devices DMA into guest memory and changing them is easy. It should be a simple matter to update guid copy in fw cfg blob, and *if we have the address*, DMA there and send SCI. Yes we don't know when does guest look at guid but that is simply up to guest. It needs to look at it at the right time. So the implementation is really easy I think. The real problem is that we will have migrated guid and command line guid and which one wins if they conflict. And that is IMO something we need to figure out now and not later. > > > > > >> Libvirt can recognize this error specifically, because it is not the > >> generic error class. ERROR_CLASS_DEVICE_NOT_ACTIVE stands for > >> "EAGAIN", practically, in this case. > >> > >> - Otherwise -- meaning latch is FALSE *or* /etc/vmgenid_addr is > >> nonzero, that is, the guest has either not run ALLOCATE since > >> reset, *or* it has, but it has also run WRITE_POINTER): > >> > >> - refresh the GUID within the fw_cfg blob for /etc/vmgenid_guid > >> in-place -- the guest will see this whenever it runs ALLOCATE for > >> /etc/vmgenid_guid, *AND* > >> > >> - if /etc/vmgenid_addr is not zero, then update the guest (that is, > >> RAM write + SCI) > >> > >> Thanks > >> Laszlo > > > > Seems way more painful than it has to be. Just migrate the guid > > and then management can write it at any time. > > Yes, management would be able to do that, but we won't know when to > expose it to the guest. (Because, again, we don't exactly know when the > guest looks at the GUID in RAM, and if the address is not configured > yet, we cannot put the GUID anywhere in RAM.) > > Do you intend to block v8 over this? > > Thanks > Laszlo > > > > > > >>> > >>>> - the address of the GUID is changed (the firmware programs it from > >>>> "zero" to an actual address, in a writeable fw_cfg file), and that piece > >>>> of info is explicitly migrated, as part of the vmgenid device's vmsd. > >>>> > >>>> Thanks > >>>> Laszlo > >>>> > >>>> > >>>>> - how does the guest on the destination see the same id > >>>>> as was in place on the source at the time migration started. > >>>>> > >>>>>> > >>>>>> > >>>>>>> when you load state from disk (you don't know how many times the same > >>>>>>> state has been loaded previously, so each load is effectively forking > >>>>>>> the VM and you want a different value), but for a single live migration, > >>>>>>> you aren't forking the VM and don't need a new generation ID. > >>>>>>> > >>>>>>> I guess it all boils down to what command line you're using: if libvirt > >>>>>>> is driving a live migration, it will request the same UUID in the > >>>>>>> command line of the destination as what is on the source; while if > >>>>>>> libvirt is loading from a [managed]save to restore state from a file, it > >>>>>>> will either request a new UUID directly or request auto to let qemu > >>>>>>> generate the new id. > >>>>>> > >>>>>> Hmm now I've lost it a bit; I thought we would preserve the value > >>>>>> transmitted from the source, not the value on the command line of the destination. > >>>>> > >>>>> I guess I'm trying to figure out whether libvirt MUST read the current > >>>>> id and explicitly tell the destination of migration to reuse that id, or > >>>>> if libvirt can omit the id on migration and everything just works > >>>>> because the id was migrated from the source. > >>>>>
On 02/21/17 17:17, Michael S. Tsirkin wrote: > On Tue, Feb 21, 2017 at 05:08:40PM +0100, Laszlo Ersek wrote: >> On 02/21/17 15:14, Michael S. Tsirkin wrote: >>> On Tue, Feb 21, 2017 at 10:58:05AM +0100, Laszlo Ersek wrote: >>>> On 02/21/17 02:43, Michael S. Tsirkin wrote: >>>>> On Mon, Feb 20, 2017 at 09:55:40PM +0100, Laszlo Ersek wrote: >>>>>> On 02/20/17 21:45, Eric Blake wrote: >>>>>>> On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote: >>>>>>>> * Eric Blake (eblake@redhat.com) wrote: >>>>>>>>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote: >>>>>>>>>> * Laszlo Ersek (lersek@redhat.com) wrote: >>>>>>>>>>> CC Dave >>>>>>>>>> >>>>>>>>>> This isn't an area I really understand; but if I'm >>>>>>>>>> reading this right then >>>>>>>>>> vmgenid is stored in fw_cfg? >>>>>>>>>> fw_cfg isn't migrated >>>>>>>>>> >>>>>>>>>> So why should any changes to it get migrated, except if it's already >>>>>>>>>> been read by the guest (and if the guest reads it again aftwards what's >>>>>>>>>> it expected to read?) >>>>>>>>> >>>>>>>>> Why are we expecting it to change on migration? You want a new value >>>>>>>> >>>>>>>> I'm not; I was asking why a change made prior to migration would be >>>>>>>> preserved across migration. >>>>>>> >>>>>>> Okay, so you're asking what happens if the source requests the vmgenid >>>>>>> device, and sets an id, but the destination of the migration does not >>>>>>> request anything >>>>>> >>>>>> This should never happen, as it means different QEMU command lines on >>>>>> source vs. target hosts. (Different as in "incorrectly different".) >>>>>> >>>>>> Dave writes, "a change made prior to migration". Change made to what? >>>>>> >>>>>> - the GUID cannot be changed via the monitor once QEMU has been started. >>>>>> We dropped the monitor command for that, due to lack of a good use case, >>>>>> and due to lifecycle complexities. We have figured out a way to make it >>>>>> safe, but until there's a really convincing use case, we shouldn't add >>>>>> that complexity. >>>>> >>>>> True but we might in the future, and it seems prudent to make >>>>> migration stream future-proof for that. >>>> >>>> It is already. >>>> >>>> The monitor command, if we add it, can be implemented incrementally. I >>>> described it as "approach (iii)" elsewhere in the thread. This is a more >>>> detailed recap: >>>> >>>> - introduce a new device property (internal only), such as >>>> "x-enable-set-vmgenid". Make it reflect whether a given machine type >>>> supports the monitor command. >>> >>> This is the part we can avoid at no real cost just >>> by making sure the guid is migrated. >>> >>> >>>> - change the /etc/vmgenid_guid fw_cfg blob from callback-less to one >>>> with a selection callback >>>> >>>> - add a new boolean latch to the vmgenid device, called >>>> "guid_blob_selected" or something similar >>>> >>>> - the reset handler sets the latch to FALSE >>>> (NB: the reset handler already sets /etc/vmgenid_addr to zero) >>>> >>>> - the select callback for /etc/vmgenid_guid sets the latch to TRUE >>>> >>>> - the latch is added to the migration stream as a subsection *if* >>>> x-enable-set-vmgenid is TRUE >>>> >>>> - the set-vmgenid monitor command checks all three of: >>>> x-enable-set-vmgenid, the latch, and the contents of >>>> /etc/vmgenid_addr: >>>> >>>> - if x-enable-set-vmgenid is FALSE, the monitor command returns >>>> QERR_UNSUPPORTED (this is a generic error class, with an >>>> "unsupported" error message). Otherwise, >>>> >>>> - if the latch is TRUE *and* /etc/vmgenid_addr is zero, then the >>>> guest firmware has executed (or started executing) ALLOCATE for >>>> /etc/vmgenid_guid, but it has not executed WRITE_POINTER yet. >>>> In this case updating the VMGENID from the monitor is unsafe >>>> (we cannot guarantee informing the guest successfully), so in this >>>> case the monitor command fails with ERROR_CLASS_DEVICE_NOT_ACTIVE. >>>> The caller should simply try a bit later. (By which time the >>>> firmware will likely have programmed /etc/vmgenid_addr.) >>> >>> This makes no sense to me. Just update it in qemu memory >>> and write when guest asks for it. >> >> I designed the above (sorry if "designed" is a word too pompous for >> this) quite explicitly to address your concern as to what would happen >> if someone tried to massage the GUID via the monitor while the firmware >> was between ALLOCATE and WRITE_POINTER. >> >> Also, we don't know when the guest "asks" for the GUID (in guest RAM). >> It just evaluates ADDR (maybe always, maybe only once, at guest driver >> startup), and then it just looks at RAM whenever it wants to. >> >> This is why this idea seeks to track the guest's state -- if the guest >> is before ALLOCATE, it's okay to update the fw_cfg blob, if it is >> between ALLOCATE and WRITE_POINTER, reject the monitor command >> (temporarily), and if the guest is after WRITE_POINTER, update the RAM >> and inject the SCI. >> >> We cannot see *exactly* when the guest has just finished writing the >> address. We have only select callbacks for fw_cfg items, not write >> callbacks. And a select callback is no good for the address blob, >> because it would be invoked *before* the guest writes the address. >> >> We discussed these facts several days (weeks?) and several iterations >> ago. The longer term remedy we came up was the above design. The shorter >> term remedy was to drop the "set" monitor command, because we couldn't >> figure out a management layer use case for that monitor command. >> >> If you now (at v8) insist to future proof the design for a potential >> "set" monitor command, that's exactly the same as if you were requiring >> Ben to implement the monitor command right now. Except this is worse, >> because we dropped the monitor command in v6 (from v5), and you didn't >> protest. > > I'm merging this as-is Thank you! > but I think the concerns are overblown. > We have many fields which devices DMA into guest memory > and changing them is easy. > > It should be a simple matter to update guid copy in > fw cfg blob, and *if we have the address*, DMA there > and send SCI. I think this was more or less what Ben's v5 did, and (again, as far as I recall) you were concerned about its safety: msgid: <20170206201249-mutt-send-email-mst@kernel.org> URL: https://www.mail-archive.com/qemu-devel@nongnu.org/msg427927.html msgid: <20170206210237-mutt-send-email-mst@kernel.org> URL: https://www.mail-archive.com/qemu-devel@nongnu.org/msg427935.html Again, at that point I "invented" the above elaborate design *only* to address your concern. If you are not concerned any longer (or, if you had never had this exact concern, I just misunderstood you), then I'm fine dropping all of the above -- I definitely don't strive to implement (or request) the above out of my own initiative. Please see item (5) in the following message: msgid: <14f224ed-08e2-cbad-9d1d-8f559cd399a6@redhat.com> URL: https://www.mail-archive.com/qemu-devel@nongnu.org/msg428296.html The design above is just "approach (iii)" expanded with more details, from under said item (5). You didn't react there, and I thought you were okay with the idea. Then Ben went on to drop the "set" monitor command in v6, and you didn't comment on that either -- so I assumed you were okay with that too. > > Yes we don't know when does guest look at guid but that > is simply up to guest. It needs to look at it at the > right time. > > So the implementation is really easy I think. That's for the best! > > The real problem is that we will have migrated guid > and command line guid and which one wins if they conflict. > And that is IMO something we need to figure out now and > not later. Neither Ben nor myself seem to know when the management layer would want to call the "set" monitor command, and youor question is really hard to answer without that knowledge. (Under my proposal, the question does not really exist, because the GUID set last on the source host needs not be migrated except as part of guest RAM, and it's always the command line GUID on the target host that takes precedence after migration and gets written into guest RAM.) In other words, it is for libvirt / users / etc to say why they would want to set GUID-A with the monitor command on the source host, *and* then start up QEMU on the target host with GUID-B on the command line. ... Either way, I would let GUID-B take effect. Thanks Laszlo
diff --git a/include/hw/acpi/vmgenid.h b/include/hw/acpi/vmgenid.h index db7fa0e63303..a2ae450b1f56 100644 --- a/include/hw/acpi/vmgenid.h +++ b/include/hw/acpi/vmgenid.h @@ -4,6 +4,7 @@ #include "hw/acpi/bios-linker-loader.h" #include "hw/qdev.h" #include "qemu/uuid.h" +#include "sysemu/sysemu.h" #define VMGENID_DEVICE "vmgenid" #define VMGENID_GUID "guid" @@ -21,6 +22,7 @@ typedef struct VmGenIdState { DeviceClass parent_obj; QemuUUID guid; /* The 128-bit GUID seen by the guest */ uint8_t vmgenid_addr_le[8]; /* Address of the GUID (little-endian) */ + VMChangeStateEntry *vmstate; } VmGenIdState; static inline Object *find_vmgenid_dev(void) diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c index 9f97b722761b..0ae1d56ff297 100644 --- a/hw/acpi/vmgenid.c +++ b/hw/acpi/vmgenid.c @@ -177,10 +177,20 @@ static void vmgenid_set_guid(Object *obj, const char *value, Error **errp) /* After restoring an image, we need to update the guest memory and notify * it of a potential change to VM Generation ID */ +static void postload_update_guest_cb(void *opaque, int running, RunState state) +{ + VmGenIdState *vms = opaque; + + qemu_del_vm_change_state_handler(vms->vmstate); + vms->vmstate = NULL; + vmgenid_update_guest(vms); +} + static int vmgenid_post_load(void *opaque, int version_id) { VmGenIdState *vms = opaque; - vmgenid_update_guest(vms); + vms->vmstate = qemu_add_vm_change_state_handler(postload_update_guest_cb, + vms); return 0; }