Message ID | 20220224185817.2207228-1-farosas@linux.ibm.com (mailing list archive) |
---|---|
Headers | show |
Series | ppc: nested TCG migration (KVM-on-TCG) | expand |
On 24/02/2022 18:58, Fabiano Rosas wrote: > This series implements the migration for a TCG pseries guest running a > nested KVM guest. This is just like migrating a pseries TCG guest, but > with some extra state to allow a nested guest to continue to run on > the destination. > > Unfortunately the regular TCG migration scenario (not nested) is not > fully working so I cannot be entirely sure the nested migration is > correct. I have included a couple of patches for the general migration > case that (I think?) improve the situation a bit, but I'm still seeing > hard lockups and other issues with more than 1 vcpu. > > This is more of an early RFC to see if anyone spots something right > away. I haven't made much progress in debugging the general TCG > migration case so if anyone has any input there as well I'd appreciate > it. > > Thanks > > Fabiano Rosas (4): > target/ppc: TCG: Migrate tb_offset and decr > spapr: TCG: Migrate spapr_cpu->prod > hw/ppc: Take nested guest into account when saving timebase > spapr: Add KVM-on-TCG migration support > > hw/ppc/ppc.c | 17 +++++++- > hw/ppc/spapr.c | 19 ++++++++ > hw/ppc/spapr_cpu_core.c | 77 +++++++++++++++++++++++++++++++++ > include/hw/ppc/spapr_cpu_core.h | 2 +- > target/ppc/machine.c | 61 ++++++++++++++++++++++++++ > 5 files changed, 174 insertions(+), 2 deletions(-) FWIW I noticed there were some issues with migrating the decrementer on Mac machines a while ago which causes a hang on the destination with TCG (for MacOS on a x86 host in my case). Have a look at the following threads for reference: https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg00546.html https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04622.html IIRC there is code that assumes any migration in PPC is being done live, and so adjusts the timebase on the destination to reflect wall clock time by recalculating tb_offset. I haven't looked at the code for a while but I think the outcome was that there needs to be 2 phases in migration: the first is to migrate the timebase as-is for guests that are paused during migration, whilst the second is to notify hypervisor-aware guest OSs such as Linux to make the timebase adjustment if required if the guest is running. ATB, Mark.
On Thu, Feb 24, 2022 at 09:00:24PM +0000, Mark Cave-Ayland wrote: > On 24/02/2022 18:58, Fabiano Rosas wrote: > > > This series implements the migration for a TCG pseries guest running a > > nested KVM guest. This is just like migrating a pseries TCG guest, but > > with some extra state to allow a nested guest to continue to run on > > the destination. > > > > Unfortunately the regular TCG migration scenario (not nested) is not > > fully working so I cannot be entirely sure the nested migration is > > correct. I have included a couple of patches for the general migration > > case that (I think?) improve the situation a bit, but I'm still seeing > > hard lockups and other issues with more than 1 vcpu. > > > > This is more of an early RFC to see if anyone spots something right > > away. I haven't made much progress in debugging the general TCG > > migration case so if anyone has any input there as well I'd appreciate > > it. > > > > Thanks > > > > Fabiano Rosas (4): > > target/ppc: TCG: Migrate tb_offset and decr > > spapr: TCG: Migrate spapr_cpu->prod > > hw/ppc: Take nested guest into account when saving timebase > > spapr: Add KVM-on-TCG migration support > > > > hw/ppc/ppc.c | 17 +++++++- > > hw/ppc/spapr.c | 19 ++++++++ > > hw/ppc/spapr_cpu_core.c | 77 +++++++++++++++++++++++++++++++++ > > include/hw/ppc/spapr_cpu_core.h | 2 +- > > target/ppc/machine.c | 61 ++++++++++++++++++++++++++ > > 5 files changed, 174 insertions(+), 2 deletions(-) > > FWIW I noticed there were some issues with migrating the decrementer on Mac > machines a while ago which causes a hang on the destination with TCG (for > MacOS on a x86 host in my case). Have a look at the following threads for > reference: > > https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg00546.html > https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04622.html > > IIRC there is code that assumes any migration in PPC is being done live, and > so adjusts the timebase on the destination to reflect wall clock time by > recalculating tb_offset. I haven't looked at the code for a while but I > think the outcome was that there needs to be 2 phases in migration: the > first is to migrate the timebase as-is for guests that are paused during > migration, whilst the second is to notify hypervisor-aware guest OSs such as > Linux to make the timebase adjustment if required if the guest is running. Whether the timebase is adjusted for the migration downtime depends whether the guest clock is pinned to wall clock time or not. Usually it should be (because you don't want your clocks to go wrong on migration of a production system). However in neither case should be the guest be involved. There may be guest side code related to this in Linux, but that's probably for migration under pHyp, which is a guest aware migration system. That's essentially unrelated to migration under qemu/kvm, which is a guest unaware system. Guest aware migration has some nice-sounding advantages; in particular itcan allow migrations across a heterogenous cluster with differences between hosts that the hypervisor can't hide, or can't efficiently hide. However, it is IMO, a deeply broken approach, because it can allow an un-cooperative guest to indefinitely block migration, and for it to be reliably correct it requires *much* more pinning down of exactly what host system changes the guest can and can't be expected to cope with than PAPR has ever bothered to do.
Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes: > On 24/02/2022 18:58, Fabiano Rosas wrote: > >> This series implements the migration for a TCG pseries guest running a >> nested KVM guest. This is just like migrating a pseries TCG guest, but >> with some extra state to allow a nested guest to continue to run on >> the destination. >> >> Unfortunately the regular TCG migration scenario (not nested) is not >> fully working so I cannot be entirely sure the nested migration is >> correct. I have included a couple of patches for the general migration >> case that (I think?) improve the situation a bit, but I'm still seeing >> hard lockups and other issues with more than 1 vcpu. >> >> This is more of an early RFC to see if anyone spots something right >> away. I haven't made much progress in debugging the general TCG >> migration case so if anyone has any input there as well I'd appreciate >> it. >> >> Thanks >> >> Fabiano Rosas (4): >> target/ppc: TCG: Migrate tb_offset and decr >> spapr: TCG: Migrate spapr_cpu->prod >> hw/ppc: Take nested guest into account when saving timebase >> spapr: Add KVM-on-TCG migration support >> >> hw/ppc/ppc.c | 17 +++++++- >> hw/ppc/spapr.c | 19 ++++++++ >> hw/ppc/spapr_cpu_core.c | 77 +++++++++++++++++++++++++++++++++ >> include/hw/ppc/spapr_cpu_core.h | 2 +- >> target/ppc/machine.c | 61 ++++++++++++++++++++++++++ >> 5 files changed, 174 insertions(+), 2 deletions(-) > > FWIW I noticed there were some issues with migrating the decrementer on Mac machines > a while ago which causes a hang on the destination with TCG (for MacOS on a x86 host > in my case). Have a look at the following threads for reference: > > https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg00546.html > https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04622.html Thanks, Mark! There's a lot of helpful information in these threads. > IIRC there is code that assumes any migration in PPC is being done >live, and so adjusts the timebase on the destination to reflect wall >clock time by recalculating tb_offset. I haven't looked at the code for >a while but I think the outcome was that there needs to be 2 phases in >migration: the first is to migrate the timebase as-is for guests that >are paused during migration, whilst the second is to notify >hypervisor-aware guest OSs such as Linux to make the timebase >adjustment if required if the guest is running. > > > ATB, > > Mark.