mbox series

[RFC,0/4] ppc: nested TCG migration (KVM-on-TCG)

Message ID 20220224185817.2207228-1-farosas@linux.ibm.com (mailing list archive)
Headers show
Series ppc: nested TCG migration (KVM-on-TCG) | expand

Message

Fabiano Rosas Feb. 24, 2022, 6:58 p.m. UTC
This series implements the migration for a TCG pseries guest running a
nested KVM guest. This is just like migrating a pseries TCG guest, but
with some extra state to allow a nested guest to continue to run on
the destination.

Unfortunately the regular TCG migration scenario (not nested) is not
fully working so I cannot be entirely sure the nested migration is
correct. I have included a couple of patches for the general migration
case that (I think?) improve the situation a bit, but I'm still seeing
hard lockups and other issues with more than 1 vcpu.

This is more of an early RFC to see if anyone spots something right
away. I haven't made much progress in debugging the general TCG
migration case so if anyone has any input there as well I'd appreciate
it.

Thanks

Fabiano Rosas (4):
  target/ppc: TCG: Migrate tb_offset and decr
  spapr: TCG: Migrate spapr_cpu->prod
  hw/ppc: Take nested guest into account when saving timebase
  spapr: Add KVM-on-TCG migration support

 hw/ppc/ppc.c                    | 17 +++++++-
 hw/ppc/spapr.c                  | 19 ++++++++
 hw/ppc/spapr_cpu_core.c         | 77 +++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_cpu_core.h |  2 +-
 target/ppc/machine.c            | 61 ++++++++++++++++++++++++++
 5 files changed, 174 insertions(+), 2 deletions(-)

Comments

Mark Cave-Ayland Feb. 24, 2022, 9 p.m. UTC | #1
On 24/02/2022 18:58, Fabiano Rosas wrote:

> This series implements the migration for a TCG pseries guest running a
> nested KVM guest. This is just like migrating a pseries TCG guest, but
> with some extra state to allow a nested guest to continue to run on
> the destination.
> 
> Unfortunately the regular TCG migration scenario (not nested) is not
> fully working so I cannot be entirely sure the nested migration is
> correct. I have included a couple of patches for the general migration
> case that (I think?) improve the situation a bit, but I'm still seeing
> hard lockups and other issues with more than 1 vcpu.
> 
> This is more of an early RFC to see if anyone spots something right
> away. I haven't made much progress in debugging the general TCG
> migration case so if anyone has any input there as well I'd appreciate
> it.
> 
> Thanks
> 
> Fabiano Rosas (4):
>    target/ppc: TCG: Migrate tb_offset and decr
>    spapr: TCG: Migrate spapr_cpu->prod
>    hw/ppc: Take nested guest into account when saving timebase
>    spapr: Add KVM-on-TCG migration support
> 
>   hw/ppc/ppc.c                    | 17 +++++++-
>   hw/ppc/spapr.c                  | 19 ++++++++
>   hw/ppc/spapr_cpu_core.c         | 77 +++++++++++++++++++++++++++++++++
>   include/hw/ppc/spapr_cpu_core.h |  2 +-
>   target/ppc/machine.c            | 61 ++++++++++++++++++++++++++
>   5 files changed, 174 insertions(+), 2 deletions(-)

FWIW I noticed there were some issues with migrating the decrementer on Mac machines 
a while ago which causes a hang on the destination with TCG (for MacOS on a x86 host 
in my case). Have a look at the following threads for reference:

https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg00546.html
https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04622.html

IIRC there is code that assumes any migration in PPC is being done live, and so 
adjusts the timebase on the destination to reflect wall clock time by recalculating 
tb_offset. I haven't looked at the code for a while but I think the outcome was that 
there needs to be 2 phases in migration: the first is to migrate the timebase as-is 
for guests that are paused during migration, whilst the second is to notify 
hypervisor-aware guest OSs such as Linux to make the timebase adjustment if required 
if the guest is running.


ATB,

Mark.
David Gibson Feb. 25, 2022, 3:54 a.m. UTC | #2
On Thu, Feb 24, 2022 at 09:00:24PM +0000, Mark Cave-Ayland wrote:
> On 24/02/2022 18:58, Fabiano Rosas wrote:
> 
> > This series implements the migration for a TCG pseries guest running a
> > nested KVM guest. This is just like migrating a pseries TCG guest, but
> > with some extra state to allow a nested guest to continue to run on
> > the destination.
> > 
> > Unfortunately the regular TCG migration scenario (not nested) is not
> > fully working so I cannot be entirely sure the nested migration is
> > correct. I have included a couple of patches for the general migration
> > case that (I think?) improve the situation a bit, but I'm still seeing
> > hard lockups and other issues with more than 1 vcpu.
> > 
> > This is more of an early RFC to see if anyone spots something right
> > away. I haven't made much progress in debugging the general TCG
> > migration case so if anyone has any input there as well I'd appreciate
> > it.
> > 
> > Thanks
> > 
> > Fabiano Rosas (4):
> >    target/ppc: TCG: Migrate tb_offset and decr
> >    spapr: TCG: Migrate spapr_cpu->prod
> >    hw/ppc: Take nested guest into account when saving timebase
> >    spapr: Add KVM-on-TCG migration support
> > 
> >   hw/ppc/ppc.c                    | 17 +++++++-
> >   hw/ppc/spapr.c                  | 19 ++++++++
> >   hw/ppc/spapr_cpu_core.c         | 77 +++++++++++++++++++++++++++++++++
> >   include/hw/ppc/spapr_cpu_core.h |  2 +-
> >   target/ppc/machine.c            | 61 ++++++++++++++++++++++++++
> >   5 files changed, 174 insertions(+), 2 deletions(-)
> 
> FWIW I noticed there were some issues with migrating the decrementer on Mac
> machines a while ago which causes a hang on the destination with TCG (for
> MacOS on a x86 host in my case). Have a look at the following threads for
> reference:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg00546.html
> https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04622.html
> 
> IIRC there is code that assumes any migration in PPC is being done live, and
> so adjusts the timebase on the destination to reflect wall clock time by
> recalculating tb_offset. I haven't looked at the code for a while but I
> think the outcome was that there needs to be 2 phases in migration: the
> first is to migrate the timebase as-is for guests that are paused during
> migration, whilst the second is to notify hypervisor-aware guest OSs such as
> Linux to make the timebase adjustment if required if the guest is running.

Whether the timebase is adjusted for the migration downtime depends
whether the guest clock is pinned to wall clock time or not.  Usually
it should be (because you don't want your clocks to go wrong on
migration of a production system).  However in neither case should be
the guest be involved.

There may be guest side code related to this in Linux, but that's
probably for migration under pHyp, which is a guest aware migration
system.  That's essentially unrelated to migration under qemu/kvm,
which is a guest unaware system.

Guest aware migration has some nice-sounding advantages; in particular
itcan allow migrations across a heterogenous cluster with differences
between hosts that the hypervisor can't hide, or can't efficiently
hide.  However, it is IMO, a deeply broken approach, because it can
allow an un-cooperative guest to indefinitely block migration, and for
it to be reliably correct it requires *much* more pinning down of
exactly what host system changes the guest can and can't be expected
to cope with than PAPR has ever bothered to do.
Fabiano Rosas Feb. 25, 2022, 4:11 p.m. UTC | #3
Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:

> On 24/02/2022 18:58, Fabiano Rosas wrote:
>
>> This series implements the migration for a TCG pseries guest running a
>> nested KVM guest. This is just like migrating a pseries TCG guest, but
>> with some extra state to allow a nested guest to continue to run on
>> the destination.
>> 
>> Unfortunately the regular TCG migration scenario (not nested) is not
>> fully working so I cannot be entirely sure the nested migration is
>> correct. I have included a couple of patches for the general migration
>> case that (I think?) improve the situation a bit, but I'm still seeing
>> hard lockups and other issues with more than 1 vcpu.
>> 
>> This is more of an early RFC to see if anyone spots something right
>> away. I haven't made much progress in debugging the general TCG
>> migration case so if anyone has any input there as well I'd appreciate
>> it.
>> 
>> Thanks
>> 
>> Fabiano Rosas (4):
>>    target/ppc: TCG: Migrate tb_offset and decr
>>    spapr: TCG: Migrate spapr_cpu->prod
>>    hw/ppc: Take nested guest into account when saving timebase
>>    spapr: Add KVM-on-TCG migration support
>> 
>>   hw/ppc/ppc.c                    | 17 +++++++-
>>   hw/ppc/spapr.c                  | 19 ++++++++
>>   hw/ppc/spapr_cpu_core.c         | 77 +++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr_cpu_core.h |  2 +-
>>   target/ppc/machine.c            | 61 ++++++++++++++++++++++++++
>>   5 files changed, 174 insertions(+), 2 deletions(-)
>
> FWIW I noticed there were some issues with migrating the decrementer on Mac machines 
> a while ago which causes a hang on the destination with TCG (for MacOS on a x86 host 
> in my case). Have a look at the following threads for reference:
>
> https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg00546.html
> https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04622.html

Thanks, Mark! There's a lot of helpful information in these threads.

> IIRC there is code that assumes any migration in PPC is being done
>live, and so adjusts the timebase on the destination to reflect wall
>clock time by recalculating tb_offset. I haven't looked at the code for
>a while but I think the outcome was that there needs to be 2 phases in
>migration: the first is to migrate the timebase as-is for guests that
>are paused during migration, whilst the second is to notify
>hypervisor-aware guest OSs such as Linux to make the timebase
>adjustment if required if the guest is running.


>
>
> ATB,
>
> Mark.