mbox series

[RFC,0/6] KVM: arm64: Errata management for VM Live migration

Message ID 20241011075053.80540-1-shameerali.kolothum.thodi@huawei.com (mailing list archive)
Headers show
Series KVM: arm64: Errata management for VM Live migration | expand

Message

Shameerali Kolothum Thodi Oct. 11, 2024, 7:50 a.m. UTC
Hi,

On ARM64 platforms most of the errata workarounds are based on CPU
MIDR/REVIDR values and a number of these workarounds need to be
implemented by the Guest kernel as well. This creates a problem when
Guest needs to be migrated to a platform that differs in these
MIDR/REVIDR values even if the VMM can come up with a common minimum
feature list for the Guest using the recently introduced "Writable
ID registers" support.

(This is roughly based on a discussion I had with Marc and Oliver
at KVM forum. Marc outlined his idea for a solution and this is an
attempt to implement it. Thanks to both and I take all the blame
if this is nowhere near what is intended/required)

This RFC proposes a solution to handle the above issue by introducing
the following,

1. A new VM IOCTL,
   KVM_ARM_SET_MIGRN_TARGET_CPUS  _IOW(KVMIO,  0xb7, struct kvm_arm_migrn_cpus)
   This can be used by the userspace(VMM) to set the target CPUs the
   Guest will run in its lifetime. See patch #2
2. Add hypercall support for Guest kernel to retrieve any migration
   errata bitmap(ARM_SMCCC_VENDOR_HYP_KVM_MIGRN_ERRATA)
   The above will return the bitmaps in R0-R3 registers. See patch #4
3. The "capability" field in struct arm64_cpu_capabilities is a generated
   one at present and may get renumbered or reordered. Hence, we can't use
   this directly for migration errata bitmaps. Instead, introduced
   "migartion_safe_cap", which has to be set statically for any
   erratum that needs to be enabled and is safe for migration
   purposes. See patches 3 & 6.
4. Rest of the patches includes the plumbing required to populate the
   errata bitmap based on the target CPUs set by the VMM and update the
   system_cap based on it.

ToDos:-
  -We still need a way to  handle the error in setting the invariant
   registers(MIDR/REVIDR/AIDR) during Guest migration. Perhaps we can
   handle it in userspace?
-  Possibly we could do better to avoid the additional "migartion_safe_cap" use.
   Suggestions welcome.
  -There are errata that require more than MIDR/REVIDR, eg: CTR_EL0.
   How to handle those?
  -Check for locking requirements if any.

This is lightly tested on a HiSilicon ARM64 platform.

Please take a look and let me know your thoughts.

Thanks,
Shameer

Shameer Kolothum (6):
  arm64: Modify callback matches() fn to take a target info
  KVM: arm64: Add support for VMM to set migration target
  KVM: arm64: Introduce a helper to retrieve errata
  KVM: arm64: Add hypercall support for retrieving migration errata
    bitmap
  arm64: Use hypercall to check for any migration related errata
  arm64: errata: Set migration_safe_cap for MIDR based errata

 arch/arm64/include/asm/cpu_migrn_errata.h |  42 +++++++
 arch/arm64/include/asm/cpufeature.h       |  15 ++-
 arch/arm64/include/asm/kvm_host.h         |   3 +
 arch/arm64/include/asm/paravirt.h         |   4 +
 arch/arm64/include/asm/spectre.h          |   8 +-
 arch/arm64/include/uapi/asm/kvm.h         |  13 ++
 arch/arm64/kernel/cpu_errata.c            | 113 ++++++++++++++----
 arch/arm64/kernel/cpufeature.c            | 138 +++++++++++++++-------
 arch/arm64/kernel/paravirt.c              |  18 +++
 arch/arm64/kernel/proton-pack.c           |  13 +-
 arch/arm64/kvm/arm.c                      |  38 ++++++
 arch/arm64/kvm/hypercalls.c               |  20 ++++
 include/linux/arm-smccc.h                 |   8 +-
 include/uapi/linux/kvm.h                  |   2 +
 14 files changed, 361 insertions(+), 74 deletions(-)
 create mode 100644 arch/arm64/include/asm/cpu_migrn_errata.h

Comments

Marc Zyngier Oct. 11, 2024, 10:37 a.m. UTC | #1
Hi Shameer,

Thanks for getting the ball rolling on this one, much appreciated.

On Fri, 11 Oct 2024 08:50:47 +0100,
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:
> 
> Hi,
> 
> On ARM64 platforms most of the errata workarounds are based on CPU
> MIDR/REVIDR values and a number of these workarounds need to be
> implemented by the Guest kernel as well. This creates a problem when
> Guest needs to be migrated to a platform that differs in these
> MIDR/REVIDR values even if the VMM can come up with a common minimum
> feature list for the Guest using the recently introduced "Writable
> ID registers" support.
> 
> (This is roughly based on a discussion I had with Marc and Oliver
> at KVM forum. Marc outlined his idea for a solution and this is an
> attempt to implement it. Thanks to both and I take all the blame
> if this is nowhere near what is intended/required)
> 
> This RFC proposes a solution to handle the above issue by introducing
> the following,
> 
> 1. A new VM IOCTL,
>    KVM_ARM_SET_MIGRN_TARGET_CPUS  _IOW(KVMIO,  0xb7, struct kvm_arm_migrn_cpus)
>    This can be used by the userspace(VMM) to set the target CPUs the
>    Guest will run in its lifetime. See patch #2
> 2. Add hypercall support for Guest kernel to retrieve any migration
>    errata bitmap(ARM_SMCCC_VENDOR_HYP_KVM_MIGRN_ERRATA)
>    The above will return the bitmaps in R0-R3 registers. See patch #4
> 3. The "capability" field in struct arm64_cpu_capabilities is a generated
>    one at present and may get renumbered or reordered. Hence, we can't use
>    this directly for migration errata bitmaps. Instead, introduced
>    "migartion_safe_cap", which has to be set statically for any
>    erratum that needs to be enabled and is safe for migration
>    purposes. See patches 3 & 6.
> 4. Rest of the patches includes the plumbing required to populate the
>    errata bitmap based on the target CPUs set by the VMM and update the
>    system_cap based on it.
> 
> ToDos:-
>   -We still need a way to  handle the error in setting the invariant
>    registers(MIDR/REVIDR/AIDR) during Guest migration. Perhaps we can
>    handle it in userspace?
> -  Possibly we could do better to avoid the additional "migartion_safe_cap" use.
>    Suggestions welcome.
>   -There are errata that require more than MIDR/REVIDR, eg: CTR_EL0.
>    How to handle those?
>   -Check for locking requirements if any.
> 
> This is lightly tested on a HiSilicon ARM64 platform.
> 
> Please take a look and let me know your thoughts.

Having eyeballed this very superficially, I think we can do something
simpler, and maybe more future-proof:

- I don't think KVM should be concerned about the description of the
  target CPUs. The hypercall you defined is the right thing to do,
  but the VMM should completely handle it. That's an implementation
  detail, but it would make things much simpler.

- I don't think the "errata bitmap" works. That's a construct that is
  specific to Linux, and that cannot be supported for other OSs. It
  also limits the described issues to those the host knows, instead of
  the guest. The host doesn't have a clue what the guest really wants.
  Really, the guest should have enough information to decide what to
  do based on its own view of the ID registers and the list of CPUs it
  runs on.

- To answer your question about CTR_EL0: KVM should (and does)
  sanitise that register by trapping it. This should be the default
  behaviour for things that need to be mitigated outside of
  MIDR/REVIDR.

Thanks,

	M.
Shameerali Kolothum Thodi Oct. 11, 2024, 10:57 a.m. UTC | #2
> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: Friday, October 11, 2024 11:37 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: kvmarm@lists.linux.dev; oliver.upton@linux.dev;
> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com;
> cohuck@redhat.com; eric.auger@redhat.com; yuzenghui
> <yuzenghui@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Anthony Jebson
> <anthony.jebson@huawei.com>; linux-arm-kernel@lists.infradead.org;
> Linuxarm <linuxarm@huawei.com>
> Subject: Re: [RFC PATCH 0/6] KVM: arm64: Errata management for VM Live
> migration
> 
> Hi Shameer,
> 
> Thanks for getting the ball rolling on this one, much appreciated.
> 
> On Fri, 11 Oct 2024 08:50:47 +0100,
> Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:
> >
> > Hi,
> >
> > On ARM64 platforms most of the errata workarounds are based on CPU
> > MIDR/REVIDR values and a number of these workarounds need to be
> > implemented by the Guest kernel as well. This creates a problem when
> > Guest needs to be migrated to a platform that differs in these
> > MIDR/REVIDR values even if the VMM can come up with a common
> minimum
> > feature list for the Guest using the recently introduced "Writable
> > ID registers" support.
> >
> > (This is roughly based on a discussion I had with Marc and Oliver
> > at KVM forum. Marc outlined his idea for a solution and this is an
> > attempt to implement it. Thanks to both and I take all the blame
> > if this is nowhere near what is intended/required)
> >
> > This RFC proposes a solution to handle the above issue by introducing
> > the following,
> >
> > 1. A new VM IOCTL,
> >    KVM_ARM_SET_MIGRN_TARGET_CPUS  _IOW(KVMIO,  0xb7, struct
> kvm_arm_migrn_cpus)
> >    This can be used by the userspace(VMM) to set the target CPUs the
> >    Guest will run in its lifetime. See patch #2
> > 2. Add hypercall support for Guest kernel to retrieve any migration
> >    errata bitmap(ARM_SMCCC_VENDOR_HYP_KVM_MIGRN_ERRATA)
> >    The above will return the bitmaps in R0-R3 registers. See patch #4
> > 3. The "capability" field in struct arm64_cpu_capabilities is a generated
> >    one at present and may get renumbered or reordered. Hence, we can't
> use
> >    this directly for migration errata bitmaps. Instead, introduced
> >    "migartion_safe_cap", which has to be set statically for any
> >    erratum that needs to be enabled and is safe for migration
> >    purposes. See patches 3 & 6.
> > 4. Rest of the patches includes the plumbing required to populate the
> >    errata bitmap based on the target CPUs set by the VMM and update the
> >    system_cap based on it.
> >
> > ToDos:-
> >   -We still need a way to  handle the error in setting the invariant
> >    registers(MIDR/REVIDR/AIDR) during Guest migration. Perhaps we can
> >    handle it in userspace?
> > -  Possibly we could do better to avoid the additional
> "migartion_safe_cap" use.
> >    Suggestions welcome.
> >   -There are errata that require more than MIDR/REVIDR, eg: CTR_EL0.
> >    How to handle those?
> >   -Check for locking requirements if any.
> >
> > This is lightly tested on a HiSilicon ARM64 platform.
> >
> > Please take a look and let me know your thoughts.
> 
> Having eyeballed this very superficially, I think we can do something
> simpler, and maybe more future-proof:

Thanks Marc for taking a look and the quick feedback.

> 
> - I don't think KVM should be concerned about the description of the
>   target CPUs. The hypercall you defined is the right thing to do,
>   but the VMM should completely handle it. That's an implementation
>   detail, but it would make things much simpler.

Ok. So does that mean the hypercall will use some sort of shared memory
to retrieve the list of target CPUs from VMM?
 
> - I don't think the "errata bitmap" works. That's a construct that is
>   specific to Linux, and that cannot be supported for other OSs. It
>   also limits the described issues to those the host knows, instead of
>   the guest. The host doesn't have a clue what the guest really wants.
>   Really, the guest should have enough information to decide what to
>   do based on its own view of the ID registers and the list of CPUs it
>   runs on.

Yes. "errata bitmap" is specific to Linux. So if we go with the above 
hypercall-->VMM path and get the target CPU list, Guest can directly
use that.

> 
> - To answer your question about CTR_EL0: KVM should (and does)
>   sanitise that register by trapping it. This should be the default
>   behaviour for things that need to be mitigated outside of
>   MIDR/REVIDR.

Ok. Make sense and simplifies things.

Please let me know whether my understanding on hypercall<->VMM is 
correct or not. I can take a look at that route.

Thanks,
Shameer
Marc Zyngier Oct. 11, 2024, 11:43 a.m. UTC | #3
On Fri, 11 Oct 2024 11:57:10 +0100,
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:
> > >
> > > Please take a look and let me know your thoughts.
> > 
> > Having eyeballed this very superficially, I think we can do something
> > simpler, and maybe more future-proof:
> 
> Thanks Marc for taking a look and the quick feedback.

No worries, that's the least I could do given that you put the effort
implementing my silly ideas! ;-)

> > - I don't think KVM should be concerned about the description of the
> >   target CPUs. The hypercall you defined is the right thing to do,
> >   but the VMM should completely handle it. That's an implementation
> >   detail, but it would make things much simpler.
> 
> Ok. So does that mean the hypercall will use some sort of shared memory
> to retrieve the list of target CPUs from VMM?

Two possibilities:

- either shared memory, in which case the hypercall would require the
  guest to give an IPA and size for the VMM to write its stuff into
  the guest memory,

- or more simply return the data as an MIDR/REVIDR pair in registers,
  the guest requesting an index, and getting an error when out of
  range, leaving it with the freedom to organise the storage.

The second option is a bit slower, but way simpler, and it only
happens once per guest boot, so it would probably be my preferred
option unless this is proved to be impractical.

>  
> > - I don't think the "errata bitmap" works. That's a construct that is
> >   specific to Linux, and that cannot be supported for other OSs. It
> >   also limits the described issues to those the host knows, instead of
> >   the guest. The host doesn't have a clue what the guest really wants.
> >   Really, the guest should have enough information to decide what to
> >   do based on its own view of the ID registers and the list of CPUs it
> >   runs on.
> 
> Yes. "errata bitmap" is specific to Linux. So if we go with the above 
> hypercall-->VMM path and get the target CPU list, Guest can directly
> use that.

Indeed, and this is something that is much easier to standardise upon,
as I really want this to be a universal mechanism.

> > - To answer your question about CTR_EL0: KVM should (and does)
> >   sanitise that register by trapping it. This should be the default
> >   behaviour for things that need to be mitigated outside of
> >   MIDR/REVIDR.
> 
> Ok. Make sense and simplifies things.
> 
> Please let me know whether my understanding on hypercall<->VMM is 
> correct or not. I can take a look at that route.

Yes, I think we are aligned on it.

Thanks again,

	M.
Cornelia Huck Oct. 11, 2024, 1:17 p.m. UTC | #4
On Fri, Oct 11 2024, Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:

> Hi,
>
> On ARM64 platforms most of the errata workarounds are based on CPU
> MIDR/REVIDR values and a number of these workarounds need to be
> implemented by the Guest kernel as well. This creates a problem when
> Guest needs to be migrated to a platform that differs in these
> MIDR/REVIDR values even if the VMM can come up with a common minimum
> feature list for the Guest using the recently introduced "Writable
> ID registers" support.
>
> (This is roughly based on a discussion I had with Marc and Oliver
> at KVM forum. Marc outlined his idea for a solution and this is an
> attempt to implement it. Thanks to both and I take all the blame
> if this is nowhere near what is intended/required)
>
> This RFC proposes a solution to handle the above issue by introducing
> the following,
>
> 1. A new VM IOCTL,
>    KVM_ARM_SET_MIGRN_TARGET_CPUS  _IOW(KVMIO,  0xb7, struct kvm_arm_migrn_cpus)
>    This can be used by the userspace(VMM) to set the target CPUs the
>    Guest will run in its lifetime. See patch #2
> 2. Add hypercall support for Guest kernel to retrieve any migration
>    errata bitmap(ARM_SMCCC_VENDOR_HYP_KVM_MIGRN_ERRATA)
>    The above will return the bitmaps in R0-R3 registers. See patch #4
> 3. The "capability" field in struct arm64_cpu_capabilities is a generated
>    one at present and may get renumbered or reordered. Hence, we can't use
>    this directly for migration errata bitmaps. Instead, introduced
>    "migartion_safe_cap", which has to be set statically for any
>    erratum that needs to be enabled and is safe for migration
>    purposes. See patches 3 & 6.
> 4. Rest of the patches includes the plumbing required to populate the
>    errata bitmap based on the target CPUs set by the VMM and update the
>    system_cap based on it.
>
> ToDos:-
>   -We still need a way to  handle the error in setting the invariant
>    registers(MIDR/REVIDR/AIDR) during Guest migration. Perhaps we can
>    handle it in userspace?
> -  Possibly we could do better to avoid the additional "migartion_safe_cap" use.
>    Suggestions welcome.
>   -There are errata that require more than MIDR/REVIDR, eg: CTR_EL0.
>    How to handle those?
>   -Check for locking requirements if any.
>
> This is lightly tested on a HiSilicon ARM64 platform.
>
> Please take a look and let me know your thoughts.

So, I've only taken a very quick look at it, but IIUC, the idea is for
the VMM to do the following:
- figure out where we want to possibly run
- figure out the least common denominator for writable features
- tell KVM about the possible target cpu resp. the errata it wants
- build a frankencpu via the writable id reg infrastructure, fiddling
  with invariant handling as needed?

(Sorry, it's late in the week :)
Shameerali Kolothum Thodi Oct. 11, 2024, 1:24 p.m. UTC | #5
> -----Original Message-----
> From: Cornelia Huck <cohuck@redhat.com>
> Sent: Friday, October 11, 2024 2:18 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; kvmarm@lists.linux.dev;
> maz@kernel.org; oliver.upton@linux.dev
> Cc: catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com;
> eric.auger@redhat.com; yuzenghui <yuzenghui@huawei.com>; Wangzhou
> (B) <wangzhou1@hisilicon.com>; jiangkunkun
> <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Anthony Jebson
> <anthony.jebson@huawei.com>; linux-arm-kernel@lists.infradead.org;
> Linuxarm <linuxarm@huawei.com>
> Subject: Re: [RFC PATCH 0/6] KVM: arm64: Errata management for VM Live
> migration
> 
> On Fri, Oct 11 2024, Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com> wrote:
> 
> > Hi,
> >
> > On ARM64 platforms most of the errata workarounds are based on CPU
> > MIDR/REVIDR values and a number of these workarounds need to be
> > implemented by the Guest kernel as well. This creates a problem when
> > Guest needs to be migrated to a platform that differs in these
> > MIDR/REVIDR values even if the VMM can come up with a common
> minimum
> > feature list for the Guest using the recently introduced "Writable
> > ID registers" support.
> >
> > (This is roughly based on a discussion I had with Marc and Oliver
> > at KVM forum. Marc outlined his idea for a solution and this is an
> > attempt to implement it. Thanks to both and I take all the blame
> > if this is nowhere near what is intended/required)
> >
> > This RFC proposes a solution to handle the above issue by introducing
> > the following,
> >
> > 1. A new VM IOCTL,
> >    KVM_ARM_SET_MIGRN_TARGET_CPUS  _IOW(KVMIO,  0xb7, struct
> kvm_arm_migrn_cpus)
> >    This can be used by the userspace(VMM) to set the target CPUs the
> >    Guest will run in its lifetime. See patch #2
> > 2. Add hypercall support for Guest kernel to retrieve any migration
> >    errata bitmap(ARM_SMCCC_VENDOR_HYP_KVM_MIGRN_ERRATA)
> >    The above will return the bitmaps in R0-R3 registers. See patch #4
> > 3. The "capability" field in struct arm64_cpu_capabilities is a generated
> >    one at present and may get renumbered or reordered. Hence, we can't
> use
> >    this directly for migration errata bitmaps. Instead, introduced
> >    "migartion_safe_cap", which has to be set statically for any
> >    erratum that needs to be enabled and is safe for migration
> >    purposes. See patches 3 & 6.
> > 4. Rest of the patches includes the plumbing required to populate the
> >    errata bitmap based on the target CPUs set by the VMM and update the
> >    system_cap based on it.
> >
> > ToDos:-
> >   -We still need a way to  handle the error in setting the invariant
> >    registers(MIDR/REVIDR/AIDR) during Guest migration. Perhaps we can
> >    handle it in userspace?
> > -  Possibly we could do better to avoid the additional
> "migartion_safe_cap" use.
> >    Suggestions welcome.
> >   -There are errata that require more than MIDR/REVIDR, eg: CTR_EL0.
> >    How to handle those?
> >   -Check for locking requirements if any.
> >
> > This is lightly tested on a HiSilicon ARM64 platform.
> >
> > Please take a look and let me know your thoughts.
> 
> So, I've only taken a very quick look at it, but IIUC, the idea is for
> the VMM to do the following:
> - figure out where we want to possibly run


Yes.

> - figure out the least common denominator for writable features

Yes

> - tell KVM about the possible target cpu resp. the errata it wants

This might change as per Marc's comments. VMM has to handle hypercall
directly and provide the target CPU list to Guest.

> - build a frankencpu via the writable id reg infrastructure, fiddling
>   with invariant handling as needed?

That is my idea of handling the invariant registers. If the VMM has provided a list of 
target CPUs for the Guest,  make sure the incoming CPU is part of the list 
during migration and ignore invariant(MIDR/REVIDR) reg SET errors.

Thanks,
Shameer
Cornelia Huck Oct. 11, 2024, 2:14 p.m. UTC | #6
On Fri, Oct 11 2024, Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:

>> -----Original Message-----
>> From: Cornelia Huck <cohuck@redhat.com>
>> Sent: Friday, October 11, 2024 2:18 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; kvmarm@lists.linux.dev;
>> maz@kernel.org; oliver.upton@linux.dev
>> Cc: catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com;
>> eric.auger@redhat.com; yuzenghui <yuzenghui@huawei.com>; Wangzhou
>> (B) <wangzhou1@hisilicon.com>; jiangkunkun
>> <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; Anthony Jebson
>> <anthony.jebson@huawei.com>; linux-arm-kernel@lists.infradead.org;
>> Linuxarm <linuxarm@huawei.com>
>> Subject: Re: [RFC PATCH 0/6] KVM: arm64: Errata management for VM Live
>> migration
>> 
>> On Fri, Oct 11 2024, Shameer Kolothum
>> <shameerali.kolothum.thodi@huawei.com> wrote:
>> 
>> > Hi,
>> >
>> > On ARM64 platforms most of the errata workarounds are based on CPU
>> > MIDR/REVIDR values and a number of these workarounds need to be
>> > implemented by the Guest kernel as well. This creates a problem when
>> > Guest needs to be migrated to a platform that differs in these
>> > MIDR/REVIDR values even if the VMM can come up with a common
>> minimum
>> > feature list for the Guest using the recently introduced "Writable
>> > ID registers" support.
>> >
>> > (This is roughly based on a discussion I had with Marc and Oliver
>> > at KVM forum. Marc outlined his idea for a solution and this is an
>> > attempt to implement it. Thanks to both and I take all the blame
>> > if this is nowhere near what is intended/required)
>> >
>> > This RFC proposes a solution to handle the above issue by introducing
>> > the following,
>> >
>> > 1. A new VM IOCTL,
>> >    KVM_ARM_SET_MIGRN_TARGET_CPUS  _IOW(KVMIO,  0xb7, struct
>> kvm_arm_migrn_cpus)
>> >    This can be used by the userspace(VMM) to set the target CPUs the
>> >    Guest will run in its lifetime. See patch #2
>> > 2. Add hypercall support for Guest kernel to retrieve any migration
>> >    errata bitmap(ARM_SMCCC_VENDOR_HYP_KVM_MIGRN_ERRATA)
>> >    The above will return the bitmaps in R0-R3 registers. See patch #4
>> > 3. The "capability" field in struct arm64_cpu_capabilities is a generated
>> >    one at present and may get renumbered or reordered. Hence, we can't
>> use
>> >    this directly for migration errata bitmaps. Instead, introduced
>> >    "migartion_safe_cap", which has to be set statically for any
>> >    erratum that needs to be enabled and is safe for migration
>> >    purposes. See patches 3 & 6.
>> > 4. Rest of the patches includes the plumbing required to populate the
>> >    errata bitmap based on the target CPUs set by the VMM and update the
>> >    system_cap based on it.
>> >
>> > ToDos:-
>> >   -We still need a way to  handle the error in setting the invariant
>> >    registers(MIDR/REVIDR/AIDR) during Guest migration. Perhaps we can
>> >    handle it in userspace?
>> > -  Possibly we could do better to avoid the additional
>> "migartion_safe_cap" use.
>> >    Suggestions welcome.
>> >   -There are errata that require more than MIDR/REVIDR, eg: CTR_EL0.
>> >    How to handle those?
>> >   -Check for locking requirements if any.
>> >
>> > This is lightly tested on a HiSilicon ARM64 platform.
>> >
>> > Please take a look and let me know your thoughts.
>> 
>> So, I've only taken a very quick look at it, but IIUC, the idea is for
>> the VMM to do the following:
>> - figure out where we want to possibly run
>
>
> Yes.
>
>> - figure out the least common denominator for writable features
>
> Yes
>
>> - tell KVM about the possible target cpu resp. the errata it wants
>
> This might change as per Marc's comments. VMM has to handle hypercall
> directly and provide the target CPU list to Guest.
>
>> - build a frankencpu via the writable id reg infrastructure, fiddling
>>   with invariant handling as needed?
>
> That is my idea of handling the invariant registers. If the VMM has provided a list of 
> target CPUs for the Guest,  make sure the incoming CPU is part of the list 
> during migration and ignore invariant(MIDR/REVIDR) reg SET errors.

Thanks!

I hope I'll be able to spend some time on this next week.
Oliver Upton Oct. 11, 2024, 3:11 p.m. UTC | #7
On Fri, Oct 11, 2024 at 12:43:28PM +0100, Marc Zyngier wrote:
> On Fri, 11 Oct 2024 11:57:10 +0100, Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:
> > > >
> > > > Please take a look and let me know your thoughts.
> > > 
> > > Having eyeballed this very superficially, I think we can do something
> > > simpler, and maybe more future-proof:
> > 
> > Thanks Marc for taking a look and the quick feedback.
> 
> No worries, that's the least I could do given that you put the effort
> implementing my silly ideas! ;-)
> 
> > > - I don't think KVM should be concerned about the description of the
> > >   target CPUs. The hypercall you defined is the right thing to do,
> > >   but the VMM should completely handle it. That's an implementation
> > >   detail, but it would make things much simpler.
> > 
> > Ok. So does that mean the hypercall will use some sort of shared memory
> > to retrieve the list of target CPUs from VMM?
> 
> Two possibilities:
> 
> - either shared memory, in which case the hypercall would require the
>   guest to give an IPA and size for the VMM to write its stuff into
>   the guest memory,
> 
> - or more simply return the data as an MIDR/REVIDR pair in registers,
>   the guest requesting an index, and getting an error when out of
>   range, leaving it with the freedom to organise the storage.
> 
> The second option is a bit slower, but way simpler, and it only
> happens once per guest boot, so it would probably be my preferred
> option unless this is proved to be impractical.

Also worth noting there's existing UAPI [*] for allowing userspace to
register range(s) of hypercalls that it services directly. It's a bit
weird that we'd allow userspace to do stuff in KVM's own hypercall
range, but I don't think it really matters at this point since this is
all prototyping.

[*]: https://docs.kernel.org/virt/kvm/devices/vm.html#attribute-kvm-arm-vm-smccc-filter-w-o
Shameerali Kolothum Thodi Oct. 11, 2024, 3:51 p.m. UTC | #8
> -----Original Message-----
> From: Oliver Upton <oliver.upton@linux.dev>
> Sent: Friday, October 11, 2024 4:11 PM
> To: Marc Zyngier <maz@kernel.org>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; kvmarm@lists.linux.dev;
> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com;
> cohuck@redhat.com; eric.auger@redhat.com; yuzenghui
> <yuzenghui@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Anthony Jebson
> <anthony.jebson@huawei.com>; linux-arm-kernel@lists.infradead.org;
> Linuxarm <linuxarm@huawei.com>
> Subject: Re: [RFC PATCH 0/6] KVM: arm64: Errata management for VM Live
> migration
> 
> On Fri, Oct 11, 2024 at 12:43:28PM +0100, Marc Zyngier wrote:
> > On Fri, 11 Oct 2024 11:57:10 +0100, Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com> wrote:
> > > > >
> > > > > Please take a look and let me know your thoughts.
> > > >
> > > > Having eyeballed this very superficially, I think we can do something
> > > > simpler, and maybe more future-proof:
> > >
> > > Thanks Marc for taking a look and the quick feedback.
> >
> > No worries, that's the least I could do given that you put the effort
> > implementing my silly ideas! ;-)
> >
> > > > - I don't think KVM should be concerned about the description of the
> > > >   target CPUs. The hypercall you defined is the right thing to do,
> > > >   but the VMM should completely handle it. That's an implementation
> > > >   detail, but it would make things much simpler.
> > >
> > > Ok. So does that mean the hypercall will use some sort of shared
> memory
> > > to retrieve the list of target CPUs from VMM?
> >
> > Two possibilities:
> >
> > - either shared memory, in which case the hypercall would require the
> >   guest to give an IPA and size for the VMM to write its stuff into
> >   the guest memory,
> >
> > - or more simply return the data as an MIDR/REVIDR pair in registers,
> >   the guest requesting an index, and getting an error when out of
> >   range, leaving it with the freedom to organise the storage.
> >
> > The second option is a bit slower, but way simpler, and it only
> > happens once per guest boot, so it would probably be my preferred
> > option unless this is proved to be impractical.
> 
> Also worth noting there's existing UAPI [*] for allowing userspace to
> register range(s) of hypercalls that it services directly. It's a bit
> weird that we'd allow userspace to do stuff in KVM's own hypercall
> range, but I don't think it really matters at this point since this is
> all prototyping.
> 
> [*]: https://docs.kernel.org/virt/kvm/devices/vm.html#attribute-kvm-arm-
> vm-smccc-filter-w-o

Thanks. Yes and there are attempts to add that handling in Qemu[*] in the context
of vCPU hotplug support(PSCI related ones though).  Will take a look.

Thanks,
Shameer
[1] https://lore.kernel.org/qemu-devel/20241009033704.250287-1-salil.mehta@huawei.com/
Oliver Upton Oct. 11, 2024, 5:07 p.m. UTC | #9
Hey Shameer,

Thank you for posting this series. This'll help might cross-host
migrations on arm64 a little less sketch :)

On Fri, Oct 11, 2024 at 08:50:47AM +0100, Shameer Kolothum wrote:
> Hi,
> 
> On ARM64 platforms most of the errata workarounds are based on CPU
> MIDR/REVIDR values and a number of these workarounds need to be
> implemented by the Guest kernel as well. This creates a problem when
> Guest needs to be migrated to a platform that differs in these
> MIDR/REVIDR values even if the VMM can come up with a common minimum
> feature list for the Guest using the recently introduced "Writable
> ID registers" support.
> 
> (This is roughly based on a discussion I had with Marc and Oliver
> at KVM forum. Marc outlined his idea for a solution and this is an
> attempt to implement it. Thanks to both and I take all the blame
> if this is nowhere near what is intended/required)
> 
> This RFC proposes a solution to handle the above issue by introducing
> the following,
> 
> 1. A new VM IOCTL,
>    KVM_ARM_SET_MIGRN_TARGET_CPUS  _IOW(KVMIO,  0xb7, struct kvm_arm_migrn_cpus)
>    This can be used by the userspace(VMM) to set the target CPUs the
>    Guest will run in its lifetime. See patch #2
> 2. Add hypercall support for Guest kernel to retrieve any migration
>    errata bitmap(ARM_SMCCC_VENDOR_HYP_KVM_MIGRN_ERRATA)
>    The above will return the bitmaps in R0-R3 registers. See patch #4
> 3. The "capability" field in struct arm64_cpu_capabilities is a generated
>    one at present and may get renumbered or reordered. Hence, we can't use
>    this directly for migration errata bitmaps. Instead, introduced
>    "migartion_safe_cap", which has to be set statically for any
>    erratum that needs to be enabled and is safe for migration
>    purposes. See patches 3 & 6.
> 4. Rest of the patches includes the plumbing required to populate the
>    errata bitmap based on the target CPUs set by the VMM and update the
>    system_cap based on it.
> 
> ToDos:-
>   -We still need a way to  handle the error in setting the invariant
>    registers(MIDR/REVIDR/AIDR) during Guest migration. Perhaps we can
>    handle it in userspace?

This seems entirely reasonable.

There's nothing KVM can do to unilaterally prevent userspace from
migrating VMs between different implementations already, and the
invariance of MIDR/REVIDR is just a speed bump and not an outright
blocker.

Ultimately this is now a contract between the VMM and guest, so we may
as well let userspace advertise whatever implementation it wants in the
ID registers.


> Please take a look and let me know your thoughts.

Looking at the guest side of things, it'd be nice if we hide the entire
MIDR abstraction behind is_midr_in_range() et al and stop letting
callers provide the MIDR. Internally that can find the MIDR of the
current CPU or walk the array of MIDRs supplied by the hypervisor.

Beyond that, we probably need a way to clue in userspace on the game in
case it has some behaviors keyed off MIDR too.
Cornelia Huck Oct. 17, 2024, 3:49 p.m. UTC | #10
On Fri, Oct 11 2024, Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:

>> -----Original Message-----
>> From: Oliver Upton <oliver.upton@linux.dev>
>> Sent: Friday, October 11, 2024 4:11 PM
>> To: Marc Zyngier <maz@kernel.org>
>> Cc: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; kvmarm@lists.linux.dev;
>> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com;
>> cohuck@redhat.com; eric.auger@redhat.com; yuzenghui
>> <yuzenghui@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; Anthony Jebson
>> <anthony.jebson@huawei.com>; linux-arm-kernel@lists.infradead.org;
>> Linuxarm <linuxarm@huawei.com>
>> Subject: Re: [RFC PATCH 0/6] KVM: arm64: Errata management for VM Live
>> migration
>> 
>> On Fri, Oct 11, 2024 at 12:43:28PM +0100, Marc Zyngier wrote:
>> > On Fri, 11 Oct 2024 11:57:10 +0100, Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com> wrote:
>> > > > >
>> > > > > Please take a look and let me know your thoughts.
>> > > >
>> > > > Having eyeballed this very superficially, I think we can do something
>> > > > simpler, and maybe more future-proof:
>> > >
>> > > Thanks Marc for taking a look and the quick feedback.
>> >
>> > No worries, that's the least I could do given that you put the effort
>> > implementing my silly ideas! ;-)
>> >
>> > > > - I don't think KVM should be concerned about the description of the
>> > > >   target CPUs. The hypercall you defined is the right thing to do,
>> > > >   but the VMM should completely handle it. That's an implementation
>> > > >   detail, but it would make things much simpler.
>> > >
>> > > Ok. So does that mean the hypercall will use some sort of shared
>> memory
>> > > to retrieve the list of target CPUs from VMM?
>> >
>> > Two possibilities:
>> >
>> > - either shared memory, in which case the hypercall would require the
>> >   guest to give an IPA and size for the VMM to write its stuff into
>> >   the guest memory,
>> >
>> > - or more simply return the data as an MIDR/REVIDR pair in registers,
>> >   the guest requesting an index, and getting an error when out of
>> >   range, leaving it with the freedom to organise the storage.
>> >
>> > The second option is a bit slower, but way simpler, and it only
>> > happens once per guest boot, so it would probably be my preferred
>> > option unless this is proved to be impractical.
>> 
>> Also worth noting there's existing UAPI [*] for allowing userspace to
>> register range(s) of hypercalls that it services directly. It's a bit
>> weird that we'd allow userspace to do stuff in KVM's own hypercall
>> range, but I don't think it really matters at this point since this is
>> all prototyping.
>> 
>> [*]: https://docs.kernel.org/virt/kvm/devices/vm.html#attribute-kvm-arm-
>> vm-smccc-filter-w-o
>
> Thanks. Yes and there are attempts to add that handling in Qemu[*] in the context
> of vCPU hotplug support(PSCI related ones though).  Will take a look.
>
> Thanks,
> Shameer
> [1] https://lore.kernel.org/qemu-devel/20241009033704.250287-1-salil.mehta@huawei.com/

Speaking of QEMU: Do you maybe already have some prototype code that
tries to do something with the setup here? (I don't think QEMU currently
mucks around with MIDR and friends when running with KVM; I wonder what
it should provide to the guest and if it should care to set something as
a base level that gives guests not using the hypercall a chance to work
properly.)
Eric Auger Oct. 17, 2024, 5:16 p.m. UTC | #11
Hi Shameer,

On 10/17/24 17:49, Cornelia Huck wrote:
> On Fri, Oct 11 2024, Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:
>
>>> -----Original Message-----
>>> From: Oliver Upton <oliver.upton@linux.dev>
>>> Sent: Friday, October 11, 2024 4:11 PM
>>> To: Marc Zyngier <maz@kernel.org>
>>> Cc: Shameerali Kolothum Thodi
>>> <shameerali.kolothum.thodi@huawei.com>; kvmarm@lists.linux.dev;
>>> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com;
>>> cohuck@redhat.com; eric.auger@redhat.com; yuzenghui
>>> <yuzenghui@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>>> <jonathan.cameron@huawei.com>; Anthony Jebson
>>> <anthony.jebson@huawei.com>; linux-arm-kernel@lists.infradead.org;
>>> Linuxarm <linuxarm@huawei.com>
>>> Subject: Re: [RFC PATCH 0/6] KVM: arm64: Errata management for VM Live
>>> migration
>>>
>>> On Fri, Oct 11, 2024 at 12:43:28PM +0100, Marc Zyngier wrote:
>>>> On Fri, 11 Oct 2024 11:57:10 +0100, Shameerali Kolothum Thodi
>>> <shameerali.kolothum.thodi@huawei.com> wrote:
>>>>>>> Please take a look and let me know your thoughts.
>>>>>> Having eyeballed this very superficially, I think we can do something
>>>>>> simpler, and maybe more future-proof:
>>>>> Thanks Marc for taking a look and the quick feedback.
>>>> No worries, that's the least I could do given that you put the effort
>>>> implementing my silly ideas! ;-)
>>>>
>>>>>> - I don't think KVM should be concerned about the description of the
>>>>>>   target CPUs. The hypercall you defined is the right thing to do,
>>>>>>   but the VMM should completely handle it. That's an implementation
>>>>>>   detail, but it would make things much simpler.
>>>>> Ok. So does that mean the hypercall will use some sort of shared
>>> memory
>>>>> to retrieve the list of target CPUs from VMM?
>>>> Two possibilities:
>>>>
>>>> - either shared memory, in which case the hypercall would require the
>>>>   guest to give an IPA and size for the VMM to write its stuff into
>>>>   the guest memory,
>>>>
>>>> - or more simply return the data as an MIDR/REVIDR pair in registers,
>>>>   the guest requesting an index, and getting an error when out of
>>>>   range, leaving it with the freedom to organise the storage.
>>>>
>>>> The second option is a bit slower, but way simpler, and it only
>>>> happens once per guest boot, so it would probably be my preferred
>>>> option unless this is proved to be impractical.
>>> Also worth noting there's existing UAPI [*] for allowing userspace to
>>> register range(s) of hypercalls that it services directly. It's a bit
>>> weird that we'd allow userspace to do stuff in KVM's own hypercall
>>> range, but I don't think it really matters at this point since this is
>>> all prototyping.
>>>
>>> [*]: https://docs.kernel.org/virt/kvm/devices/vm.html#attribute-kvm-arm-
>>> vm-smccc-filter-w-o
>> Thanks. Yes and there are attempts to add that handling in Qemu[*] in the context
>> of vCPU hotplug support(PSCI related ones though).  Will take a look.
>>
>> Thanks,
>> Shameer
>> [1] https://lore.kernel.org/qemu-devel/20241009033704.250287-1-salil.mehta@huawei.com/
> Speaking of QEMU: Do you maybe already have some prototype code that
> tries to do something with the setup here? (I don't think QEMU currently
> mucks around with MIDR and friends when running with KVM; I wonder what
> it should provide to the guest and if it should care to set something as
> a base level that gives guests not using the hypercall a chance to work
> properly.)
>
As discussed during the KVM forum we are working on a qemu integration
for writable ID regs. The first goal is to be able to specialize the
host passthrough model (custom host model). Maybe this will trigger more
discussions on named models too. This is complementary to the
MIDR/REVIDR problematic and I hope we will be able to consolidate our
works at some point.

Thanks

Eric
Shameerali Kolothum Thodi Oct. 18, 2024, 8:33 a.m. UTC | #12
> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, October 17, 2024 6:16 PM
> To: Cornelia Huck <cohuck@redhat.com>; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Oliver Upton
> <oliver.upton@linux.dev>; Marc Zyngier <maz@kernel.org>
> Cc: kvmarm@lists.linux.dev; catalin.marinas@arm.com; will@kernel.org;
> mark.rutland@arm.com; yuzenghui <yuzenghui@huawei.com>; Wangzhou
> (B) <wangzhou1@hisilicon.com>; jiangkunkun
> <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Anthony Jebson
> <anthony.jebson@huawei.com>; linux-arm-kernel@lists.infradead.org;
> Linuxarm <linuxarm@huawei.com>
> Subject: Re: [RFC PATCH 0/6] KVM: arm64: Errata management for VM Live
> migration
> 
> Hi Shameer,
> 
> On 10/17/24 17:49, Cornelia Huck wrote:
> > On Fri, Oct 11 2024, Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com> wrote:
> >
> >>> -----Original Message-----
> >>> From: Oliver Upton <oliver.upton@linux.dev>
> >>> Sent: Friday, October 11, 2024 4:11 PM
> >>> To: Marc Zyngier <maz@kernel.org>
> >>> Cc: Shameerali Kolothum Thodi
> >>> <shameerali.kolothum.thodi@huawei.com>; kvmarm@lists.linux.dev;
> >>> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com;
> >>> cohuck@redhat.com; eric.auger@redhat.com; yuzenghui
> >>> <yuzenghui@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>;
> >>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> >>> <jonathan.cameron@huawei.com>; Anthony Jebson
> >>> <anthony.jebson@huawei.com>; linux-arm-kernel@lists.infradead.org;
> >>> Linuxarm <linuxarm@huawei.com>
> >>> Subject: Re: [RFC PATCH 0/6] KVM: arm64: Errata management for VM
> Live
> >>> migration
> >>>
> >>> On Fri, Oct 11, 2024 at 12:43:28PM +0100, Marc Zyngier wrote:
> >>>> On Fri, 11 Oct 2024 11:57:10 +0100, Shameerali Kolothum Thodi
> >>> <shameerali.kolothum.thodi@huawei.com> wrote:
> >>>>>>> Please take a look and let me know your thoughts.
> >>>>>> Having eyeballed this very superficially, I think we can do
> something
> >>>>>> simpler, and maybe more future-proof:
> >>>>> Thanks Marc for taking a look and the quick feedback.
> >>>> No worries, that's the least I could do given that you put the effort
> >>>> implementing my silly ideas! ;-)
> >>>>
> >>>>>> - I don't think KVM should be concerned about the description of
> the
> >>>>>>   target CPUs. The hypercall you defined is the right thing to do,
> >>>>>>   but the VMM should completely handle it. That's an
> implementation
> >>>>>>   detail, but it would make things much simpler.
> >>>>> Ok. So does that mean the hypercall will use some sort of shared
> >>> memory
> >>>>> to retrieve the list of target CPUs from VMM?
> >>>> Two possibilities:
> >>>>
> >>>> - either shared memory, in which case the hypercall would require the
> >>>>   guest to give an IPA and size for the VMM to write its stuff into
> >>>>   the guest memory,
> >>>>
> >>>> - or more simply return the data as an MIDR/REVIDR pair in registers,
> >>>>   the guest requesting an index, and getting an error when out of
> >>>>   range, leaving it with the freedom to organise the storage.
> >>>>
> >>>> The second option is a bit slower, but way simpler, and it only
> >>>> happens once per guest boot, so it would probably be my preferred
> >>>> option unless this is proved to be impractical.
> >>> Also worth noting there's existing UAPI [*] for allowing userspace to
> >>> register range(s) of hypercalls that it services directly. It's a bit
> >>> weird that we'd allow userspace to do stuff in KVM's own hypercall
> >>> range, but I don't think it really matters at this point since this is
> >>> all prototyping.
> >>>
> >>> [*]: https://docs.kernel.org/virt/kvm/devices/vm.html#attribute-kvm-
> arm-
> >>> vm-smccc-filter-w-o
> >> Thanks. Yes and there are attempts to add that handling in Qemu[*] in
> the context
> >> of vCPU hotplug support(PSCI related ones though).  Will take a look.
> >>
> >> Thanks,
> >> Shameer
> >> [1] https://lore.kernel.org/qemu-devel/20241009033704.250287-1-
> salil.mehta@huawei.com/
> > Speaking of QEMU: Do you maybe already have some prototype code that
> > tries to do something with the setup here? (I don't think QEMU currently
> > mucks around with MIDR and friends when running with KVM; I wonder
> what
> > it should provide to the guest and if it should care to set something as
> > a base level that gives guests not using the hypercall a chance to work
> > properly.)

I had a hacked Qemu to test this RFC. Nothing fancy but just to verify the  RFC.
It basically adds a named CPU to virt as below,

1. Checks the host MIDR and REVIDR and if it is one of the supported platforms,
   use the writable register interface to update the ID registers with a common
   minimum    feature list.
2. Use the IOCTL in this RFC to update the target CPU list info.
3. Ignore the invariant register SET errors.

> >
> As discussed during the KVM forum we are working on a qemu integration
> for writable ID regs. The first goal is to be able to specialize the
> host passthrough model (custom host model). Maybe this will trigger more
> discussions on named models too. This is complementary to the
> MIDR/REVIDR problematic and I hope we will be able to consolidate our
> works at some point.

Cool!. Happy to know that it is making progress. Once I rework this RFC based on the
suggestions, I will rework the Qemu to have a hacked prototype to test it as well and
probably can share that.

Please keep me in loop if you have plans to share your Qemu work soon. That will
be very helpful.

Thanks,
Shameer
Cornelia Huck Oct. 18, 2024, 12:28 p.m. UTC | #13
On Fri, Oct 18 2024, Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:

>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Thursday, October 17, 2024 6:16 PM
>> To: Cornelia Huck <cohuck@redhat.com>; Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; Oliver Upton
>> <oliver.upton@linux.dev>; Marc Zyngier <maz@kernel.org>
>> Cc: kvmarm@lists.linux.dev; catalin.marinas@arm.com; will@kernel.org;
>> mark.rutland@arm.com; yuzenghui <yuzenghui@huawei.com>; Wangzhou
>> (B) <wangzhou1@hisilicon.com>; jiangkunkun
>> <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; Anthony Jebson
>> <anthony.jebson@huawei.com>; linux-arm-kernel@lists.infradead.org;
>> Linuxarm <linuxarm@huawei.com>
>> Subject: Re: [RFC PATCH 0/6] KVM: arm64: Errata management for VM Live
>> migration
>> 
>> Hi Shameer,
>> 
>> On 10/17/24 17:49, Cornelia Huck wrote:
>> > On Fri, Oct 11 2024, Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com> wrote:
>> >
>> >>> -----Original Message-----
>> >>> From: Oliver Upton <oliver.upton@linux.dev>
>> >>> Sent: Friday, October 11, 2024 4:11 PM
>> >>> To: Marc Zyngier <maz@kernel.org>
>> >>> Cc: Shameerali Kolothum Thodi
>> >>> <shameerali.kolothum.thodi@huawei.com>; kvmarm@lists.linux.dev;
>> >>> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com;
>> >>> cohuck@redhat.com; eric.auger@redhat.com; yuzenghui
>> >>> <yuzenghui@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>;
>> >>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> >>> <jonathan.cameron@huawei.com>; Anthony Jebson
>> >>> <anthony.jebson@huawei.com>; linux-arm-kernel@lists.infradead.org;
>> >>> Linuxarm <linuxarm@huawei.com>
>> >>> Subject: Re: [RFC PATCH 0/6] KVM: arm64: Errata management for VM
>> Live
>> >>> migration
>> >>>
>> >>> On Fri, Oct 11, 2024 at 12:43:28PM +0100, Marc Zyngier wrote:
>> >>>> On Fri, 11 Oct 2024 11:57:10 +0100, Shameerali Kolothum Thodi
>> >>> <shameerali.kolothum.thodi@huawei.com> wrote:
>> >>>>>>> Please take a look and let me know your thoughts.
>> >>>>>> Having eyeballed this very superficially, I think we can do
>> something
>> >>>>>> simpler, and maybe more future-proof:
>> >>>>> Thanks Marc for taking a look and the quick feedback.
>> >>>> No worries, that's the least I could do given that you put the effort
>> >>>> implementing my silly ideas! ;-)
>> >>>>
>> >>>>>> - I don't think KVM should be concerned about the description of
>> the
>> >>>>>>   target CPUs. The hypercall you defined is the right thing to do,
>> >>>>>>   but the VMM should completely handle it. That's an
>> implementation
>> >>>>>>   detail, but it would make things much simpler.
>> >>>>> Ok. So does that mean the hypercall will use some sort of shared
>> >>> memory
>> >>>>> to retrieve the list of target CPUs from VMM?
>> >>>> Two possibilities:
>> >>>>
>> >>>> - either shared memory, in which case the hypercall would require the
>> >>>>   guest to give an IPA and size for the VMM to write its stuff into
>> >>>>   the guest memory,
>> >>>>
>> >>>> - or more simply return the data as an MIDR/REVIDR pair in registers,
>> >>>>   the guest requesting an index, and getting an error when out of
>> >>>>   range, leaving it with the freedom to organise the storage.
>> >>>>
>> >>>> The second option is a bit slower, but way simpler, and it only
>> >>>> happens once per guest boot, so it would probably be my preferred
>> >>>> option unless this is proved to be impractical.
>> >>> Also worth noting there's existing UAPI [*] for allowing userspace to
>> >>> register range(s) of hypercalls that it services directly. It's a bit
>> >>> weird that we'd allow userspace to do stuff in KVM's own hypercall
>> >>> range, but I don't think it really matters at this point since this is
>> >>> all prototyping.
>> >>>
>> >>> [*]: https://docs.kernel.org/virt/kvm/devices/vm.html#attribute-kvm-
>> arm-
>> >>> vm-smccc-filter-w-o
>> >> Thanks. Yes and there are attempts to add that handling in Qemu[*] in
>> the context
>> >> of vCPU hotplug support(PSCI related ones though).  Will take a look.
>> >>
>> >> Thanks,
>> >> Shameer
>> >> [1] https://lore.kernel.org/qemu-devel/20241009033704.250287-1-
>> salil.mehta@huawei.com/
>> > Speaking of QEMU: Do you maybe already have some prototype code that
>> > tries to do something with the setup here? (I don't think QEMU currently
>> > mucks around with MIDR and friends when running with KVM; I wonder
>> what
>> > it should provide to the guest and if it should care to set something as
>> > a base level that gives guests not using the hypercall a chance to work
>> > properly.)
>
> I had a hacked Qemu to test this RFC. Nothing fancy but just to verify the  RFC.
> It basically adds a named CPU to virt as below,
>
> 1. Checks the host MIDR and REVIDR and if it is one of the supported platforms,
>    use the writable register interface to update the ID registers with a common
>    minimum    feature list.
> 2. Use the IOCTL in this RFC to update the target CPU list info.
> 3. Ignore the invariant register SET errors.

Thanks for sharing this information (no need to share the code, I
usually do not want to share my hacked-together-for-testing code, either
:)

>
>> >
>> As discussed during the KVM forum we are working on a qemu integration
>> for writable ID regs. The first goal is to be able to specialize the
>> host passthrough model (custom host model). Maybe this will trigger more
>> discussions on named models too. This is complementary to the
>> MIDR/REVIDR problematic and I hope we will be able to consolidate our
>> works at some point.
>
> Cool!. Happy to know that it is making progress. Once I rework this RFC based on the
> suggestions, I will rework the Qemu to have a hacked prototype to test it as well and
> probably can share that.

The reason I asked was mostly to get a feel about how you wanted to end
up using the interface. Did I understand correctly that you derive your
custom set of features from whatever comes as common features with the
MIDR/REVIDR, and do not treat "frankencpu with features common across
all possible targets" and "list of MIDR/REVIDR for all possible targets"
separately and tack them together in the end? It would also be
interesting to discuss what actual named models could look like (models
we could use for baselining etc.) But that's probably something that
needs to be discussed on qemu-devel anyway, and I certainly don't want
to derail the discussions about the KVM changes. (Agreeing on QEMU
interfaces is going to take a lot of work anyway :)

>
> Please keep me in loop if you have plans to share your Qemu work soon. That will
> be very helpful.
>
> Thanks,
> Shameer
Marc Zyngier Oct. 18, 2024, 1:17 p.m. UTC | #14
On Thu, 17 Oct 2024 18:16:04 +0100,
Eric Auger <eric.auger@redhat.com> wrote:
> 
> Hi Shameer,
> 
> On 10/17/24 17:49, Cornelia Huck wrote:

> > Speaking of QEMU: Do you maybe already have some prototype code that
> > tries to do something with the setup here? (I don't think QEMU currently
> > mucks around with MIDR and friends when running with KVM; I wonder what
> > it should provide to the guest and if it should care to set something as
> > a base level that gives guests not using the hypercall a chance to work
> > properly.)
> >
> As discussed during the KVM forum we are working on a qemu integration
> for writable ID regs. The first goal is to be able to specialize the
> host passthrough model (custom host model). Maybe this will trigger more
> discussions on named models too. This is complementary to the
> MIDR/REVIDR problematic and I hope we will be able to consolidate our
> works at some point.

Complementary to the MIDR/REVIDR work, I would also like to make
MIDR/REVIDR writable when this scheme is available. Ideally reporting
a synthetic CPU description (with MIDR_EL1.Implementer returning 0,
and the rest being VMM-specific, but with a clear definition for the
IMPDEF fields so that we can version the ABI).

Thoughts?

	M.
Oliver Upton Oct. 18, 2024, 6:52 p.m. UTC | #15
On Fri, Oct 18, 2024 at 02:17:17PM +0100, Marc Zyngier wrote:
> On Thu, 17 Oct 2024 18:16:04 +0100,
> Eric Auger <eric.auger@redhat.com> wrote:
> > 
> > Hi Shameer,
> > 
> > On 10/17/24 17:49, Cornelia Huck wrote:
> 
> > > Speaking of QEMU: Do you maybe already have some prototype code that
> > > tries to do something with the setup here? (I don't think QEMU currently
> > > mucks around with MIDR and friends when running with KVM; I wonder what
> > > it should provide to the guest and if it should care to set something as
> > > a base level that gives guests not using the hypercall a chance to work
> > > properly.)
> > >
> > As discussed during the KVM forum we are working on a qemu integration
> > for writable ID regs. The first goal is to be able to specialize the
> > host passthrough model (custom host model). Maybe this will trigger more
> > discussions on named models too. This is complementary to the
> > MIDR/REVIDR problematic and I hope we will be able to consolidate our
> > works at some point.
> 
> Complementary to the MIDR/REVIDR work, I would also like to make
> MIDR/REVIDR writable when this scheme is available. Ideally reporting
> a synthetic CPU description (with MIDR_EL1.Implementer returning 0,
> and the rest being VMM-specific, but with a clear definition for the
> IMPDEF fields so that we can version the ABI).

When the VMM is using the hypercall mechanism to describe
implementations, 100% agree.

> Thoughts?

I think we should go a step further and allow userspace full control of
known fields in these registers, even for a nonzero Implementer code.

We're already affording userspace full control of what implementation(s)
the guest sees anyway via hypercalls, so there isn't much left for KVM
to enforce.

Ignoring errata, it'd let folks spoof an old implementation on newer
hardware for testing, especially when dealing w/ older software.
Cornelia Huck Oct. 22, 2024, 10:58 a.m. UTC | #16
On Fri, Oct 18 2024, Oliver Upton <oliver.upton@linux.dev> wrote:

> On Fri, Oct 18, 2024 at 02:17:17PM +0100, Marc Zyngier wrote:
>> On Thu, 17 Oct 2024 18:16:04 +0100,
>> Eric Auger <eric.auger@redhat.com> wrote:
>> > 
>> > Hi Shameer,
>> > 
>> > On 10/17/24 17:49, Cornelia Huck wrote:
>> 
>> > > Speaking of QEMU: Do you maybe already have some prototype code that
>> > > tries to do something with the setup here? (I don't think QEMU currently
>> > > mucks around with MIDR and friends when running with KVM; I wonder what
>> > > it should provide to the guest and if it should care to set something as
>> > > a base level that gives guests not using the hypercall a chance to work
>> > > properly.)
>> > >
>> > As discussed during the KVM forum we are working on a qemu integration
>> > for writable ID regs. The first goal is to be able to specialize the
>> > host passthrough model (custom host model). Maybe this will trigger more
>> > discussions on named models too. This is complementary to the
>> > MIDR/REVIDR problematic and I hope we will be able to consolidate our
>> > works at some point.
>> 
>> Complementary to the MIDR/REVIDR work, I would also like to make
>> MIDR/REVIDR writable when this scheme is available. Ideally reporting
>> a synthetic CPU description (with MIDR_EL1.Implementer returning 0,
>> and the rest being VMM-specific, but with a clear definition for the
>> IMPDEF fields so that we can version the ABI).
>
> When the VMM is using the hypercall mechanism to describe
> implementations, 100% agree.
>
>> Thoughts?
>
> I think we should go a step further and allow userspace full control of
> known fields in these registers, even for a nonzero Implementer code.
>
> We're already affording userspace full control of what implementation(s)
> the guest sees anyway via hypercalls, so there isn't much left for KVM
> to enforce.
>
> Ignoring errata, it'd let folks spoof an old implementation on newer
> hardware for testing, especially when dealing w/ older software.

So basically, the VMM would have to make a choice which guests it wants
to support? Modern guests, which are aware through the hypercall
interface what is going on and are able to parse any information in our
synthetic MIDR correctly, or older guests, which should be presented
with a MIDR that makes them hopefully act in the way we want it to act?

I guess that would mean that for full migration support between
different hosts, we'd want enlightned guests, and for (some) other
cases, we could generate a configuration that will hopefully work?

(Spoofing for testing things sounds useful regardless.)
Marc Zyngier Oct. 22, 2024, 12:52 p.m. UTC | #17
On Tue, 22 Oct 2024 11:58:02 +0100,
Cornelia Huck <cohuck@redhat.com> wrote:
> 
> On Fri, Oct 18 2024, Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> > On Fri, Oct 18, 2024 at 02:17:17PM +0100, Marc Zyngier wrote:
> >> On Thu, 17 Oct 2024 18:16:04 +0100,
> >> Eric Auger <eric.auger@redhat.com> wrote:
> >> > 
> >> > Hi Shameer,
> >> > 
> >> > On 10/17/24 17:49, Cornelia Huck wrote:
> >> 
> >> > > Speaking of QEMU: Do you maybe already have some prototype code that
> >> > > tries to do something with the setup here? (I don't think QEMU currently
> >> > > mucks around with MIDR and friends when running with KVM; I wonder what
> >> > > it should provide to the guest and if it should care to set something as
> >> > > a base level that gives guests not using the hypercall a chance to work
> >> > > properly.)
> >> > >
> >> > As discussed during the KVM forum we are working on a qemu integration
> >> > for writable ID regs. The first goal is to be able to specialize the
> >> > host passthrough model (custom host model). Maybe this will trigger more
> >> > discussions on named models too. This is complementary to the
> >> > MIDR/REVIDR problematic and I hope we will be able to consolidate our
> >> > works at some point.
> >> 
> >> Complementary to the MIDR/REVIDR work, I would also like to make
> >> MIDR/REVIDR writable when this scheme is available. Ideally reporting
> >> a synthetic CPU description (with MIDR_EL1.Implementer returning 0,
> >> and the rest being VMM-specific, but with a clear definition for the
> >> IMPDEF fields so that we can version the ABI).
> >
> > When the VMM is using the hypercall mechanism to describe
> > implementations, 100% agree.
> >
> >> Thoughts?
> >
> > I think we should go a step further and allow userspace full control of
> > known fields in these registers, even for a nonzero Implementer code.
> >
> > We're already affording userspace full control of what implementation(s)
> > the guest sees anyway via hypercalls, so there isn't much left for KVM
> > to enforce.
> >
> > Ignoring errata, it'd let folks spoof an old implementation on newer
> > hardware for testing, especially when dealing w/ older software.
> 
> So basically, the VMM would have to make a choice which guests it wants
> to support? Modern guests, which are aware through the hypercall
> interface what is going on and are able to parse any information in our
> synthetic MIDR correctly, or older guests, which should be presented
> with a MIDR that makes them hopefully act in the way we want it to act?
>
> I guess that would mean that for full migration support between
> different hosts, we'd want enlightned guests, and for (some) other
> cases, we could generate a configuration that will hopefully work?

That's sums it up, I think. The way I think of it is that, at VM
creation, you decide what you want to support:

- homogeneous migration: that's basically what we have today, with
  some limited flexibility.

- heterogeneous migration: synthetic MIDR, discovery hypercall, the
  works.

> (Spoofing for testing things sounds useful regardless.)

My problem with that is that there is no difference in ABI between "I
want this for testing" and "I expect this to fully work". The latter
is obviously not achievable...

Thanks,

	M.