mbox series

[0/3] KVM: arm64: nv: Fixes for Nested Virtualization issues

Message ID 20220824060304.21128-1-gankulkarni@os.amperecomputing.com (mailing list archive)
Headers show
Series KVM: arm64: nv: Fixes for Nested Virtualization issues | expand

Message

Ganapatrao Kulkarni Aug. 24, 2022, 6:03 a.m. UTC
This series contains 3 fixes which were found while testing
ARM64 Nested Virtualization patch series.

First patch avoids the restart of hrtimer when timer interrupt is
fired/forwarded to Guest-Hypervisor.

Second patch fixes the vtimer interrupt drop from the Guest-Hypervisor.

Third patch fixes the NestedVM boot hang seen when Guest Hypersior
configured with 64K pagesize where as Host Hypervisor with 4K.

These patches are rebased on Nested Virtualization V6 patchset[1].

[1] https://www.spinics.net/lists/kvm/msg265656.html

D Scott Phillips (1):
  KVM: arm64: nv: only emulate timers that have not yet fired

Ganapatrao Kulkarni (2):
  KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired.
  KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than
    block size.

 arch/arm64/kvm/arch_timer.c | 8 +++++++-
 arch/arm64/kvm/mmu.c        | 2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

Comments

Ganapatrao Kulkarni Oct. 10, 2022, 5:56 a.m. UTC | #1
Hi Marc,

Any review comments on this series?

On 24-08-2022 11:33 am, Ganapatrao Kulkarni wrote:
> This series contains 3 fixes which were found while testing
> ARM64 Nested Virtualization patch series.
> 
> First patch avoids the restart of hrtimer when timer interrupt is
> fired/forwarded to Guest-Hypervisor.
> 
> Second patch fixes the vtimer interrupt drop from the Guest-Hypervisor.
> 
> Third patch fixes the NestedVM boot hang seen when Guest Hypersior
> configured with 64K pagesize where as Host Hypervisor with 4K.
> 
> These patches are rebased on Nested Virtualization V6 patchset[1].
> 
> [1] https://www.spinics.net/lists/kvm/msg265656.html
> 
> D Scott Phillips (1):
>    KVM: arm64: nv: only emulate timers that have not yet fired
> 
> Ganapatrao Kulkarni (2):
>    KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired.
>    KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than
>      block size.
> 
>   arch/arm64/kvm/arch_timer.c | 8 +++++++-
>   arch/arm64/kvm/mmu.c        | 2 +-
>   2 files changed, 8 insertions(+), 2 deletions(-)
> 

Thanks,
Ganapat
Marc Zyngier Oct. 19, 2022, 7:59 a.m. UTC | #2
On Mon, 10 Oct 2022 06:56:31 +0100,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>
> Hi Marc,
> 
> Any review comments on this series?

Not yet. So far, the NV stuff is put on ice until I can source some
actual HW to make the development less painful.

	M.
Ganapatrao Kulkarni Jan. 10, 2023, 12:17 p.m. UTC | #3
Hi Marc,

On 24-08-2022 11:33 am, Ganapatrao Kulkarni wrote:
> This series contains 3 fixes which were found while testing
> ARM64 Nested Virtualization patch series.
> 
> First patch avoids the restart of hrtimer when timer interrupt is
> fired/forwarded to Guest-Hypervisor.
> 
> Second patch fixes the vtimer interrupt drop from the Guest-Hypervisor.
> 
> Third patch fixes the NestedVM boot hang seen when Guest Hypersior
> configured with 64K pagesize where as Host Hypervisor with 4K.
> 
> These patches are rebased on Nested Virtualization V6 patchset[1].

If I boot a Guest Hypervisor with more cores and then booting of a 
NestedVM with equal number of cores or booting multiple 
NestedVMs(simultaneously) with lower number of cores is resulting in 
very slow booting and some time RCU soft-lockup of a NestedVM. This I 
have debugged and turned out to be due to many SGI are getting asserted 
to all vCPUs of a Guest-Hypervisor when Guest-Hypervisor KVM code 
prepares NestedVM for WFI wakeup/return.

When Guest Hypervisor prepares NestedVM while returning/resuming from 
WFI, it is loading guest-context,  vGIC and timer contexts etc.
The function gic_poke_irq (called from irq_set_irqchip_state with 
spinlock held) writes to register GICD_ISACTIVER in Guest-Hypervisor's 
KVM code resulting in mem-abort trap to Host Hypervisor. Host Hypervisor 
as part of handling the guest mem abort, function io_mem_abort is called 
  in turn vgic_mmio_write_sactive, which prepares every vCPU of Guest 
Hypervisor by calling SGI. The number of SGI/IPI calls goes 
exponentially high when more and more cores are used to boot Guest 
Hypervisor.

Code trace:
At Guest-hypervisor: 
kvm_timer_vcpu_load->kvm_timer_vcpu_load_gic->set_timer_irq_phys_active->
irq_set_irqchip_state->gic_poke_irq

At Host-Hypervisor: io_mem_abort-> 
kvm_io_bus_write->__kvm_io_bus_write->dispatch_mmio_write->
vgic_mmio_write_sactive->vgic_access_active_prepare->
kvm_kick_many_cpus->smp_call_function_many

I am currently working around this with "nohlt" kernel param to 
NestedVM. Any suggestions to handle/fix this case/issue and avoid the 
slowness of booting of NestedVM with more cores?

Note: Guest-Hypervisor and NestedVM are using default kernel installed 
using Fedora 36 iso.

> 
> [1] https://www.spinics.net/lists/kvm/msg265656.html
> 
> D Scott Phillips (1):
>    KVM: arm64: nv: only emulate timers that have not yet fired
> 
> Ganapatrao Kulkarni (2):
>    KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired.
>    KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than
>      block size.
> 
>   arch/arm64/kvm/arch_timer.c | 8 +++++++-
>   arch/arm64/kvm/mmu.c        | 2 +-
>   2 files changed, 8 insertions(+), 2 deletions(-)
> 

Thanks,
Ganapat
Marc Zyngier Jan. 10, 2023, 2:05 p.m. UTC | #4
Hi Ganapatrao,

On Tue, 10 Jan 2023 12:17:20 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> Hi Marc,
> 
> On 24-08-2022 11:33 am, Ganapatrao Kulkarni wrote:
> > This series contains 3 fixes which were found while testing
> > ARM64 Nested Virtualization patch series.
> > 
> > First patch avoids the restart of hrtimer when timer interrupt is
> > fired/forwarded to Guest-Hypervisor.
> > 
> > Second patch fixes the vtimer interrupt drop from the Guest-Hypervisor.
> > 
> > Third patch fixes the NestedVM boot hang seen when Guest Hypersior
> > configured with 64K pagesize where as Host Hypervisor with 4K.
> > 
> > These patches are rebased on Nested Virtualization V6 patchset[1].
> 
> If I boot a Guest Hypervisor with more cores and then booting of a
> NestedVM with equal number of cores or booting multiple
> NestedVMs(simultaneously) with lower number of cores is resulting in
> very slow booting and some time RCU soft-lockup of a NestedVM. This I
> have debugged and turned out to be due to many SGI are getting
> asserted to all vCPUs of a Guest-Hypervisor when Guest-Hypervisor KVM
> code prepares NestedVM for WFI wakeup/return.
> 
> When Guest Hypervisor prepares NestedVM while returning/resuming from
> WFI, it is loading guest-context,  vGIC and timer contexts etc.
> The function gic_poke_irq (called from irq_set_irqchip_state with
> spinlock held) writes to register GICD_ISACTIVER in Guest-Hypervisor's
> KVM code resulting in mem-abort trap to Host Hypervisor. Host
> Hypervisor as part of handling the guest mem abort, function
> io_mem_abort is called  in turn vgic_mmio_write_sactive, which
> prepares every vCPU of Guest Hypervisor by calling SGI. The number of
> SGI/IPI calls goes exponentially high when more and more cores are
> used to boot Guest Hypervisor.

This really isn't surprising. NV combined with oversubscribing is
bound to be absolutely terrible. The write to GICD_ISACTIVER is
only symptomatic of the interrupt amplification problem that already
exists without NV (any IPI in a guest is likely to result in at least
one IPI in the host).

Short of having direct injection of interrupts for *all* interrupts,
you end-up with this sort of emulation that relies on being able to
synchronise all CPUs. Is it bad? Yes. Very.

> 
> Code trace:
> At Guest-hypervisor:
> kvm_timer_vcpu_load->kvm_timer_vcpu_load_gic->set_timer_irq_phys_active->
> irq_set_irqchip_state->gic_poke_irq
> 
> At Host-Hypervisor: io_mem_abort->
> kvm_io_bus_write->__kvm_io_bus_write->dispatch_mmio_write->
> vgic_mmio_write_sactive->vgic_access_active_prepare->
> kvm_kick_many_cpus->smp_call_function_many
> 
> I am currently working around this with "nohlt" kernel param to
> NestedVM. Any suggestions to handle/fix this case/issue and avoid the
> slowness of booting of NestedVM with more cores?

At the moment, I'm focussing on correctness rather than performance.

Maybe we can restrict the conditions in which we perform this
synchronisation, but that's pretty low on my radar at the moment. Once
things are in a state that can be merged and that works correctly, we
can look into it.

Thanks,

	M.
Marc Zyngier Jan. 10, 2023, 9:54 p.m. UTC | #5
On Tue, 10 Jan 2023 12:17:20 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> I am currently working around this with "nohlt" kernel param to
> NestedVM. Any suggestions to handle/fix this case/issue and avoid the
> slowness of booting of NestedVM with more cores?
> 
> Note: Guest-Hypervisor and NestedVM are using default kernel installed
> using Fedora 36 iso.

Despite what I said earlier, I have a vague idea here, thanks to the
interesting call traces that you provided (this is really awesome work
BTW, given how hard it is to trace things across 3 different kernels).

We can slightly limit the impact of the prepare/finish sequence if the
guest hypervisor only accesses the active registers for SGIs/PPIs on
the vcpu that owns them, forbidding any cross-CPU-to-redistributor
access.

Something along these lines, which is only boot-tested. Let me know
how this fares for you.

Thanks,

	M.

diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c
index b32d434c1d4a..1cca45be5335 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio.c
@@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu,
  * active state can be overwritten when the VCPU's state is synced coming back
  * from the guest.
  *
- * For shared interrupts as well as GICv3 private interrupts, we have to
- * stop all the VCPUs because interrupts can be migrated while we don't hold
- * the IRQ locks and we don't want to be chasing moving targets.
+ * For shared interrupts as well as GICv3 private interrupts accessed from the
+ * non-owning CPU, we have to stop all the VCPUs because interrupts can be
+ * migrated while we don't hold the IRQ locks and we don't want to be chasing
+ * moving targets.
  *
  * For GICv2 private interrupts we don't have to do anything because
  * userspace accesses to the VGIC state already require all VCPUs to be
@@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu,
  */
 static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
 {
-	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 ||
+	if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 &&
+	     vcpu == kvm_get_running_vcpu()) ||
 	    intid >= VGIC_NR_PRIVATE_IRQS)
 		kvm_arm_halt_guest(vcpu->kvm);
 }
@@ -492,7 +494,8 @@ static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
 /* See vgic_access_active_prepare */
 static void vgic_access_active_finish(struct kvm_vcpu *vcpu, u32 intid)
 {
-	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 ||
+	if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 &&
+	     vcpu == kvm_get_running_vcpu()) ||
 	    intid >= VGIC_NR_PRIVATE_IRQS)
 		kvm_arm_resume_guest(vcpu->kvm);
 }
Marc Zyngier Jan. 11, 2023, 7:54 a.m. UTC | #6
On Tue, 10 Jan 2023 21:54:39 +0000,
Marc Zyngier <maz@kernel.org> wrote:
> 
> On Tue, 10 Jan 2023 12:17:20 +0000,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> > 
> > I am currently working around this with "nohlt" kernel param to
> > NestedVM. Any suggestions to handle/fix this case/issue and avoid the
> > slowness of booting of NestedVM with more cores?
> > 
> > Note: Guest-Hypervisor and NestedVM are using default kernel installed
> > using Fedora 36 iso.
> 
> Despite what I said earlier, I have a vague idea here, thanks to the
> interesting call traces that you provided (this is really awesome work
> BTW, given how hard it is to trace things across 3 different kernels).
> 
> We can slightly limit the impact of the prepare/finish sequence if the
> guest hypervisor only accesses the active registers for SGIs/PPIs on
> the vcpu that owns them, forbidding any cross-CPU-to-redistributor
> access.
> 
> Something along these lines, which is only boot-tested. Let me know
> how this fares for you.
> 
> Thanks,
> 
> 	M.
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c
> index b32d434c1d4a..1cca45be5335 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio.c
> @@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu,
>   * active state can be overwritten when the VCPU's state is synced coming back
>   * from the guest.
>   *
> - * For shared interrupts as well as GICv3 private interrupts, we have to
> - * stop all the VCPUs because interrupts can be migrated while we don't hold
> - * the IRQ locks and we don't want to be chasing moving targets.
> + * For shared interrupts as well as GICv3 private interrupts accessed from the
> + * non-owning CPU, we have to stop all the VCPUs because interrupts can be
> + * migrated while we don't hold the IRQ locks and we don't want to be chasing
> + * moving targets.
>   *
>   * For GICv2 private interrupts we don't have to do anything because
>   * userspace accesses to the VGIC state already require all VCPUs to be
> @@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu,
>   */
>  static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
>  {
> -	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 ||
> +	if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 &&
> +	     vcpu == kvm_get_running_vcpu()) ||

This should obviously be

+	     vcpu != kvm_get_running_vcpu()) ||

>  	    intid >= VGIC_NR_PRIVATE_IRQS)
>  		kvm_arm_halt_guest(vcpu->kvm);
>  }
> @@ -492,7 +494,8 @@ static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
>  /* See vgic_access_active_prepare */
>  static void vgic_access_active_finish(struct kvm_vcpu *vcpu, u32 intid)
>  {
> -	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 ||
> +	if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 &&
> +	     vcpu == kvm_get_running_vcpu()) ||

Same here.

>  	    intid >= VGIC_NR_PRIVATE_IRQS)
>  		kvm_arm_resume_guest(vcpu->kvm);
>  }

Thanks,

	M.
Ganapatrao Kulkarni Jan. 11, 2023, 8:46 a.m. UTC | #7
On 11-01-2023 03:24 am, Marc Zyngier wrote:
> On Tue, 10 Jan 2023 12:17:20 +0000,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>
>> I am currently working around this with "nohlt" kernel param to
>> NestedVM. Any suggestions to handle/fix this case/issue and avoid the
>> slowness of booting of NestedVM with more cores?
>>
>> Note: Guest-Hypervisor and NestedVM are using default kernel installed
>> using Fedora 36 iso.
> 
> Despite what I said earlier, I have a vague idea here, thanks to the
> interesting call traces that you provided (this is really awesome work
> BTW, given how hard it is to trace things across 3 different kernels).
> 
> We can slightly limit the impact of the prepare/finish sequence if the
> guest hypervisor only accesses the active registers for SGIs/PPIs on
> the vcpu that owns them, forbidding any cross-CPU-to-redistributor
> access.
> 
> Something along these lines, which is only boot-tested. Let me know
> how this fares for you.
> 
> Thanks,
> 
> 	M.
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c
> index b32d434c1d4a..1cca45be5335 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio.c
> @@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu,
>    * active state can be overwritten when the VCPU's state is synced coming back
>    * from the guest.
>    *
> - * For shared interrupts as well as GICv3 private interrupts, we have to
> - * stop all the VCPUs because interrupts can be migrated while we don't hold
> - * the IRQ locks and we don't want to be chasing moving targets.
> + * For shared interrupts as well as GICv3 private interrupts accessed from the
> + * non-owning CPU, we have to stop all the VCPUs because interrupts can be
> + * migrated while we don't hold the IRQ locks and we don't want to be chasing
> + * moving targets.
>    *
>    * For GICv2 private interrupts we don't have to do anything because
>    * userspace accesses to the VGIC state already require all VCPUs to be
> @@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu,
>    */
>   static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
>   {
> -	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 ||
> +	if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 &&
> +	     vcpu == kvm_get_running_vcpu()) ||

Thanks Marc for the patch!

I think, you mean not equal to?
+           vcpu != kvm_get_running_vcpu()) ||

With the change to not-equal, the issue is fixed and I could see the 
NestedVM booting is pretty fast with higher number of cores as well.

>   	    intid >= VGIC_NR_PRIVATE_IRQS)
>   		kvm_arm_halt_guest(vcpu->kvm);
>   }
> @@ -492,7 +494,8 @@ static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
>   /* See vgic_access_active_prepare */
>   static void vgic_access_active_finish(struct kvm_vcpu *vcpu, u32 intid)
>   {
> -	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 ||
> +	if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 &&
> +	     vcpu == kvm_get_running_vcpu()) ||

Same, not equal to.
>   	    intid >= VGIC_NR_PRIVATE_IRQS)
>   		kvm_arm_resume_guest(vcpu->kvm);
>   }
> 


Thanks,
Ganapat
Ganapatrao Kulkarni Jan. 11, 2023, 8:48 a.m. UTC | #8
On 11-01-2023 02:16 pm, Ganapatrao Kulkarni wrote:
> 
> 
> On 11-01-2023 03:24 am, Marc Zyngier wrote:
>> On Tue, 10 Jan 2023 12:17:20 +0000,
>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>
>>> I am currently working around this with "nohlt" kernel param to
>>> NestedVM. Any suggestions to handle/fix this case/issue and avoid the
>>> slowness of booting of NestedVM with more cores?
>>>
>>> Note: Guest-Hypervisor and NestedVM are using default kernel installed
>>> using Fedora 36 iso.
>>
>> Despite what I said earlier, I have a vague idea here, thanks to the
>> interesting call traces that you provided (this is really awesome work
>> BTW, given how hard it is to trace things across 3 different kernels).
>>
>> We can slightly limit the impact of the prepare/finish sequence if the
>> guest hypervisor only accesses the active registers for SGIs/PPIs on
>> the vcpu that owns them, forbidding any cross-CPU-to-redistributor
>> access.
>>
>> Something along these lines, which is only boot-tested. Let me know
>> how this fares for you.
>>
>> Thanks,
>>
>>     M.
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c 
>> b/arch/arm64/kvm/vgic/vgic-mmio.c
>> index b32d434c1d4a..1cca45be5335 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio.c
>> @@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu 
>> *vcpu,
>>    * active state can be overwritten when the VCPU's state is synced 
>> coming back
>>    * from the guest.
>>    *
>> - * For shared interrupts as well as GICv3 private interrupts, we have to
>> - * stop all the VCPUs because interrupts can be migrated while we 
>> don't hold
>> - * the IRQ locks and we don't want to be chasing moving targets.
>> + * For shared interrupts as well as GICv3 private interrupts accessed 
>> from the
>> + * non-owning CPU, we have to stop all the VCPUs because interrupts 
>> can be
>> + * migrated while we don't hold the IRQ locks and we don't want to be 
>> chasing
>> + * moving targets.
>>    *
>>    * For GICv2 private interrupts we don't have to do anything because
>>    * userspace accesses to the VGIC state already require all VCPUs to be
>> @@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu 
>> *vcpu,
>>    */
>>   static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 
>> intid)
>>   {
>> -    if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 ||
>> +    if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 &&
>> +         vcpu == kvm_get_running_vcpu()) ||
> 
> Thanks Marc for the patch!
> 
> I think, you mean not equal to?
Sorry, I did not see your follow up email.

> +           vcpu != kvm_get_running_vcpu()) ||
> 
> With the change to not-equal, the issue is fixed and I could see the 
> NestedVM booting is pretty fast with higher number of cores as well.
> 
>>           intid >= VGIC_NR_PRIVATE_IRQS)
>>           kvm_arm_halt_guest(vcpu->kvm);
>>   }
>> @@ -492,7 +494,8 @@ static void vgic_access_active_prepare(struct 
>> kvm_vcpu *vcpu, u32 intid)
>>   /* See vgic_access_active_prepare */
>>   static void vgic_access_active_finish(struct kvm_vcpu *vcpu, u32 intid)
>>   {
>> -    if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 ||
>> +    if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 &&
>> +         vcpu == kvm_get_running_vcpu()) ||
> 
> Same, not equal to.
>>           intid >= VGIC_NR_PRIVATE_IRQS)
>>           kvm_arm_resume_guest(vcpu->kvm);
>>   }
>>
> 
> 
> Thanks,
> Ganapat
Marc Zyngier Jan. 11, 2023, 11:39 a.m. UTC | #9
On 2023-01-11 08:46, Ganapatrao Kulkarni wrote:
> On 11-01-2023 03:24 am, Marc Zyngier wrote:
>> On Tue, 10 Jan 2023 12:17:20 +0000,
>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>> 
>>> I am currently working around this with "nohlt" kernel param to
>>> NestedVM. Any suggestions to handle/fix this case/issue and avoid the
>>> slowness of booting of NestedVM with more cores?
>>> 
>>> Note: Guest-Hypervisor and NestedVM are using default kernel 
>>> installed
>>> using Fedora 36 iso.
>> 
>> Despite what I said earlier, I have a vague idea here, thanks to the
>> interesting call traces that you provided (this is really awesome work
>> BTW, given how hard it is to trace things across 3 different kernels).
>> 
>> We can slightly limit the impact of the prepare/finish sequence if the
>> guest hypervisor only accesses the active registers for SGIs/PPIs on
>> the vcpu that owns them, forbidding any cross-CPU-to-redistributor
>> access.
>> 
>> Something along these lines, which is only boot-tested. Let me know
>> how this fares for you.
>> 
>> Thanks,
>> 
>> 	M.
>> 
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c 
>> b/arch/arm64/kvm/vgic/vgic-mmio.c
>> index b32d434c1d4a..1cca45be5335 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio.c
>> @@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu 
>> *vcpu,
>>    * active state can be overwritten when the VCPU's state is synced 
>> coming back
>>    * from the guest.
>>    *
>> - * For shared interrupts as well as GICv3 private interrupts, we have 
>> to
>> - * stop all the VCPUs because interrupts can be migrated while we 
>> don't hold
>> - * the IRQ locks and we don't want to be chasing moving targets.
>> + * For shared interrupts as well as GICv3 private interrupts accessed 
>> from the
>> + * non-owning CPU, we have to stop all the VCPUs because interrupts 
>> can be
>> + * migrated while we don't hold the IRQ locks and we don't want to be 
>> chasing
>> + * moving targets.
>>    *
>>    * For GICv2 private interrupts we don't have to do anything because
>>    * userspace accesses to the VGIC state already require all VCPUs to 
>> be
>> @@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu 
>> *vcpu,
>>    */
>>   static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 
>> intid)
>>   {
>> -	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 ||
>> +	if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 &&
>> +	     vcpu == kvm_get_running_vcpu()) ||
> 
> Thanks Marc for the patch!
> 
> I think, you mean not equal to?
> +           vcpu != kvm_get_running_vcpu()) ||

Yeah, exactly. I woke up this morning realising this patch was
*almost* right. Don't write patches like this after a long day
at work...

> With the change to not-equal, the issue is fixed and I could see the
> NestedVM booting is pretty fast with higher number of cores as well.

Good, thanks for testing it. I'll roll up an actual patch for that
and stick it in the monster queue.

Cheers,

        M.
Ganapatrao Kulkarni Jan. 11, 2023, 12:46 p.m. UTC | #10
On 11-01-2023 05:09 pm, Marc Zyngier wrote:
> On 2023-01-11 08:46, Ganapatrao Kulkarni wrote:
>> On 11-01-2023 03:24 am, Marc Zyngier wrote:
>>> On Tue, 10 Jan 2023 12:17:20 +0000,
>>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>>
>>>> I am currently working around this with "nohlt" kernel param to
>>>> NestedVM. Any suggestions to handle/fix this case/issue and avoid the
>>>> slowness of booting of NestedVM with more cores?
>>>>
>>>> Note: Guest-Hypervisor and NestedVM are using default kernel installed
>>>> using Fedora 36 iso.
>>>
>>> Despite what I said earlier, I have a vague idea here, thanks to the
>>> interesting call traces that you provided (this is really awesome work
>>> BTW, given how hard it is to trace things across 3 different kernels).
>>>
>>> We can slightly limit the impact of the prepare/finish sequence if the
>>> guest hypervisor only accesses the active registers for SGIs/PPIs on
>>> the vcpu that owns them, forbidding any cross-CPU-to-redistributor
>>> access.
>>>
>>> Something along these lines, which is only boot-tested. Let me know
>>> how this fares for you.
>>>
>>> Thanks,
>>>
>>>     M.
>>>
>>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c 
>>> b/arch/arm64/kvm/vgic/vgic-mmio.c
>>> index b32d434c1d4a..1cca45be5335 100644
>>> --- a/arch/arm64/kvm/vgic/vgic-mmio.c
>>> +++ b/arch/arm64/kvm/vgic/vgic-mmio.c
>>> @@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu 
>>> *vcpu,
>>>    * active state can be overwritten when the VCPU's state is synced 
>>> coming back
>>>    * from the guest.
>>>    *
>>> - * For shared interrupts as well as GICv3 private interrupts, we 
>>> have to
>>> - * stop all the VCPUs because interrupts can be migrated while we 
>>> don't hold
>>> - * the IRQ locks and we don't want to be chasing moving targets.
>>> + * For shared interrupts as well as GICv3 private interrupts 
>>> accessed from the
>>> + * non-owning CPU, we have to stop all the VCPUs because interrupts 
>>> can be
>>> + * migrated while we don't hold the IRQ locks and we don't want to 
>>> be chasing
>>> + * moving targets.
>>>    *
>>>    * For GICv2 private interrupts we don't have to do anything because
>>>    * userspace accesses to the VGIC state already require all VCPUs 
>>> to be
>>> @@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu 
>>> *vcpu,
>>>    */
>>>   static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 
>>> intid)
>>>   {
>>> -    if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 ||
>>> +    if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 &&
>>> +         vcpu == kvm_get_running_vcpu()) ||
>>
>> Thanks Marc for the patch!
>>
>> I think, you mean not equal to?
>> +           vcpu != kvm_get_running_vcpu()) ||
> 
> Yeah, exactly. I woke up this morning realising this patch was
> *almost* right. Don't write patches like this after a long day
> at work...
> 
>> With the change to not-equal, the issue is fixed and I could see the
>> NestedVM booting is pretty fast with higher number of cores as well.
> 
> Good, thanks for testing it. I'll roll up an actual patch for that
> and stick it in the monster queue.

Thanks, Please pull patch 3/3 also to nv-6.2 tree along with this patch. 
I will move my setup to nv-6.2 once these patches are in.

> 
> Cheers,
> 
>         M.


Thanks,
Ganapat
Marc Zyngier Jan. 11, 2023, 1:36 p.m. UTC | #11
On 2023-01-11 12:46, Ganapatrao Kulkarni wrote:
> On 11-01-2023 05:09 pm, Marc Zyngier wrote:
>> On 2023-01-11 08:46, Ganapatrao Kulkarni wrote:
>>> On 11-01-2023 03:24 am, Marc Zyngier wrote:
>>>> On Tue, 10 Jan 2023 12:17:20 +0000,
>>>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>>> 
>>>>> I am currently working around this with "nohlt" kernel param to
>>>>> NestedVM. Any suggestions to handle/fix this case/issue and avoid 
>>>>> the
>>>>> slowness of booting of NestedVM with more cores?
>>>>> 
>>>>> Note: Guest-Hypervisor and NestedVM are using default kernel 
>>>>> installed
>>>>> using Fedora 36 iso.
>>>> 
>>>> Despite what I said earlier, I have a vague idea here, thanks to the
>>>> interesting call traces that you provided (this is really awesome 
>>>> work
>>>> BTW, given how hard it is to trace things across 3 different 
>>>> kernels).
>>>> 
>>>> We can slightly limit the impact of the prepare/finish sequence if 
>>>> the
>>>> guest hypervisor only accesses the active registers for SGIs/PPIs on
>>>> the vcpu that owns them, forbidding any cross-CPU-to-redistributor
>>>> access.
>>>> 
>>>> Something along these lines, which is only boot-tested. Let me know
>>>> how this fares for you.
>>>> 
>>>> Thanks,
>>>> 
>>>>     M.
>>>> 
>>>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c 
>>>> b/arch/arm64/kvm/vgic/vgic-mmio.c
>>>> index b32d434c1d4a..1cca45be5335 100644
>>>> --- a/arch/arm64/kvm/vgic/vgic-mmio.c
>>>> +++ b/arch/arm64/kvm/vgic/vgic-mmio.c
>>>> @@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu 
>>>> *vcpu,
>>>>    * active state can be overwritten when the VCPU's state is synced 
>>>> coming back
>>>>    * from the guest.
>>>>    *
>>>> - * For shared interrupts as well as GICv3 private interrupts, we 
>>>> have to
>>>> - * stop all the VCPUs because interrupts can be migrated while we 
>>>> don't hold
>>>> - * the IRQ locks and we don't want to be chasing moving targets.
>>>> + * For shared interrupts as well as GICv3 private interrupts 
>>>> accessed from the
>>>> + * non-owning CPU, we have to stop all the VCPUs because interrupts 
>>>> can be
>>>> + * migrated while we don't hold the IRQ locks and we don't want to 
>>>> be chasing
>>>> + * moving targets.
>>>>    *
>>>>    * For GICv2 private interrupts we don't have to do anything 
>>>> because
>>>>    * userspace accesses to the VGIC state already require all VCPUs 
>>>> to be
>>>> @@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu 
>>>> *vcpu,
>>>>    */
>>>>   static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 
>>>> intid)
>>>>   {
>>>> -    if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 
>>>> ||
>>>> +    if ((vcpu->kvm->arch.vgic.vgic_model == 
>>>> KVM_DEV_TYPE_ARM_VGIC_V3 &&
>>>> +         vcpu == kvm_get_running_vcpu()) ||
>>> 
>>> Thanks Marc for the patch!
>>> 
>>> I think, you mean not equal to?
>>> +           vcpu != kvm_get_running_vcpu()) ||
>> 
>> Yeah, exactly. I woke up this morning realising this patch was
>> *almost* right. Don't write patches like this after a long day
>> at work...
>> 
>>> With the change to not-equal, the issue is fixed and I could see the
>>> NestedVM booting is pretty fast with higher number of cores as well.
>> 
>> Good, thanks for testing it. I'll roll up an actual patch for that
>> and stick it in the monster queue.
> 
> Thanks, Please pull patch 3/3 also to nv-6.2 tree along with this
> patch. I will move my setup to nv-6.2 once these patches are in.

3/3 should already be in the branch, merged with the shadow
S2 fault handling.

         M.