Message ID | 20220824060304.21128-1-gankulkarni@os.amperecomputing.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: arm64: nv: Fixes for Nested Virtualization issues | expand |
Hi Marc, Any review comments on this series? On 24-08-2022 11:33 am, Ganapatrao Kulkarni wrote: > This series contains 3 fixes which were found while testing > ARM64 Nested Virtualization patch series. > > First patch avoids the restart of hrtimer when timer interrupt is > fired/forwarded to Guest-Hypervisor. > > Second patch fixes the vtimer interrupt drop from the Guest-Hypervisor. > > Third patch fixes the NestedVM boot hang seen when Guest Hypersior > configured with 64K pagesize where as Host Hypervisor with 4K. > > These patches are rebased on Nested Virtualization V6 patchset[1]. > > [1] https://www.spinics.net/lists/kvm/msg265656.html > > D Scott Phillips (1): > KVM: arm64: nv: only emulate timers that have not yet fired > > Ganapatrao Kulkarni (2): > KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired. > KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than > block size. > > arch/arm64/kvm/arch_timer.c | 8 +++++++- > arch/arm64/kvm/mmu.c | 2 +- > 2 files changed, 8 insertions(+), 2 deletions(-) > Thanks, Ganapat
On Mon, 10 Oct 2022 06:56:31 +0100, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > Hi Marc, > > Any review comments on this series? Not yet. So far, the NV stuff is put on ice until I can source some actual HW to make the development less painful. M.
Hi Marc, On 24-08-2022 11:33 am, Ganapatrao Kulkarni wrote: > This series contains 3 fixes which were found while testing > ARM64 Nested Virtualization patch series. > > First patch avoids the restart of hrtimer when timer interrupt is > fired/forwarded to Guest-Hypervisor. > > Second patch fixes the vtimer interrupt drop from the Guest-Hypervisor. > > Third patch fixes the NestedVM boot hang seen when Guest Hypersior > configured with 64K pagesize where as Host Hypervisor with 4K. > > These patches are rebased on Nested Virtualization V6 patchset[1]. If I boot a Guest Hypervisor with more cores and then booting of a NestedVM with equal number of cores or booting multiple NestedVMs(simultaneously) with lower number of cores is resulting in very slow booting and some time RCU soft-lockup of a NestedVM. This I have debugged and turned out to be due to many SGI are getting asserted to all vCPUs of a Guest-Hypervisor when Guest-Hypervisor KVM code prepares NestedVM for WFI wakeup/return. When Guest Hypervisor prepares NestedVM while returning/resuming from WFI, it is loading guest-context, vGIC and timer contexts etc. The function gic_poke_irq (called from irq_set_irqchip_state with spinlock held) writes to register GICD_ISACTIVER in Guest-Hypervisor's KVM code resulting in mem-abort trap to Host Hypervisor. Host Hypervisor as part of handling the guest mem abort, function io_mem_abort is called in turn vgic_mmio_write_sactive, which prepares every vCPU of Guest Hypervisor by calling SGI. The number of SGI/IPI calls goes exponentially high when more and more cores are used to boot Guest Hypervisor. Code trace: At Guest-hypervisor: kvm_timer_vcpu_load->kvm_timer_vcpu_load_gic->set_timer_irq_phys_active-> irq_set_irqchip_state->gic_poke_irq At Host-Hypervisor: io_mem_abort-> kvm_io_bus_write->__kvm_io_bus_write->dispatch_mmio_write-> vgic_mmio_write_sactive->vgic_access_active_prepare-> kvm_kick_many_cpus->smp_call_function_many I am currently working around this with "nohlt" kernel param to NestedVM. Any suggestions to handle/fix this case/issue and avoid the slowness of booting of NestedVM with more cores? Note: Guest-Hypervisor and NestedVM are using default kernel installed using Fedora 36 iso. > > [1] https://www.spinics.net/lists/kvm/msg265656.html > > D Scott Phillips (1): > KVM: arm64: nv: only emulate timers that have not yet fired > > Ganapatrao Kulkarni (2): > KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired. > KVM: arm64: nv: Avoid block mapping if max_map_size is smaller than > block size. > > arch/arm64/kvm/arch_timer.c | 8 +++++++- > arch/arm64/kvm/mmu.c | 2 +- > 2 files changed, 8 insertions(+), 2 deletions(-) > Thanks, Ganapat
Hi Ganapatrao, On Tue, 10 Jan 2023 12:17:20 +0000, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > > Hi Marc, > > On 24-08-2022 11:33 am, Ganapatrao Kulkarni wrote: > > This series contains 3 fixes which were found while testing > > ARM64 Nested Virtualization patch series. > > > > First patch avoids the restart of hrtimer when timer interrupt is > > fired/forwarded to Guest-Hypervisor. > > > > Second patch fixes the vtimer interrupt drop from the Guest-Hypervisor. > > > > Third patch fixes the NestedVM boot hang seen when Guest Hypersior > > configured with 64K pagesize where as Host Hypervisor with 4K. > > > > These patches are rebased on Nested Virtualization V6 patchset[1]. > > If I boot a Guest Hypervisor with more cores and then booting of a > NestedVM with equal number of cores or booting multiple > NestedVMs(simultaneously) with lower number of cores is resulting in > very slow booting and some time RCU soft-lockup of a NestedVM. This I > have debugged and turned out to be due to many SGI are getting > asserted to all vCPUs of a Guest-Hypervisor when Guest-Hypervisor KVM > code prepares NestedVM for WFI wakeup/return. > > When Guest Hypervisor prepares NestedVM while returning/resuming from > WFI, it is loading guest-context, vGIC and timer contexts etc. > The function gic_poke_irq (called from irq_set_irqchip_state with > spinlock held) writes to register GICD_ISACTIVER in Guest-Hypervisor's > KVM code resulting in mem-abort trap to Host Hypervisor. Host > Hypervisor as part of handling the guest mem abort, function > io_mem_abort is called in turn vgic_mmio_write_sactive, which > prepares every vCPU of Guest Hypervisor by calling SGI. The number of > SGI/IPI calls goes exponentially high when more and more cores are > used to boot Guest Hypervisor. This really isn't surprising. NV combined with oversubscribing is bound to be absolutely terrible. The write to GICD_ISACTIVER is only symptomatic of the interrupt amplification problem that already exists without NV (any IPI in a guest is likely to result in at least one IPI in the host). Short of having direct injection of interrupts for *all* interrupts, you end-up with this sort of emulation that relies on being able to synchronise all CPUs. Is it bad? Yes. Very. > > Code trace: > At Guest-hypervisor: > kvm_timer_vcpu_load->kvm_timer_vcpu_load_gic->set_timer_irq_phys_active-> > irq_set_irqchip_state->gic_poke_irq > > At Host-Hypervisor: io_mem_abort-> > kvm_io_bus_write->__kvm_io_bus_write->dispatch_mmio_write-> > vgic_mmio_write_sactive->vgic_access_active_prepare-> > kvm_kick_many_cpus->smp_call_function_many > > I am currently working around this with "nohlt" kernel param to > NestedVM. Any suggestions to handle/fix this case/issue and avoid the > slowness of booting of NestedVM with more cores? At the moment, I'm focussing on correctness rather than performance. Maybe we can restrict the conditions in which we perform this synchronisation, but that's pretty low on my radar at the moment. Once things are in a state that can be merged and that works correctly, we can look into it. Thanks, M.
On Tue, 10 Jan 2023 12:17:20 +0000, Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > I am currently working around this with "nohlt" kernel param to > NestedVM. Any suggestions to handle/fix this case/issue and avoid the > slowness of booting of NestedVM with more cores? > > Note: Guest-Hypervisor and NestedVM are using default kernel installed > using Fedora 36 iso. Despite what I said earlier, I have a vague idea here, thanks to the interesting call traces that you provided (this is really awesome work BTW, given how hard it is to trace things across 3 different kernels). We can slightly limit the impact of the prepare/finish sequence if the guest hypervisor only accesses the active registers for SGIs/PPIs on the vcpu that owns them, forbidding any cross-CPU-to-redistributor access. Something along these lines, which is only boot-tested. Let me know how this fares for you. Thanks, M. diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c index b32d434c1d4a..1cca45be5335 100644 --- a/arch/arm64/kvm/vgic/vgic-mmio.c +++ b/arch/arm64/kvm/vgic/vgic-mmio.c @@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu, * active state can be overwritten when the VCPU's state is synced coming back * from the guest. * - * For shared interrupts as well as GICv3 private interrupts, we have to - * stop all the VCPUs because interrupts can be migrated while we don't hold - * the IRQ locks and we don't want to be chasing moving targets. + * For shared interrupts as well as GICv3 private interrupts accessed from the + * non-owning CPU, we have to stop all the VCPUs because interrupts can be + * migrated while we don't hold the IRQ locks and we don't want to be chasing + * moving targets. * * For GICv2 private interrupts we don't have to do anything because * userspace accesses to the VGIC state already require all VCPUs to be @@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu, */ static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid) { - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || + if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 && + vcpu == kvm_get_running_vcpu()) || intid >= VGIC_NR_PRIVATE_IRQS) kvm_arm_halt_guest(vcpu->kvm); } @@ -492,7 +494,8 @@ static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid) /* See vgic_access_active_prepare */ static void vgic_access_active_finish(struct kvm_vcpu *vcpu, u32 intid) { - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || + if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 && + vcpu == kvm_get_running_vcpu()) || intid >= VGIC_NR_PRIVATE_IRQS) kvm_arm_resume_guest(vcpu->kvm); }
On Tue, 10 Jan 2023 21:54:39 +0000, Marc Zyngier <maz@kernel.org> wrote: > > On Tue, 10 Jan 2023 12:17:20 +0000, > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: > > > > I am currently working around this with "nohlt" kernel param to > > NestedVM. Any suggestions to handle/fix this case/issue and avoid the > > slowness of booting of NestedVM with more cores? > > > > Note: Guest-Hypervisor and NestedVM are using default kernel installed > > using Fedora 36 iso. > > Despite what I said earlier, I have a vague idea here, thanks to the > interesting call traces that you provided (this is really awesome work > BTW, given how hard it is to trace things across 3 different kernels). > > We can slightly limit the impact of the prepare/finish sequence if the > guest hypervisor only accesses the active registers for SGIs/PPIs on > the vcpu that owns them, forbidding any cross-CPU-to-redistributor > access. > > Something along these lines, which is only boot-tested. Let me know > how this fares for you. > > Thanks, > > M. > > diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c > index b32d434c1d4a..1cca45be5335 100644 > --- a/arch/arm64/kvm/vgic/vgic-mmio.c > +++ b/arch/arm64/kvm/vgic/vgic-mmio.c > @@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu, > * active state can be overwritten when the VCPU's state is synced coming back > * from the guest. > * > - * For shared interrupts as well as GICv3 private interrupts, we have to > - * stop all the VCPUs because interrupts can be migrated while we don't hold > - * the IRQ locks and we don't want to be chasing moving targets. > + * For shared interrupts as well as GICv3 private interrupts accessed from the > + * non-owning CPU, we have to stop all the VCPUs because interrupts can be > + * migrated while we don't hold the IRQ locks and we don't want to be chasing > + * moving targets. > * > * For GICv2 private interrupts we don't have to do anything because > * userspace accesses to the VGIC state already require all VCPUs to be > @@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu, > */ > static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid) > { > - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || > + if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 && > + vcpu == kvm_get_running_vcpu()) || This should obviously be + vcpu != kvm_get_running_vcpu()) || > intid >= VGIC_NR_PRIVATE_IRQS) > kvm_arm_halt_guest(vcpu->kvm); > } > @@ -492,7 +494,8 @@ static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid) > /* See vgic_access_active_prepare */ > static void vgic_access_active_finish(struct kvm_vcpu *vcpu, u32 intid) > { > - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || > + if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 && > + vcpu == kvm_get_running_vcpu()) || Same here. > intid >= VGIC_NR_PRIVATE_IRQS) > kvm_arm_resume_guest(vcpu->kvm); > } Thanks, M.
On 11-01-2023 03:24 am, Marc Zyngier wrote: > On Tue, 10 Jan 2023 12:17:20 +0000, > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >> >> I am currently working around this with "nohlt" kernel param to >> NestedVM. Any suggestions to handle/fix this case/issue and avoid the >> slowness of booting of NestedVM with more cores? >> >> Note: Guest-Hypervisor and NestedVM are using default kernel installed >> using Fedora 36 iso. > > Despite what I said earlier, I have a vague idea here, thanks to the > interesting call traces that you provided (this is really awesome work > BTW, given how hard it is to trace things across 3 different kernels). > > We can slightly limit the impact of the prepare/finish sequence if the > guest hypervisor only accesses the active registers for SGIs/PPIs on > the vcpu that owns them, forbidding any cross-CPU-to-redistributor > access. > > Something along these lines, which is only boot-tested. Let me know > how this fares for you. > > Thanks, > > M. > > diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c > index b32d434c1d4a..1cca45be5335 100644 > --- a/arch/arm64/kvm/vgic/vgic-mmio.c > +++ b/arch/arm64/kvm/vgic/vgic-mmio.c > @@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu, > * active state can be overwritten when the VCPU's state is synced coming back > * from the guest. > * > - * For shared interrupts as well as GICv3 private interrupts, we have to > - * stop all the VCPUs because interrupts can be migrated while we don't hold > - * the IRQ locks and we don't want to be chasing moving targets. > + * For shared interrupts as well as GICv3 private interrupts accessed from the > + * non-owning CPU, we have to stop all the VCPUs because interrupts can be > + * migrated while we don't hold the IRQ locks and we don't want to be chasing > + * moving targets. > * > * For GICv2 private interrupts we don't have to do anything because > * userspace accesses to the VGIC state already require all VCPUs to be > @@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu, > */ > static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid) > { > - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || > + if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 && > + vcpu == kvm_get_running_vcpu()) || Thanks Marc for the patch! I think, you mean not equal to? + vcpu != kvm_get_running_vcpu()) || With the change to not-equal, the issue is fixed and I could see the NestedVM booting is pretty fast with higher number of cores as well. > intid >= VGIC_NR_PRIVATE_IRQS) > kvm_arm_halt_guest(vcpu->kvm); > } > @@ -492,7 +494,8 @@ static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid) > /* See vgic_access_active_prepare */ > static void vgic_access_active_finish(struct kvm_vcpu *vcpu, u32 intid) > { > - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || > + if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 && > + vcpu == kvm_get_running_vcpu()) || Same, not equal to. > intid >= VGIC_NR_PRIVATE_IRQS) > kvm_arm_resume_guest(vcpu->kvm); > } > Thanks, Ganapat
On 11-01-2023 02:16 pm, Ganapatrao Kulkarni wrote: > > > On 11-01-2023 03:24 am, Marc Zyngier wrote: >> On Tue, 10 Jan 2023 12:17:20 +0000, >> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >>> >>> I am currently working around this with "nohlt" kernel param to >>> NestedVM. Any suggestions to handle/fix this case/issue and avoid the >>> slowness of booting of NestedVM with more cores? >>> >>> Note: Guest-Hypervisor and NestedVM are using default kernel installed >>> using Fedora 36 iso. >> >> Despite what I said earlier, I have a vague idea here, thanks to the >> interesting call traces that you provided (this is really awesome work >> BTW, given how hard it is to trace things across 3 different kernels). >> >> We can slightly limit the impact of the prepare/finish sequence if the >> guest hypervisor only accesses the active registers for SGIs/PPIs on >> the vcpu that owns them, forbidding any cross-CPU-to-redistributor >> access. >> >> Something along these lines, which is only boot-tested. Let me know >> how this fares for you. >> >> Thanks, >> >> M. >> >> diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c >> b/arch/arm64/kvm/vgic/vgic-mmio.c >> index b32d434c1d4a..1cca45be5335 100644 >> --- a/arch/arm64/kvm/vgic/vgic-mmio.c >> +++ b/arch/arm64/kvm/vgic/vgic-mmio.c >> @@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu >> *vcpu, >> * active state can be overwritten when the VCPU's state is synced >> coming back >> * from the guest. >> * >> - * For shared interrupts as well as GICv3 private interrupts, we have to >> - * stop all the VCPUs because interrupts can be migrated while we >> don't hold >> - * the IRQ locks and we don't want to be chasing moving targets. >> + * For shared interrupts as well as GICv3 private interrupts accessed >> from the >> + * non-owning CPU, we have to stop all the VCPUs because interrupts >> can be >> + * migrated while we don't hold the IRQ locks and we don't want to be >> chasing >> + * moving targets. >> * >> * For GICv2 private interrupts we don't have to do anything because >> * userspace accesses to the VGIC state already require all VCPUs to be >> @@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu >> *vcpu, >> */ >> static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 >> intid) >> { >> - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || >> + if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 && >> + vcpu == kvm_get_running_vcpu()) || > > Thanks Marc for the patch! > > I think, you mean not equal to? Sorry, I did not see your follow up email. > + vcpu != kvm_get_running_vcpu()) || > > With the change to not-equal, the issue is fixed and I could see the > NestedVM booting is pretty fast with higher number of cores as well. > >> intid >= VGIC_NR_PRIVATE_IRQS) >> kvm_arm_halt_guest(vcpu->kvm); >> } >> @@ -492,7 +494,8 @@ static void vgic_access_active_prepare(struct >> kvm_vcpu *vcpu, u32 intid) >> /* See vgic_access_active_prepare */ >> static void vgic_access_active_finish(struct kvm_vcpu *vcpu, u32 intid) >> { >> - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || >> + if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 && >> + vcpu == kvm_get_running_vcpu()) || > > Same, not equal to. >> intid >= VGIC_NR_PRIVATE_IRQS) >> kvm_arm_resume_guest(vcpu->kvm); >> } >> > > > Thanks, > Ganapat
On 2023-01-11 08:46, Ganapatrao Kulkarni wrote: > On 11-01-2023 03:24 am, Marc Zyngier wrote: >> On Tue, 10 Jan 2023 12:17:20 +0000, >> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >>> >>> I am currently working around this with "nohlt" kernel param to >>> NestedVM. Any suggestions to handle/fix this case/issue and avoid the >>> slowness of booting of NestedVM with more cores? >>> >>> Note: Guest-Hypervisor and NestedVM are using default kernel >>> installed >>> using Fedora 36 iso. >> >> Despite what I said earlier, I have a vague idea here, thanks to the >> interesting call traces that you provided (this is really awesome work >> BTW, given how hard it is to trace things across 3 different kernels). >> >> We can slightly limit the impact of the prepare/finish sequence if the >> guest hypervisor only accesses the active registers for SGIs/PPIs on >> the vcpu that owns them, forbidding any cross-CPU-to-redistributor >> access. >> >> Something along these lines, which is only boot-tested. Let me know >> how this fares for you. >> >> Thanks, >> >> M. >> >> diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c >> b/arch/arm64/kvm/vgic/vgic-mmio.c >> index b32d434c1d4a..1cca45be5335 100644 >> --- a/arch/arm64/kvm/vgic/vgic-mmio.c >> +++ b/arch/arm64/kvm/vgic/vgic-mmio.c >> @@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu >> *vcpu, >> * active state can be overwritten when the VCPU's state is synced >> coming back >> * from the guest. >> * >> - * For shared interrupts as well as GICv3 private interrupts, we have >> to >> - * stop all the VCPUs because interrupts can be migrated while we >> don't hold >> - * the IRQ locks and we don't want to be chasing moving targets. >> + * For shared interrupts as well as GICv3 private interrupts accessed >> from the >> + * non-owning CPU, we have to stop all the VCPUs because interrupts >> can be >> + * migrated while we don't hold the IRQ locks and we don't want to be >> chasing >> + * moving targets. >> * >> * For GICv2 private interrupts we don't have to do anything because >> * userspace accesses to the VGIC state already require all VCPUs to >> be >> @@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu >> *vcpu, >> */ >> static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 >> intid) >> { >> - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || >> + if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 && >> + vcpu == kvm_get_running_vcpu()) || > > Thanks Marc for the patch! > > I think, you mean not equal to? > + vcpu != kvm_get_running_vcpu()) || Yeah, exactly. I woke up this morning realising this patch was *almost* right. Don't write patches like this after a long day at work... > With the change to not-equal, the issue is fixed and I could see the > NestedVM booting is pretty fast with higher number of cores as well. Good, thanks for testing it. I'll roll up an actual patch for that and stick it in the monster queue. Cheers, M.
On 11-01-2023 05:09 pm, Marc Zyngier wrote: > On 2023-01-11 08:46, Ganapatrao Kulkarni wrote: >> On 11-01-2023 03:24 am, Marc Zyngier wrote: >>> On Tue, 10 Jan 2023 12:17:20 +0000, >>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >>>> >>>> I am currently working around this with "nohlt" kernel param to >>>> NestedVM. Any suggestions to handle/fix this case/issue and avoid the >>>> slowness of booting of NestedVM with more cores? >>>> >>>> Note: Guest-Hypervisor and NestedVM are using default kernel installed >>>> using Fedora 36 iso. >>> >>> Despite what I said earlier, I have a vague idea here, thanks to the >>> interesting call traces that you provided (this is really awesome work >>> BTW, given how hard it is to trace things across 3 different kernels). >>> >>> We can slightly limit the impact of the prepare/finish sequence if the >>> guest hypervisor only accesses the active registers for SGIs/PPIs on >>> the vcpu that owns them, forbidding any cross-CPU-to-redistributor >>> access. >>> >>> Something along these lines, which is only boot-tested. Let me know >>> how this fares for you. >>> >>> Thanks, >>> >>> M. >>> >>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c >>> b/arch/arm64/kvm/vgic/vgic-mmio.c >>> index b32d434c1d4a..1cca45be5335 100644 >>> --- a/arch/arm64/kvm/vgic/vgic-mmio.c >>> +++ b/arch/arm64/kvm/vgic/vgic-mmio.c >>> @@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu >>> *vcpu, >>> * active state can be overwritten when the VCPU's state is synced >>> coming back >>> * from the guest. >>> * >>> - * For shared interrupts as well as GICv3 private interrupts, we >>> have to >>> - * stop all the VCPUs because interrupts can be migrated while we >>> don't hold >>> - * the IRQ locks and we don't want to be chasing moving targets. >>> + * For shared interrupts as well as GICv3 private interrupts >>> accessed from the >>> + * non-owning CPU, we have to stop all the VCPUs because interrupts >>> can be >>> + * migrated while we don't hold the IRQ locks and we don't want to >>> be chasing >>> + * moving targets. >>> * >>> * For GICv2 private interrupts we don't have to do anything because >>> * userspace accesses to the VGIC state already require all VCPUs >>> to be >>> @@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu >>> *vcpu, >>> */ >>> static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 >>> intid) >>> { >>> - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || >>> + if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 && >>> + vcpu == kvm_get_running_vcpu()) || >> >> Thanks Marc for the patch! >> >> I think, you mean not equal to? >> + vcpu != kvm_get_running_vcpu()) || > > Yeah, exactly. I woke up this morning realising this patch was > *almost* right. Don't write patches like this after a long day > at work... > >> With the change to not-equal, the issue is fixed and I could see the >> NestedVM booting is pretty fast with higher number of cores as well. > > Good, thanks for testing it. I'll roll up an actual patch for that > and stick it in the monster queue. Thanks, Please pull patch 3/3 also to nv-6.2 tree along with this patch. I will move my setup to nv-6.2 once these patches are in. > > Cheers, > > M. Thanks, Ganapat
On 2023-01-11 12:46, Ganapatrao Kulkarni wrote: > On 11-01-2023 05:09 pm, Marc Zyngier wrote: >> On 2023-01-11 08:46, Ganapatrao Kulkarni wrote: >>> On 11-01-2023 03:24 am, Marc Zyngier wrote: >>>> On Tue, 10 Jan 2023 12:17:20 +0000, >>>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote: >>>>> >>>>> I am currently working around this with "nohlt" kernel param to >>>>> NestedVM. Any suggestions to handle/fix this case/issue and avoid >>>>> the >>>>> slowness of booting of NestedVM with more cores? >>>>> >>>>> Note: Guest-Hypervisor and NestedVM are using default kernel >>>>> installed >>>>> using Fedora 36 iso. >>>> >>>> Despite what I said earlier, I have a vague idea here, thanks to the >>>> interesting call traces that you provided (this is really awesome >>>> work >>>> BTW, given how hard it is to trace things across 3 different >>>> kernels). >>>> >>>> We can slightly limit the impact of the prepare/finish sequence if >>>> the >>>> guest hypervisor only accesses the active registers for SGIs/PPIs on >>>> the vcpu that owns them, forbidding any cross-CPU-to-redistributor >>>> access. >>>> >>>> Something along these lines, which is only boot-tested. Let me know >>>> how this fares for you. >>>> >>>> Thanks, >>>> >>>> M. >>>> >>>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c >>>> b/arch/arm64/kvm/vgic/vgic-mmio.c >>>> index b32d434c1d4a..1cca45be5335 100644 >>>> --- a/arch/arm64/kvm/vgic/vgic-mmio.c >>>> +++ b/arch/arm64/kvm/vgic/vgic-mmio.c >>>> @@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu >>>> *vcpu, >>>> * active state can be overwritten when the VCPU's state is synced >>>> coming back >>>> * from the guest. >>>> * >>>> - * For shared interrupts as well as GICv3 private interrupts, we >>>> have to >>>> - * stop all the VCPUs because interrupts can be migrated while we >>>> don't hold >>>> - * the IRQ locks and we don't want to be chasing moving targets. >>>> + * For shared interrupts as well as GICv3 private interrupts >>>> accessed from the >>>> + * non-owning CPU, we have to stop all the VCPUs because interrupts >>>> can be >>>> + * migrated while we don't hold the IRQ locks and we don't want to >>>> be chasing >>>> + * moving targets. >>>> * >>>> * For GICv2 private interrupts we don't have to do anything >>>> because >>>> * userspace accesses to the VGIC state already require all VCPUs >>>> to be >>>> @@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu >>>> *vcpu, >>>> */ >>>> static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 >>>> intid) >>>> { >>>> - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 >>>> || >>>> + if ((vcpu->kvm->arch.vgic.vgic_model == >>>> KVM_DEV_TYPE_ARM_VGIC_V3 && >>>> + vcpu == kvm_get_running_vcpu()) || >>> >>> Thanks Marc for the patch! >>> >>> I think, you mean not equal to? >>> + vcpu != kvm_get_running_vcpu()) || >> >> Yeah, exactly. I woke up this morning realising this patch was >> *almost* right. Don't write patches like this after a long day >> at work... >> >>> With the change to not-equal, the issue is fixed and I could see the >>> NestedVM booting is pretty fast with higher number of cores as well. >> >> Good, thanks for testing it. I'll roll up an actual patch for that >> and stick it in the monster queue. > > Thanks, Please pull patch 3/3 also to nv-6.2 tree along with this > patch. I will move my setup to nv-6.2 once these patches are in. 3/3 should already be in the branch, merged with the shadow S2 fault handling. M.