Message ID | 20250206152100.1107909-1-maz@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | KVM: arm64: Assorted vgic fixes for 6.14 | expand |
On Thu, Feb 06, 2025 at 03:20:57PM +0000, Marc Zyngier wrote: > Alexander, while fuzzing KVM/arm64, found an annoying set of problems, > all stemming from the fact that the vgic can be destroyed in parallel > with the rest of the guest still being live. > > Yes, this is annoying. > > Fixing this is not going to happen overnight (though I have some > ideas), but we can make what we have today a bit more robust. > > This is what patch #2 is doing. Patch #1 is just removing a loud > WARN_ON() that serves little purpose, and patch #3 fixes the actual > bug that Alex reported. > > Hopefully, none of that is controversial... I'm a bit grumbly about slapping bandaids on the problem, but given the fact that glider reported all of this a while ago and we still haven't fixed it is enough to justify these patches. So: Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
On Fri, 07 Feb 2025 18:03:55 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > On Thu, Feb 06, 2025 at 03:20:57PM +0000, Marc Zyngier wrote: > > Alexander, while fuzzing KVM/arm64, found an annoying set of problems, > > all stemming from the fact that the vgic can be destroyed in parallel > > with the rest of the guest still being live. > > > > Yes, this is annoying. > > > > Fixing this is not going to happen overnight (though I have some > > ideas), but we can make what we have today a bit more robust. > > > > This is what patch #2 is doing. Patch #1 is just removing a loud > > WARN_ON() that serves little purpose, and patch #3 fixes the actual > > bug that Alex reported. > > > > Hopefully, none of that is controversial... > > I'm a bit grumbly about slapping bandaids on the problem, but given the > fact that glider reported all of this a while ago and we still haven't > fixed it is enough to justify these patches. So: Yeah, same here. I'm starting to think that we need to either prevent the vgic from being asynchronously destroyed, or start refcounting all IRQs just like LPIs. Which is very annoying since we don't have a global namespace for SGIs and PPIs. But maybe simply refcounting the vgic itself would be enough. Thoughts? > Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Thanks, M.
On Fri, Feb 07, 2025 at 06:10:49PM +0000, Marc Zyngier wrote: > On Fri, 07 Feb 2025 18:03:55 +0000, > Oliver Upton <oliver.upton@linux.dev> wrote: > > > > On Thu, Feb 06, 2025 at 03:20:57PM +0000, Marc Zyngier wrote: > > > Alexander, while fuzzing KVM/arm64, found an annoying set of problems, > > > all stemming from the fact that the vgic can be destroyed in parallel > > > with the rest of the guest still being live. > > > > > > Yes, this is annoying. > > > > > > Fixing this is not going to happen overnight (though I have some > > > ideas), but we can make what we have today a bit more robust. > > > > > > This is what patch #2 is doing. Patch #1 is just removing a loud > > > WARN_ON() that serves little purpose, and patch #3 fixes the actual > > > bug that Alex reported. > > > > > > Hopefully, none of that is controversial... > > > > I'm a bit grumbly about slapping bandaids on the problem, but given the > > fact that glider reported all of this a while ago and we still haven't > > fixed it is enough to justify these patches. So: > > Yeah, same here. I'm starting to think that we need to either prevent > the vgic from being asynchronously destroyed, or start refcounting all > IRQs just like LPIs. Which is very annoying since we don't have a > global namespace for SGIs and PPIs. > > But maybe simply refcounting the vgic itself would be enough. > Thoughts? So would we refcount on the owning structure for a particular IRQ? i.e. private IRQs are counted against the owning vCPU and SPIs against the distributor? Adding a vgic_put_vcpu_irq() could help disambiguate private IRQs too.
On Fri, 07 Feb 2025 18:50:32 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > On Fri, Feb 07, 2025 at 06:10:49PM +0000, Marc Zyngier wrote: > > On Fri, 07 Feb 2025 18:03:55 +0000, > > Oliver Upton <oliver.upton@linux.dev> wrote: > > > > > > On Thu, Feb 06, 2025 at 03:20:57PM +0000, Marc Zyngier wrote: > > > > Alexander, while fuzzing KVM/arm64, found an annoying set of problems, > > > > all stemming from the fact that the vgic can be destroyed in parallel > > > > with the rest of the guest still being live. > > > > > > > > Yes, this is annoying. > > > > > > > > Fixing this is not going to happen overnight (though I have some > > > > ideas), but we can make what we have today a bit more robust. > > > > > > > > This is what patch #2 is doing. Patch #1 is just removing a loud > > > > WARN_ON() that serves little purpose, and patch #3 fixes the actual > > > > bug that Alex reported. > > > > > > > > Hopefully, none of that is controversial... > > > > > > I'm a bit grumbly about slapping bandaids on the problem, but given the > > > fact that glider reported all of this a while ago and we still haven't > > > fixed it is enough to justify these patches. So: > > > > Yeah, same here. I'm starting to think that we need to either prevent > > the vgic from being asynchronously destroyed, or start refcounting all > > IRQs just like LPIs. Which is very annoying since we don't have a > > global namespace for SGIs and PPIs. > > > > But maybe simply refcounting the vgic itself would be enough. > > Thoughts? > > So would we refcount on the owning structure for a particular IRQ? i.e. > private IRQs are counted against the owning vCPU and SPIs against the > distributor? > > Adding a vgic_put_vcpu_irq() could help disambiguate private IRQs too. Having slept (yay!) on it, I don't think this is enough. The observation is that a lot of the issues we have are triggered by the private interrupt array being allocated on demand on VGIC creation. We had way fewer problems when the private interrupts were directly part of struct vcpu, which really was hiding the basic problem. We have an annoying discrepancy between knowing that a GIC is present (VM-wide state) and the interrupts for a vcpu being allocated (vcpu state). This is what Alexander's fuzzing is all about (create a vcpu, init it, and while this is happening, create/destroy a vgic). So it isn't just destruction, but also creation that is problematic, and I'm not sure that refcounting is the correct solution anymore. Now, for the issue that this series is trying to fix, I came up with an alternative approach, based on the above epiphany. We currently allocate private interrupts in two locations: - at vcpu creation time *if* a vgic has already been created - from vgic_init(), which is itself called on: - KVM_DEV_ARM_VGIC_CTRL_INIT for the modern stuff, - or lazily for anything old and specific to GICv2 (huh). Crucially, there is a gaping window between the creation of a vgic *after* a vcpu has been created, and its initialisation. Anything that checks for irqchip_in_kernel() during that interval may assume that private interrupts are allocated, because this is what it used to be. Waiting for vgic_init() to mop-up the "early" vcpus is what creates the issue. Given that, the solution is obvious: hoist the early vcpu mop-up from vgic_init() into kvm_vgic_create(). This guarantees that any vcpu observing irqchip_in_kernel() being true will also see its private interrupts. The patch below fixes it for me (tested with kvmtool, qemu, and Alex's tickling reproducer). Thoughts? M. From c150c2300d5b8918909ad46a769aae864a58025f Mon Sep 17 00:00:00 2001 From: Marc Zyngier <maz@kernel.org> Date: Sat, 8 Feb 2025 14:58:43 +0000 Subject: [PATCH] KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic() If userspace creates vcpus, then a vgic, we end-up in a situation where irqchip_in_kernel() will return true, but no private interrupt has been allocated for these vcpus. This situation will continue until userspace initialises the vgic, at which point we fix the early vcpus. Should a vcpu run or be initialised in the interval, bad things may happen. An obvious solution is to move this fix-up phase to the point where the vgic is created. This ensures that from that point onwards, all vcpus have their private interrupts, as new vcpus will directly allocate them. With that, we have the invariant that when irqchip_in_kernel() is true, all vcpus have their private interrupts. Signed-off-by: Marc Zyngier <maz@kernel.org> --- arch/arm64/kvm/vgic/vgic-init.c | 74 ++++++++++++++++----------------- 1 file changed, 37 insertions(+), 37 deletions(-) diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c index c7a82bd0c276d..1f33e71c2a731 100644 --- a/arch/arm64/kvm/vgic/vgic-init.c +++ b/arch/arm64/kvm/vgic/vgic-init.c @@ -34,9 +34,9 @@ * * CPU Interface: * - * - kvm_vgic_vcpu_init(): initialization of static data that - * doesn't depend on any sizing information or emulation type. No - * allocation is allowed there. + * - kvm_vgic_vcpu_init(): initialization of static data that doesn't depend + * on any sizing information. Private interrupts are allocated if not + * already allocated at vgic-creation time. */ /* EARLY INIT */ @@ -58,6 +58,8 @@ void kvm_vgic_early_init(struct kvm *kvm) /* CREATION */ +static int vgic_allocate_private_irqs_locked(struct kvm_vcpu *vcpu, u32 type); + /** * kvm_vgic_create: triggered by the instantiation of the VGIC device by * user space, either through the legacy KVM_CREATE_IRQCHIP ioctl (v2 only) @@ -112,6 +114,22 @@ int kvm_vgic_create(struct kvm *kvm, u32 type) goto out_unlock; } + kvm_for_each_vcpu(i, vcpu, kvm) { + ret = vgic_allocate_private_irqs_locked(vcpu, type); + if (ret) + break; + } + + if (ret) { + kvm_for_each_vcpu(i, vcpu, kvm) { + struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; + kfree(vgic_cpu->private_irqs); + vgic_cpu->private_irqs = NULL; + } + + goto out_unlock; + } + kvm->arch.vgic.in_kernel = true; kvm->arch.vgic.vgic_model = type; @@ -201,7 +219,7 @@ int kvm_vgic_vcpu_nv_init(struct kvm_vcpu *vcpu) return ret; } -static int vgic_allocate_private_irqs_locked(struct kvm_vcpu *vcpu) +static int vgic_allocate_private_irqs_locked(struct kvm_vcpu *vcpu, u32 type) { struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; int i; @@ -239,17 +257,28 @@ static int vgic_allocate_private_irqs_locked(struct kvm_vcpu *vcpu) /* PPIs */ irq->config = VGIC_CONFIG_LEVEL; } + + switch (type) { + case KVM_DEV_TYPE_ARM_VGIC_V3: + irq->group = 1; + irq->mpidr = kvm_vcpu_get_mpidr_aff(vcpu); + break; + case KVM_DEV_TYPE_ARM_VGIC_V2: + irq->group = 0; + irq->targets = BIT(vcpu->vcpu_id); + break; + } } return 0; } -static int vgic_allocate_private_irqs(struct kvm_vcpu *vcpu) +static int vgic_allocate_private_irqs(struct kvm_vcpu *vcpu, u32 type) { int ret; mutex_lock(&vcpu->kvm->arch.config_lock); - ret = vgic_allocate_private_irqs_locked(vcpu); + ret = vgic_allocate_private_irqs_locked(vcpu, type); mutex_unlock(&vcpu->kvm->arch.config_lock); return ret; @@ -279,7 +308,7 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) if (!irqchip_in_kernel(vcpu->kvm)) return 0; - ret = vgic_allocate_private_irqs(vcpu); + ret = vgic_allocate_private_irqs(vcpu, dist->vgic_model); if (ret) return ret; @@ -316,7 +345,7 @@ int vgic_init(struct kvm *kvm) { struct vgic_dist *dist = &kvm->arch.vgic; struct kvm_vcpu *vcpu; - int ret = 0, i; + int ret = 0; unsigned long idx; lockdep_assert_held(&kvm->arch.config_lock); @@ -336,35 +365,6 @@ int vgic_init(struct kvm *kvm) if (ret) goto out; - /* Initialize groups on CPUs created before the VGIC type was known */ - kvm_for_each_vcpu(idx, vcpu, kvm) { - ret = vgic_allocate_private_irqs_locked(vcpu); - if (ret) - goto out; - - for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++) { - struct vgic_irq *irq = vgic_get_vcpu_irq(vcpu, i); - - switch (dist->vgic_model) { - case KVM_DEV_TYPE_ARM_VGIC_V3: - irq->group = 1; - irq->mpidr = kvm_vcpu_get_mpidr_aff(vcpu); - break; - case KVM_DEV_TYPE_ARM_VGIC_V2: - irq->group = 0; - irq->targets = 1U << idx; - break; - default: - ret = -EINVAL; - } - - vgic_put_irq(kvm, irq); - - if (ret) - goto out; - } - } - /* * If we have GICv4.1 enabled, unconditionally request enable the * v4 support so that we get HW-accelerated vSGIs. Otherwise, only