[0/3] KVM: arm64: Assorted vgic fixes for 6.14

Message ID	20250206152100.1107909-1-maz@kernel.org (mailing list archive)
Headers	show Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org> From: Marc Zyngier <maz@kernel.org> To: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org Cc: Alexander Potapenko <glider@google.com>, Joey Gouly <joey.gouly@arm.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Oliver Upton <oliver.upton@linux.dev>, Zenghui Yu <yuzenghui@huawei.com> Subject: [PATCH 0/3] KVM: arm64: Assorted vgic fixes for 6.14 Date: Thu, 6 Feb 2025 15:20:57 +0000 Message-Id: <20250206152100.1107909-1-maz@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
Series	KVM: arm64: Assorted vgic fixes for 6.14 \| expand [0/3] KVM: arm64: Assorted vgic fixes for 6.14 [1/3] KVM: arm64: timer: Drop warning on failed interrupt signalling [2/3] KVM: arm64: vgic: Check for unallocated PPI/SPI arrays [3/3] KVM: arm64: vgic: Gracefully handle resetting an unallocated interrupt

Message ID

20250206152100.1107909-1-maz@kernel.org (mailing list archive)

Headers

From: Marc Zyngier <maz@kernel.org>
To: kvmarm@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org
Cc: Alexander Potapenko <glider@google.com>,
	Joey Gouly <joey.gouly@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	Zenghui Yu <yuzenghui@huawei.com>
Subject: [PATCH 0/3] KVM: arm64: Assorted vgic fixes for 6.14
Date: Thu,  6 Feb 2025 15:20:57 +0000
Message-Id: <20250206152100.1107909-1-maz@kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: 
 linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

Series

KVM: arm64: Assorted vgic fixes for 6.14 | expand

Message

Marc Zyngier Feb. 6, 2025, 3:20 p.m. UTC

Alexander, while fuzzing KVM/arm64, found an annoying set of problems,
all stemming from the fact that the vgic can be destroyed in parallel
with the rest of the guest still being live.

Yes, this is annoying.

Fixing this is not going to happen overnight (though I have some
ideas), but we can make what we have today a bit more robust.

This is what patch #2 is doing. Patch #1 is just removing a loud
WARN_ON() that serves little purpose, and patch #3 fixes the actual
bug that Alex reported.

Hopefully, none of that is controversial...

Marc Zyngier (3):
  KVM: arm64: timer: Drop warning on failed interrupt signalling
  KVM: arm64: vgic: Check for unallocated PPI/SPI arrays
  KVM: arm64: vgic: Gracefully handle resetting an unallocated interrupt

 arch/arm64/kvm/arch_timer.c | 16 +++++++---------
 arch/arm64/kvm/vgic/vgic.c  |  7 +++++++
 2 files changed, 14 insertions(+), 9 deletions(-)

Comments

Oliver Upton Feb. 7, 2025, 6:03 p.m. UTC | #1

On Thu, Feb 06, 2025 at 03:20:57PM +0000, Marc Zyngier wrote:
> Alexander, while fuzzing KVM/arm64, found an annoying set of problems,
> all stemming from the fact that the vgic can be destroyed in parallel
> with the rest of the guest still being live.
> 
> Yes, this is annoying.
> 
> Fixing this is not going to happen overnight (though I have some
> ideas), but we can make what we have today a bit more robust.
> 
> This is what patch #2 is doing. Patch #1 is just removing a loud
> WARN_ON() that serves little purpose, and patch #3 fixes the actual
> bug that Alex reported.
> 
> Hopefully, none of that is controversial...

I'm a bit grumbly about slapping bandaids on the problem, but given the
fact that glider reported all of this a while ago and we still haven't
fixed it is enough to justify these patches. So:

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>

Marc Zyngier Feb. 7, 2025, 6:10 p.m. UTC | #2

On Fri, 07 Feb 2025 18:03:55 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Thu, Feb 06, 2025 at 03:20:57PM +0000, Marc Zyngier wrote:
> > Alexander, while fuzzing KVM/arm64, found an annoying set of problems,
> > all stemming from the fact that the vgic can be destroyed in parallel
> > with the rest of the guest still being live.
> > 
> > Yes, this is annoying.
> > 
> > Fixing this is not going to happen overnight (though I have some
> > ideas), but we can make what we have today a bit more robust.
> > 
> > This is what patch #2 is doing. Patch #1 is just removing a loud
> > WARN_ON() that serves little purpose, and patch #3 fixes the actual
> > bug that Alex reported.
> > 
> > Hopefully, none of that is controversial...
> 
> I'm a bit grumbly about slapping bandaids on the problem, but given the
> fact that glider reported all of this a while ago and we still haven't
> fixed it is enough to justify these patches. So:

Yeah, same here. I'm starting to think that we need to either prevent
the vgic from being asynchronously destroyed, or start refcounting all
IRQs just like LPIs. Which is very annoying since we don't have a
global namespace for SGIs and PPIs.

But maybe simply refcounting the vgic itself would be enough.
Thoughts?

> Reviewed-by: Oliver Upton <oliver.upton@linux.dev>

Thanks,

	M.

Oliver Upton Feb. 7, 2025, 6:50 p.m. UTC | #3

On Fri, Feb 07, 2025 at 06:10:49PM +0000, Marc Zyngier wrote:
> On Fri, 07 Feb 2025 18:03:55 +0000,
> Oliver Upton <oliver.upton@linux.dev> wrote:
> > 
> > On Thu, Feb 06, 2025 at 03:20:57PM +0000, Marc Zyngier wrote:
> > > Alexander, while fuzzing KVM/arm64, found an annoying set of problems,
> > > all stemming from the fact that the vgic can be destroyed in parallel
> > > with the rest of the guest still being live.
> > > 
> > > Yes, this is annoying.
> > > 
> > > Fixing this is not going to happen overnight (though I have some
> > > ideas), but we can make what we have today a bit more robust.
> > > 
> > > This is what patch #2 is doing. Patch #1 is just removing a loud
> > > WARN_ON() that serves little purpose, and patch #3 fixes the actual
> > > bug that Alex reported.
> > > 
> > > Hopefully, none of that is controversial...
> > 
> > I'm a bit grumbly about slapping bandaids on the problem, but given the
> > fact that glider reported all of this a while ago and we still haven't
> > fixed it is enough to justify these patches. So:
> 
> Yeah, same here. I'm starting to think that we need to either prevent
> the vgic from being asynchronously destroyed, or start refcounting all
> IRQs just like LPIs. Which is very annoying since we don't have a
> global namespace for SGIs and PPIs.
> 
> But maybe simply refcounting the vgic itself would be enough.
> Thoughts?

So would we refcount on the owning structure for a particular IRQ? i.e.
private IRQs are counted against the owning vCPU and SPIs against the
distributor?

Adding a vgic_put_vcpu_irq() could help disambiguate private IRQs too.

Marc Zyngier Feb. 8, 2025, 3:15 p.m. UTC | #4

On Fri, 07 Feb 2025 18:50:32 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Fri, Feb 07, 2025 at 06:10:49PM +0000, Marc Zyngier wrote:
> > On Fri, 07 Feb 2025 18:03:55 +0000,
> > Oliver Upton <oliver.upton@linux.dev> wrote:
> > > 
> > > On Thu, Feb 06, 2025 at 03:20:57PM +0000, Marc Zyngier wrote:
> > > > Alexander, while fuzzing KVM/arm64, found an annoying set of problems,
> > > > all stemming from the fact that the vgic can be destroyed in parallel
> > > > with the rest of the guest still being live.
> > > > 
> > > > Yes, this is annoying.
> > > > 
> > > > Fixing this is not going to happen overnight (though I have some
> > > > ideas), but we can make what we have today a bit more robust.
> > > > 
> > > > This is what patch #2 is doing. Patch #1 is just removing a loud
> > > > WARN_ON() that serves little purpose, and patch #3 fixes the actual
> > > > bug that Alex reported.
> > > > 
> > > > Hopefully, none of that is controversial...
> > > 
> > > I'm a bit grumbly about slapping bandaids on the problem, but given the
> > > fact that glider reported all of this a while ago and we still haven't
> > > fixed it is enough to justify these patches. So:
> > 
> > Yeah, same here. I'm starting to think that we need to either prevent
> > the vgic from being asynchronously destroyed, or start refcounting all
> > IRQs just like LPIs. Which is very annoying since we don't have a
> > global namespace for SGIs and PPIs.
> > 
> > But maybe simply refcounting the vgic itself would be enough.
> > Thoughts?
> 
> So would we refcount on the owning structure for a particular IRQ? i.e.
> private IRQs are counted against the owning vCPU and SPIs against the
> distributor?
> 
> Adding a vgic_put_vcpu_irq() could help disambiguate private IRQs too.

Having slept (yay!) on it, I don't think this is enough.

The observation is that a lot of the issues we have are triggered by
the private interrupt array being allocated on demand on VGIC
creation. We had way fewer problems when the private interrupts were
directly part of struct vcpu, which really was hiding the basic
problem.

We have an annoying discrepancy between knowing that a GIC is present
(VM-wide state) and the interrupts for a vcpu being allocated (vcpu
state). This is what Alexander's fuzzing is all about (create a vcpu,
init it, and while this is happening, create/destroy a vgic).

So it isn't just destruction, but also creation that is problematic,
and I'm not sure that refcounting is the correct solution anymore.

Now, for the issue that this series is trying to fix, I came up with
an alternative approach, based on the above epiphany. We currently
allocate private interrupts in two locations:

- at vcpu creation time *if* a vgic has already been created

- from vgic_init(), which is itself called on:

  - KVM_DEV_ARM_VGIC_CTRL_INIT for the modern stuff,

  - or lazily for anything old and specific to GICv2 (huh).

Crucially, there is a gaping window between the creation of a vgic
*after* a vcpu has been created, and its initialisation. Anything that
checks for irqchip_in_kernel() during that interval may assume that
private interrupts are allocated, because this is what it used to be.
Waiting for vgic_init() to mop-up the "early" vcpus is what creates
the issue.

Given that, the solution is obvious: hoist the early vcpu mop-up from
vgic_init() into kvm_vgic_create(). This guarantees that any vcpu
observing irqchip_in_kernel() being true will also see its private
interrupts.

The patch below fixes it for me (tested with kvmtool, qemu, and Alex's
tickling reproducer).

Thoughts?

	M.

From c150c2300d5b8918909ad46a769aae864a58025f Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz@kernel.org>
Date: Sat, 8 Feb 2025 14:58:43 +0000
Subject: [PATCH] KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to
 kvm_create_vgic()

If userspace creates vcpus, then a vgic, we end-up in a situation
where irqchip_in_kernel() will return true, but no private interrupt
has been allocated for these vcpus. This situation will continue
until userspace initialises the vgic, at which point we fix the
early vcpus. Should a vcpu run or be initialised in the interval,
bad things may happen.

An obvious solution is to move this fix-up phase to the point where
the vgic is created. This ensures that from that point onwards,
all vcpus have their private interrupts, as new vcpus will directly
allocate them.

With that, we have the invariant that when irqchip_in_kernel() is
true, all vcpus have their private interrupts.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-init.c | 74 ++++++++++++++++-----------------
 1 file changed, 37 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index c7a82bd0c276d..1f33e71c2a731 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -34,9 +34,9 @@
  *
  * CPU Interface:
  *
- * - kvm_vgic_vcpu_init(): initialization of static data that
- *   doesn't depend on any sizing information or emulation type. No
- *   allocation is allowed there.
+ * - kvm_vgic_vcpu_init(): initialization of static data that doesn't depend
+ *   on any sizing information. Private interrupts are allocated if not
+ *   already allocated at vgic-creation time.
  */
 
 /* EARLY INIT */
@@ -58,6 +58,8 @@ void kvm_vgic_early_init(struct kvm *kvm)
 
 /* CREATION */
 
+static int vgic_allocate_private_irqs_locked(struct kvm_vcpu *vcpu, u32 type);
+
 /**
  * kvm_vgic_create: triggered by the instantiation of the VGIC device by
  * user space, either through the legacy KVM_CREATE_IRQCHIP ioctl (v2 only)
@@ -112,6 +114,22 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 		goto out_unlock;
 	}
 
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		ret = vgic_allocate_private_irqs_locked(vcpu, type);
+		if (ret)
+			break;
+	}
+
+	if (ret) {
+		kvm_for_each_vcpu(i, vcpu, kvm) {
+			struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+			kfree(vgic_cpu->private_irqs);
+			vgic_cpu->private_irqs = NULL;
+		}
+
+		goto out_unlock;
+	}
+
 	kvm->arch.vgic.in_kernel = true;
 	kvm->arch.vgic.vgic_model = type;
 
@@ -201,7 +219,7 @@ int kvm_vgic_vcpu_nv_init(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
-static int vgic_allocate_private_irqs_locked(struct kvm_vcpu *vcpu)
+static int vgic_allocate_private_irqs_locked(struct kvm_vcpu *vcpu, u32 type)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	int i;
@@ -239,17 +257,28 @@ static int vgic_allocate_private_irqs_locked(struct kvm_vcpu *vcpu)
 			/* PPIs */
 			irq->config = VGIC_CONFIG_LEVEL;
 		}
+
+		switch (type) {
+		case KVM_DEV_TYPE_ARM_VGIC_V3:
+			irq->group = 1;
+			irq->mpidr = kvm_vcpu_get_mpidr_aff(vcpu);
+			break;
+		case KVM_DEV_TYPE_ARM_VGIC_V2:
+			irq->group = 0;
+			irq->targets = BIT(vcpu->vcpu_id);
+			break;
+		}
 	}
 
 	return 0;
 }
 
-static int vgic_allocate_private_irqs(struct kvm_vcpu *vcpu)
+static int vgic_allocate_private_irqs(struct kvm_vcpu *vcpu, u32 type)
 {
 	int ret;
 
 	mutex_lock(&vcpu->kvm->arch.config_lock);
-	ret = vgic_allocate_private_irqs_locked(vcpu);
+	ret = vgic_allocate_private_irqs_locked(vcpu, type);
 	mutex_unlock(&vcpu->kvm->arch.config_lock);
 
 	return ret;
@@ -279,7 +308,7 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
 	if (!irqchip_in_kernel(vcpu->kvm))
 		return 0;
 
-	ret = vgic_allocate_private_irqs(vcpu);
+	ret = vgic_allocate_private_irqs(vcpu, dist->vgic_model);
 	if (ret)
 		return ret;
 
@@ -316,7 +345,7 @@ int vgic_init(struct kvm *kvm)
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
 	struct kvm_vcpu *vcpu;
-	int ret = 0, i;
+	int ret = 0;
 	unsigned long idx;
 
 	lockdep_assert_held(&kvm->arch.config_lock);
@@ -336,35 +365,6 @@ int vgic_init(struct kvm *kvm)
 	if (ret)
 		goto out;
 
-	/* Initialize groups on CPUs created before the VGIC type was known */
-	kvm_for_each_vcpu(idx, vcpu, kvm) {
-		ret = vgic_allocate_private_irqs_locked(vcpu);
-		if (ret)
-			goto out;
-
-		for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++) {
-			struct vgic_irq *irq = vgic_get_vcpu_irq(vcpu, i);
-
-			switch (dist->vgic_model) {
-			case KVM_DEV_TYPE_ARM_VGIC_V3:
-				irq->group = 1;
-				irq->mpidr = kvm_vcpu_get_mpidr_aff(vcpu);
-				break;
-			case KVM_DEV_TYPE_ARM_VGIC_V2:
-				irq->group = 0;
-				irq->targets = 1U << idx;
-				break;
-			default:
-				ret = -EINVAL;
-			}
-
-			vgic_put_irq(kvm, irq);
-
-			if (ret)
-				goto out;
-		}
-	}
-
 	/*
 	 * If we have GICv4.1 enabled, unconditionally request enable the
 	 * v4 support so that we get HW-accelerated vSGIs. Otherwise, only