diff mbox

[v5,1/2] arm/arm64: KVM: Detect vGIC presence at runtime

Message ID f759b4ba37d4cacdc971d2a416cdc8ecf0c3c3d9.1448874023.git.p.fedin@samsung.com (mailing list archive)
State New, archived
Headers show

Commit Message

Pavel Fedin Nov. 30, 2015, 9:40 a.m. UTC
Before commit 662d9715840aef44dcb573b0f9fab9e8319c868a
("arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER}") is was possible to
compile the kernel without vGIC and vTimer support. Commit message says
about possibility to detect vGIC support in runtime, but this has never
been implemented.

This patch introdices runtime check, restoring the lost functionality.
It again allows to use KVM on hardware without vGIC. Interrupt
controller has to be emulated in userspace in this case.

-ENODEV return code from probe function means there's no GIC at all.
-ENXIO happens when, for example, there is GIC node in the device tree,
but it does not specify vGIC resources. Normally this means that vGIC
hardware is defunct. Any other error code is still treated as full stop
because it might mean some really serious problems.

This patch does not touch any virtual timer code, suggesting that timer
hardware is actually in place. Normally on boards in question it is true,
however since vGIC is missing, it is impossible to correctly utilize
interrupts from the virtual timer. Since virtual timer handling is in
active redevelopment now, handling in it userspace is out of scope at
the moment. The guest is currently suggested to use some memory-mapped
timer which can be emulated in userspace.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
---
 arch/arm/kvm/arm.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

Comments

Alexander Graf April 21, 2016, 9:41 p.m. UTC | #1
On 30.11.15 10:40, Pavel Fedin wrote:
> Before commit 662d9715840aef44dcb573b0f9fab9e8319c868a
> ("arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER}") is was possible to
> compile the kernel without vGIC and vTimer support. Commit message says
> about possibility to detect vGIC support in runtime, but this has never
> been implemented.
> 
> This patch introdices runtime check, restoring the lost functionality.
> It again allows to use KVM on hardware without vGIC. Interrupt
> controller has to be emulated in userspace in this case.
> 
> -ENODEV return code from probe function means there's no GIC at all.
> -ENXIO happens when, for example, there is GIC node in the device tree,
> but it does not specify vGIC resources. Normally this means that vGIC
> hardware is defunct. Any other error code is still treated as full stop
> because it might mean some really serious problems.
> 
> This patch does not touch any virtual timer code, suggesting that timer
> hardware is actually in place. Normally on boards in question it is true,
> however since vGIC is missing, it is impossible to correctly utilize
> interrupts from the virtual timer. Since virtual timer handling is in

So effectively all we'd need is to set CNTHCTL_EL2.EL1PCEN to 0 for
guests that have no in-kernel irqchip, no? We should then trap on all
timer accesses and be able to emulate them in user space where we can
inject IRQs into an emulated GIC again.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Maydell April 21, 2016, 10:04 p.m. UTC | #2
On 21 April 2016 at 22:41, Alexander Graf <agraf@suse.de> wrote:
> So effectively all we'd need is to set CNTHCTL_EL2.EL1PCEN to 0 for
> guests that have no in-kernel irqchip, no? We should then trap on all
> timer accesses and be able to emulate them in user space where we can
> inject IRQs into an emulated GIC again.

You'd still need to define a new userspace ABI for "we stopped
for a system register trap" and the userspace code to emulate
the timers as part of KVM rather than as part of TCG, which seems
like a lot of effort for a mode that you really don't want to be
using...

For GICv3 it gets trickier again because the interface between
the interrupt controller and the CPU is no longer as simple
as "an FIQ line and an IRQ line". (In particular the interrupt
controller needs to know the CPU's current exception level and
security state to know if it should raise IRQ or FIQ, which means
that it needs to be told every time the CPU changes EL. I haven't
yet figured out if I should model this in the QEMU emulated GICv3
by having a backdoor callback function for this or by biting
the bullet and putting the GICv3 cpu interface really in the
CPU with a properly modelled equivalent of the stream protocol...)

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf April 21, 2016, 10:35 p.m. UTC | #3
On 22.04.16 00:04, Peter Maydell wrote:
> On 21 April 2016 at 22:41, Alexander Graf <agraf@suse.de> wrote:
>> So effectively all we'd need is to set CNTHCTL_EL2.EL1PCEN to 0 for
>> guests that have no in-kernel irqchip, no? We should then trap on all
>> timer accesses and be able to emulate them in user space where we can
>> inject IRQs into an emulated GIC again.
> 
> You'd still need to define a new userspace ABI for "we stopped
> for a system register trap" and the userspace code to emulate
> the timers as part of KVM rather than as part of TCG, which seems
> like a lot of effort for a mode that you really don't want to be
> using...

It might make sense to have a generic "system register couldn't get
handled" exit code anyway. If nothing else, at least to display
unhandled registers in the qemu context where they appear rather than in
the kernel log (which a guest could deliberately clutter as it stands
today).

With such an interface in place, the rest would only be a few lines of glue.

> For GICv3 it gets trickier again because the interface between
> the interrupt controller and the CPU is no longer as simple
> as "an FIQ line and an IRQ line". (In particular the interrupt
> controller needs to know the CPU's current exception level and
> security state to know if it should raise IRQ or FIQ, which means
> that it needs to be told every time the CPU changes EL. I haven't
> yet figured out if I should model this in the QEMU emulated GICv3
> by having a backdoor callback function for this or by biting
> the bullet and putting the GICv3 cpu interface really in the
> CPU with a properly modelled equivalent of the stream protocol...)

We moved the lapic into target-i386 as well, no? Given that it really is
very close to the cpu these days it might not be a bad idea.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Maydell April 21, 2016, 10:41 p.m. UTC | #4
On 21 April 2016 at 23:35, Alexander Graf <agraf@suse.de> wrote:
> It might make sense to have a generic "system register couldn't get
> handled" exit code anyway. If nothing else, at least to display
> unhandled registers in the qemu context where they appear rather than in
> the kernel log (which a guest could deliberately clutter as it stands
> today).

Unhandled sysregs UNDEF so they get logged by the guest if anywhere,
I think.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier April 22, 2016, 7:50 a.m. UTC | #5
On 21/04/16 22:41, Alexander Graf wrote:
> 
> 
> On 30.11.15 10:40, Pavel Fedin wrote:
>> Before commit 662d9715840aef44dcb573b0f9fab9e8319c868a
>> ("arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER}") is was possible to
>> compile the kernel without vGIC and vTimer support. Commit message says
>> about possibility to detect vGIC support in runtime, but this has never
>> been implemented.
>>
>> This patch introdices runtime check, restoring the lost functionality.
>> It again allows to use KVM on hardware without vGIC. Interrupt
>> controller has to be emulated in userspace in this case.
>>
>> -ENODEV return code from probe function means there's no GIC at all.
>> -ENXIO happens when, for example, there is GIC node in the device tree,
>> but it does not specify vGIC resources. Normally this means that vGIC
>> hardware is defunct. Any other error code is still treated as full stop
>> because it might mean some really serious problems.
>>
>> This patch does not touch any virtual timer code, suggesting that timer
>> hardware is actually in place. Normally on boards in question it is true,
>> however since vGIC is missing, it is impossible to correctly utilize
>> interrupts from the virtual timer. Since virtual timer handling is in
> 
> So effectively all we'd need is to set CNTHCTL_EL2.EL1PCEN to 0 for
> guests that have no in-kernel irqchip, no? We should then trap on all
> timer accesses and be able to emulate them in user space where we can
> inject IRQs into an emulated GIC again.

That's for the counter. The timer is already trapped.

That very nice, until you realize that Linux guests use the virtual
timer, not the physical one. Yes, you can hack that. And at that point,
you might as well have a different timer altogether, which solves the
problem entirely, without having to forward anything to userspace.

Thanks,

	M.
diff mbox

Patch

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index eab83b2..d581756 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -61,6 +61,8 @@  static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
 static u8 kvm_next_vmid;
 static DEFINE_SPINLOCK(kvm_vmid_lock);
 
+static bool vgic_present;
+
 static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
 {
 	BUG_ON(preemptible());
@@ -132,7 +134,8 @@  int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm->arch.vmid_gen = 0;
 
 	/* The maximum number of VCPUs is limited by the host's GIC model */
-	kvm->arch.max_vcpus = kvm_vgic_get_max_vcpus();
+	kvm->arch.max_vcpus = vgic_present ?
+				kvm_vgic_get_max_vcpus() : KVM_MAX_VCPUS;
 
 	return ret;
 out_free_stage2_pgd:
@@ -172,6 +175,8 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	int r;
 	switch (ext) {
 	case KVM_CAP_IRQCHIP:
+		r = vgic_present;
+		break;
 	case KVM_CAP_IOEVENTFD:
 	case KVM_CAP_DEVICE_CTRL:
 	case KVM_CAP_USER_MEMORY:
@@ -918,6 +923,8 @@  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 
 	switch (dev_id) {
 	case KVM_ARM_DEVICE_VGIC_V2:
+		if (!vgic_present)
+			return -ENXIO;
 		return kvm_vgic_addr(kvm, type, &dev_addr->addr, true);
 	default:
 		return -ENODEV;
@@ -932,6 +939,8 @@  long kvm_arch_vm_ioctl(struct file *filp,
 
 	switch (ioctl) {
 	case KVM_CREATE_IRQCHIP: {
+		if (!vgic_present)
+			return -ENXIO;
 		return kvm_vgic_create(kvm, KVM_DEV_TYPE_ARM_VGIC_V2);
 	}
 	case KVM_ARM_SET_DEVICE_ADDR: {
@@ -1116,8 +1125,17 @@  static int init_hyp_mode(void)
 	 * Init HYP view of VGIC
 	 */
 	err = kvm_vgic_hyp_init();
-	if (err)
+	switch (err) {
+	case 0:
+		vgic_present = true;
+		break;
+	case -ENODEV:
+	case -ENXIO:
+		vgic_present = false;
+		break;
+	default:
 		goto out_free_context;
+	}
 
 	/*
 	 * Init HYP architected timer support