diff mbox

GET_SUPPORTED_CPUID and irqchip (was Re: kvm_pv_unhalt and kernel_irqchip=off)

Message ID 20160815175059.GA12532@potion (mailing list archive)
State New, archived
Headers show

Commit Message

Radim Krčmář Aug. 15, 2016, 5:50 p.m. UTC
2016-08-15 10:03-0300, Eduardo Habkost:
> On Sat, Aug 13, 2016 at 02:43:03PM +0200, Radim Krčmář wrote:
>> I would like to return -EINVAL from KVM_SET_CPUID2 if userspace
>> requested a new CPUID feature that cannot work in given situation.
>> 
>> Another way would be to disable buggy features in KVM_SET_CPUID2, which
>> would require userspace to call KVM_GET_CPUID2 afterwards to learn what
>> the guest is actually using.
> 
> I'd prefer to get -EINVAL. QEMU needs to be 100% aware of the
> resulting CPUID bits, so if we end up trying to enable something
> that will never work, it's already too late to silently clear
> CPUID bits.
> 
> This shouldn't make any difference for existing QEMU code,
> because it already filters out all CPUID feature flags based
> on GET_SUPPORTED_CPUID (+ the extra fixups in
> QEMU's kvm_arch_get_supported_cpuid()).
> 
> But: note that returning -EINVAL might break very old QEMU
> versions. Maybe we really want them to break (because they have
> broken CPUID logic), but I am not sure.

Returning -EINVAL would break even current QEMU users, because they
might have '-cpu host' with a guest that doesn't use PV_UNHALT.

-EINVAL without any interface changes could be done only for future
features. :/

>> I have patches that implement the latter for X2APIC and PV_UNHALT, but
>> I'm not sure if it's better than leaving the bug unfixed, because QEMU
>> doesn't use KVM_GET_CPUID2 and migration to older KVM would change
>> CPUID, which is a very subtle bug.
>> What do you think?
> 
> Implementing those checks basically mean moving the existing
> kvm_arch_get_supported_cpuid() logic inside KVM. If we do that,
> can we get an interface that will allow QEMU to just query KVM
> instead of duplicating that logic?

The QEMU code for old kernels likely has to stay so only future features
wouldn't be duplicated.

An interface to check what KVM disabled already exists, KVM_GET_CPUID2.
We'd be hijacking it for "SET -> GET -> compare" where QEMU would exit()
if features got cleared (and/or added?).

It an extension of existing practice -- new features would be notified
with a capability and SET+GET would check if they really can be enabled.

We'd need a toggle (a way for QEMU to tell KVM that it will GET and
compare) to allow KVM safely clear already released features.
e.g. migration wouldn't enable this feature.

The toggle for SET+GET would be very simple in the kernel, but might
turn out to be complicated in QEMU ...

See the rough idea at the bottom.

> Making KVM_SET_CPUID2 clear those bits and/or return -EINVAL
> would probably be enough as an interface, as long as QEMU has a
> way to know in advance if it needs to clear CPUID bits based on
> the legacy kvm_arch_get_supported_cpuid() code (for older
> kernels), or if it can just give everything to KVM_SET_CPUID2,
> and check for unsupported/cleared flags later (for newer
> kernels).

A new IOCTL that returns valid-but-not-in-supported and
invalid-from-supported CPUIDs and forces KVM to -EINVAL if userspace
doesn't listen is in the game too.  It is more complicated, though.

---8<---
Rough (doesn't even define everything) idea of the clearing approach.

---
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eduardo Habkost Aug. 15, 2016, 6 p.m. UTC | #1
On Mon, Aug 15, 2016 at 07:50:59PM +0200, Radim Krčmář wrote:
[...]
> +static void kvm_clear_invalid_cpuid(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_cpuid_entry2 *best;
> +
> +	if (!kvm->clear_invalid_cpuid)
> +		return;
> +
> +	best = kvm_find_cpuid_entry(vcpu, 1, 0);
> +	if (best && !lapic_in_kernel(vcpu))
> +		best->ecx &= ~F(X2APIC);

Isn't it possible to implement x2apic support in userspace APIC
using the current KVM interfaces?
Radim Krčmář Aug. 15, 2016, 6:32 p.m. UTC | #2
2016-08-15 15:00-0300, Eduardo Habkost:
> On Mon, Aug 15, 2016 at 07:50:59PM +0200, Radim Krčmář wrote:
> [...]
>> +static void kvm_clear_invalid_cpuid(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvm *kvm = vcpu->kvm;
>> +	struct kvm_cpuid_entry2 *best;
>> +
>> +	if (!kvm->clear_invalid_cpuid)
>> +		return;
>> +
>> +	best = kvm_find_cpuid_entry(vcpu, 1, 0);
>> +	if (best && !lapic_in_kernel(vcpu))
>> +		best->ecx &= ~F(X2APIC);
> 
> Isn't it possible to implement x2apic support in userspace APIC
> using the current KVM interfaces?

No, MSR accesses never get to userspace.  We'll probably implement it in
the future as there already has been a proposal for unhandled MSRs.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 739db9ab16b2..8d085823c4af 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -545,6 +545,9 @@  struct kvm_cpuid {
 	struct kvm_cpuid_entry entries[0];
 };
 
+If KVM_CAP_CLEAR_INVALID_CPUID is enabled, then the ioctls can modify CPUID
+bits to prevent invalid configurations.
+
 
 4.21 KVM_SET_SIGNAL_MASK
 
@@ -3900,6 +3903,18 @@  to take care of that.
 This capability can be enabled dynamically even if VCPUs were already
 created and are running.
 
+7.9 KVM_CAP_CLEAR_INVALID_CPUID
+
+Architectures: x86
+Parameters: none
+
+After setting this capability, KVM_SET_CPUID2 can modify CPUID bits that are
+invalid in the given configuration, e.g. disable KVM_FEATURE_PV_UNHALT when the
+LAPIC is in the userspace.
+
+The userspace is expected to use KVM_GET_CPUID2 to retrieve the final CPUID.
+
+
 8. Other capabilities.
 ----------------------
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb596cd..afa05bb31cc3 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -160,6 +160,21 @@  static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu)
 	}
 }
 
+static void kvm_clear_invalid_cpuid(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_cpuid_entry2 *best;
+
+	if (!kvm->clear_invalid_cpuid)
+		return;
+
+	best = kvm_find_cpuid_entry(vcpu, 1, 0);
+	if (best && !lapic_in_kernel(vcpu))
+		best->ecx &= ~F(X2APIC);
+
+	/* TODO: to disable KVM_FEATURE_PV_UNHALT */
+}
+
 int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
@@ -234,6 +249,8 @@  int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu *vcpu,
 			   cpuid->nent * sizeof(struct kvm_cpuid_entry2)))
 		goto out;
 	vcpu->arch.cpuid_nent = cpuid->nent;
+
+	kvm_clear_invalid_cpuid(vcpu);
 	kvm_apic_set_version(vcpu);
 	kvm_x86_ops->cpuid_update(vcpu);
 	r = kvm_update_cpuid(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19f9f9e05c2a..99028df615e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3812,6 +3812,13 @@  split_irqchip_unlock:
 
 		r = 0;
 		break;
+	case KVM_CAP_CLEAR_INVALID_CPUID:
+		r = -EEXIST;
+		if (kvm->created_vcpus)
+			break;
+		kvm->clear_invalid_cpuid = true;
+		r = 0;
+		break;
 	default:
 		r = -EINVAL;
 		break;