[6/9] KVM: MMU: introduce the framework to check reserved bits on sptes
diff mbox

Message ID 1438685961-8107-7-git-send-email-guangrong.xiao@linux.intel.com
State New
Headers show

Commit Message

Xiao Guangrong Aug. 4, 2015, 10:59 a.m. UTC
We have abstracted the data struct and functions which are used to check
reserved bit on guest page tables, now we extend the logic to check
reserved bits on shadow page tables

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu.c              | 51 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu.h              |  3 +++
 arch/x86/kvm/svm.c              |  1 +
 4 files changed, 56 insertions(+)

Comments

Paolo Bonzini Aug. 4, 2015, 12:14 p.m. UTC | #1
On 04/08/2015 12:59, Xiao Guangrong wrote:
> +/*
> + * the page table on host is the shadow page table for the page
> + * table in guest or amd nested guest, its mmu features completely
> + * follow the features in guest.
> + */
> +void
> +reset_shadow_rsvds_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
> +{
> +	__reset_rsvds_bits_mask(vcpu, &context->shadow_rsvd_check,
> +				boot_cpu_data.x86_phys_bits,
> +				context->shadow_root_level, context->nx,

This should be cpu_has_nx, I think.

> +				guest_cpuid_has_gbpages(vcpu),

This should be cpu_has_gbpages.

> is_pse(vcpu));

This should be cpu_has_pse.

Paolo

> +}
> +EXPORT_SYMBOL_GPL(reset_shadow_rsvds_bits_mask);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong Aug. 4, 2015, 1:10 p.m. UTC | #2
On 08/04/2015 08:14 PM, Paolo Bonzini wrote:
>
>
> On 04/08/2015 12:59, Xiao Guangrong wrote:
>> +/*
>> + * the page table on host is the shadow page table for the page
>> + * table in guest or amd nested guest, its mmu features completely
>> + * follow the features in guest.
>> + */
>> +void
>> +reset_shadow_rsvds_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
>> +{
>> +	__reset_rsvds_bits_mask(vcpu, &context->shadow_rsvd_check,
>> +				boot_cpu_data.x86_phys_bits,
>> +				context->shadow_root_level, context->nx,
>
> This should be cpu_has_nx, I think.

cpu_has_nx() checks the feature on host CPU, however, this is the shadow
page table which completely follow guest's features.

E.g, if guest does not execution-protect the physical page, then
KVM does not do it either.

>
>> +				guest_cpuid_has_gbpages(vcpu),
>
> This should be cpu_has_gbpages.

E.g, if guest does not use 1G page size, it's also not used in shadow page
table.

>
>> is_pse(vcpu));
>
> This should be cpu_has_pse.

E.g, guest does no use 4M page size, then KVM does not use it either.

BTW, cpu_pse only hurts 32 bit page table which is not used by shadow
page table (32 PAE and 64 Long mode are used in shadow page).

Only tdp only follows host CPU's features, KVM does not use NX to
protect the page, so i always mark it as false in
reset_tdp_shadow_rsvds_bits_mask().
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Aug. 4, 2015, 1:23 p.m. UTC | #3
On 04/08/2015 15:10, Xiao Guangrong wrote:
>>
>> This should be cpu_has_nx, I think.
> 
> cpu_has_nx() checks the feature on host CPU, however, this is the shadow
> page table which completely follow guest's features.
> 
> E.g, if guest does not execution-protect the physical page, then
> KVM does not do it either.

That's just true for current code.  In principle you could add a memslot
flag for KVM_MEMSLOT_NO_EXECUTE, then NX would be true on an spte but
not on a PTE.

>>
>>> +                guest_cpuid_has_gbpages(vcpu),
>>
>> This should be cpu_has_gbpages.
> 
> E.g, if guest does not use 1G page size, it's also not used in shadow page
> table.

However, bit 7 in the shadow PDPTE is not reserved.  If you're not
testing "is this bit reserved" but rather "should this bit be always
zero" in the SPTE, then checking guest_cpuid is okay.  But in that case
shadow_rsvd_check is really more like shadow_always_zero_check.

>>
>>> is_pse(vcpu));
>>
>> This should be cpu_has_pse.
> 
> E.g, guest does no use 4M page size, then KVM does not use it either.

Right, it should always be true, not cpu_has_pse, because PAE and 64-bit
page tables always support huge (2M) pages.  Or as above, if you're
testing "should this bit be always zero" then it's a different story.

Paolo

> BTW, cpu_pse only hurts 32 bit page table which is not used by shadow
> page table (32 PAE and 64 Long mode are used in shadow page).
> 
> Only tdp only follows host CPU's features, KVM does not use NX to
> protect the page, so i always mark it as false in
> reset_tdp_shadow_rsvds_bits_mask().
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong Aug. 4, 2015, 1:34 p.m. UTC | #4
On 08/04/2015 09:23 PM, Paolo Bonzini wrote:
>
>
> On 04/08/2015 15:10, Xiao Guangrong wrote:
>>>
>>> This should be cpu_has_nx, I think.
>>
>> cpu_has_nx() checks the feature on host CPU, however, this is the shadow
>> page table which completely follow guest's features.
>>
>> E.g, if guest does not execution-protect the physical page, then
>> KVM does not do it either.
>
> That's just true for current code.  In principle you could add a memslot
> flag for KVM_MEMSLOT_NO_EXECUTE, then NX would be true on an spte but
> not on a PTE.

Yes, i agree. I would like to keep it as strict as possible to catch
potential bugs. We can relax it while KVM_MEMSLOT_NO_EXECUTE is being
developed.

>
>>>
>>>> +                guest_cpuid_has_gbpages(vcpu),
>>>
>>> This should be cpu_has_gbpages.
>>
>> E.g, if guest does not use 1G page size, it's also not used in shadow page
>> table.
>
> However, bit 7 in the shadow PDPTE is not reserved.  If you're not
> testing "is this bit reserved" but rather "should this bit be always
> zero" in the SPTE, then checking guest_cpuid is okay.  But in that case
> shadow_rsvd_check is really more like shadow_always_zero_check.

Yes, it is not reserved in hardware's point of view. shadow_always_zero_check()
seems a more meaningful name, thanks for your suggestion. :)

>
>>>
>>>> is_pse(vcpu));
>>>
>>> This should be cpu_has_pse.
>>
>> E.g, guest does no use 4M page size, then KVM does not use it either.
>
> Right, it should always be true, not cpu_has_pse, because PAE and 64-bit
> page tables always support huge (2M) pages.  Or as above, if you're
> testing "should this bit be always zero" then it's a different story.

Yeah, i will rename the function.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3e33c0d..8356259 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -294,6 +294,7 @@  struct kvm_mmu {
 
 	u64 *pae_root;
 	u64 *lm_root;
+	struct rsvd_bits_validate shadow_rsvd_check;
 	struct rsvd_bits_validate guest_rsvd_check;
 
 	/*
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d11d212..50fbc14 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3699,6 +3699,54 @@  static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
 				    cpuid_maxphyaddr(vcpu), execonly);
 }
 
+/*
+ * the page table on host is the shadow page table for the page
+ * table in guest or amd nested guest, its mmu features completely
+ * follow the features in guest.
+ */
+void
+reset_shadow_rsvds_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
+{
+	__reset_rsvds_bits_mask(vcpu, &context->shadow_rsvd_check,
+				boot_cpu_data.x86_phys_bits,
+				context->shadow_root_level, context->nx,
+				guest_cpuid_has_gbpages(vcpu), is_pse(vcpu));
+}
+EXPORT_SYMBOL_GPL(reset_shadow_rsvds_bits_mask);
+
+/*
+ * the direct page table on host, use as much mmu features as
+ * possible, however, kvm currently does not do execution-protection.
+ */
+static void
+reset_tdp_shadow_rsvds_bits_mask(struct kvm_vcpu *vcpu,
+				 struct kvm_mmu *context)
+{
+	if (guest_cpuid_is_amd(vcpu))
+		__reset_rsvds_bits_mask(vcpu, &context->shadow_rsvd_check,
+					boot_cpu_data.x86_phys_bits,
+					context->shadow_root_level, false,
+					cpu_has_gbpages, true);
+	else
+		__reset_rsvds_bits_mask_ept(&context->shadow_rsvd_check,
+					    boot_cpu_data.x86_phys_bits,
+					    false);
+
+}
+
+/*
+ * as the comments in reset_shadow_rsvds_bits_mask() except it
+ * is the shadow page table for intel nested guest.
+ */
+static void
+reset_ept_shadow_rsvds_bits_mask(struct kvm_vcpu *vcpu,
+				 struct kvm_mmu *context,
+				 bool execonly)
+{
+	__reset_rsvds_bits_mask_ept(&context->shadow_rsvd_check,
+				    boot_cpu_data.x86_phys_bits, execonly);
+}
+
 static void update_permission_bitmask(struct kvm_vcpu *vcpu,
 				      struct kvm_mmu *mmu, bool ept)
 {
@@ -3877,6 +3925,7 @@  static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 
 	update_permission_bitmask(vcpu, context, false);
 	update_last_pte_bitmap(vcpu, context);
+	reset_tdp_shadow_rsvds_bits_mask(vcpu, context);
 }
 
 void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu)
@@ -3904,6 +3953,7 @@  void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu)
 	context->base_role.smap_andnot_wp
 		= smap && !is_write_protection(vcpu);
 	context->base_role.smm = is_smm(vcpu);
+	reset_shadow_rsvds_bits_mask(vcpu, context);
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
@@ -3927,6 +3977,7 @@  void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly)
 
 	update_permission_bitmask(vcpu, context, true);
 	reset_rsvds_bits_mask_ept(vcpu, context, execonly);
+	reset_ept_shadow_rsvds_bits_mask(vcpu, context, execonly);
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 398d21c..96563be 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -53,6 +53,9 @@  static inline u64 rsvd_bits(int s, int e)
 int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
 
+void
+reset_shadow_rsvds_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+
 /*
  * Return values of handle_mmio_page_fault_common:
  * RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the instruction
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 568cd0f..1de4685 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2107,6 +2107,7 @@  static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu.get_pdptr         = nested_svm_get_tdp_pdptr;
 	vcpu->arch.mmu.inject_page_fault = nested_svm_inject_npf_exit;
 	vcpu->arch.mmu.shadow_root_level = get_npt_level();
+	reset_shadow_rsvds_bits_mask(vcpu, &vcpu->arch.mmu);
 	vcpu->arch.walk_mmu              = &vcpu->arch.nested_mmu;
 }