[RFC,03/13] kvm: Add XO memslot type
diff mbox series

Message ID 20191003212400.31130-4-rick.p.edgecombe@intel.com
State New
Headers show
Series
  • XOM for KVM guest userspace
Related show

Commit Message

Edgecombe, Rick P Oct. 3, 2019, 9:23 p.m. UTC
Add XO memslot type to create execute-only guest physical memory based on
the RO memslot. Like the RO memslot, disallow changing the memslot type
to/from XO.

In the EPT case ACC_USER_MASK represents the readable bit, so add the
ability for set_spte() to unset this.

This is based in part on a patch by Yu Zhang.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/kvm/mmu.c             |  9 ++++++++-
 include/uapi/linux/kvm.h       |  1 +
 tools/include/uapi/linux/kvm.h |  1 +
 virt/kvm/kvm_main.c            | 15 ++++++++++++++-
 4 files changed, 24 insertions(+), 2 deletions(-)

Comments

Paolo Bonzini Oct. 4, 2019, 7:27 a.m. UTC | #1
On 03/10/19 23:23, Rick Edgecombe wrote:
> Add XO memslot type to create execute-only guest physical memory based on
> the RO memslot. Like the RO memslot, disallow changing the memslot type
> to/from XO.
> 
> In the EPT case ACC_USER_MASK represents the readable bit, so add the
> ability for set_spte() to unset this.
> 
> This is based in part on a patch by Yu Zhang.
> 
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

Instead of this, why not check the exit qualification gpa and, if it has
the XO bit set, mask away both the XO bit and the R bit?  It can be done
unconditionally for all memslots.  This should require no change to
userspace.

Paolo
Edgecombe, Rick P Oct. 4, 2019, 7:06 p.m. UTC | #2
On Fri, 2019-10-04 at 09:27 +0200, Paolo Bonzini wrote:
> On 03/10/19 23:23, Rick Edgecombe wrote:
> > Add XO memslot type to create execute-only guest physical memory based on
> > the RO memslot. Like the RO memslot, disallow changing the memslot type
> > to/from XO.
> > 
> > In the EPT case ACC_USER_MASK represents the readable bit, so add the
> > ability for set_spte() to unset this.
> > 
> > This is based in part on a patch by Yu Zhang.
> > 
> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> 
> Instead of this, why not check the exit qualification gpa and, if it has
> the XO bit set, mask away both the XO bit and the R bit?  It can be done
> unconditionally for all memslots.  This should require no change to
> userspace.
> 
> Paolo
> 
The reasoning was that it seems like KVM leaves it to userspace to control the
physical address space layout since userspace decides the supported physical
address bits and lays out memory in the physical address space. So duplication
with XO memslots was an attempt was to keep the logic around that together.

I'll take another look at doing it this way though. I think userspace may still
need to adjust the MAXPHYADDR and be aware it can't layout memory in the XO
range.

Thanks,

Rick
Paolo Bonzini Oct. 6, 2019, 4:15 p.m. UTC | #3
On 04/10/19 21:06, Edgecombe, Rick P wrote:
> The reasoning was that it seems like KVM leaves it to userspace to control the
> physical address space layout since userspace decides the supported physical
> address bits and lays out memory in the physical address space. So duplication
> with XO memslots was an attempt was to keep the logic around that together.
> 
> I'll take another look at doing it this way though. I think userspace may still
> need to adjust the MAXPHYADDR and be aware it can't layout memory in the XO
> range.

Right, you would have to use KVM_ENABLE_CAP passing the desired X bit
(which must be < MAXPHYADDR) as the argument.  Userspace needs to know
that it must then make MAXPHYADDR in the guest CPUID equal to the
argument.  When the MSR is written to 1, bit "MAXPHYADDR-1" in the page
table entries becomes an XO bit.

Paolo

Patch
diff mbox series

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e44a8053af78..338cc64cc821 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2981,6 +2981,8 @@  static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 
 	if (pte_access & ACC_USER_MASK)
 		spte |= shadow_user_mask;
+	else
+		spte &= ~shadow_user_mask;
 
 	if (level > PT_PAGE_TABLE_LEVEL)
 		spte |= PT_PAGE_SIZE_MASK;
@@ -3203,6 +3205,11 @@  static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
 	int ret;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	gfn_t base_gfn = gfn;
+	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+	unsigned int pte_access = ACC_ALL;
+
+	if (slot && slot->flags & KVM_MEM_EXECONLY)
+		pte_access = ACC_EXEC_MASK;
 
 	if (!VALID_PAGE(vcpu->arch.mmu->root_hpa))
 		return RET_PF_RETRY;
@@ -3222,7 +3229,7 @@  static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
 		}
 	}
 
-	ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
+	ret = mmu_set_spte(vcpu, it.sptep, pte_access,
 			   write, level, base_gfn, pfn, prefault,
 			   map_writable);
 	direct_pte_prefetch(vcpu, it.sptep);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5e3f12d5359e..ede487b7b216 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -109,6 +109,7 @@  struct kvm_userspace_memory_region {
  */
 #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
 #define KVM_MEM_READONLY	(1UL << 1)
+#define KVM_MEM_EXECONLY	(1UL << 2)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index 5e3f12d5359e..ede487b7b216 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -109,6 +109,7 @@  struct kvm_userspace_memory_region {
  */
 #define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
 #define KVM_MEM_READONLY	(1UL << 1)
+#define KVM_MEM_EXECONLY	(1UL << 2)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c6a91b044d8d..65087c1d67be 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -865,6 +865,8 @@  static int check_memory_region_flags(const struct kvm_userspace_memory_region *m
 	valid_flags |= KVM_MEM_READONLY;
 #endif
 
+	valid_flags |= KVM_MEM_EXECONLY;
+
 	if (mem->flags & ~valid_flags)
 		return -EINVAL;
 
@@ -969,9 +971,12 @@  int __kvm_set_memory_region(struct kvm *kvm,
 		if (!old.npages)
 			change = KVM_MR_CREATE;
 		else { /* Modify an existing slot. */
+			const __u8 changeable = KVM_MEM_READONLY
+					       | KVM_MEM_EXECONLY;
+
 			if ((mem->userspace_addr != old.userspace_addr) ||
 			    (npages != old.npages) ||
-			    ((new.flags ^ old.flags) & KVM_MEM_READONLY))
+			    ((new.flags ^ old.flags) & changeable))
 				goto out;
 
 			if (base_gfn != old.base_gfn)
@@ -1356,6 +1361,11 @@  static bool memslot_is_readonly(struct kvm_memory_slot *slot)
 	return slot->flags & KVM_MEM_READONLY;
 }
 
+static bool memslot_is_execonly(struct kvm_memory_slot *slot)
+{
+	return slot->flags & KVM_MEM_EXECONLY;
+}
+
 static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
 				       gfn_t *nr_pages, bool write)
 {
@@ -1365,6 +1375,9 @@  static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
 	if (memslot_is_readonly(slot) && write)
 		return KVM_HVA_ERR_RO_BAD;
 
+	if (memslot_is_execonly(slot) && write)
+		return KVM_HVA_ERR_RO_BAD;
+
 	if (nr_pages)
 		*nr_pages = slot->npages - (gfn - slot->base_gfn);