Message ID | 20180719213802.17161-4-ahmedsoliman0x666@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jul 19, 2018 at 11:40 PM Ahmed Abd El Mawgood <ahmedsoliman0x666@gmail.com> wrote: > This patch introduces a hypercall implemented for X86 that can assist > against subset of kernel rootkits, it works by place readonly protection in > shadow PTE. The end result protection is also kept in a bitmap for each > kvm_memory_slot and is used as reference when updating SPTEs. The whole > goal is to protect the guest kernel static data from modification if > attacker is running from guest ring 0, for this reason there is no > hypercall to revert effect of Memory ROE hypercall. This patch doesn't > implement integrity check on guest TLB so obvious attack on the current > implementation will involve guest virtual address -> guest physical > address remapping, but there are plans to fix that. Why are you implementing this in the kernel, instead of doing it in host userspace?
On 20 July 2018 at 00:59, Jann Horn <jannh@google.com> wrote: > On Thu, Jul 19, 2018 at 11:40 PM Ahmed Abd El Mawgood > Why are you implementing this in the kernel, instead of doing it in > host userspace? I thought about implementing it completely in QEMU but It won't be possible for few reasons: - After talking to QEMU folks I came up to conclusion that it when it comes to managing memory allocated for guest, it is always better to let KVM handles everything, unless there is a good reason to play with that memory chunk inside QEMU itself. - But actually there is a good reason for implementing ROE in kernel space, it is that ROE is architecture dependent to great extent. I should have emphasized that the only currently supported architecture is X86. I am not sure how deep the dependency on architecture goes. But as for now the current set of patches does a SPTE enumeration as part of the process. To my best knowledge, this isn't exposed outside arch/x68/kvm let alone having a host user space interface for it. Also the way I am planning to protect TLB from malicious gva -> gpa mapping is by knowing that in x86 it is possible to VMEXIT on page faults, I am not sure if it will safe to assume that all kvm supported architectures will behave this way. For these reasons I thought it will be better if arch dependent stuff (the mechanism implementation) is kept in arch/*/kvm folder and with minimal modifications to virt/kvm/* after setting a kconfig variable to enable ROE. But I left room for the user space app using kvm to decide the rightful policy for handling ROE violations. The way it works by KVM_EXIT_MMIO error to user space, keeping all the architectural details hidden away from user space. A last note is that I didn't create this from scratch, instead I extended KVM_MEM_READONLY implementation to also allow R/O per page instead R/O per whole slot which is already done in kernel space.
On 07/19/2018 02:38 PM, Ahmed Abd El Mawgood wrote: > This patch introduces a hypercall implemented for X86 that can assist > against subset of kernel rootkits, it works by place readonly protection in > shadow PTE. The end result protection is also kept in a bitmap for each > kvm_memory_slot and is used as reference when updating SPTEs. The whole > goal is to protect the guest kernel static data from modification if > attacker is running from guest ring 0, for this reason there is no > hypercall to revert effect of Memory ROE hypercall. This patch doesn't > implement integrity check on guest TLB so obvious attack on the current > implementation will involve guest virtual address -> guest physical > address remapping, but there are plans to fix that. > > Signed-off-by: Ahmed Abd El Mawgood <ahmedsoliman0x666@gmail.com> > --- > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig > index 92fd433c50b9..8ae822a8dc7a 100644 > --- a/arch/x86/kvm/Kconfig > +++ b/arch/x86/kvm/Kconfig > @@ -96,6 +96,13 @@ config KVM_MMU_AUDIT > This option adds a R/W kVM module parameter 'mmu_audit', which allows > auditing of KVM MMU events at runtime. > > +config KVM_MROE > + bool "Hypercall Memory Read-Only Enforcement" > + depends on KVM && X86 > + help > + This option add KVM_HC_HMROE hypercall to kvm which as hardening adds to kvm as a hardening (???) > + mechanism to protect memory pages from being edited. > + > # OK, it's a little counter-intuitive to do this, but it puts it neatly under > # the virtualization menu. > source drivers/vhost/Kconfig
On Fri, Jul 20, 2018 at 2:26 AM Ahmed Soliman <ahmedsoliman0x666@gmail.com> wrote: > > On 20 July 2018 at 00:59, Jann Horn <jannh@google.com> wrote: > > On Thu, Jul 19, 2018 at 11:40 PM Ahmed Abd El Mawgood > > > Why are you implementing this in the kernel, instead of doing it in > > host userspace? > > I thought about implementing it completely in QEMU but It won't be > possible for few reasons: > > - After talking to QEMU folks I came up to conclusion that it when it > comes to managing memory allocated for guest, it is always better to let > KVM handles everything, unless there is a good reason to play with that > memory chunk inside QEMU itself. Why? It seems to me like it'd be easier to add a way to mprotect() guest pages to readonly via virtio or whatever in QEMU than to add kernel code? And if you ever want to support VM snapshotting/resumption, you'll need support for restoring the protection flags from QEMU anyway. > - But actually there is a good reason for implementing ROE in kernel space, > it is that ROE is architecture dependent to great extent. How so? The host component just has to make pages in guest memory readonly, right? As far as I can tell, from QEMU, it'd more or less be a matter of calling mprotect() a few times? (Plus potentially some hooks to prevent other virtio code from crashing by attempting to access protected pages - but you'd need that anyway, no matter where the protection for the guest is enforced.) > I should have > emphasized that the only currently supported architecture is X86. I am > not sure how deep the dependency on architecture goes. But as for now > the current set of patches does a SPTE enumeration as part of the process. > To my best knowledge, this isn't exposed outside arch/x68/kvm let alone > having a host user space interface for it. Also the way I am planning to > protect TLB from malicious gva -> gpa mapping is by knowing that in x86 > it is possible to VMEXIT on page faults, I am not sure if it will safe to > assume that all kvm supported architectures will behave this way. You mean EPT faults, right? If so: I think all architectures have to support that - there are already other reasons why random guest memory accesses can fault. In particular, the host can page out guest memory. I think that's the case on all architectures? > For these reasons I thought it will be better if arch dependent stuff (the > mechanism implementation) is kept in arch/*/kvm folder and with minimal > modifications to virt/kvm/* after setting a kconfig variable to enable ROE. > But I left room for the user space app using kvm to decide the rightful policy > for handling ROE violations. The way it works by KVM_EXIT_MMIO error to user > space, keeping all the architectural details hidden away from user space. > > A last note is that I didn't create this from scratch, instead I extended > KVM_MEM_READONLY implementation to also allow R/O per page instead > R/O per whole slot which is already done in kernel space. But then you still have to also do something about virtio code in QEMU that might write to those pages, right?
On 20 July 2018 at 03:28, Jann Horn <jannh@google.com> wrote: > On Fri, Jul 20, 2018 at 2:26 AM Ahmed Soliman > <ahmedsoliman0x666@gmail.com> wrote: >> >> On 20 July 2018 at 00:59, Jann Horn <jannh@google.com> wrote: >> > On Thu, Jul 19, 2018 at 11:40 PM Ahmed Abd El Mawgood >> >> > Why are you implementing this in the kernel, instead of doing it in >> > host userspace? >> >> I thought about implementing it completely in QEMU but It won't be >> possible for few reasons: >> >> - After talking to QEMU folks I came up to conclusion that it when it >> comes to managing memory allocated for guest, it is always better to let >> KVM handles everything, unless there is a good reason to play with that >> memory chunk inside QEMU itself. > > Why? It seems to me like it'd be easier to add a way to mprotect() > guest pages to readonly via virtio or whatever in QEMU than to add > kernel code? I did an early prototype with mprotect(), But then mprotect() didn't do exactly what I wanted, The goal here is to prevent the guest from writing to protected page but allow the host to do if it ever needs to at the same time. mprotect() will either allow both host and guest, or prevent both host and guest. Even though I can not come up with a use case where one might need to allow host to read/write to a page but prevent guest from writing to that page, I think that it is a limitation that will cost complete redesign if it proves that this kind of behavior is undesired. Also mprotect is kind of inflexible. Writing to mprotected pages would immediately trigger SIGSEGV and then userspace process will have to handle that fault in order to control the situation. That sounded to me more like a little hack than a solid design. > And if you ever want to support VM snapshotting/resumption, you'll > need support for restoring the protection flags from QEMU anyway. I never thought about that, but thanks for letting me know. I will keep that in my TODO list. >> - But actually there is a good reason for implementing ROE in kernel space, >> it is that ROE is architecture dependent to great extent. > > How so? The host component just has to make pages in guest memory > readonly, right? As far as I can tell, from QEMU, it'd more or less be > a matter of calling mprotect() a few times? (Plus potentially some > hooks to prevent other virtio code from crashing by attempting to > access protected pages - but you'd need that anyway, no matter where > the protection for the guest is enforced.) I don't think that virtio would crash that way, because host should be able write to memory as it wants. but yet I see where there is this going, probably I can add hooks so that virtio can respect the read only flags. >> I should have >> emphasized that the only currently supported architecture is X86. I am >> not sure how deep the dependency on architecture goes. But as for now >> the current set of patches does a SPTE enumeration as part of the process. >> To my best knowledge, this isn't exposed outside arch/x68/kvm let alone >> having a host user space interface for it. Also the way I am planning to >> protect TLB from malicious gva -> gpa mapping is by knowing that in x86 >> it is possible to VMEXIT on page faults, I am not sure if it will safe to >> assume that all kvm supported architectures will behave this way. > > You mean EPT faults, right? If so: I think all architectures have to > support that - there are already other reasons why random guest memory > accesses can fault. In particular, the host can page out guest memory. > I think that's the case on all architectures? Here my lack of full knowledge kicks in, I am not sure whether is EPT fault or guest pf is what I want to capture validate. I think X86 can vm exit on both. Due to nature of ROE, guest user space code can not have ROE because it is irreversible, so it will be safe to assume that only pages that are not swappable are the one's I would care about. still lots of the details are blurry for me. But what I was trying to say is that there is always differences based on architecture that is why it will be better to do things in kernel module if we decided not to use mprotect method. >> For these reasons I thought it will be better if arch dependent stuff (the >> mechanism implementation) is kept in arch/*/kvm folder and with minimal >> modifications to virt/kvm/* after setting a kconfig variable to enable ROE. >> But I left room for the user space app using kvm to decide the rightful policy >> for handling ROE violations. The way it works by KVM_EXIT_MMIO error to user >> space, keeping all the architectural details hidden away from user space. >> >> A last note is that I didn't create this from scratch, instead I extended >> KVM_MEM_READONLY implementation to also allow R/O per page instead >> R/O per whole slot which is already done in kernel space. > > But then you still have to also do something about virtio code in QEMU > that might write to those pages, right? Probably yes, still I haven't fully planned that yet. But I was thinking about if I can make use of IOMMU protection for DMA and have something similar for emulated devices backed by virtio.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c13cd28d9d1b..128bcfa246a3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -236,6 +236,15 @@ struct kvm_mmu_memory_cache { void *objects[KVM_NR_MEM_OBJS]; }; +/* + * This is internal structure used to be be able to access kvm memory slot and + * have track of the number of current PTE when doing shadow PTE walk + */ +struct kvm_write_access_data { + int i; + struct kvm_memory_slot *memslot; +}; + /* * the pages used as guest page table on soft mmu are tracked by * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used @@ -1130,7 +1139,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, u64 acc_track_mask, u64 me_mask); void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); -void kvm_mmu_slot_remove_write_access(struct kvm *kvm, +void kvm_mmu_slot_apply_write_access(struct kvm *kvm, struct kvm_memory_slot *memslot); void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm, const struct kvm_memory_slot *memslot); diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 92fd433c50b9..8ae822a8dc7a 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -96,6 +96,13 @@ config KVM_MMU_AUDIT This option adds a R/W kVM module parameter 'mmu_audit', which allows auditing of KVM MMU events at runtime. +config KVM_MROE + bool "Hypercall Memory Read-Only Enforcement" + depends on KVM && X86 + help + This option add KVM_HC_HMROE hypercall to kvm which as hardening + mechanism to protect memory pages from being edited. + # OK, it's a little counter-intuitive to do this, but it puts it neatly under # the virtualization menu. source drivers/vhost/Kconfig diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 77661530b2c4..4ce6a9a19a23 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1416,9 +1416,8 @@ static bool spte_write_protect(u64 *sptep, bool pt_protect) return mmu_spte_update(sptep, spte); } -static bool __rmap_write_protect(struct kvm *kvm, - struct kvm_rmap_head *rmap_head, - bool pt_protect, void *data) +static bool __rmap_write_protection(struct kvm *kvm, + struct kvm_rmap_head *rmap_head, bool pt_protect) { u64 *sptep; struct rmap_iterator iter; @@ -1430,6 +1429,38 @@ static bool __rmap_write_protect(struct kvm *kvm, return flush; } +#ifdef CONFIG_KVM_MROE +static bool __rmap_write_protect_mroe(struct kvm *kvm, + struct kvm_rmap_head *rmap_head, + bool pt_protect, + struct kvm_write_access_data *d) +{ + u64 *sptep; + struct rmap_iterator iter; + bool prot; + bool flush = false; + + for_each_rmap_spte(rmap_head, &iter, sptep) { + prot = !test_bit(d->i, d->memslot->mroe_bitmap) && pt_protect; + flush |= spte_write_protect(sptep, prot); + d->i++; + } + return flush; +} +#endif + +static bool __rmap_write_protect(struct kvm *kvm, + struct kvm_rmap_head *rmap_head, + bool pt_protect, + struct kvm_write_access_data *d) +{ +#ifdef CONFIG_KVM_MROE + if (d != NULL) + return __rmap_write_protect_mroe(kvm, rmap_head, pt_protect, d); +#endif + return __rmap_write_protection(kvm, rmap_head, pt_protect); +} + static bool spte_clear_dirty(u64 *sptep) { u64 spte = *sptep; @@ -1517,7 +1548,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, while (mask) { rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask), PT_PAGE_TABLE_LEVEL, slot); - __rmap_write_protect(kvm, rmap_head, false, NULL); + __rmap_write_protection(kvm, rmap_head, false); /* clear the first set bit */ mask &= mask - 1; @@ -1593,11 +1624,15 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, struct kvm_rmap_head *rmap_head; int i; bool write_protected = false; + struct kvm_write_access_data data = { + .i = 0, + .memslot = slot, + }; for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) { rmap_head = __gfn_to_rmap(gfn, i, slot); write_protected |= __rmap_write_protect(kvm, rmap_head, true, - NULL); + &data); } return write_protected; @@ -5190,21 +5225,36 @@ static bool slot_rmap_write_protect(struct kvm *kvm, struct kvm_rmap_head *rmap_head, void *data) { - return __rmap_write_protect(kvm, rmap_head, false, data); + return __rmap_write_protect(kvm, rmap_head, false, + (struct kvm_write_access_data *)data); } -void kvm_mmu_slot_remove_write_access(struct kvm *kvm, +static bool slot_rmap_apply_protection(struct kvm *kvm, + struct kvm_rmap_head *rmap_head, + void *data) +{ + struct kvm_write_access_data *d = (struct kvm_write_access_data *) data; + bool prot_mask = !(d->memslot->flags & KVM_MEM_READONLY); + + return __rmap_write_protect(kvm, rmap_head, prot_mask, d); +} + +void kvm_mmu_slot_apply_write_access(struct kvm *kvm, struct kvm_memory_slot *memslot) { bool flush; + struct kvm_write_access_data data = { + .i = 0, + .memslot = memslot, + }; spin_lock(&kvm->mmu_lock); - flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect, - false, NULL); + flush = slot_handle_all_level(kvm, memslot, slot_rmap_apply_protection, + false, &data); spin_unlock(&kvm->mmu_lock); /* - * kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log() + * kvm_mmu_slot_apply_write_access() and kvm_vm_ioctl_get_dirty_log() * which do tlb flush out of mmu-lock should be serialized by * kvm->slots_lock otherwise tlb flush would be missed. */ @@ -5301,7 +5351,7 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm, false, NULL); spin_unlock(&kvm->mmu_lock); - /* see kvm_mmu_slot_remove_write_access */ + /* see kvm_mmu_slot_apply_write_access*/ lockdep_assert_held(&kvm->slots_lock); if (flush) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0046aa70205a..9addc46d75be 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4177,7 +4177,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) /* * All the TLBs can be flushed out of mmu lock, see the comments in - * kvm_mmu_slot_remove_write_access(). + * kvm_mmu_slot_apply_write_access(). */ lockdep_assert_held(&kvm->slots_lock); if (is_dirty) @@ -6670,7 +6670,76 @@ static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr, } #endif -/* +#ifdef CONFIG_KVM_MROE +static int __roe_protect_frame(struct kvm *kvm, gpa_t gpa) +{ + struct kvm_memory_slot *slot; + gfn_t gfn = gpa >> PAGE_SHIFT; + + slot = gfn_to_memslot(kvm, gfn); + if (!slot || gfn > slot->base_gfn + slot->npages) + return -EINVAL; + set_bit(gfn - slot->base_gfn, slot->mroe_bitmap); + kvm_mmu_slot_apply_write_access(kvm, slot); + kvm_arch_flush_shadow_memslot(kvm, slot); + + return 0; +} + +static int roe_protect_frame(struct kvm *kvm, gpa_t gpa) +{ + int r; + + mutex_lock(&kvm->slots_lock); + r = __roe_protect_frame(kvm, gpa); + mutex_unlock(&kvm->slots_lock); + return r; +} + +static bool kvm_mroe_userspace(struct kvm_vcpu *vcpu) +{ + u64 rflags; + u64 cr0 = kvm_read_cr0(vcpu); + u64 iopl; + + // first checking we are not in protected mode + if ((cr0 & 1) == 0) + return false; + /* + * we don't need to worry about comments in __get_regs + * because we are sure that this function will only be + * triggered at the end of a hypercall + */ + rflags = kvm_get_rflags(vcpu); + iopl = (rflags >> 12) & 3; + if (iopl != 3) + return false; + return true; +} + +static int kvm_mroe(struct kvm_vcpu *vcpu, u64 gva) +{ + struct kvm *kvm = vcpu->kvm; + gpa_t gpa; + u64 hva; + + /* + * First we need to maek sure that we are running from something that + * isn't usermode + */ + if (kvm_mroe_userspace(vcpu)) + return -1;//I don't really know what to return + if (gva & ~PAGE_MASK) + return -EINVAL; + gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL); + hva = gfn_to_hva(kvm, gpa >> PAGE_SHIFT); + if (!access_ok(VERIFY_WRITE, hva, PAGE_SIZE)) + return -EINVAL; + return roe_protect_frame(vcpu->kvm, gpa); +} +#endif + + /* * kvm_pv_kick_cpu_op: Kick a vcpu. * * @apicid - apicid of vcpu to be kicked. @@ -6737,6 +6806,11 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) case KVM_HC_CLOCK_PAIRING: ret = kvm_pv_clock_pairing(vcpu, a0, a1); break; +#endif +#ifdef CONFIG_KVM_MROE + case KVM_HC_HMROE: + ret = kvm_mroe(vcpu, a0); + break; #endif default: ret = -KVM_ENOSYS; @@ -8971,8 +9045,8 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm, struct kvm_memory_slot *new) { /* Still write protect RO slot */ + kvm_mmu_slot_apply_write_access(kvm, new); if (new->flags & KVM_MEM_READONLY) { - kvm_mmu_slot_remove_write_access(kvm, new); return; } @@ -9010,7 +9084,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm, if (kvm_x86_ops->slot_enable_log_dirty) kvm_x86_ops->slot_enable_log_dirty(kvm, new); else - kvm_mmu_slot_remove_write_access(kvm, new); + kvm_mmu_slot_apply_write_access(kvm, new); } else { if (kvm_x86_ops->slot_disable_log_dirty) kvm_x86_ops->slot_disable_log_dirty(kvm, new); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4ee7bc548a83..82c5780e11d9 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -297,6 +297,9 @@ static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) struct kvm_memory_slot { gfn_t base_gfn; unsigned long npages; +#ifdef CONFIG_KVM_MROE + unsigned long *mroe_bitmap; +#endif unsigned long *dirty_bitmap; struct kvm_arch_memory_slot arch; unsigned long userspace_addr; diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h index dcf629dd2889..4e2badc09b5b 100644 --- a/include/uapi/linux/kvm_para.h +++ b/include/uapi/linux/kvm_para.h @@ -26,6 +26,7 @@ #define KVM_HC_MIPS_EXIT_VM 7 #define KVM_HC_MIPS_CONSOLE_OUTPUT 8 #define KVM_HC_CLOCK_PAIRING 9 +#define KVM_HC_HMROE 10 /* * hypercalls use architecture specific diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8b47507faab5..0f7141e4d550 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -794,6 +794,17 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot) return 0; } +static int kvm_init_mroe_bitmap(struct kvm_memory_slot *slot) +{ +#ifdef CONFIG_KVM_MROE + slot->mroe_bitmap = kvzalloc(BITS_TO_LONGS(slot->npages) * + sizeof(unsigned long), GFP_KERNEL); + if (!slot->mroe_bitmap) + return -ENOMEM; +#endif + return 0; +} + /* * Insert memslot and re-sort memslots based on their GFN, * so binary search could be used to lookup GFN. @@ -1011,6 +1022,8 @@ int __kvm_set_memory_region(struct kvm *kvm, if (kvm_create_dirty_bitmap(&new) < 0) goto out_free; } + if (kvm_init_mroe_bitmap(&new) < 0) + goto out_free; slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); if (!slots) @@ -1264,13 +1277,23 @@ static bool memslot_is_readonly(struct kvm_memory_slot *slot) return slot->flags & KVM_MEM_READONLY; } +static bool gfn_is_readonly(struct kvm_memory_slot *slot, gfn_t gfn) +{ +#ifdef CONFIG_KVM_MROE + return test_bit(gfn - slot->base_gfn, slot->mroe_bitmap) || + memslot_is_readonly(slot); +#else + return memslot_is_readonly(slot); +#endif +} + static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn, gfn_t *nr_pages, bool write) { if (!slot || slot->flags & KVM_MEMSLOT_INVALID) return KVM_HVA_ERR_BAD; - if (memslot_is_readonly(slot) && write) + if (gfn_is_readonly(slot, gfn) && write) return KVM_HVA_ERR_RO_BAD; if (nr_pages) @@ -1314,7 +1337,7 @@ unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, unsigned long hva = __gfn_to_hva_many(slot, gfn, NULL, false); if (!kvm_is_error_hva(hva) && writable) - *writable = !memslot_is_readonly(slot); + *writable = !gfn_is_readonly(slot, gfn); return hva; } @@ -1554,7 +1577,7 @@ kvm_pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn, } /* Do not map writable pfn in the readonly memslot. */ - if (writable && memslot_is_readonly(slot)) { + if (writable && gfn_is_readonly(slot, gfn)) { *writable = false; writable = NULL; }
This patch introduces a hypercall implemented for X86 that can assist against subset of kernel rootkits, it works by place readonly protection in shadow PTE. The end result protection is also kept in a bitmap for each kvm_memory_slot and is used as reference when updating SPTEs. The whole goal is to protect the guest kernel static data from modification if attacker is running from guest ring 0, for this reason there is no hypercall to revert effect of Memory ROE hypercall. This patch doesn't implement integrity check on guest TLB so obvious attack on the current implementation will involve guest virtual address -> guest physical address remapping, but there are plans to fix that. Signed-off-by: Ahmed Abd El Mawgood <ahmedsoliman0x666@gmail.com> --- arch/x86/include/asm/kvm_host.h | 11 +++++- arch/x86/kvm/Kconfig | 7 ++++ arch/x86/kvm/mmu.c | 72 ++++++++++++++++++++++++++++++------ arch/x86/kvm/x86.c | 82 +++++++++++++++++++++++++++++++++++++++-- include/linux/kvm_host.h | 3 ++ include/uapi/linux/kvm_para.h | 1 + virt/kvm/kvm_main.c | 29 +++++++++++++-- 7 files changed, 186 insertions(+), 19 deletions(-)