Message ID | 20200601082146.18969-1-pbonzini@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: check userspace_addr for all memslots | expand |
On Mon, 2020-06-01 at 04:21 -0400, Paolo Bonzini wrote: > The userspace_addr alignment and range checks are not performed for private > memory slots that are prepared by KVM itself. This is unnecessary and makes > it questionable to use __*_user functions to access memory later on. We also > rely on the userspace address being aligned since we have an entire family > of functions to map gfn to pfn. > > Fortunately skipping the check is completely unnecessary. Only x86 uses > private memslots and their userspace_addr is obtained from vm_mmap, > therefore it must be below PAGE_OFFSET. In fact, any attempt to pass > an address above PAGE_OFFSET would have failed because such an address > would return true for kvm_is_error_hva. > > Reported-by: Linus Torvalds <torvalds@linux-foundation.org> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> I bisected this patch to break a VM on my AMD system (3970X) The reason it happens, is because I have avic enabled (which uses a private KVM memslot), but it is permanently disabled for that VM, since I enabled nesting for that VM (+svm) and that triggers the code in __x86_set_memory_region to set userspace_addr of the disabled memslot to non canonical address (0xdeadull << 48) which is later rejected in __kvm_set_memory_region after that patch, and that makes it silently not disable the memslot, which hangs the guest. The call is from avic_update_access_page, which is called from svm_pre_update_apicv_exec_ctrl which discards the return value. I think that the fix for this would be to either make access_ok always return true for size==0, or __kvm_set_memory_region should treat size==0 specially and skip that check for it. Best regards, Maxim Levitsky
On 11/06/20 16:44, Maxim Levitsky wrote: > On Mon, 2020-06-01 at 04:21 -0400, Paolo Bonzini wrote: >> The userspace_addr alignment and range checks are not performed for private >> memory slots that are prepared by KVM itself. This is unnecessary and makes >> it questionable to use __*_user functions to access memory later on. We also >> rely on the userspace address being aligned since we have an entire family >> of functions to map gfn to pfn. >> >> Fortunately skipping the check is completely unnecessary. Only x86 uses >> private memslots and their userspace_addr is obtained from vm_mmap, >> therefore it must be below PAGE_OFFSET. In fact, any attempt to pass >> an address above PAGE_OFFSET would have failed because such an address >> would return true for kvm_is_error_hva. >> >> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > I bisected this patch to break a VM on my AMD system (3970X) > > The reason it happens, is because I have avic enabled (which uses > a private KVM memslot), but it is permanently disabled for that VM, > since I enabled nesting for that VM (+svm) and that triggers the code > in __x86_set_memory_region to set userspace_addr of the disabled > memslot to non canonical address (0xdeadull << 48) which is later rejected in __kvm_set_memory_region > after that patch, and that makes it silently not disable the memslot, which hangs the guest. > > The call is from avic_update_access_page, which is called from svm_pre_update_apicv_exec_ctrl > which discards the return value. > > > I think that the fix for this would be to either make access_ok always return > true for size==0, or __kvm_set_memory_region should treat size==0 specially > and skip that check for it. Or just set hva to 0. Deletion goes through kvm_delete_memslot so that dummy hva is not used anywhere. If we really want to poison the hva of deleted memslots we should not do it specially in __x86_set_memory_region. I'll send a patch. Paolo
On Thu, 2020-06-11 at 17:27 +0200, Paolo Bonzini wrote: > On 11/06/20 16:44, Maxim Levitsky wrote: > > On Mon, 2020-06-01 at 04:21 -0400, Paolo Bonzini wrote: > > > The userspace_addr alignment and range checks are not performed for private > > > memory slots that are prepared by KVM itself. This is unnecessary and makes > > > it questionable to use __*_user functions to access memory later on. We also > > > rely on the userspace address being aligned since we have an entire family > > > of functions to map gfn to pfn. > > > > > > Fortunately skipping the check is completely unnecessary. Only x86 uses > > > private memslots and their userspace_addr is obtained from vm_mmap, > > > therefore it must be below PAGE_OFFSET. In fact, any attempt to pass > > > an address above PAGE_OFFSET would have failed because such an address > > > would return true for kvm_is_error_hva. > > > > > > Reported-by: Linus Torvalds <torvalds@linux-foundation.org> > > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > > > I bisected this patch to break a VM on my AMD system (3970X) > > > > The reason it happens, is because I have avic enabled (which uses > > a private KVM memslot), but it is permanently disabled for that VM, > > since I enabled nesting for that VM (+svm) and that triggers the code > > in __x86_set_memory_region to set userspace_addr of the disabled > > memslot to non canonical address (0xdeadull << 48) which is later rejected in __kvm_set_memory_region > > after that patch, and that makes it silently not disable the memslot, which hangs the guest. > > > > The call is from avic_update_access_page, which is called from svm_pre_update_apicv_exec_ctrl > > which discards the return value. > > > > > > I think that the fix for this would be to either make access_ok always return > > true for size==0, or __kvm_set_memory_region should treat size==0 specially > > and skip that check for it. > > Or just set hva to 0. Deletion goes through kvm_delete_memslot so that > dummy hva is not used anywhere. If we really want to poison the hva of > deleted memslots we should not do it specially in > __x86_set_memory_region. I'll send a patch. After checking exactly what access_ok does, I mostly agree with this. There is still an implicit assumption that address 0 is a valid userspace address. It is fair to assume that on x86 though. Best regards, Maxim Levitsky > > Paolo >
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 731c1e517716..08184f571669 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1223,10 +1223,9 @@ int __kvm_set_memory_region(struct kvm *kvm, if (mem->guest_phys_addr & (PAGE_SIZE - 1)) return -EINVAL; /* We can read the guest memory with __xxx_user() later on. */ - if ((id < KVM_USER_MEM_SLOTS) && - ((mem->userspace_addr & (PAGE_SIZE - 1)) || + if ((mem->userspace_addr & (PAGE_SIZE - 1)) || !access_ok((void __user *)(unsigned long)mem->userspace_addr, - mem->memory_size))) + mem->memory_size)) return -EINVAL; if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM) return -EINVAL;
The userspace_addr alignment and range checks are not performed for private memory slots that are prepared by KVM itself. This is unnecessary and makes it questionable to use __*_user functions to access memory later on. We also rely on the userspace address being aligned since we have an entire family of functions to map gfn to pfn. Fortunately skipping the check is completely unnecessary. Only x86 uses private memslots and their userspace_addr is obtained from vm_mmap, therefore it must be below PAGE_OFFSET. In fact, any attempt to pass an address above PAGE_OFFSET would have failed because such an address would return true for kvm_is_error_hva. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- virt/kvm/kvm_main.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)