Message ID | 20230412213510.1220557-5-amoorthy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improve scalability of KVM + userfaultfd live migration via annotated memory faults. | expand |
During some testing yesterday I realized that this patch actually breaks the self test, causing an error which the later self test changes cover up. Running "./demand_paging_test -b 512M -u MINOR -s shmem -v 1" from kvm/next (b3c98052d469) with just this patch applies gives the following output > # ./demand_paging_test -b 512M -u MINOR -s shmem -v 1 > Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages > guest physical test memory: [0x7fcdfffe000, 0x7fcffffe000) > Finished creating vCPUs and starting uffd threads > Started all vCPUs > ==== Test Assertion Failure ==== > demand_paging_test.c:50: false > pid=13293 tid=13297 errno=4 - Interrupted system call > // Some stack trace stuff > Invalid guest sync status: exit_reason=UNKNOWN, ucall=0 The problem is the get_ucall() part of the following block in the self test's vcpu_worker() > ret = _vcpu_run(vcpu); > TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret); > if (get_ucall(vcpu, NULL) != UCALL_SYNC) { > TEST_ASSERT(false, > "Invalid guest sync status: exit_reason=%s\n", > exit_reason_str(run->exit_reason)); > } I took a look and, while get_ucall() does depend on the value of exit_reason, the error's root cause isn't clear to me yet. Moving the "exit_reason = kvm_exit_unknown" line to later in the function, right above the vcpu_run() call "fixes" the problem. I've done that for now and will bisect later to investigate: if anyone has any clues please let me know.
On Tue, May 02, 2023, Anish Moorthy wrote: > During some testing yesterday I realized that this patch actually > breaks the self test, causing an error which the later self test > changes cover up. > > Running "./demand_paging_test -b 512M -u MINOR -s shmem -v 1" from > kvm/next (b3c98052d469) with just this patch applies gives the > following output > > > # ./demand_paging_test -b 512M -u MINOR -s shmem -v 1 > > Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages > > guest physical test memory: [0x7fcdfffe000, 0x7fcffffe000) > > Finished creating vCPUs and starting uffd threads > > Started all vCPUs > > ==== Test Assertion Failure ==== > > demand_paging_test.c:50: false > > pid=13293 tid=13297 errno=4 - Interrupted system call > > // Some stack trace stuff > > Invalid guest sync status: exit_reason=UNKNOWN, ucall=0 > > The problem is the get_ucall() part of the following block in the self > test's vcpu_worker() > > > ret = _vcpu_run(vcpu); > > TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret); > > if (get_ucall(vcpu, NULL) != UCALL_SYNC) { > > TEST_ASSERT(false, > > "Invalid guest sync status: exit_reason=%s\n", > > exit_reason_str(run->exit_reason)); > > } > > I took a look and, while get_ucall() does depend on the value of > exit_reason, the error's root cause isn't clear to me yet. Stating what you likely already know... On x86, the UCALL is performed via port I/O, and so the selftests framework zeros out the ucall struct if the userspace exit reason isn't KVM_EXIT_IO. > Moving the "exit_reason = kvm_exit_unknown" line to later in the > function, right above the vcpu_run() call "fixes" the problem. I've > done that for now and will bisect later to investigate: if anyone > has any clues please let me know. Clobbering vcpu->run->exit_reason before this code block is a bug: if (unlikely(vcpu->arch.complete_userspace_io)) { int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io; vcpu->arch.complete_userspace_io = NULL; r = cui(vcpu); if (r <= 0) goto out; } else { WARN_ON_ONCE(vcpu->arch.pio.count); WARN_ON_ONCE(vcpu->mmio_needed); } if (kvm_run->immediate_exit) { r = -EINTR; goto out; } For userspace I/O and MMIO, KVM requires userspace to "complete" the instruction that triggered the exit to userspace, e.g. write memory/registers and skip the instruction as needed. The immediate_exit flag is set by userspace when userspace wants to retain control and is doing KVM_RUN purely to placate KVM. In selftests, this is done by vcpu_run_complete_io(). The one part I'm a bit surprised by is that this caused ucall problems. The ucall framework invokes vcpu_run_complete_io() _after_ it grabs the information. addr = ucall_arch_get_ucall(vcpu); if (addr) { TEST_ASSERT(addr != (void *)GUEST_UCALL_FAILED, "Guest failed to allocate ucall struct"); memcpy(uc, addr, sizeof(*uc)); vcpu_run_complete_io(vcpu); } else { memset(uc, 0, sizeof(*uc)); } Making multiple calls to get_ucall() after a single guest ucall would explain everything as only the first get_ucall() would succeed, but AFAICT the test doesn't invoke get_ucall() multiple times. Aha! Found it. _vcpu_run() invokes assert_on_unhandled_exception(), which does if (get_ucall(vcpu, &uc) == UCALL_UNHANDLED) { uint64_t vector = uc.args[0]; TEST_FAIL("Unexpected vectored event in guest (vector:0x%lx)", vector); } and thus triggers vcpu_run_complete_io() before demand_paging_test's vcpu_worker() gets control and does _its_ get_ucall().
Thanks for nailing this down for me! One more question: should we be concerned about any guest memory accesses occurring in the preamble to that vcpu_run() call in kvm_arch_vcpu_ioctl_run()? I only see two spots from which an EFAULT could make it to userspace, those being the sync_regs() and cui() calls. The former looks clean but I'm not sure about the latter. As written it's not an issue per se if the cui() call tries a vCPU memory access- the kvm_populate_efault_info() helper will just not populate the run struct and WARN_ON_ONCE(). But it would be good to know about.
On Tue, May 02, 2023, Anish Moorthy wrote: > Thanks for nailing this down for me! One more question: should we be > concerned about any guest memory accesses occurring in the preamble to > that vcpu_run() call in kvm_arch_vcpu_ioctl_run()? > > I only see two spots from which an EFAULT could make it to userspace, > those being the sync_regs() and cui() calls. The former looks clean Ya, sync_regs() is a non-issue, that doesn't touch guest memory unless userspace is doing something truly bizarre. > but I'm not sure about the latter. As written it's not an issue per se > if the cui() call tries a vCPU memory access- the > kvm_populate_efault_info() helper will just not populate the run > struct and WARN_ON_ONCE(). But it would be good to know about. If KVM triggers a WARN_ON_ONCE(), then that's an issue. Though looking at the code, the cui() aspect is a moot point. As I stated in the previous discussion, the WARN_ON_ONCE() in question needs to be off-by-default. : Hmm, one idea would be to have the initial -EFAULT detection fill kvm_run.memory_fault, : but set kvm_run.exit_reason to some magic number, e.g. zero it out. Then KVM could : WARN if something tries to overwrite kvm_run.exit_reason. The WARN would need to : be buried by a Kconfig or something since kvm_run can be modified by userspace, : but other than that I think it would work.
On Tue, May 2, 2023 at 1:41 PM Sean Christopherson <seanjc@google.com> wrote: > > If KVM triggers a WARN_ON_ONCE(), then that's an issue. Though looking at the > code, the cui() aspect is a moot point. As I stated in the previous discussion, > the WARN_ON_ONCE() in question needs to be off-by-default. > > : Hmm, one idea would be to have the initial -EFAULT detection fill kvm_run.memory_fault, > : but set kvm_run.exit_reason to some magic number, e.g. zero it out. Then KVM could > : WARN if something tries to overwrite kvm_run.exit_reason. The WARN would need to > : be buried by a Kconfig or something since kvm_run can be modified by userspace, > : but other than that I think it would work. Ah, ok: I thought using WARN_ON_ONCE instead of WARN might have obviated the Kconfig. I'll go add one.
On Tue, May 02, 2023, Anish Moorthy wrote: > On Tue, May 2, 2023 at 1:41 PM Sean Christopherson <seanjc@google.com> wrote: > > > > If KVM triggers a WARN_ON_ONCE(), then that's an issue. Though looking at the > > code, the cui() aspect is a moot point. As I stated in the previous discussion, > > the WARN_ON_ONCE() in question needs to be off-by-default. > > > > : Hmm, one idea would be to have the initial -EFAULT detection fill kvm_run.memory_fault, > > : but set kvm_run.exit_reason to some magic number, e.g. zero it out. Then KVM could > > : WARN if something tries to overwrite kvm_run.exit_reason. The WARN would need to > > : be buried by a Kconfig or something since kvm_run can be modified by userspace, > > : but other than that I think it would work. > > Ah, ok: I thought using WARN_ON_ONCE instead of WARN might have > obviated the Kconfig. I'll go add one. Don't put too much effort into anything at this point. I'm not entirely convinced that it's worth carrying a Kconfig for this one-off case (my "suggestion" was mostly just me spitballing), and at a quick glance through the rest of the series, I'll definitely have more comments when I do a full review, i.e. things may change too.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 237c483b12301..ca73eb066af81 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10965,6 +10965,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) kvm_run->flags = 0; kvm_load_guest_fpu(vcpu); + kvm_run->exit_reason = KVM_EXIT_UNKNOWN; kvm_vcpu_srcu_read_lock(vcpu); if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) { if (kvm_run->immediate_exit) {
Give kvm_run.exit_reason a defined initial value on entry into KVM_RUN: other architectures (riscv, arm64) already use KVM_EXIT_UNKNOWN for this purpose, so copy that convention. This gives vCPUs trying to fill the run struct a mechanism to avoid overwriting already-populated data, albeit an imperfect one. Being able to detect an already-populated KVM run struct will prevent at least some bugs in the upcoming implementation of KVM_CAP_MEMORY_FAULT_INFO, which will attempt to fill the run struct whenever a vCPU fails a guest memory access. Without the already-populated check, KVM_CAP_MEMORY_FAULT_INFO could change kvm_run in any code paths which 1. Populate kvm_run for some exit and prepare to return to userspace 2. Access guest memory for some reason (but without returning -EFAULTs to userspace) 3. Finish the return to userspace set up in (1), now with the contents of kvm_run changed to contain efault info. Signed-off-by: Anish Moorthy <amoorthy@google.com> --- arch/x86/kvm/x86.c | 1 + 1 file changed, 1 insertion(+)