Message ID | 20240228024147.41573-9-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/mmu: Page fault and MMIO cleanups | expand |
On 28/02/2024 3:41 pm, Sean Christopherson wrote: > WARN and skip the emulated MMIO fastpath if a private, reserved page fault > is encountered, as private+reserved should be an impossible combination > (KVM should never create an MMIO SPTE for a private access). > > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/mmu/mmu.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index bd342ebd0809..9206cfa58feb 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -5866,7 +5866,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err > error_code |= PFERR_PRIVATE_ACCESS; > > r = RET_PF_INVALID; > - if (unlikely(error_code & PFERR_RSVD_MASK)) { > + if (unlikely((error_code & PFERR_RSVD_MASK) && > + !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) { > r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct); > if (r == RET_PF_EMULATE) > goto emulate; It seems this will make KVM continue to call kvm_mmu_do_page_fault() when such private+reserve error code actually happens (e.g., due to bug), because @r is still RET_PF_INVALID in such case. Is it better to just return error, e.g., -EINVAL, and give up?
On Fri, Mar 01, 2024, Kai Huang wrote: > > > On 28/02/2024 3:41 pm, Sean Christopherson wrote: > > WARN and skip the emulated MMIO fastpath if a private, reserved page fault > > is encountered, as private+reserved should be an impossible combination > > (KVM should never create an MMIO SPTE for a private access). > > > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > arch/x86/kvm/mmu/mmu.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index bd342ebd0809..9206cfa58feb 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -5866,7 +5866,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err > > error_code |= PFERR_PRIVATE_ACCESS; > > r = RET_PF_INVALID; > > - if (unlikely(error_code & PFERR_RSVD_MASK)) { > > + if (unlikely((error_code & PFERR_RSVD_MASK) && > > + !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) { > > r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct); > > if (r == RET_PF_EMULATE) > > goto emulate; > > It seems this will make KVM continue to call kvm_mmu_do_page_fault() when > such private+reserve error code actually happens (e.g., due to bug), because > @r is still RET_PF_INVALID in such case. Yep. > Is it better to just return error, e.g., -EINVAL, and give up? As long as there is no obvious/immediate danger to the host, no obvious way for the "bad" behavior to cause data corruption for the guest, and continuing on has a plausible chance of working, then KVM should generally try to continue on and not terminate the VM. E.g. in this case, KVM will just skip various fast paths because of the RSVD flag, and treat the fault like a PRIVATE fault. Hmm, but page_fault_handle_page_track() would skip write tracking, which could theoretically cause data corruption, so I guess arguably it would be safer to bail? Anyone else have an opinion? This type of bug should never escape development, so I'm a-ok effectively killing the VM. Unless someone has a good argument for continuing on, I'll go with Kai's suggestion and squash this: diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index cedacb1b89c5..d796a162b2da 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5892,8 +5892,10 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err error_code |= PFERR_PRIVATE_ACCESS; r = RET_PF_INVALID; - if (unlikely((error_code & PFERR_RSVD_MASK) && - !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) { + if (unlikely(error_code & PFERR_RSVD_MASK)) { + if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS)) + return -EFAULT; + r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct); if (r == RET_PF_EMULATE) goto emulate;
On 1/03/2024 12:06 pm, Sean Christopherson wrote: > On Fri, Mar 01, 2024, Kai Huang wrote: >> >> >> On 28/02/2024 3:41 pm, Sean Christopherson wrote: >>> WARN and skip the emulated MMIO fastpath if a private, reserved page fault >>> is encountered, as private+reserved should be an impossible combination >>> (KVM should never create an MMIO SPTE for a private access). >>> >>> Signed-off-by: Sean Christopherson <seanjc@google.com> >>> --- >>> arch/x86/kvm/mmu/mmu.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c >>> index bd342ebd0809..9206cfa58feb 100644 >>> --- a/arch/x86/kvm/mmu/mmu.c >>> +++ b/arch/x86/kvm/mmu/mmu.c >>> @@ -5866,7 +5866,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err >>> error_code |= PFERR_PRIVATE_ACCESS; >>> r = RET_PF_INVALID; >>> - if (unlikely(error_code & PFERR_RSVD_MASK)) { >>> + if (unlikely((error_code & PFERR_RSVD_MASK) && >>> + !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) { >>> r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct); >>> if (r == RET_PF_EMULATE) >>> goto emulate; >> >> It seems this will make KVM continue to call kvm_mmu_do_page_fault() when >> such private+reserve error code actually happens (e.g., due to bug), because >> @r is still RET_PF_INVALID in such case. > > Yep. > >> Is it better to just return error, e.g., -EINVAL, and give up? > > As long as there is no obvious/immediate danger to the host, no obvious way for > the "bad" behavior to cause data corruption for the guest, and continuing on has > a plausible chance of working, then KVM should generally try to continue on and > not terminate the VM. Agreed. But I think sometimes it is hard to tell whether there's any dangerous things waiting to happen, because that means we have to sanity check a lot of code, and when new patches arrive we need to keep that in mind too, which could be a nightmare in terms of maintenance. > > E.g. in this case, KVM will just skip various fast paths because of the RSVD flag, > and treat the fault like a PRIVATE fault. Hmm, but page_fault_handle_page_track() > would skip write tracking, which could theoretically cause data corruption, so I > guess arguably it would be safer to bail? > > Anyone else have an opinion? This type of bug should never escape development, > so I'm a-ok effectively killing the VM. Unless someone has a good argument for > continuing on, I'll go with Kai's suggestion and squash this: > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index cedacb1b89c5..d796a162b2da 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -5892,8 +5892,10 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err > error_code |= PFERR_PRIVATE_ACCESS; > > r = RET_PF_INVALID; > - if (unlikely((error_code & PFERR_RSVD_MASK) && > - !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) { > + if (unlikely(error_code & PFERR_RSVD_MASK)) { > + if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS)) > + return -EFAULT; -EFAULT is part of guest_memfd() memory fault ABI. I didn't think over this thoroughly but do you want to return -EFAULT here?
On Fri, Mar 01, 2024, Kai Huang wrote: > On 1/03/2024 12:06 pm, Sean Christopherson wrote: > > E.g. in this case, KVM will just skip various fast paths because of the RSVD flag, > > and treat the fault like a PRIVATE fault. Hmm, but page_fault_handle_page_track() > > would skip write tracking, which could theoretically cause data corruption, so I > > guess arguably it would be safer to bail? > > > > Anyone else have an opinion? This type of bug should never escape development, > > so I'm a-ok effectively killing the VM. Unless someone has a good argument for > > continuing on, I'll go with Kai's suggestion and squash this: > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index cedacb1b89c5..d796a162b2da 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -5892,8 +5892,10 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err > > error_code |= PFERR_PRIVATE_ACCESS; > > r = RET_PF_INVALID; > > - if (unlikely((error_code & PFERR_RSVD_MASK) && > > - !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) { > > + if (unlikely(error_code & PFERR_RSVD_MASK)) { > > + if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS)) > > + return -EFAULT; > > -EFAULT is part of guest_memfd() memory fault ABI. I didn't think over this > thoroughly but do you want to return -EFAULT here? Yes, I/we do. There are many existing paths that can return -EFAULT from KVM_RUN without setting run->exit_reason to KVM_EXIT_MEMORY_FAULT. Userspace is responsible for checking run->exit_reason on -EFAULT (and -EHWPOISON), i.e. must be prepared to handle a "bare" -EFAULT, where for all intents and purposes "handle" means "terminate the guest". That's actually one of the reasons why KVM_EXIT_MEMORY_FAULT exists, it'd require an absurd amount of work and churn in KVM to *safely* return useful information on *all* -EFAULTs. FWIW, I had hopes and dreams of actually doing exactly this, but have long since abandoned those dreams. In other words, KVM_EXIT_MEMORY_FAULT essentially communicates to userspace that (a) userspace can likely fix whatever badness triggered the -EFAULT, and (b) that KVM is in a state where fixing the underlying problem and resuming the guest is safe, e.g. won't corrupt the guest (because KVM is in a half-baked state).
On 5/03/2024 4:51 am, Sean Christopherson wrote: > On Fri, Mar 01, 2024, Kai Huang wrote: >> On 1/03/2024 12:06 pm, Sean Christopherson wrote: >>> E.g. in this case, KVM will just skip various fast paths because of the RSVD flag, >>> and treat the fault like a PRIVATE fault. Hmm, but page_fault_handle_page_track() >>> would skip write tracking, which could theoretically cause data corruption, so I >>> guess arguably it would be safer to bail? >>> >>> Anyone else have an opinion? This type of bug should never escape development, >>> so I'm a-ok effectively killing the VM. Unless someone has a good argument for >>> continuing on, I'll go with Kai's suggestion and squash this: >>> >>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c >>> index cedacb1b89c5..d796a162b2da 100644 >>> --- a/arch/x86/kvm/mmu/mmu.c >>> +++ b/arch/x86/kvm/mmu/mmu.c >>> @@ -5892,8 +5892,10 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err >>> error_code |= PFERR_PRIVATE_ACCESS; >>> r = RET_PF_INVALID; >>> - if (unlikely((error_code & PFERR_RSVD_MASK) && >>> - !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) { >>> + if (unlikely(error_code & PFERR_RSVD_MASK)) { >>> + if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS)) >>> + return -EFAULT; >> >> -EFAULT is part of guest_memfd() memory fault ABI. I didn't think over this >> thoroughly but do you want to return -EFAULT here? > > Yes, I/we do. There are many existing paths that can return -EFAULT from KVM_RUN > without setting run->exit_reason to KVM_EXIT_MEMORY_FAULT. Userspace is responsible > for checking run->exit_reason on -EFAULT (and -EHWPOISON), i.e. must be prepared > to handle a "bare" -EFAULT, where for all intents and purposes "handle" means > "terminate the guest". Right. > > That's actually one of the reasons why KVM_EXIT_MEMORY_FAULT exists, it'd require > an absurd amount of work and churn in KVM to *safely* return useful information > on *all* -EFAULTs. FWIW, I had hopes and dreams of actually doing exactly this, > but have long since abandoned those dreams. I am not sure whether we need to do that. Perhaps it made you feel so after we changed to use -EFAULT to carry KVM_EXIT_MEMORY_FAULT. :-) > > In other words, KVM_EXIT_MEMORY_FAULT essentially communicates to userspace that > (a) userspace can likely fix whatever badness triggered the -EFAULT, and (b) that > KVM is in a state where fixing the underlying problem and resuming the guest is > safe, e.g. won't corrupt the guest (because KVM is in a half-baked state). > Sure. One small issue might be that, in a later code check, we actually return KVM_EXIT_MEMORY_FAULT when private fault hits RET_PF_EMULATION -- see your patch: [PATCH 01/16] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation So here if we just return -EFAULT w/o reporting KVM_EXIT_MEMORY_FAULT when private+reserved is hit, it seems there's a little bit inconsistency here. But you may have concern of corrupting guest here as you mentioned above.
On Wed, Mar 06, 2024, Kai Huang wrote: > On 5/03/2024 4:51 am, Sean Christopherson wrote: > > In other words, KVM_EXIT_MEMORY_FAULT essentially communicates to userspace that > > (a) userspace can likely fix whatever badness triggered the -EFAULT, and (b) that > > KVM is in a state where fixing the underlying problem and resuming the guest is > > safe, e.g. won't corrupt the guest (because KVM is in a half-baked state). > > > > Sure. One small issue might be that, in a later code check, we actually > return KVM_EXIT_MEMORY_FAULT when private fault hits RET_PF_EMULATION -- see > your patch: > > [PATCH 01/16] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault > hits emulation > > So here if we just return -EFAULT w/o reporting KVM_EXIT_MEMORY_FAULT when > private+reserved is hit, it seems there's a little bit inconsistency here. It's intentionally inconsistent. -EFAULT without KVM_EXIT_MEMORY_FAULT is essentially KVM saying "something bad happened, and it can't be fixed", whereas exiting with KVM_EXIT_MEMORY_FAULT says "there's an issue, but you may be able to resolve it". The ABI is a bit messy, e.g. in some ways it would be cleaner if KVM returned '0'. But doing that in a backwards compatible way would have required a rather ugly opt-in, and it would also make it more tedious to extend KVM_EXIT_MEMORY_FAULT, e.g. pairing it with -EHWPOISON didn't require any new flags.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index bd342ebd0809..9206cfa58feb 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5866,7 +5866,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err error_code |= PFERR_PRIVATE_ACCESS; r = RET_PF_INVALID; - if (unlikely(error_code & PFERR_RSVD_MASK)) { + if (unlikely((error_code & PFERR_RSVD_MASK) && + !WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))) { r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct); if (r == RET_PF_EMULATE) goto emulate;
WARN and skip the emulated MMIO fastpath if a private, reserved page fault is encountered, as private+reserved should be an impossible combination (KVM should never create an MMIO SPTE for a private access). Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/mmu/mmu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)