Message ID | 20250116035008.43404-1-yosryahmed@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: nVMX: Always use TLB_FLUSH_GUEST for nested VM-Enter/VM-Exit | expand |
On Wed, Jan 15, 2025 at 7:50 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > nested_vmx_transition_tlb_flush() uses KVM_REQ_TLB_FLUSH_CURRENT to > flush the TLB if VPID is enabled for both L1 and L2, but they still > share the TLB tag. This happens if EPT is disabled and KVM fails to > allocate a VPID for L2, so both the EPTP and VPID are shared between L1 > and L2. Nit: Combined and guest-physical TLB tags are based on EPTRTA (the new acronym for EP4TA), not EPTP. But, in any case, with EPT disabled, there are no combined or guest-physical mappings. There are only linear mappings. > Interestingly, nested_vmx_transition_tlb_flush() uses > KVM_REQ_TLB_FLUSH_GUEST to flush the TLB for all other cases where a > flush is required. > > Taking a close look at vmx_flush_tlb_guest() and > vmx_flush_tlb_current(), the main differences are: > (a) vmx_flush_tlb_current() is a noop if the KVM MMU is invalid. > (b) vmx_flush_tlb_current() uses INVEPT if EPT is enabled (instead of > INVVPID) to flush the guest-physical mappings as well as combined > mappings. > > The check in (a) is seemingly an optimization, and there should not be > any TLB entries for L1 anyway if the KVM MMU is invalid. Not having this > check in vmx_flush_tlb_guest() is not a fundamental difference, and it > can be added there separately if needed. > > The difference in (b) is irrelevant in this case, because EPT being > enabled for L1 means that its TLB tags are tagged with EPTP and cannot > be used by L2 (regardless of whether or not EPT is enabled for L2). The difference is also irrelevant because, as you concluded in the first paragraph, EPT is disabled in the final block of nested_vmx_transition_tlb_flush(). > Use KVM_REQ_TLB_FLUSH_GUEST in this case in > nested_vmx_transition_tlb_flush() for consistency. This arguably makes > more sense conceptually too -- L1 and L2 cannot share the TLB tag for > guest-physical translations, so only flushing linear and combined > translations (i.e. guest-generated translations) is needed. And, as I mentioned above, with EPT disabled, there are no combined or guest-physical mappings. > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> I think the reasoning in the commit message can be cleared up a bit, but... Reviewed-by: Jim Mattson <mattson@google.com> > --- > > I tested this by running all selftests that have "nested" in their name > (and not svm). I was tempted to run KVM-unit-tests in an L1 guest but I > convinced myself it's prompted by the change :) > > --- > arch/x86/kvm/vmx/nested.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index aa78b6f38dfef..2ed454186e59c 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -1241,7 +1241,7 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu, > * as the effective ASID is common to both L1 and L2. > */ > if (!nested_has_guest_tlb_tag(vcpu)) > - kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); > + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); > } > > static bool is_bitwise_subset(u64 superset, u64 subset, u64 mask) > -- > 2.48.0.rc2.279.g1de40edade-goog > >
On Wed, Jan 15, 2025 at 9:27 PM Jim Mattson <jmattson@google.com> wrote: > > On Wed, Jan 15, 2025 at 7:50 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > nested_vmx_transition_tlb_flush() uses KVM_REQ_TLB_FLUSH_CURRENT to > > flush the TLB if VPID is enabled for both L1 and L2, but they still > > share the TLB tag. This happens if EPT is disabled and KVM fails to > > allocate a VPID for L2, so both the EPTP and VPID are shared between L1 > > and L2. > > Nit: Combined and guest-physical TLB tags are based on EPTRTA (the new > acronym for EP4TA), not EPTP. But, in any case, with EPT disabled, > there are no combined or guest-physical mappings. There are only > linear mappings. Interestingly, I did initially write EPTRTA, but I changed it to EPTP because that is the terminology used in nested_has_guest_tlb_tag(). Anyway, I definitely don't mind changing it to EPTRTA. > > > Interestingly, nested_vmx_transition_tlb_flush() uses > > KVM_REQ_TLB_FLUSH_GUEST to flush the TLB for all other cases where a > > flush is required. > > > > Taking a close look at vmx_flush_tlb_guest() and > > vmx_flush_tlb_current(), the main differences are: > > (a) vmx_flush_tlb_current() is a noop if the KVM MMU is invalid. > > (b) vmx_flush_tlb_current() uses INVEPT if EPT is enabled (instead of > > INVVPID) to flush the guest-physical mappings as well as combined > > mappings. > > > > The check in (a) is seemingly an optimization, and there should not be > > any TLB entries for L1 anyway if the KVM MMU is invalid. Not having this > > check in vmx_flush_tlb_guest() is not a fundamental difference, and it > > can be added there separately if needed. > > > > The difference in (b) is irrelevant in this case, because EPT being > > enabled for L1 means that its TLB tags are tagged with EPTP and cannot > > be used by L2 (regardless of whether or not EPT is enabled for L2). > > The difference is also irrelevant because, as you concluded in the > first paragraph, EPT is disabled in the final block of > nested_vmx_transition_tlb_flush(). I was trying to explain that even if EPT is enabled, sharing guest-physical translations between L1 and L2 should never be possible (and hence we should never worry about flushing these translations in nested_vmx_transition_tlb_flush()). Now that I read it again it is not as clear as I had hoped. > > > Use KVM_REQ_TLB_FLUSH_GUEST in this case in > > nested_vmx_transition_tlb_flush() for consistency. This arguably makes > > more sense conceptually too -- L1 and L2 cannot share the TLB tag for > > guest-physical translations, so only flushing linear and combined > > translations (i.e. guest-generated translations) is needed. > > And, as I mentioned above, with EPT disabled, there are no combined or > guest-physical mappings. > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> > > I think the reasoning in the commit message can be cleared up a bit, but... Agreed :) I am sure Sean will also want changes in the commit message anyway. > > Reviewed-by: Jim Mattson <mattson@google.com> Thanks for the quick review!
On Thu, Jan 16, 2025, Yosry Ahmed wrote: > On Wed, Jan 15, 2025 at 9:27 PM Jim Mattson <jmattson@google.com> wrote: > > On Wed, Jan 15, 2025 at 7:50 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > > Use KVM_REQ_TLB_FLUSH_GUEST in this case in > > > nested_vmx_transition_tlb_flush() for consistency. This arguably makes > > > more sense conceptually too -- L1 and L2 cannot share the TLB tag for > > > guest-physical translations, so only flushing linear and combined > > > translations (i.e. guest-generated translations) is needed. No, using KVM_REQ_TLB_FLUSH_CURRENT is correct. From *L1's* perspective, VPID is enabled, and so VM-Entry/VM-Exit are NOT architecturally guaranteed to flush TLBs, and thus KVM is not required to FLUSH_GUEST. E.g. if KVM is using shadow paging (no EPT whatsoever), and L1 has modified the PTEs used to map L2 but has not yet flushed TLBs for L2's VPID, then KVM is allowed to retain its old, "stale" SPTEs that map L2 because architecturally they aren't guaranteed to be visible to L2. But because L1 and L2 share TLB entries *in hardware*, KVM needs to ensure the hardware TLBs are flushed. Without EPT, KVM will use different CR3s for L1 and L2, but Intel's ASID tag doesn't include the CR3 address, only the PCID, which KVM always pulls from guest CR3, i.e. could be the same for L1 and L2. Specifically, the synchronization of shadow roots in kvm_vcpu_flush_tlb_guest() is not required in this scenario. static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu) { ++vcpu->stat.tlb_flush; if (!tdp_enabled) { /* * A TLB flush on behalf of the guest is equivalent to * INVPCID(all), toggling CR4.PGE, etc., which requires * a forced sync of the shadow page tables. Ensure all the * roots are synced and the guest TLB in hardware is clean. */ kvm_mmu_sync_roots(vcpu); kvm_mmu_sync_prev_roots(vcpu); } kvm_x86_call(flush_tlb_guest)(vcpu); /* * Flushing all "guest" TLB is always a superset of Hyper-V's fine * grained flushing. */ kvm_hv_vcpu_purge_flush_tlb(vcpu); }
On Thu, Jan 16, 2025 at 9:11 AM Sean Christopherson <seanjc@google.com> wrote: > > On Thu, Jan 16, 2025, Yosry Ahmed wrote: > > On Wed, Jan 15, 2025 at 9:27 PM Jim Mattson <jmattson@google.com> wrote: > > > On Wed, Jan 15, 2025 at 7:50 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > > > Use KVM_REQ_TLB_FLUSH_GUEST in this case in > > > > nested_vmx_transition_tlb_flush() for consistency. This arguably makes > > > > more sense conceptually too -- L1 and L2 cannot share the TLB tag for > > > > guest-physical translations, so only flushing linear and combined > > > > translations (i.e. guest-generated translations) is needed. > > No, using KVM_REQ_TLB_FLUSH_CURRENT is correct. From *L1's* perspective, VPID > is enabled, and so VM-Entry/VM-Exit are NOT architecturally guaranteed to flush > TLBs, and thus KVM is not required to FLUSH_GUEST. > > E.g. if KVM is using shadow paging (no EPT whatsoever), and L1 has modified the > PTEs used to map L2 but has not yet flushed TLBs for L2's VPID, then KVM is allowed > to retain its old, "stale" SPTEs that map L2 because architecturally they aren't > guaranteed to be visible to L2. > > But because L1 and L2 share TLB entries *in hardware*, KVM needs to ensure the > hardware TLBs are flushed. Without EPT, KVM will use different CR3s for L1 and > L2, but Intel's ASID tag doesn't include the CR3 address, only the PCID, which > KVM always pulls from guest CR3, i.e. could be the same for L1 and L2. > > Specifically, the synchronization of shadow roots in kvm_vcpu_flush_tlb_guest() > is not required in this scenario. Aha, I was examining vmx_flush_tlb_guest() not kvm_vcpu_flush_tlb_guest(), so I missed the synchronization. Yeah I think it's possible that we end up unnecessarily synchronizing the shadow page tables (or dropping them) in this case. Do you think it's worth expanding the comment in nested_vmx_transition_tlb_flush()? diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 2ed454186e59c..43d34e413d016 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1239,6 +1239,11 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu, * does not have a unique TLB tag (ASID), i.e. EPT is disabled and * KVM was unable to allocate a VPID for L2, flush the current context * as the effective ASID is common to both L1 and L2. + * + * Note that even though TLB_FLUSH_GUEST would be correct because we + * only need to flush linear mappings, it would unnecessarily + * synchronize the MMU even though a TLB flush is not architecturally + * required from L1's perspective. */ if (!nested_has_guest_tlb_tag(vcpu)) kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index aa78b6f38dfef..2ed454186e59c 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1241,7 +1241,7 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu, * as the effective ASID is common to both L1 and L2. */ if (!nested_has_guest_tlb_tag(vcpu)) - kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); } static bool is_bitwise_subset(u64 superset, u64 subset, u64 mask)
nested_vmx_transition_tlb_flush() uses KVM_REQ_TLB_FLUSH_CURRENT to flush the TLB if VPID is enabled for both L1 and L2, but they still share the TLB tag. This happens if EPT is disabled and KVM fails to allocate a VPID for L2, so both the EPTP and VPID are shared between L1 and L2. Interestingly, nested_vmx_transition_tlb_flush() uses KVM_REQ_TLB_FLUSH_GUEST to flush the TLB for all other cases where a flush is required. Taking a close look at vmx_flush_tlb_guest() and vmx_flush_tlb_current(), the main differences are: (a) vmx_flush_tlb_current() is a noop if the KVM MMU is invalid. (b) vmx_flush_tlb_current() uses INVEPT if EPT is enabled (instead of INVVPID) to flush the guest-physical mappings as well as combined mappings. The check in (a) is seemingly an optimization, and there should not be any TLB entries for L1 anyway if the KVM MMU is invalid. Not having this check in vmx_flush_tlb_guest() is not a fundamental difference, and it can be added there separately if needed. The difference in (b) is irrelevant in this case, because EPT being enabled for L1 means that its TLB tags are tagged with EPTP and cannot be used by L2 (regardless of whether or not EPT is enabled for L2). Use KVM_REQ_TLB_FLUSH_GUEST in this case in nested_vmx_transition_tlb_flush() for consistency. This arguably makes more sense conceptually too -- L1 and L2 cannot share the TLB tag for guest-physical translations, so only flushing linear and combined translations (i.e. guest-generated translations) is needed. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> --- I tested this by running all selftests that have "nested" in their name (and not svm). I was tempted to run KVM-unit-tests in an L1 guest but I convinced myself it's prompted by the change :) --- arch/x86/kvm/vmx/nested.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)