diff mbox series

[v10,02/39] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag

Message ID 20220921152436.3673454-3-vkuznets@redhat.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush features | expand

Commit Message

Vitaly Kuznetsov Sept. 21, 2022, 3:23 p.m. UTC
In preparation to implementing fine-grained Hyper-V TLB flush and
L2 TLB flush, resurrect dedicated KVM_REQ_HV_TLB_FLUSH request bit. As
KVM_REQ_TLB_FLUSH_GUEST/KVM_REQ_TLB_FLUSH_CURRENT are stronger operations,
clear KVM_REQ_HV_TLB_FLUSH request in kvm_service_local_tlb_flush_requests()
when any of these were also requested.

No (real) functional change intended.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/hyperv.c           |  4 ++--
 arch/x86/kvm/x86.c              | 10 ++++++++--
 3 files changed, 12 insertions(+), 4 deletions(-)

Comments

Sean Christopherson Sept. 21, 2022, 4:23 p.m. UTC | #1
On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f62d5799fcd7..86504a8bfd9a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3418,11 +3418,17 @@ static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
>   */
>  void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
>  {
> -	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
> +	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) {
>  		kvm_vcpu_flush_tlb_current(vcpu);
> +		kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);

This isn't correct, flush_tlb_current() flushes "host" TLB entries, i.e. guest-physical
mappings in Intel terminology, where flush_tlb_guest() and (IIUC) Hyper-V's paravirt
TLB flush both flesh "guest" TLB entries, i.e. linear and combined mappings.

Amusing side topic, apparently I like arm's stage-2 terminology better than "TDP",
because I actually typed out "stage-2" first.

> +	}
>  
> -	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu))
> +	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
> +		kvm_vcpu_flush_tlb_guest(vcpu);
> +		kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
> +	} else if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu)) {
>  		kvm_vcpu_flush_tlb_guest(vcpu);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(kvm_service_local_tlb_flush_requests);
>  
> -- 
> 2.37.3
>
Sean Christopherson Sept. 21, 2022, 4:45 p.m. UTC | #2
On Wed, Sep 21, 2022, Sean Christopherson wrote:
> On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index f62d5799fcd7..86504a8bfd9a 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3418,11 +3418,17 @@ static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
> >   */
> >  void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
> >  {
> > -	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
> > +	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) {
> >  		kvm_vcpu_flush_tlb_current(vcpu);
> > +		kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
> 
> This isn't correct, flush_tlb_current() flushes "host" TLB entries, i.e. guest-physical
> mappings in Intel terminology, where flush_tlb_guest() and (IIUC) Hyper-V's paravirt
> TLB flush both flesh "guest" TLB entries, i.e. linear and combined mappings.
> 
> Amusing side topic, apparently I like arm's stage-2 terminology better than "TDP",
> because I actually typed out "stage-2" first.
> 
> > +	}
> >  
> > -	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu))
> > +	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
> > +		kvm_vcpu_flush_tlb_guest(vcpu);
> > +		kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);

Looking at future patches where KVM needs to reset the FIFO when doing a "guest"
TLB flush, i.e. needs to do more than just clearing the request, what about putting
this in kvm_vcpu_flush_tlb_guest() right away?

Ah, and there's already a second caller to kvm_vcpu_flush_tlb_guest().  I doubt
KVM's paravirt TLB flush will ever collide with Hyper-V's paravirt TLB flush,
but logically a "guest" flush that is initiated through KVM's paravirt interface
should also clear Hyper-V's queue/request.

And for consistency, slot this in before this patch:

From: Sean Christopherson <seanjc@google.com>
Date: Wed, 21 Sep 2022 09:35:34 -0700
Subject: [PATCH] KVM: x86: Move clearing of TLB_FLUSH_CURRENT to
 kvm_vcpu_flush_tlb_all()

Clear KVM_REQ_TLB_FLUSH_CURRENT in kvm_vcpu_flush_tlb_all() instead of in
its sole caller that processes KVM_REQ_TLB_FLUSH.  Regardless of why/when
kvm_vcpu_flush_tlb_all() is called, flushing "all" TLB entries also
flushes "current" TLB entries.

Ideally, there will never be another caller of kvm_vcpu_flush_tlb_all(),
and moving the handling "requires" extra work to document the ordering
requirement, but future Hyper-V paravirt TLB flushing support will add
similar logic for flush "guest" (Hyper-V can flush a subset of "guest"
entries).  And in the Hyper-V case, KVM needs to do more than just clear
the request, the queue of GPAs to flush also needs to purged, and doing
all only in the request path is undesirable as kvm_vcpu_flush_tlb_guest()
does have multiple callers (though it's unlikely KVM's paravirt TLB flush
will coincide with Hyper-V's paravirt TLB flush).

Move the logic even though it adds extra "work" so that KVM will be
consistent with how flush requests are processed when the Hyper-V support
lands.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f62d5799fcd7..3ea2e51a8cb5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3383,6 +3383,9 @@ static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
 	++vcpu->stat.tlb_flush;
 	static_call(kvm_x86_flush_tlb_all)(vcpu);
+
+	/* Flushing all ASIDs flushes the current ASID... */
+	kvm_clear_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
 
 static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
@@ -10462,12 +10465,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_mmu_sync_roots(vcpu);
 		if (kvm_check_request(KVM_REQ_LOAD_MMU_PGD, vcpu))
 			kvm_mmu_load_pgd(vcpu);
-		if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
+
+		/*
+		 * Note, the order matters here, as flushing "all" TLB entries
+		 * also flushes the "current" TLB entries, i.e. servicing the
+		 * flush "all" will clear any request to flush "current".
+		 */
+		if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
 			kvm_vcpu_flush_tlb_all(vcpu);
-
-			/* Flushing all ASIDs flushes the current ASID... */
-			kvm_clear_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
-		}
 		kvm_service_local_tlb_flush_requests(vcpu);
 
 		if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {

base-commit: ed102fe0b59586397b362a849bd7fb32582b77d8
--
Vitaly Kuznetsov Sept. 22, 2022, 9:31 a.m. UTC | #3
Sean Christopherson <seanjc@google.com> writes:

> On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index f62d5799fcd7..86504a8bfd9a 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3418,11 +3418,17 @@ static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
>>   */
>>  void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
>>  {
>> -	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
>> +	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) {
>>  		kvm_vcpu_flush_tlb_current(vcpu);
>> +		kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
>
> This isn't correct, flush_tlb_current() flushes "host" TLB entries, i.e. guest-physical
> mappings in Intel terminology, where flush_tlb_guest() and (IIUC) Hyper-V's paravirt
> TLB flush both flesh "guest" TLB entries, i.e. linear and combined
> mappings.

(Honestly, I was waiting for this comment when I first brought this, I
even put it in a separate patch with a provokative "KVM: x86:
KVM_REQ_TLB_FLUSH_CURRENT is a superset of KVM_REQ_HV_TLB_FLUSH too"
name but AFAIR the only comment I got was "please merge with the patch
which clears KVM_REQ_TLB_FLUSH_GUEST" so started thinking this was the
right thing to do :) Jokes aside,

This small optimization was done for nSVM case. When switching from L1
to L2 and vice versa, the code does nested_svm_transition_tlb_flush()
which is

	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);

On AMD, both KVM_REQ_TLB_FLUSH_CURRENT and KVM_REQ_TLB_FLUSH_GUEST are
the same thing (.flush_tlb_current == .flush_tlb_guest ==
svm_flush_tlb_current()) flushing the whole ASID so processing Hyper-V
TLB flush requests is ceratainly redundant.

Now let's get to VMX and the point of my confusion (and thanks in
advance for educating me!):
AFAIU, when EPT is in use:
 KVM_REQ_TLB_FLUSH_CURRENT == invept
 KVM_REQ_TLB_FLUSH_GUEST = invvpid

For "normal" mappings (which are mapped on both stages) this is the same
thing as they're 'tagged' with both VPID and 'EPT root'. The question is
what's left. Given your comment, do I understand correctly that in case
of an invalid mapping in the guest (GVA doesn't resolve to a GPA), this
will only be tagged with VPID but not with 'EPT root' (as the CPU never
reached to the second translation stage)? We certainly can't ignore
these. Another (probably pure theoretical question) is what are the
mappings which are tagged with 'EPT root' but don't have a VPID tag? Are
these the mapping which happen when e.g. vCPU has paging disabled? These
are probably unrelated to Hyper-V TLB flushing.

To preserve the 'small' optimization, we can probably move 
 kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);

to nested_svm_transition_tlb_flush() or, in case this sounds too
hackish, we can drop it for now and add it to the (already overfull)
bucket of the "optimize nested_svm_transition_tlb_flush()".
Vitaly Kuznetsov Sept. 22, 2022, 9:35 a.m. UTC | #4
Sean Christopherson <seanjc@google.com> writes:

> On Wed, Sep 21, 2022, Sean Christopherson wrote:
>> On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
>> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> > index f62d5799fcd7..86504a8bfd9a 100644
>> > --- a/arch/x86/kvm/x86.c
>> > +++ b/arch/x86/kvm/x86.c
>> > @@ -3418,11 +3418,17 @@ static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
>> >   */
>> >  void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
>> >  {
>> > -	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
>> > +	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) {
>> >  		kvm_vcpu_flush_tlb_current(vcpu);
>> > +		kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
>> 
>> This isn't correct, flush_tlb_current() flushes "host" TLB entries, i.e. guest-physical
>> mappings in Intel terminology, where flush_tlb_guest() and (IIUC) Hyper-V's paravirt
>> TLB flush both flesh "guest" TLB entries, i.e. linear and combined mappings.
>> 
>> Amusing side topic, apparently I like arm's stage-2 terminology better than "TDP",
>> because I actually typed out "stage-2" first.
>> 
>> > +	}
>> >  
>> > -	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu))
>> > +	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
>> > +		kvm_vcpu_flush_tlb_guest(vcpu);
>> > +		kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
>
> Looking at future patches where KVM needs to reset the FIFO when doing a "guest"
> TLB flush, i.e. needs to do more than just clearing the request, what about putting
> this in kvm_vcpu_flush_tlb_guest() right away?

Will do.

>
> Ah, and there's already a second caller to kvm_vcpu_flush_tlb_guest().  I doubt
> KVM's paravirt TLB flush will ever collide with Hyper-V's paravirt TLB flush,
> but logically a "guest" flush that is initiated through KVM's paravirt interface
> should also clear Hyper-V's queue/request.

I ignored this as a case which is not worth optimizing for,
i.e. over-flushing is always correct.

>
> And for consistency, slot this in before this patch:
>

Will do, thanks!

> From: Sean Christopherson <seanjc@google.com>
> Date: Wed, 21 Sep 2022 09:35:34 -0700
> Subject: [PATCH] KVM: x86: Move clearing of TLB_FLUSH_CURRENT to
>  kvm_vcpu_flush_tlb_all()
>
> Clear KVM_REQ_TLB_FLUSH_CURRENT in kvm_vcpu_flush_tlb_all() instead of in
> its sole caller that processes KVM_REQ_TLB_FLUSH.  Regardless of why/when
> kvm_vcpu_flush_tlb_all() is called, flushing "all" TLB entries also
> flushes "current" TLB entries.
>
> Ideally, there will never be another caller of kvm_vcpu_flush_tlb_all(),
> and moving the handling "requires" extra work to document the ordering
> requirement, but future Hyper-V paravirt TLB flushing support will add
> similar logic for flush "guest" (Hyper-V can flush a subset of "guest"
> entries).  And in the Hyper-V case, KVM needs to do more than just clear
> the request, the queue of GPAs to flush also needs to purged, and doing
> all only in the request path is undesirable as kvm_vcpu_flush_tlb_guest()
> does have multiple callers (though it's unlikely KVM's paravirt TLB flush
> will coincide with Hyper-V's paravirt TLB flush).
>
> Move the logic even though it adds extra "work" so that KVM will be
> consistent with how flush requests are processed when the Hyper-V support
> lands.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/x86.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f62d5799fcd7..3ea2e51a8cb5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3383,6 +3383,9 @@ static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
>  {
>  	++vcpu->stat.tlb_flush;
>  	static_call(kvm_x86_flush_tlb_all)(vcpu);
> +
> +	/* Flushing all ASIDs flushes the current ASID... */
> +	kvm_clear_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
>  }
>  
>  static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
> @@ -10462,12 +10465,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  			kvm_mmu_sync_roots(vcpu);
>  		if (kvm_check_request(KVM_REQ_LOAD_MMU_PGD, vcpu))
>  			kvm_mmu_load_pgd(vcpu);
> -		if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
> +
> +		/*
> +		 * Note, the order matters here, as flushing "all" TLB entries
> +		 * also flushes the "current" TLB entries, i.e. servicing the
> +		 * flush "all" will clear any request to flush "current".
> +		 */
> +		if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
>  			kvm_vcpu_flush_tlb_all(vcpu);
> -
> -			/* Flushing all ASIDs flushes the current ASID... */
> -			kvm_clear_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> -		}
>  		kvm_service_local_tlb_flush_requests(vcpu);
>  
>  		if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
>
> base-commit: ed102fe0b59586397b362a849bd7fb32582b77d8
Sean Christopherson Sept. 22, 2022, 3:23 p.m. UTC | #5
On Thu, Sep 22, 2022, Vitaly Kuznetsov wrote:
> Now let's get to VMX and the point of my confusion (and thanks in
> advance for educating me!):
> AFAIU, when EPT is in use:
>  KVM_REQ_TLB_FLUSH_CURRENT == invept
>  KVM_REQ_TLB_FLUSH_GUEST = invvpid
> 
> For "normal" mappings (which are mapped on both stages) this is the same
> thing as they're 'tagged' with both VPID and 'EPT root'. The question is
> what's left. Given your comment, do I understand correctly that in case
> of an invalid mapping in the guest (GVA doesn't resolve to a GPA), this
> will only be tagged with VPID but not with 'EPT root' (as the CPU never
> reached to the second translation stage)? We certainly can't ignore
> these. Another (probably pure theoretical question) is what are the
> mappings which are tagged with 'EPT root' but don't have a VPID tag?

Intel puts mappings into three categories, which for non-root mode equates to:

  linear         == GVA => GPA
  guest-physical == GPA => HPA
  combined       == GVA => HPA

and essentially the categories that consume the GVA are tagged with the VPID
(linear and combined), and categories that consume the GPA are tagged with the
EPTP address (guest-physical and combined).

> Are these the mapping which happen when e.g. vCPU has paging disabled?

No, these mappings can be created at all times.  Even with CR0.PG=1, the guest
can generate GPAs without going through a GVA=>GPA translation, e.g. the page tables
themselves, RTIT (Intel PT) addresses, etc...  And even for combined/full
translations, the CPU can insert TLB entries for just the GPA=>HPA part.

E.g. when a page is allocated by/for userspace, the kernel will zero the page using
the kernel's direct map, but userspace will access the page via a different GVA.
I.e. the guest effectively aliases GPA(x) with GVA(k) and GVA(u).  By inserting
the GPA(x) => HPA(y) into the TLB, when guest userspace access GVA(u), the CPU
encounters a TLB miss on GVA(u) => GPA(x), but gets a TLB hit on GPA(x) => HPA(y).

Separating EPT flushes from VPID (and PCID) flushes allows the CPU to retain
the partial TLB entries, e.g. a host change in the EPT tables will result in the
guest-physical and combined mappings being invalidated, but linear mappings can
be kept.

I'm 99% certain AMD also caches partial entries, e.g. see the blurb on INVLPGA
not affecting NPT translations, AMD just doesn't provide a way for the host to
flush _only_ NPT translations.  Maybe the performance benefits weren't significant
enough to justify the extra complexity?

> These are probably unrelated to Hyper-V TLB flushing.
> 
> To preserve the 'small' optimization, we can probably move 
>  kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
> 
> to nested_svm_transition_tlb_flush() or, in case this sounds too
> hackish

Move it to svm_flush_tlb_current(), because the justification is that on SVM,
flushing "current" TLB entries also flushes "guest" TLB entries due to the more
coarse-grained ASID-based TLB flush.  E.g.

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dd599afc85f5..a86b41503723 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3737,6 +3737,13 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
 {
        struct vcpu_svm *svm = to_svm(vcpu);
 
+       /*
+        * Unlike VMX, SVM doesn't provide a way to flush only NPT TLB entries.
+        * A TLB flush for the current ASID flushes both "host" and "guest" TLB
+        * entries, and thus is a superset of Hyper-V's fine grained flushing.
+        */
+       kvm_hv_vcpu_purge_flush_tlb(vcpu);
+
        /*
         * Flush only the current ASID even if the TLB flush was invoked via
         * kvm_flush_remote_tlbs().  Although flushing remote TLBs requires all

> we can drop it for now and add it to the (already overfull)
> bucket of the "optimize nested_svm_transition_tlb_flush()".

I think even long term, purging Hyper-V's FIFO in svm_flush_tlb_current() is the
correct/desired behavior.  This doesn't really have anything to do with nSVM,
it's all about SVM not providing a way to flush only NPT entries.
Vitaly Kuznetsov Sept. 22, 2022, 3:37 p.m. UTC | #6
Sean Christopherson <seanjc@google.com> writes:

> On Thu, Sep 22, 2022, Vitaly Kuznetsov wrote:
>> Now let's get to VMX and the point of my confusion (and thanks in
>> advance for educating me!):
>> AFAIU, when EPT is in use:
>>  KVM_REQ_TLB_FLUSH_CURRENT == invept
>>  KVM_REQ_TLB_FLUSH_GUEST = invvpid
>> 
>> For "normal" mappings (which are mapped on both stages) this is the same
>> thing as they're 'tagged' with both VPID and 'EPT root'. The question is
>> what's left. Given your comment, do I understand correctly that in case
>> of an invalid mapping in the guest (GVA doesn't resolve to a GPA), this
>> will only be tagged with VPID but not with 'EPT root' (as the CPU never
>> reached to the second translation stage)? We certainly can't ignore
>> these. Another (probably pure theoretical question) is what are the
>> mappings which are tagged with 'EPT root' but don't have a VPID tag?
>
> Intel puts mappings into three categories, which for non-root mode equates to:
>
>   linear         == GVA => GPA
>   guest-physical == GPA => HPA
>   combined       == GVA => HPA
>
> and essentially the categories that consume the GVA are tagged with the VPID
> (linear and combined), and categories that consume the GPA are tagged with the
> EPTP address (guest-physical and combined).
>
>> Are these the mapping which happen when e.g. vCPU has paging disabled?
>
> No, these mappings can be created at all times.  Even with CR0.PG=1, the guest
> can generate GPAs without going through a GVA=>GPA translation, e.g. the page tables
> themselves, RTIT (Intel PT) addresses, etc...  And even for combined/full
> translations, the CPU can insert TLB entries for just the GPA=>HPA part.
>
> E.g. when a page is allocated by/for userspace, the kernel will zero the page using
> the kernel's direct map, but userspace will access the page via a different GVA.
> I.e. the guest effectively aliases GPA(x) with GVA(k) and GVA(u).  By inserting
> the GPA(x) => HPA(y) into the TLB, when guest userspace access GVA(u), the CPU
> encounters a TLB miss on GVA(u) => GPA(x), but gets a TLB hit on GPA(x) => HPA(y).
>
> Separating EPT flushes from VPID (and PCID) flushes allows the CPU to retain
> the partial TLB entries, e.g. a host change in the EPT tables will result in the
> guest-physical and combined mappings being invalidated, but linear mappings can
> be kept.
>

Thanks a bunch! For some reason I though it's always the full thing (combined)
which is tagged with both VPID/PCID and EPTP and linear/guest-physical
are just 'corner' cases (but are still combined and tagged). Apparently,
it's not like that.

> I'm 99% certain AMD also caches partial entries, e.g. see the blurb on INVLPGA
> not affecting NPT translations, AMD just doesn't provide a way for the host to
> flush _only_ NPT translations.  Maybe the performance benefits weren't significant
> enough to justify the extra complexity?
>
>> These are probably unrelated to Hyper-V TLB flushing.
>> 
>> To preserve the 'small' optimization, we can probably move 
>>  kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
>> 
>> to nested_svm_transition_tlb_flush() or, in case this sounds too
>> hackish
>
> Move it to svm_flush_tlb_current(), because the justification is that on SVM,
> flushing "current" TLB entries also flushes "guest" TLB entries due to the more
> coarse-grained ASID-based TLB flush.  E.g.
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index dd599afc85f5..a86b41503723 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3737,6 +3737,13 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
>  {
>         struct vcpu_svm *svm = to_svm(vcpu);
>  
> +       /*
> +        * Unlike VMX, SVM doesn't provide a way to flush only NPT TLB entries.
> +        * A TLB flush for the current ASID flushes both "host" and "guest" TLB
> +        * entries, and thus is a superset of Hyper-V's fine grained flushing.
> +        */
> +       kvm_hv_vcpu_purge_flush_tlb(vcpu);
> +
>         /*
>          * Flush only the current ASID even if the TLB flush was invoked via
>          * kvm_flush_remote_tlbs().  Although flushing remote TLBs requires all
>
>> we can drop it for now and add it to the (already overfull)
>> bucket of the "optimize nested_svm_transition_tlb_flush()".
>
> I think even long term, purging Hyper-V's FIFO in svm_flush_tlb_current() is the
> correct/desired behavior.  This doesn't really have anything to do with nSVM,
> it's all about SVM not providing a way to flush only NPT entries.

True that, silly me forgot that even without any nesting, Hyper-V TLB
flush after svm_flush_tlb_current() makes no sense.

>
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 504daf473092..45c390c804f0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -108,6 +108,8 @@ 
 	KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \
 	KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_HV_TLB_FLUSH \
+	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 0adf4a437e85..3c0f639f6a05 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1870,11 +1870,11 @@  static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
 	 * analyze it here, flush TLB regardless of the specified address space.
 	 */
 	if (all_cpus) {
-		kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH_GUEST);
+		kvm_make_all_cpus_request(kvm, KVM_REQ_HV_TLB_FLUSH);
 	} else {
 		sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
 
-		kvm_make_vcpus_request_mask(kvm, KVM_REQ_TLB_FLUSH_GUEST, vcpu_mask);
+		kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
 	}
 
 ret_success:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f62d5799fcd7..86504a8bfd9a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3418,11 +3418,17 @@  static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
  */
 void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
 {
-	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
+	if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) {
 		kvm_vcpu_flush_tlb_current(vcpu);
+		kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
+	}
 
-	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu))
+	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
+		kvm_vcpu_flush_tlb_guest(vcpu);
+		kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
+	} else if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu)) {
 		kvm_vcpu_flush_tlb_guest(vcpu);
+	}
 }
 EXPORT_SYMBOL_GPL(kvm_service_local_tlb_flush_requests);