diff mbox series

[09/22] KVM: x86/mmu: Try "unprotect for retry" iff there are indirect SPs

Message ID 20240809190319.1710470-10-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Fix multiple #PF RO infinite loop bugs | expand

Commit Message

Sean Christopherson Aug. 9, 2024, 7:03 p.m. UTC
Try to unprotect shadow pages if and only if indirect_shadow_pages is non-
zero, i.e. iff there is at least one protected such shadow page.  Pre-
checking indirect_shadow_pages avoids taking mmu_lock for write when the
gfn is write-protected by a third party, i.e. not for KVM shadow paging,
and in the *extremely* unlikely case that a different task has already
unprotected the last shadow page.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Paolo Bonzini Aug. 14, 2024, 5:30 p.m. UTC | #1
On 8/9/24 21:03, Sean Christopherson wrote:
> Try to unprotect shadow pages if and only if indirect_shadow_pages is non-
> zero, i.e. iff there is at least one protected such shadow page.  Pre-
> checking indirect_shadow_pages avoids taking mmu_lock for write when the
> gfn is write-protected by a third party, i.e. not for KVM shadow paging,
> and in the *extremely* unlikely case that a different task has already
> unprotected the last shadow page.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 09a42dc1fe5a..358294889baa 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2736,6 +2736,9 @@ bool kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa)
>   	gpa_t gpa = cr2_or_gpa;
>   	bool r;
>   
> +	if (!vcpu->kvm->arch.indirect_shadow_pages)
> +		return false;

indirect_shadow_pages is accessed without a lock, so here please add a 
note that, while it may be stale, a false negative will only cause KVM 
to skip the "unprotect and retry" optimization.  (This is preexisting in 
reexecute_instruction() and goes away in patch 18, if I'm pre-reading 
that part of the series correctly).

Bonus points for opportunistically adding a READ_ONCE() here and in 
kvm_mmu_track_write().

Paolo

>   	if (!vcpu->arch.mmu->root_role.direct)
>   		gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);
>
Sean Christopherson Aug. 15, 2024, 2:09 p.m. UTC | #2
On Wed, Aug 14, 2024, Paolo Bonzini wrote:
> On 8/9/24 21:03, Sean Christopherson wrote:
> > Try to unprotect shadow pages if and only if indirect_shadow_pages is non-
> > zero, i.e. iff there is at least one protected such shadow page.  Pre-
> > checking indirect_shadow_pages avoids taking mmu_lock for write when the
> > gfn is write-protected by a third party, i.e. not for KVM shadow paging,
> > and in the *extremely* unlikely case that a different task has already
> > unprotected the last shadow page.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/mmu/mmu.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 09a42dc1fe5a..358294889baa 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -2736,6 +2736,9 @@ bool kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa)
> >   	gpa_t gpa = cr2_or_gpa;
> >   	bool r;
> > +	if (!vcpu->kvm->arch.indirect_shadow_pages)
> > +		return false;
> 
> indirect_shadow_pages is accessed without a lock, so here please add a note
> that, while it may be stale, a false negative will only cause KVM to skip
> the "unprotect and retry" optimization.

Correct, I'll add a comment.

> (This is preexisting in reexecute_instruction() and goes away in patch 18, if
> I'm pre-reading that part of the series correctly).
> 
> Bonus points for opportunistically adding a READ_ONCE() here and in
> kvm_mmu_track_write().

Hmm, right, this one should have a READ_ONCE(), but I don't see any reason to
add one in kvm_mmu_track_write().  If the compiler was crazy and generate multiple
loads between the smp_mb() and write_lock(), _and_ the value transitioned from
1->0, reading '0' on the second go is totally fine because it means the last
shadow page was zapped.  Amusingly, it'd actually be "better" in that it would
avoid unnecessary taking mmu_lock.

Practically speaking, the compiler would have to be broken to generate multiple
loads in the 0->1 case, as that would mean the generated code loaded the value
but ignored the result.  But even if that were to happen, a final read of '1' is
again a-ok.

This code is different because a READ_ONCE() would ensure that indirect_shadow_pages
isn't reloaded for every check.  Though that too would be functionally ok, just
weird.

Obviously the READ_ONCE() would be harmless, but IMO it would be more confusing
than helpful, e.g. would beg the question of why kvm_vcpu_exit_request() doesn't
wrap vcpu->mode with READ_ONCE().  Heh, though arguably vcpu->mode should be
wrapped with READ_ONCE() since it's a helper and could be called multiple times
without any code in between that would guarantee a reload.
Paolo Bonzini Aug. 15, 2024, 4:48 p.m. UTC | #3
On Thu, Aug 15, 2024 at 4:09 PM Sean Christopherson <seanjc@google.com> wrote:
> > (This is preexisting in reexecute_instruction() and goes away in patch 18, if
> > I'm pre-reading that part of the series correctly).
> >
> > Bonus points for opportunistically adding a READ_ONCE() here and in
> > kvm_mmu_track_write().
>
> Hmm, right, this one should have a READ_ONCE(), but I don't see any reason to
> add one in kvm_mmu_track_write().  If the compiler was crazy and generate multiple
> loads between the smp_mb() and write_lock(), _and_ the value transitioned from
> 1->0, reading '0' on the second go is totally fine because it means the last
> shadow page was zapped.  Amusingly, it'd actually be "better" in that it would
> avoid unnecessary taking mmu_lock.

Your call, but I have started leaning towards always using
READ_ONCE(), similar to all atomic_t accesses are done with
atomic_read(); that is, just as much as a marker for cross-thread
lock-free accesses, in addition to limiting the compiler's
optimizations.

tools/memory-model/Documentation/access-marking.txt also suggests
using READ_ONCE() and WRITE_ONCE() always except in special cases.
They are also more friendly to KCSAN (though I have never used it).

This of course has the issue of being yet another unfinished transition.

> Obviously the READ_ONCE() would be harmless, but IMO it would be more confusing
> than helpful, e.g. would beg the question of why kvm_vcpu_exit_request() doesn't
> wrap vcpu->mode with READ_ONCE().  Heh, though arguably vcpu->mode should be
> wrapped with READ_ONCE() since it's a helper and could be called multiple times
> without any code in between that would guarantee a reload.

Indeed, who said I wouldn't change that one as well? :)

Paolo
Sean Christopherson Aug. 28, 2024, 11:28 p.m. UTC | #4
On Thu, Aug 15, 2024, Paolo Bonzini wrote:
> On Thu, Aug 15, 2024 at 4:09 PM Sean Christopherson <seanjc@google.com> wrote:
> > > (This is preexisting in reexecute_instruction() and goes away in patch 18, if
> > > I'm pre-reading that part of the series correctly).
> > >
> > > Bonus points for opportunistically adding a READ_ONCE() here and in
> > > kvm_mmu_track_write().
> >
> > Hmm, right, this one should have a READ_ONCE(), but I don't see any reason to
> > add one in kvm_mmu_track_write().  If the compiler was crazy and generate multiple
> > loads between the smp_mb() and write_lock(), _and_ the value transitioned from
> > 1->0, reading '0' on the second go is totally fine because it means the last
> > shadow page was zapped.  Amusingly, it'd actually be "better" in that it would
> > avoid unnecessary taking mmu_lock.
> 
> Your call, but I have started leaning towards always using
> READ_ONCE(), similar to all atomic_t accesses are done with
> atomic_read(); that is, just as much as a marker for cross-thread
> lock-free accesses, in addition to limiting the compiler's
> optimizations.
> 
> tools/memory-model/Documentation/access-marking.txt also suggests
> using READ_ONCE() and WRITE_ONCE() always except in special cases.
> They are also more friendly to KCSAN (though I have never used it).
> 
> This of course has the issue of being yet another unfinished transition.

I opted to fix the kvm_vcpu_exit_request() case[*], and add the READ_ONCE() to
this patch, but left kvm_mmu_track_write() as-is.

My reasoning, and what I think makes for a decent policy, is that while I 100%
agree lockless accesses need _some_ form of protection/documentation, I think
adding READ_ONCE() (and WRITE_ONCE()) on top adds confusion and makes the actual
requirement unclear.

In other words, if there's already an smp_rmb() or smp_wmb() (or similar), then
don't add READ/WRITE_ONCE() (unless that's also necesary for some reason) because
doing so detracts from the barriers that are actually necessary.

[*] https://lore.kernel.org/all/20240828232013.768446-1-seanjc@google.com

> > Obviously the READ_ONCE() would be harmless, but IMO it would be more confusing
> > than helpful, e.g. would beg the question of why kvm_vcpu_exit_request() doesn't
> > wrap vcpu->mode with READ_ONCE().  Heh, though arguably vcpu->mode should be
> > wrapped with READ_ONCE() since it's a helper and could be called multiple times
> > without any code in between that would guarantee a reload.
> 
> Indeed, who said I wouldn't change that one as well? :)
> 
> Paolo
>
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 09a42dc1fe5a..358294889baa 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2736,6 +2736,9 @@  bool kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa)
 	gpa_t gpa = cr2_or_gpa;
 	bool r;
 
+	if (!vcpu->kvm->arch.indirect_shadow_pages)
+		return false;
+
 	if (!vcpu->arch.mmu->root_role.direct)
 		gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL);