diff mbox series

[v2,26/43] KVM: VMX: Read Posted Interrupt "control" exactly once per loop iteration

Message ID 20211009021236.4122790-27-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: Halt-polling and x86 APICv overhaul | expand

Commit Message

Sean Christopherson Oct. 9, 2021, 2:12 a.m. UTC
Use READ_ONCE() when loading the posted interrupt descriptor control
field to ensure "old" and "new" have the same base value.  If the
compiler emits separate loads, and loads into "new" before "old", KVM
could theoretically drop the ON bit if it were set between the loads.

Fixes: 28b835d60fcc ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/posted_intr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Maxim Levitsky Oct. 28, 2021, 10:58 a.m. UTC | #1
On Fri, 2021-10-08 at 19:12 -0700, Sean Christopherson wrote:
> Use READ_ONCE() when loading the posted interrupt descriptor control
> field to ensure "old" and "new" have the same base value.  If the
> compiler emits separate loads, and loads into "new" before "old", KVM
> could theoretically drop the ON bit if it were set between the loads.
> 
> Fixes: 28b835d60fcc ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/vmx/posted_intr.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> index 414ea6972b5c..fea343dcc011 100644
> --- a/arch/x86/kvm/vmx/posted_intr.c
> +++ b/arch/x86/kvm/vmx/posted_intr.c
> @@ -53,7 +53,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
>  
>  	/* The full case.  */
>  	do {
> -		old.control = new.control = pi_desc->control;
> +		old.control = new.control = READ_ONCE(pi_desc->control);
>  
>  		dest = cpu_physical_id(cpu);
>  
> @@ -104,7 +104,7 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
>  	     "Wakeup handler not enabled while the vCPU was blocking");
>  
>  	do {
> -		old.control = new.control = pi_desc->control;
> +		old.control = new.control = READ_ONCE(pi_desc->control);
>  
>  		dest = cpu_physical_id(vcpu->cpu);
>  
> @@ -160,7 +160,7 @@ int pi_pre_block(struct kvm_vcpu *vcpu)
>  	     "Posted Interrupt Suppress Notification set before blocking");
>  
>  	do {
> -		old.control = new.control = pi_desc->control;
> +		old.control = new.control = READ_ONCE(pi_desc->control);
>  
>  		/* set 'NV' to 'wakeup vector' */
>  		new.nv = POSTED_INTR_WAKEUP_VECTOR;

I wish there was a way to mark fields in a struct, as requiring 'READ_ONCE' on them
so that compiler would complain if this isn't done, or automatically use 'READ_ONCE'
logic.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky
Sean Christopherson Oct. 28, 2021, 3:55 p.m. UTC | #2
On Thu, Oct 28, 2021, Maxim Levitsky wrote:
> On Fri, 2021-10-08 at 19:12 -0700, Sean Christopherson wrote:
> > Use READ_ONCE() when loading the posted interrupt descriptor control
> > field to ensure "old" and "new" have the same base value.  If the
> > compiler emits separate loads, and loads into "new" before "old", KVM
> > could theoretically drop the ON bit if it were set between the loads.
> > 
> > Fixes: 28b835d60fcc ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/vmx/posted_intr.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> > index 414ea6972b5c..fea343dcc011 100644
> > --- a/arch/x86/kvm/vmx/posted_intr.c
> > +++ b/arch/x86/kvm/vmx/posted_intr.c
> > @@ -53,7 +53,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
> >  
> >  	/* The full case.  */
> >  	do {
> > -		old.control = new.control = pi_desc->control;
> > +		old.control = new.control = READ_ONCE(pi_desc->control);
> >  
> >  		dest = cpu_physical_id(cpu);
> >  
> > @@ -104,7 +104,7 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> >  	     "Wakeup handler not enabled while the vCPU was blocking");
> >  
> >  	do {
> > -		old.control = new.control = pi_desc->control;
> > +		old.control = new.control = READ_ONCE(pi_desc->control);
> >  
> >  		dest = cpu_physical_id(vcpu->cpu);
> >  
> > @@ -160,7 +160,7 @@ int pi_pre_block(struct kvm_vcpu *vcpu)
> >  	     "Posted Interrupt Suppress Notification set before blocking");
> >  
> >  	do {
> > -		old.control = new.control = pi_desc->control;
> > +		old.control = new.control = READ_ONCE(pi_desc->control);
> >  
> >  		/* set 'NV' to 'wakeup vector' */
> >  		new.nv = POSTED_INTR_WAKEUP_VECTOR;
> 
> I wish there was a way to mark fields in a struct, as requiring 'READ_ONCE' on them
> so that compiler would complain if this isn't done, or automatically use 'READ_ONCE'
> logic.

Hmm, I think you could make an argument that ON and thus the whole "control"
word should be volatile.  AFAICT, tagging just "on" as volatile actually works.
There's even in a clause in Documentation/process/volatile-considered-harmful.rst
that calls this out as a (potentially) legitimate use case.

  - Pointers to data structures in coherent memory which might be modified
    by I/O devices can, sometimes, legitimately be volatile.

That said, I think I actually prefer forcing the use of READ_ONCE.  The descriptor
requires more protections than what volatile provides, namely that all writes need
to be atomic.  So given that volatile alone isn't sufficient, I'd prefer to have
the code itself be more self-documenting.

E.g. this compiles and does mess up the expected size.

diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h
index 7f7b2326caf5..149df3b18789 100644
--- a/arch/x86/kvm/vmx/posted_intr.h
+++ b/arch/x86/kvm/vmx/posted_intr.h
@@ -11,9 +11,9 @@ struct pi_desc {
        union {
                struct {
                                /* bit 256 - Outstanding Notification */
-                       u16     on      : 1,
+                       volatile u16    on      : 1;
                                /* bit 257 - Suppress Notification */
-                               sn      : 1,
+                       u16     sn      : 1,
                                /* bit 271:258 - Reserved */
                                rsvd_1  : 14;
                                /* bit 279:272 - Notification Vector */
@@ -23,7 +23,7 @@ struct pi_desc {
                                /* bit 319:288 - Notification Destination */
                        u32     ndst;
                };
-               u64 control;
+               volatile u64 control;
        };
        u32 rsvd[6];
 } __aligned(64);
Maxim Levitsky Oct. 31, 2021, 10:48 p.m. UTC | #3
On Thu, 2021-10-28 at 15:55 +0000, Sean Christopherson wrote:
> On Thu, Oct 28, 2021, Maxim Levitsky wrote:
> > On Fri, 2021-10-08 at 19:12 -0700, Sean Christopherson wrote:
> > > Use READ_ONCE() when loading the posted interrupt descriptor control
> > > field to ensure "old" and "new" have the same base value.  If the
> > > compiler emits separate loads, and loads into "new" before "old", KVM
> > > could theoretically drop the ON bit if it were set between the loads.
> > > 
> > > Fixes: 28b835d60fcc ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/kvm/vmx/posted_intr.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> > > index 414ea6972b5c..fea343dcc011 100644
> > > --- a/arch/x86/kvm/vmx/posted_intr.c
> > > +++ b/arch/x86/kvm/vmx/posted_intr.c
> > > @@ -53,7 +53,7 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
> > >  
> > >  	/* The full case.  */
> > >  	do {
> > > -		old.control = new.control = pi_desc->control;
> > > +		old.control = new.control = READ_ONCE(pi_desc->control);
> > >  
> > >  		dest = cpu_physical_id(cpu);
> > >  
> > > @@ -104,7 +104,7 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> > >  	     "Wakeup handler not enabled while the vCPU was blocking");
> > >  
> > >  	do {
> > > -		old.control = new.control = pi_desc->control;
> > > +		old.control = new.control = READ_ONCE(pi_desc->control);
> > >  
> > >  		dest = cpu_physical_id(vcpu->cpu);
> > >  
> > > @@ -160,7 +160,7 @@ int pi_pre_block(struct kvm_vcpu *vcpu)
> > >  	     "Posted Interrupt Suppress Notification set before blocking");
> > >  
> > >  	do {
> > > -		old.control = new.control = pi_desc->control;
> > > +		old.control = new.control = READ_ONCE(pi_desc->control);
> > >  
> > >  		/* set 'NV' to 'wakeup vector' */
> > >  		new.nv = POSTED_INTR_WAKEUP_VECTOR;
> > 
> > I wish there was a way to mark fields in a struct, as requiring 'READ_ONCE' on them
> > so that compiler would complain if this isn't done, or automatically use 'READ_ONCE'
> > logic.
> 
> Hmm, I think you could make an argument that ON and thus the whole "control"
> word should be volatile.  AFAICT, tagging just "on" as volatile actually works.
> There's even in a clause in Documentation/process/volatile-considered-harmful.rst
> that calls this out as a (potentially) legitimate use case.
> 
>   - Pointers to data structures in coherent memory which might be modified
>     by I/O devices can, sometimes, legitimately be volatile.
> 
> That said, I think I actually prefer forcing the use of READ_ONCE.  The descriptor
> requires more protections than what volatile provides, namely that all writes need
> to be atomic.  So given that volatile alone isn't sufficient, I'd prefer to have
> the code itself be more self-documenting.

I took a look at how READ_ONCE/WRITE_ONCE is implemented and indeed they use volatile
(the comment above __READ_ONCE is worth gold...), so there is a bit of contradiction:

volatile-considered-harmful.rst states not to mark struct members volatile since
you usually need more that than (very true often) and yet, I also heard that
READ_ONCE/WRITE_ONCE is very encouraged to be used to fields that are used in lockless
algorithms, even when not strictly needed,
so why not to just mark the field and then use it normally? I guess that
explicit READ_ONCE/WRITE_ONCE is much more readable/visible that a volatile in some header file.

Anyway this isn't something I am going to argue about or push to be changed,
just something I thought about.

Best regards,
	Maxim Levitsky



> 
> E.g. this compiles and does mess up the expected size.
> 
> diff --git a/arch/x86/kvm/vmx/posted_intr.h b/arch/x86/kvm/vmx/posted_intr.h
> index 7f7b2326caf5..149df3b18789 100644
> --- a/arch/x86/kvm/vmx/posted_intr.h
> +++ b/arch/x86/kvm/vmx/posted_intr.h
> @@ -11,9 +11,9 @@ struct pi_desc {
>         union {
>                 struct {
>                                 /* bit 256 - Outstanding Notification */
> -                       u16     on      : 1,
> +                       volatile u16    on      : 1;
>                                 /* bit 257 - Suppress Notification */
> -                               sn      : 1,
> +                       u16     sn      : 1,
>                                 /* bit 271:258 - Reserved */
>                                 rsvd_1  : 14;
>                                 /* bit 279:272 - Notification Vector */
> @@ -23,7 +23,7 @@ struct pi_desc {
>                                 /* bit 319:288 - Notification Destination */
>                         u32     ndst;
>                 };
> -               u64 control;
> +               volatile u64 control;
>         };
>         u32 rsvd[6];
>  } __aligned(64);
>
Sean Christopherson Nov. 1, 2021, 5:41 p.m. UTC | #4
On Mon, Nov 01, 2021, Maxim Levitsky wrote:
> On Thu, 2021-10-28 at 15:55 +0000, Sean Christopherson wrote:
> > On Thu, Oct 28, 2021, Maxim Levitsky wrote:
> > > On Fri, 2021-10-08 at 19:12 -0700, Sean Christopherson wrote:
> > > I wish there was a way to mark fields in a struct, as requiring 'READ_ONCE' on them
> > > so that compiler would complain if this isn't done, or automatically use 'READ_ONCE'
> > > logic.
> > 
> > Hmm, I think you could make an argument that ON and thus the whole "control"
> > word should be volatile.  AFAICT, tagging just "on" as volatile actually works.
> > There's even in a clause in Documentation/process/volatile-considered-harmful.rst
> > that calls this out as a (potentially) legitimate use case.
> > 
> >   - Pointers to data structures in coherent memory which might be modified
> >     by I/O devices can, sometimes, legitimately be volatile.
> > 
> > That said, I think I actually prefer forcing the use of READ_ONCE.  The descriptor
> > requires more protections than what volatile provides, namely that all writes need
> > to be atomic.  So given that volatile alone isn't sufficient, I'd prefer to have
> > the code itself be more self-documenting.
> 
> I took a look at how READ_ONCE/WRITE_ONCE is implemented and indeed they use volatile
> (the comment above __READ_ONCE is worth gold...), so there is a bit of contradiction:
> 
> volatile-considered-harmful.rst states not to mark struct members volatile since
> you usually need more that than (very true often) and yet, I also heard that
> READ_ONCE/WRITE_ONCE is very encouraged to be used to fields that are used in lockless
> algorithms, even when not strictly needed,
> so why not to just mark the field and then use it normally? I guess that
> explicit READ_ONCE/WRITE_ONCE is much more readable/visible that a volatile
> in some header file.

Are you asking about this PI field in particular, or for any field in general?

In this particular case, visibility and documentation is really the only difference,
functionally the result is the same.  But that's also very much related to why this
case gets the exception listed above.  The "use it normally" part is also why I
don't want to tag the field volatile since writing the field absolutely cannot be
done "normally", it must be done atomically, and volatile doesn't capture that
detail.

If you're asking about fields in general, the "volatile is harmful" guideline is
to deter usage of volatile for cases where the field/variable in question is not
intrinsically volatile.  As the docs call out, using volatile in those cases often
leads to worse code generation because the compiler is disallowed from optimizing
accesses that are protected through other mechanisms.

A good example in x86 KVM is the READ_ONCE(sp->unsync) in mmu_try_to_unsync_pages() to
force the compiler to emit a load of sp->unsync after acquiring mmu_unsync_pages_lock.
Tagging "unsync" as volatile is unnecessary since the vast majority of its usage is
protected by holding a spinlock for write, and would prevent optimizing references in
kvm_mmu_get_page() and other flows that are protected by mmu_lock in the legacy MMU.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 414ea6972b5c..fea343dcc011 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -53,7 +53,7 @@  void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 
 	/* The full case.  */
 	do {
-		old.control = new.control = pi_desc->control;
+		old.control = new.control = READ_ONCE(pi_desc->control);
 
 		dest = cpu_physical_id(cpu);
 
@@ -104,7 +104,7 @@  static void __pi_post_block(struct kvm_vcpu *vcpu)
 	     "Wakeup handler not enabled while the vCPU was blocking");
 
 	do {
-		old.control = new.control = pi_desc->control;
+		old.control = new.control = READ_ONCE(pi_desc->control);
 
 		dest = cpu_physical_id(vcpu->cpu);
 
@@ -160,7 +160,7 @@  int pi_pre_block(struct kvm_vcpu *vcpu)
 	     "Posted Interrupt Suppress Notification set before blocking");
 
 	do {
-		old.control = new.control = pi_desc->control;
+		old.control = new.control = READ_ONCE(pi_desc->control);
 
 		/* set 'NV' to 'wakeup vector' */
 		new.nv = POSTED_INTR_WAKEUP_VECTOR;