diff mbox series

[v1,2/3] KVM: VMX: Do not change PID.NDST when loading a blocked vCPU

Message ID 20191106175602.4515-3-joao.m.martins@oracle.com (mailing list archive)
State New, archived
Headers show
Series KVM: VMX: Posted Interrupts fixes | expand

Commit Message

Joao Martins Nov. 6, 2019, 5:56 p.m. UTC
When vCPU enters block phase, pi_pre_block() inserts vCPU to a per pCPU
linked list of all vCPUs that are blocked on this pCPU. Afterwards, it
changes PID.NV to POSTED_INTR_WAKEUP_VECTOR which its handler
(wakeup_handler()) is responsible to kick (unblock) any vCPU on that
linked list that now has pending posted interrupts.

While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which
will cause vmx_vcpu_pi_put() to set PID.SN.  If later the vCPU will be
scheduled to run on a different pCPU, vmx_vcpu_pi_load() will clear
PID.SN but will also *overwrite PID.NDST to this different pCPU*.
Instead of keeping it with original pCPU which vCPU had entered block
phase on.

This results in an issue because when a posted interrupt is delivered,
the wakeup_handler() will be executed and fail to find blocked vCPU on
its per pCPU linked list of all vCPUs that are blocked on this pCPU.
Which is due to the vCPU being placed on a *different* per pCPU
linked list than the original pCPU that it had entered block phase.

The regression is introduced by commit c112b5f50232 ("KVM: x86:
Recompute PID.ON when clearing PID.SN"). Therefore, partially revert
it and reintroduce the condition in vmx_vcpu_pi_load() responsible for
avoiding changing PID.NDST when loading a blocked vCPU.

Fixes: c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
---
 arch/x86/kvm/vmx/vmx.c | 14 ++++++++++++++
 arch/x86/kvm/vmx/vmx.h |  6 ++++++
 2 files changed, 20 insertions(+)

Comments

Joao Martins Nov. 6, 2019, 9:54 p.m. UTC | #1
On 11/6/19 5:56 PM, Joao Martins wrote:
> When vCPU enters block phase, pi_pre_block() inserts vCPU to a per pCPU
> linked list of all vCPUs that are blocked on this pCPU. Afterwards, it
> changes PID.NV to POSTED_INTR_WAKEUP_VECTOR which its handler
> (wakeup_handler()) is responsible to kick (unblock) any vCPU on that
> linked list that now has pending posted interrupts.
> 
> While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which
> will cause vmx_vcpu_pi_put() to set PID.SN.  If later the vCPU will be
> scheduled to run on a different pCPU, vmx_vcpu_pi_load() will clear
> PID.SN but will also *overwrite PID.NDST to this different pCPU*.
> Instead of keeping it with original pCPU which vCPU had entered block
> phase on.
> 
> This results in an issue because when a posted interrupt is delivered,
> the wakeup_handler() will be executed and fail to find blocked vCPU on
> its per pCPU linked list of all vCPUs that are blocked on this pCPU.
> Which is due to the vCPU being placed on a *different* per pCPU
> linked list than the original pCPU that it had entered block phase.
> 
> The regression is introduced by commit c112b5f50232 ("KVM: x86:
> Recompute PID.ON when clearing PID.SN"). Therefore, partially revert
> it and reintroduce the condition in vmx_vcpu_pi_load() responsible for
> avoiding changing PID.NDST when loading a blocked vCPU.
> 
> Fixes: c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>

Sorry, I missed a Tb tag:

	Tested-by: Nathan Ni <nathan.ni@oracle.com>

I can resend it, if preferred.

> ---
>  arch/x86/kvm/vmx/vmx.c | 14 ++++++++++++++
>  arch/x86/kvm/vmx/vmx.h |  6 ++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 18b0bee662a5..75d903455e1c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1274,6 +1274,18 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu)
>  		return;
>  
> +	/*
> +	 * If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change
> +	 * PI.NDST: pi_post_block is the one expected to change PID.NDST and the
> +	 * wakeup handler expects the vCPU to be on the blocked_vcpu_list that
> +	 * matches PI.NDST. Otherwise, a vcpu may not be able to be woken up
> +	 * correctly.
> +	 */
> +	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR || vcpu->cpu == cpu) {
> +		pi_clear_sn(pi_desc);
> +		goto after_clear_sn;
> +	}
> +
>  	/* The full case.  */
>  	do {
>  		old.control = new.control = pi_desc->control;
> @@ -1289,6 +1301,8 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
>  	} while (cmpxchg64(&pi_desc->control, old.control,
>  			   new.control) != old.control);
>  
> +after_clear_sn:
> +
>  	/*
>  	 * Clear SN before reading the bitmap.  The VT-d firmware
>  	 * writes the bitmap and reads SN atomically (5.2.3 in the
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index bee16687dc0b..1e32ab54fc2d 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -373,6 +373,12 @@ static inline void pi_clear_on(struct pi_desc *pi_desc)
>  		(unsigned long *)&pi_desc->control);
>  }
>  
> +static inline void pi_clear_sn(struct pi_desc *pi_desc)
> +{
> +	clear_bit(POSTED_INTR_SN,
> +		(unsigned long *)&pi_desc->control);
> +}
> +
>  static inline int pi_test_on(struct pi_desc *pi_desc)
>  {
>  	return test_bit(POSTED_INTR_ON,
>
Paolo Bonzini Nov. 11, 2019, 2:39 p.m. UTC | #2
On 06/11/19 18:56, Joao Martins wrote:
> When vCPU enters block phase, pi_pre_block() inserts vCPU to a per pCPU
> linked list of all vCPUs that are blocked on this pCPU. Afterwards, it
> changes PID.NV to POSTED_INTR_WAKEUP_VECTOR which its handler
> (wakeup_handler()) is responsible to kick (unblock) any vCPU on that
> linked list that now has pending posted interrupts.
> 
> While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which
> will cause vmx_vcpu_pi_put() to set PID.SN.  If later the vCPU will be
> scheduled to run on a different pCPU, vmx_vcpu_pi_load() will clear
> PID.SN but will also *overwrite PID.NDST to this different pCPU*.
> Instead of keeping it with original pCPU which vCPU had entered block
> phase on.
> 
> This results in an issue because when a posted interrupt is delivered,
> the wakeup_handler() will be executed and fail to find blocked vCPU on
> its per pCPU linked list of all vCPUs that are blocked on this pCPU.
> Which is due to the vCPU being placed on a *different* per pCPU
> linked list than the original pCPU that it had entered block phase.
> 
> The regression is introduced by commit c112b5f50232 ("KVM: x86:
> Recompute PID.ON when clearing PID.SN"). Therefore, partially revert
> it and reintroduce the condition in vmx_vcpu_pi_load() responsible for
> avoiding changing PID.NDST when loading a blocked vCPU.
> 
> Fixes: c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>

Something wrong in the SoB line?

Otherwise looks good.

Paolo

> ---
>  arch/x86/kvm/vmx/vmx.c | 14 ++++++++++++++
>  arch/x86/kvm/vmx/vmx.h |  6 ++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 18b0bee662a5..75d903455e1c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1274,6 +1274,18 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu)
>  		return;
>  
> +	/*
> +	 * If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change
> +	 * PI.NDST: pi_post_block is the one expected to change PID.NDST and the
> +	 * wakeup handler expects the vCPU to be on the blocked_vcpu_list that
> +	 * matches PI.NDST. Otherwise, a vcpu may not be able to be woken up
> +	 * correctly.
> +	 */
> +	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR || vcpu->cpu == cpu) {
> +		pi_clear_sn(pi_desc);
> +		goto after_clear_sn;
> +	}
> +
>  	/* The full case.  */
>  	do {
>  		old.control = new.control = pi_desc->control;
> @@ -1289,6 +1301,8 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
>  	} while (cmpxchg64(&pi_desc->control, old.control,
>  			   new.control) != old.control);
>  
> +after_clear_sn:
> +
>  	/*
>  	 * Clear SN before reading the bitmap.  The VT-d firmware
>  	 * writes the bitmap and reads SN atomically (5.2.3 in the
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index bee16687dc0b..1e32ab54fc2d 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -373,6 +373,12 @@ static inline void pi_clear_on(struct pi_desc *pi_desc)
>  		(unsigned long *)&pi_desc->control);
>  }
>  
> +static inline void pi_clear_sn(struct pi_desc *pi_desc)
> +{
> +	clear_bit(POSTED_INTR_SN,
> +		(unsigned long *)&pi_desc->control);
> +}
> +
>  static inline int pi_test_on(struct pi_desc *pi_desc)
>  {
>  	return test_bit(POSTED_INTR_ON,
>
Joao Martins Nov. 11, 2019, 2:48 p.m. UTC | #3
On 11/11/19 2:39 PM, Paolo Bonzini wrote:
> On 06/11/19 18:56, Joao Martins wrote:
>> When vCPU enters block phase, pi_pre_block() inserts vCPU to a per pCPU
>> linked list of all vCPUs that are blocked on this pCPU. Afterwards, it
>> changes PID.NV to POSTED_INTR_WAKEUP_VECTOR which its handler
>> (wakeup_handler()) is responsible to kick (unblock) any vCPU on that
>> linked list that now has pending posted interrupts.
>>
>> While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which
>> will cause vmx_vcpu_pi_put() to set PID.SN.  If later the vCPU will be
>> scheduled to run on a different pCPU, vmx_vcpu_pi_load() will clear
>> PID.SN but will also *overwrite PID.NDST to this different pCPU*.
>> Instead of keeping it with original pCPU which vCPU had entered block
>> phase on.
>>
>> This results in an issue because when a posted interrupt is delivered,
>> the wakeup_handler() will be executed and fail to find blocked vCPU on
>> its per pCPU linked list of all vCPUs that are blocked on this pCPU.
>> Which is due to the vCPU being placed on a *different* per pCPU
>> linked list than the original pCPU that it had entered block phase.
>>
>> The regression is introduced by commit c112b5f50232 ("KVM: x86:
>> Recompute PID.ON when clearing PID.SN"). Therefore, partially revert
>> it and reintroduce the condition in vmx_vcpu_pi_load() responsible for
>> avoiding changing PID.NDST when loading a blocked vCPU.
>>
>> Fixes: c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> 
> Something wrong in the SoB line?
> 
I can't spot any mistake; at least it looks chained correctly for me. What's the
issue you see with the Sob line?

The only thing I forgot was a:

Tested-by: Nathan Ni <nathan.ni@oracle.com>

> Otherwise looks good.

If you want I can resubmit the series with the Tb and Jag Rb, unless you think
it's OK doing on commit? Otherwise I can resubmit.

	Joao
Paolo Bonzini Nov. 11, 2019, 2:50 p.m. UTC | #4
On 11/11/19 15:48, Joao Martins wrote:
>>>
>>> Fixes: c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>> Something wrong in the SoB line?
>>
> I can't spot any mistake; at least it looks chained correctly for me. What's the
> issue you see with the Sob line?

Liran's line after yours is confusing.  Did he help with the analysis or
anything like that?

Paolo

> The only thing I forgot was a:
> 
> Tested-by: Nathan Ni <nathan.ni@oracle.com>
> 
>> Otherwise looks good.
> If you want I can resubmit the series with the Tb and Jag Rb, unless you think
> it's OK doing on commit? Otherwise I can resubmit.
Joao Martins Nov. 11, 2019, 2:56 p.m. UTC | #5
On 11/11/19 2:50 PM, Paolo Bonzini wrote:
> On 11/11/19 15:48, Joao Martins wrote:
>>>>
>>>> Fixes: c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>>> Something wrong in the SoB line?
>>>
>> I can't spot any mistake; at least it looks chained correctly for me. What's the
>> issue you see with the Sob line?
> 
> Liran's line after yours is confusing.  Did he help with the analysis or
> anything like that?
> 
He was initially reviewing my patches, but then helped improving the problem
description in the commit messages so felt correct to give credit.

	Joao
Liran Alon Nov. 11, 2019, 2:59 p.m. UTC | #6
> On 11 Nov 2019, at 16:56, Joao Martins <joao.m.martins@oracle.com> wrote:
> 
> On 11/11/19 2:50 PM, Paolo Bonzini wrote:
>> On 11/11/19 15:48, Joao Martins wrote:
>>>>> 
>>>>> Fixes: c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>>>> Something wrong in the SoB line?
>>>> 
>>> I can't spot any mistake; at least it looks chained correctly for me. What's the
>>> issue you see with the Sob line?
>> 
>> Liran's line after yours is confusing.  Did he help with the analysis or
>> anything like that?
>> 
> He was initially reviewing my patches, but then helped improving the problem
> description in the commit messages so felt correct to give credit.
> 
> 	Joao

I think proper action is to just remove me from the SoB line.

-Liran
Sean Christopherson Nov. 11, 2019, 3:53 p.m. UTC | #7
On Mon, Nov 11, 2019 at 04:59:20PM +0200, Liran Alon wrote:
> 
> 
> > On 11 Nov 2019, at 16:56, Joao Martins <joao.m.martins@oracle.com> wrote:
> > 
> > On 11/11/19 2:50 PM, Paolo Bonzini wrote:
> >> On 11/11/19 15:48, Joao Martins wrote:
> >>>>> 
> >>>>> Fixes: c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
> >>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> >>>>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
> >>>> Something wrong in the SoB line?
> >>>> 
> >>> I can't spot any mistake; at least it looks chained correctly for me. What's the
> >>> issue you see with the Sob line?
> >> 
> >> Liran's line after yours is confusing.  Did he help with the analysis or
> >> anything like that?
> >> 
> > He was initially reviewing my patches, but then helped improving the problem
> > description in the commit messages so felt correct to give credit.
> > 
> > 	Joao
> 
> I think proper action is to just remove me from the SoB line.

Use Co-developed-by to attribute multiple authors.  Note, the SoB ordering
should show the chronological history of the patch when possible, e.g. the
person sending the patch should always have their SoB last.

Documentation/process/submitting-patches.rst and 
Documentation/process/5.Posting.rst have more details.
Joao Martins Nov. 11, 2019, 5:27 p.m. UTC | #8
On 11/11/19 3:53 PM, Sean Christopherson wrote:
> On Mon, Nov 11, 2019 at 04:59:20PM +0200, Liran Alon wrote:
>>
>>
>>> On 11 Nov 2019, at 16:56, Joao Martins <joao.m.martins@oracle.com> wrote:
>>>
>>> On 11/11/19 2:50 PM, Paolo Bonzini wrote:
>>>> On 11/11/19 15:48, Joao Martins wrote:
>>>>>>>
>>>>>>> Fixes: c112b5f50232 ("KVM: x86: Recompute PID.ON when clearing PID.SN")
>>>>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>>>>> Signed-off-by: Liran Alon <liran.alon@oracle.com>
>>>>>> Something wrong in the SoB line?
>>>>>>
>>>>> I can't spot any mistake; at least it looks chained correctly for me. What's the
>>>>> issue you see with the Sob line?
>>>>
>>>> Liran's line after yours is confusing.  Did he help with the analysis or
>>>> anything like that?
>>>>
>>> He was initially reviewing my patches, but then helped improving the problem
>>> description in the commit messages so felt correct to give credit.
>>>
>>> 	Joao
>>
>> I think proper action is to just remove me from the SoB line.
> 
> Use Co-developed-by to attribute multiple authors.  Note, the SoB ordering
> should show the chronological history of the patch when possible, e.g. the
> person sending the patch should always have their SoB last.
> 
> Documentation/process/submitting-patches.rst and 
> Documentation/process/5.Posting.rst have more details.
> 
The Sob chain on the first two patches were broken (regardless of any use of
Co-developed-by). Fixed it up on v2, alongside the rest of the comments.

Cheers,
	Joao
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 18b0bee662a5..75d903455e1c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1274,6 +1274,18 @@  static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 	if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu)
 		return;
 
+	/*
+	 * If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change
+	 * PI.NDST: pi_post_block is the one expected to change PID.NDST and the
+	 * wakeup handler expects the vCPU to be on the blocked_vcpu_list that
+	 * matches PI.NDST. Otherwise, a vcpu may not be able to be woken up
+	 * correctly.
+	 */
+	if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR || vcpu->cpu == cpu) {
+		pi_clear_sn(pi_desc);
+		goto after_clear_sn;
+	}
+
 	/* The full case.  */
 	do {
 		old.control = new.control = pi_desc->control;
@@ -1289,6 +1301,8 @@  static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
 	} while (cmpxchg64(&pi_desc->control, old.control,
 			   new.control) != old.control);
 
+after_clear_sn:
+
 	/*
 	 * Clear SN before reading the bitmap.  The VT-d firmware
 	 * writes the bitmap and reads SN atomically (5.2.3 in the
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index bee16687dc0b..1e32ab54fc2d 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -373,6 +373,12 @@  static inline void pi_clear_on(struct pi_desc *pi_desc)
 		(unsigned long *)&pi_desc->control);
 }
 
+static inline void pi_clear_sn(struct pi_desc *pi_desc)
+{
+	clear_bit(POSTED_INTR_SN,
+		(unsigned long *)&pi_desc->control);
+}
+
 static inline int pi_test_on(struct pi_desc *pi_desc)
 {
 	return test_bit(POSTED_INTR_ON,