diff mbox

[v3,1/4] KVM: Recover IRTE to remapped mode if the interrupt is not single-destination

Message ID 1453254177-103002-2-git-send-email-feng.wu@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wu, Feng Jan. 20, 2016, 1:42 a.m. UTC
When the interrupt is not single destination any more, we need
to change back IRTE to remapped mode explicitly.

Signed-off-by: Feng Wu <feng.wu@intel.com>
---
 arch/x86/kvm/vmx.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Yang Zhang Jan. 21, 2016, 3:05 a.m. UTC | #1
On 2016/1/20 9:42, Feng Wu wrote:
> When the interrupt is not single destination any more, we need
> to change back IRTE to remapped mode explicitly.
>
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
>   arch/x86/kvm/vmx.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index e2951b6..13d14d4 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
>   		 */
>
>   		kvm_set_msi_irq(e, &irq);
> -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
> +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
> +			/*
> +			 * Make sure the IRTE is in remapped mode if
> +			 * we don't handle it in posted mode.
> +			 */
> +			pi_set_sn(vcpu_to_pi_desc(vcpu));
> +			ret = irq_set_vcpu_affinity(host_irq, NULL);
> +			pi_clear_sn(vcpu_to_pi_desc(vcpu));
> +
>   			continue;
> +		}
>
>   		vcpu_info.pi_desc_addr = __pa(vcpu_to_pi_desc(vcpu));
>   		vcpu_info.vector = irq.vector;
>

I am still feel weird with this change: according the semantic of VT-d 
posted interrupt, the interrupt will injected to guest through posted 
notification and /proc/interrupts shows the same meaning. But now, 
without being aware of user, the interrupt changes to legacy way and it 
appears on different entry on /proc/interrupts. It looks weird.

Any comments? Paolo.
Wu, Feng Jan. 21, 2016, 3:14 a.m. UTC | #2
> -----Original Message-----
> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
> Sent: Thursday, January 21, 2016 11:06 AM
> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
> rkrcmar@redhat.com
> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
> interrupt is not single-destination
> 
> On 2016/1/20 9:42, Feng Wu wrote:
> > When the interrupt is not single destination any more, we need
> > to change back IRTE to remapped mode explicitly.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> >   arch/x86/kvm/vmx.c | 11 ++++++++++-
> >   1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index e2951b6..13d14d4 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct kvm
> *kvm, unsigned int host_irq,
> >   		 */
> >
> >   		kvm_set_msi_irq(e, &irq);
> > -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
> > +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
> > +			/*
> > +			 * Make sure the IRTE is in remapped mode if
> > +			 * we don't handle it in posted mode.
> > +			 */
> > +			pi_set_sn(vcpu_to_pi_desc(vcpu));
> > +			ret = irq_set_vcpu_affinity(host_irq, NULL);
> > +			pi_clear_sn(vcpu_to_pi_desc(vcpu));
> > +
> >   			continue;
> > +		}
> >
> >   		vcpu_info.pi_desc_addr = __pa(vcpu_to_pi_desc(vcpu));
> >   		vcpu_info.vector = irq.vector;
> >
> 
> I am still feel weird with this change: according the semantic of VT-d
> posted interrupt, the interrupt will injected to guest through posted
> notification and /proc/interrupts shows the same meaning. But now,
> without being aware of user, the interrupt changes to legacy way and it
> appears on different entry on /proc/interrupts. It looks weird.

I don't think it has problem here, IMO, this is exactly how it works.
There should be different entry for the interrupts in VT-d PI mode
and leagcy mode.

For VT-d PI mode, it is delivered by notification event, for legacy mode,
it is delivered by VFIO.

Thanks,
Feng

> 
> Any comments? Paolo.
> 
> --
> best regards
> yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang Zhang Jan. 21, 2016, 3:34 a.m. UTC | #3
On 2016/1/21 11:14, Wu, Feng wrote:
>
>
>> -----Original Message-----
>> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
>> Sent: Thursday, January 21, 2016 11:06 AM
>> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
>> rkrcmar@redhat.com
>> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
>> interrupt is not single-destination
>>
>> On 2016/1/20 9:42, Feng Wu wrote:
>>> When the interrupt is not single destination any more, we need
>>> to change back IRTE to remapped mode explicitly.
>>>
>>> Signed-off-by: Feng Wu <feng.wu@intel.com>
>>> ---
>>>    arch/x86/kvm/vmx.c | 11 ++++++++++-
>>>    1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index e2951b6..13d14d4 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct kvm
>> *kvm, unsigned int host_irq,
>>>    		 */
>>>
>>>    		kvm_set_msi_irq(e, &irq);
>>> -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
>>> +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
>>> +			/*
>>> +			 * Make sure the IRTE is in remapped mode if
>>> +			 * we don't handle it in posted mode.
>>> +			 */
>>> +			pi_set_sn(vcpu_to_pi_desc(vcpu));
>>> +			ret = irq_set_vcpu_affinity(host_irq, NULL);
>>> +			pi_clear_sn(vcpu_to_pi_desc(vcpu));
>>> +
>>>    			continue;
>>> +		}
>>>
>>>    		vcpu_info.pi_desc_addr = __pa(vcpu_to_pi_desc(vcpu));
>>>    		vcpu_info.vector = irq.vector;
>>>
>>
>> I am still feel weird with this change: according the semantic of VT-d
>> posted interrupt, the interrupt will injected to guest through posted
>> notification and /proc/interrupts shows the same meaning. But now,
>> without being aware of user, the interrupt changes to legacy way and it
>> appears on different entry on /proc/interrupts. It looks weird.
>
> I don't think it has problem here, IMO, this is exactly how it works.
> There should be different entry for the interrupts in VT-d PI mode
> and leagcy mode.

I am not saying any problem here. Just feel weird. From a normal user's 
point, he has turned on the VT-d pi and according the semantic of VT-d 
pi, he should not observe the interrupt through legacy mode, but now he 
do see it. Maybe print out a message here will be helpful, like what you 
did for disabled lapic found during irq injection.

>
> For VT-d PI mode, it is delivered by notification event, for legacy mode,
> it is delivered by VFIO.
>
> Thanks,
> Feng
>
>>
>> Any comments? Paolo.
>>
>> --
>> best regards
>> yang
Wu, Feng Jan. 21, 2016, 4:42 a.m. UTC | #4
> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Yang Zhang
> Sent: Thursday, January 21, 2016 11:35 AM
> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
> rkrcmar@redhat.com
> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
> interrupt is not single-destination
> 
> On 2016/1/21 11:14, Wu, Feng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
> >> Sent: Thursday, January 21, 2016 11:06 AM
> >> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
> >> rkrcmar@redhat.com
> >> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> >> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
> >> interrupt is not single-destination
> >>
> >> On 2016/1/20 9:42, Feng Wu wrote:
> >>> When the interrupt is not single destination any more, we need
> >>> to change back IRTE to remapped mode explicitly.
> >>>
> >>> Signed-off-by: Feng Wu <feng.wu@intel.com>
> >>> ---
> >>>    arch/x86/kvm/vmx.c | 11 ++++++++++-
> >>>    1 file changed, 10 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>> index e2951b6..13d14d4 100644
> >>> --- a/arch/x86/kvm/vmx.c
> >>> +++ b/arch/x86/kvm/vmx.c
> >>> @@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct kvm
> >> *kvm, unsigned int host_irq,
> >>>    		 */
> >>>
> >>>    		kvm_set_msi_irq(e, &irq);
> >>> -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
> >>> +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
> >>> +			/*
> >>> +			 * Make sure the IRTE is in remapped mode if
> >>> +			 * we don't handle it in posted mode.
> >>> +			 */
> >>> +			pi_set_sn(vcpu_to_pi_desc(vcpu));
> >>> +			ret = irq_set_vcpu_affinity(host_irq, NULL);
> >>> +			pi_clear_sn(vcpu_to_pi_desc(vcpu));
> >>> +
> >>>    			continue;
> >>> +		}
> >>>
> >>>    		vcpu_info.pi_desc_addr = __pa(vcpu_to_pi_desc(vcpu));
> >>>    		vcpu_info.vector = irq.vector;
> >>>
> >>
> >> I am still feel weird with this change: according the semantic of VT-d
> >> posted interrupt, the interrupt will injected to guest through posted
> >> notification and /proc/interrupts shows the same meaning. But now,
> >> without being aware of user, the interrupt changes to legacy way and it
> >> appears on different entry on /proc/interrupts. It looks weird.
> >
> > I don't think it has problem here, IMO, this is exactly how it works.
> > There should be different entry for the interrupts in VT-d PI mode
> > and leagcy mode.
> 
> I am not saying any problem here. Just feel weird. From a normal user's
> point, he has turned on the VT-d pi and according the semantic of VT-d
> pi, he should not observe the interrupt through legacy mode, but now he
> do see it. Maybe print out a message here will be helpful, like what you
> did for disabled lapic found during irq injection.

Even VT-d PI is on, not all interrupts can be handled by it, the reason the
interrupts is changed back to legacy mode is because the user changes
the affinity, and it cannot be handle in PI mode, and hence legacy mode
is used. It is the user's behavior that cause this mode change, seems it is
not so weird to me. But add some message here is good idea, just like
what I did later in this function, I can also add the following trace
message here.

                trace_kvm_pi_irte_update(vcpu->vcpu_id, host_irq, e->gsi,
                                vcpu_info.vector, vcpu_info.pi_desc_addr, set);

Thanks,
Feng

> 
> >
> > For VT-d PI mode, it is delivered by notification event, for legacy mode,
> > it is delivered by VFIO.
> >
> > Thanks,
> > Feng
> >
> >>
> >> Any comments? Paolo.
> >>
> >> --
> >> best regards
> >> yang
> 
> 
> --
> best regards
> yang
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tian, Kevin Jan. 21, 2016, 4:54 a.m. UTC | #5
> From: Wu, Feng
> Sent: Thursday, January 21, 2016 12:43 PM
> 
> > -----Original Message-----
> > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> > Behalf Of Yang Zhang
> > Sent: Thursday, January 21, 2016 11:35 AM
> > To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
> > rkrcmar@redhat.com
> > Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> > Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
> > interrupt is not single-destination
> >
> > On 2016/1/21 11:14, Wu, Feng wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
> > >> Sent: Thursday, January 21, 2016 11:06 AM
> > >> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
> > >> rkrcmar@redhat.com
> > >> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> > >> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
> > >> interrupt is not single-destination
> > >>
> > >> On 2016/1/20 9:42, Feng Wu wrote:
> > >>> When the interrupt is not single destination any more, we need
> > >>> to change back IRTE to remapped mode explicitly.
> > >>>
> > >>> Signed-off-by: Feng Wu <feng.wu@intel.com>
> > >>> ---
> > >>>    arch/x86/kvm/vmx.c | 11 ++++++++++-
> > >>>    1 file changed, 10 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > >>> index e2951b6..13d14d4 100644
> > >>> --- a/arch/x86/kvm/vmx.c
> > >>> +++ b/arch/x86/kvm/vmx.c
> > >>> @@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct kvm
> > >> *kvm, unsigned int host_irq,
> > >>>    		 */
> > >>>
> > >>>    		kvm_set_msi_irq(e, &irq);
> > >>> -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
> > >>> +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
> > >>> +			/*
> > >>> +			 * Make sure the IRTE is in remapped mode if
> > >>> +			 * we don't handle it in posted mode.
> > >>> +			 */
> > >>> +			pi_set_sn(vcpu_to_pi_desc(vcpu));
> > >>> +			ret = irq_set_vcpu_affinity(host_irq, NULL);
> > >>> +			pi_clear_sn(vcpu_to_pi_desc(vcpu));
> > >>> +
> > >>>    			continue;
> > >>> +		}
> > >>>
> > >>>    		vcpu_info.pi_desc_addr = __pa(vcpu_to_pi_desc(vcpu));
> > >>>    		vcpu_info.vector = irq.vector;
> > >>>
> > >>
> > >> I am still feel weird with this change: according the semantic of VT-d
> > >> posted interrupt, the interrupt will injected to guest through posted
> > >> notification and /proc/interrupts shows the same meaning. But now,
> > >> without being aware of user, the interrupt changes to legacy way and it
> > >> appears on different entry on /proc/interrupts. It looks weird.
> > >
> > > I don't think it has problem here, IMO, this is exactly how it works.
> > > There should be different entry for the interrupts in VT-d PI mode
> > > and leagcy mode.
> >
> > I am not saying any problem here. Just feel weird. From a normal user's
> > point, he has turned on the VT-d pi and according the semantic of VT-d
> > pi, he should not observe the interrupt through legacy mode, but now he
> > do see it. Maybe print out a message here will be helpful, like what you
> > did for disabled lapic found during irq injection.
> 
> Even VT-d PI is on, not all interrupts can be handled by it, the reason the
> interrupts is changed back to legacy mode is because the user changes
> the affinity, and it cannot be handle in PI mode, and hence legacy mode
> is used. It is the user's behavior that cause this mode change, seems it is
> not so weird to me. But add some message here is good idea, just like
> what I did later in this function, I can also add the following trace
> message here.
> 
>                 trace_kvm_pi_irte_update(vcpu->vcpu_id, host_irq, e->gsi,
>                                 vcpu_info.vector, vcpu_info.pi_desc_addr, set);
> 
> Thanks,
> Feng
> 

Right. Whether the interrupt is delivered into guest directly with PI or with 
legacy mode, is completely agnostic to the guest. Guest can't tell
or set any expectation on the underlying delivering method, so there is no
guest visible impact. 

Thanks
Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang Zhang Jan. 21, 2016, 4:59 a.m. UTC | #6
On 2016/1/21 12:42, Wu, Feng wrote:
>
>
>> -----Original Message-----
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
>> Behalf Of Yang Zhang
>> Sent: Thursday, January 21, 2016 11:35 AM
>> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
>> rkrcmar@redhat.com
>> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
>> interrupt is not single-destination
>>
>> On 2016/1/21 11:14, Wu, Feng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
>>>> Sent: Thursday, January 21, 2016 11:06 AM
>>>> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
>>>> rkrcmar@redhat.com
>>>> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>>>> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
>>>> interrupt is not single-destination
>>>>
>>>> On 2016/1/20 9:42, Feng Wu wrote:
>>>>> When the interrupt is not single destination any more, we need
>>>>> to change back IRTE to remapped mode explicitly.
>>>>>
>>>>> Signed-off-by: Feng Wu <feng.wu@intel.com>
>>>>> ---
>>>>>     arch/x86/kvm/vmx.c | 11 ++++++++++-
>>>>>     1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>> index e2951b6..13d14d4 100644
>>>>> --- a/arch/x86/kvm/vmx.c
>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>> @@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct kvm
>>>> *kvm, unsigned int host_irq,
>>>>>     		 */
>>>>>
>>>>>     		kvm_set_msi_irq(e, &irq);
>>>>> -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
>>>>> +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
>>>>> +			/*
>>>>> +			 * Make sure the IRTE is in remapped mode if
>>>>> +			 * we don't handle it in posted mode.
>>>>> +			 */
>>>>> +			pi_set_sn(vcpu_to_pi_desc(vcpu));
>>>>> +			ret = irq_set_vcpu_affinity(host_irq, NULL);
>>>>> +			pi_clear_sn(vcpu_to_pi_desc(vcpu));
>>>>> +
>>>>>     			continue;
>>>>> +		}
>>>>>
>>>>>     		vcpu_info.pi_desc_addr = __pa(vcpu_to_pi_desc(vcpu));
>>>>>     		vcpu_info.vector = irq.vector;
>>>>>
>>>>
>>>> I am still feel weird with this change: according the semantic of VT-d
>>>> posted interrupt, the interrupt will injected to guest through posted
>>>> notification and /proc/interrupts shows the same meaning. But now,
>>>> without being aware of user, the interrupt changes to legacy way and it
>>>> appears on different entry on /proc/interrupts. It looks weird.
>>>
>>> I don't think it has problem here, IMO, this is exactly how it works.
>>> There should be different entry for the interrupts in VT-d PI mode
>>> and leagcy mode.
>>
>> I am not saying any problem here. Just feel weird. From a normal user's
>> point, he has turned on the VT-d pi and according the semantic of VT-d
>> pi, he should not observe the interrupt through legacy mode, but now he
>> do see it. Maybe print out a message here will be helpful, like what you
>> did for disabled lapic found during irq injection.
>
> Even VT-d PI is on, not all interrupts can be handled by it, the reason the

No, we can handle it but we don't do it due to the complexity.For 
example, we can use wake up vector to delivery the interrupt which still 
is in PI mode but doesn't require any mode change.

> interrupts is changed back to legacy mode is because the user changes
> the affinity, and it cannot be handle in PI mode, and hence legacy mode
> is used. It is the user's behavior that cause this mode change, seems it is
> not so weird to me. But add some message here is good idea, just like

Why user's behavior can change the mode? According the current design, 
there is no way for user to turn on/off dynamically.Why we need to 
rollback to legacy mode is we don't want to handle multi-destination 
interrupt in PI mode but it doesn't mean we cannot do it like i said before.
Wu, Feng Jan. 21, 2016, 5:07 a.m. UTC | #7
> -----Original Message-----
> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
> Sent: Thursday, January 21, 2016 1:00 PM
> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
> rkrcmar@redhat.com
> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
> interrupt is not single-destination
> 
> On 2016/1/21 12:42, Wu, Feng wrote:
> >
> >
> >> -----Original Message-----
> >> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
> On
> >> Behalf Of Yang Zhang
> >> Sent: Thursday, January 21, 2016 11:35 AM
> >> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
> >> rkrcmar@redhat.com
> >> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> >> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
> >> interrupt is not single-destination
> >>
> >> On 2016/1/21 11:14, Wu, Feng wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
> >>>> Sent: Thursday, January 21, 2016 11:06 AM
> >>>> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
> >>>> rkrcmar@redhat.com
> >>>> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> >>>> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if
> the
> >>>> interrupt is not single-destination
> >>>>
> >>>> On 2016/1/20 9:42, Feng Wu wrote:
> >>>>> When the interrupt is not single destination any more, we need
> >>>>> to change back IRTE to remapped mode explicitly.
> >>>>>
> >>>>> Signed-off-by: Feng Wu <feng.wu@intel.com>
> >>>>> ---
> >>>>>     arch/x86/kvm/vmx.c | 11 ++++++++++-
> >>>>>     1 file changed, 10 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>>> index e2951b6..13d14d4 100644
> >>>>> --- a/arch/x86/kvm/vmx.c
> >>>>> +++ b/arch/x86/kvm/vmx.c
> >>>>> @@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct kvm
> >>>> *kvm, unsigned int host_irq,
> >>>>>     		 */
> >>>>>
> >>>>>     		kvm_set_msi_irq(e, &irq);
> >>>>> -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
> >>>>> +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
> >>>>> +			/*
> >>>>> +			 * Make sure the IRTE is in remapped mode if
> >>>>> +			 * we don't handle it in posted mode.
> >>>>> +			 */
> >>>>> +			pi_set_sn(vcpu_to_pi_desc(vcpu));
> >>>>> +			ret = irq_set_vcpu_affinity(host_irq, NULL);
> >>>>> +			pi_clear_sn(vcpu_to_pi_desc(vcpu));
> >>>>> +
> >>>>>     			continue;
> >>>>> +		}
> >>>>>
> >>>>>     		vcpu_info.pi_desc_addr =
> __pa(vcpu_to_pi_desc(vcpu));
> >>>>>     		vcpu_info.vector = irq.vector;
> >>>>>
> >>>>
> >>>> I am still feel weird with this change: according the semantic of VT-d
> >>>> posted interrupt, the interrupt will injected to guest through posted
> >>>> notification and /proc/interrupts shows the same meaning. But now,
> >>>> without being aware of user, the interrupt changes to legacy way and it
> >>>> appears on different entry on /proc/interrupts. It looks weird.
> >>>
> >>> I don't think it has problem here, IMO, this is exactly how it works.
> >>> There should be different entry for the interrupts in VT-d PI mode
> >>> and leagcy mode.
> >>
> >> I am not saying any problem here. Just feel weird. From a normal user's
> >> point, he has turned on the VT-d pi and according the semantic of VT-d
> >> pi, he should not observe the interrupt through legacy mode, but now he
> >> do see it. Maybe print out a message here will be helpful, like what you
> >> did for disabled lapic found during irq injection.
> >
> > Even VT-d PI is on, not all interrupts can be handled by it, the reason the
> 
> No, we can handle it but we don't do it due to the complexity.For
> example, we can use wake up vector to delivery the interrupt which still
> is in PI mode but doesn't require any mode change.

I mean, multi-cast and broadcast interrupts cannot be handled in PI mode.

> 
> > interrupts is changed back to legacy mode is because the user changes
> > the affinity, and it cannot be handle in PI mode, and hence legacy mode
> > is used. It is the user's behavior that cause this mode change, seems it is
> > not so weird to me. But add some message here is good idea, just like
> 
> Why user's behavior can change the mode? 

Like you mentioned before, if the interrupt is changed from single-destination
to multiple-destination by guest. And this is the reason of adding the rollback
logic here, right?

Thanks,
Feng

> According the current design,
> there is no way for user to turn on/off dynamically.Why we need to
> rollback to legacy mode is we don't want to handle multi-destination
> interrupt in PI mode but it doesn't mean we cannot do it like i said before.
> 
> 
> --
> best regards
> yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang Zhang Jan. 21, 2016, 5:35 a.m. UTC | #8
On 2016/1/21 13:07, Wu, Feng wrote:
>
>
>> -----Original Message-----
>> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
>> Sent: Thursday, January 21, 2016 1:00 PM
>> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
>> rkrcmar@redhat.com
>> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
>> interrupt is not single-destination
>>
>> On 2016/1/21 12:42, Wu, Feng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
>> On
>>>> Behalf Of Yang Zhang
>>>> Sent: Thursday, January 21, 2016 11:35 AM
>>>> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
>>>> rkrcmar@redhat.com
>>>> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>>>> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
>>>> interrupt is not single-destination
>>>>
>>>> On 2016/1/21 11:14, Wu, Feng wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
>>>>>> Sent: Thursday, January 21, 2016 11:06 AM
>>>>>> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
>>>>>> rkrcmar@redhat.com
>>>>>> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>>>>>> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if
>> the
>>>>>> interrupt is not single-destination
>>>>>>
>>>>>> On 2016/1/20 9:42, Feng Wu wrote:
>>>>>>> When the interrupt is not single destination any more, we need
>>>>>>> to change back IRTE to remapped mode explicitly.
>>>>>>>
>>>>>>> Signed-off-by: Feng Wu <feng.wu@intel.com>
>>>>>>> ---
>>>>>>>      arch/x86/kvm/vmx.c | 11 ++++++++++-
>>>>>>>      1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>>> index e2951b6..13d14d4 100644
>>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>>> @@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct kvm
>>>>>> *kvm, unsigned int host_irq,
>>>>>>>      		 */
>>>>>>>
>>>>>>>      		kvm_set_msi_irq(e, &irq);
>>>>>>> -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
>>>>>>> +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
>>>>>>> +			/*
>>>>>>> +			 * Make sure the IRTE is in remapped mode if
>>>>>>> +			 * we don't handle it in posted mode.
>>>>>>> +			 */
>>>>>>> +			pi_set_sn(vcpu_to_pi_desc(vcpu));
>>>>>>> +			ret = irq_set_vcpu_affinity(host_irq, NULL);
>>>>>>> +			pi_clear_sn(vcpu_to_pi_desc(vcpu));
>>>>>>> +
>>>>>>>      			continue;
>>>>>>> +		}
>>>>>>>
>>>>>>>      		vcpu_info.pi_desc_addr =
>> __pa(vcpu_to_pi_desc(vcpu));
>>>>>>>      		vcpu_info.vector = irq.vector;
>>>>>>>
>>>>>>
>>>>>> I am still feel weird with this change: according the semantic of VT-d
>>>>>> posted interrupt, the interrupt will injected to guest through posted
>>>>>> notification and /proc/interrupts shows the same meaning. But now,
>>>>>> without being aware of user, the interrupt changes to legacy way and it
>>>>>> appears on different entry on /proc/interrupts. It looks weird.
>>>>>
>>>>> I don't think it has problem here, IMO, this is exactly how it works.
>>>>> There should be different entry for the interrupts in VT-d PI mode
>>>>> and leagcy mode.
>>>>
>>>> I am not saying any problem here. Just feel weird. From a normal user's
>>>> point, he has turned on the VT-d pi and according the semantic of VT-d
>>>> pi, he should not observe the interrupt through legacy mode, but now he
>>>> do see it. Maybe print out a message here will be helpful, like what you
>>>> did for disabled lapic found during irq injection.
>>>
>>> Even VT-d PI is on, not all interrupts can be handled by it, the reason the
>>
>> No, we can handle it but we don't do it due to the complexity.For
>> example, we can use wake up vector to delivery the interrupt which still
>> is in PI mode but doesn't require any mode change.
>
> I mean, multi-cast and broadcast interrupts cannot be handled in PI mode.

We may have different understanding on PI mode. My understanding is if 
we set the IRTE to PI format, than the subsequent interrupt will be 
handled in PI mode. multi-cast and broadcast interrupts cannot be 
injected to guest directly but it doesn't mean cannot be handled in PI 
mode. As i said, we can handle it in wake up vector or via other 
approach.But it is much complexity.

I agree that rollback to legacy mode is the best choice, but may need 
some additional messages to tell the user(host administrator) why we 
change to legacy mode. I think not all of them are familiar with the 
detail of VT-d PI. If they find there are still some interrupts goto 
legacy mode even they have turned on PI, they may get confused.

>
>>
>>> interrupts is changed back to legacy mode is because the user changes
>>> the affinity, and it cannot be handle in PI mode, and hence legacy mode
>>> is used. It is the user's behavior that cause this mode change, seems it is
>>> not so weird to me. But add some message here is good idea, just like
>>
>> Why user's behavior can change the mode?
>
> Like you mentioned before, if the interrupt is changed from single-destination
> to multiple-destination by guest. And this is the reason of adding the rollback
> logic here, right?

The user means the host administrator.

>
> Thanks,
> Feng
>
>> According the current design,
>> there is no way for user to turn on/off dynamically.Why we need to
>> rollback to legacy mode is we don't want to handle multi-destination
>> interrupt in PI mode but it doesn't mean we cannot do it like i said before.
>>
>>
>> --
>> best regards
>> yang
Wu, Feng Jan. 21, 2016, 5:41 a.m. UTC | #9
> -----Original Message-----
> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
> Sent: Thursday, January 21, 2016 1:36 PM
> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
> rkrcmar@redhat.com
> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
> interrupt is not single-destination
> 
> On 2016/1/21 13:07, Wu, Feng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
> >> Sent: Thursday, January 21, 2016 1:00 PM
> >> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
> >> rkrcmar@redhat.com
> >> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> >> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
> >> interrupt is not single-destination
> >>
> >> On 2016/1/21 12:42, Wu, Feng wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
> >> On
> >>>> Behalf Of Yang Zhang
> >>>> Sent: Thursday, January 21, 2016 11:35 AM
> >>>> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
> >>>> rkrcmar@redhat.com
> >>>> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> >>>> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if
> the
> >>>> interrupt is not single-destination
> >>>>
> >>>> On 2016/1/21 11:14, Wu, Feng wrote:
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
> >>>>>> Sent: Thursday, January 21, 2016 11:06 AM
> >>>>>> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
> >>>>>> rkrcmar@redhat.com
> >>>>>> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
> >>>>>> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if
> >> the
> >>>>>> interrupt is not single-destination
> >>>>>>
> >>>>>> On 2016/1/20 9:42, Feng Wu wrote:
> >>>>>>> When the interrupt is not single destination any more, we need
> >>>>>>> to change back IRTE to remapped mode explicitly.
> >>>>>>>
> >>>>>>> Signed-off-by: Feng Wu <feng.wu@intel.com>
> >>>>>>> ---
> >>>>>>>      arch/x86/kvm/vmx.c | 11 ++++++++++-
> >>>>>>>      1 file changed, 10 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>>>>> index e2951b6..13d14d4 100644
> >>>>>>> --- a/arch/x86/kvm/vmx.c
> >>>>>>> +++ b/arch/x86/kvm/vmx.c
> >>>>>>> @@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct
> kvm
> >>>>>> *kvm, unsigned int host_irq,
> >>>>>>>      		 */
> >>>>>>>
> >>>>>>>      		kvm_set_msi_irq(e, &irq);
> >>>>>>> -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
> >>>>>>> +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
> >>>>>>> +			/*
> >>>>>>> +			 * Make sure the IRTE is in remapped mode if
> >>>>>>> +			 * we don't handle it in posted mode.
> >>>>>>> +			 */
> >>>>>>> +			pi_set_sn(vcpu_to_pi_desc(vcpu));
> >>>>>>> +			ret = irq_set_vcpu_affinity(host_irq, NULL);
> >>>>>>> +			pi_clear_sn(vcpu_to_pi_desc(vcpu));
> >>>>>>> +
> >>>>>>>      			continue;
> >>>>>>> +		}
> >>>>>>>
> >>>>>>>      		vcpu_info.pi_desc_addr =
> >> __pa(vcpu_to_pi_desc(vcpu));
> >>>>>>>      		vcpu_info.vector = irq.vector;
> >>>>>>>
> >>>>>>
> >>>>>> I am still feel weird with this change: according the semantic of VT-d
> >>>>>> posted interrupt, the interrupt will injected to guest through posted
> >>>>>> notification and /proc/interrupts shows the same meaning. But now,
> >>>>>> without being aware of user, the interrupt changes to legacy way and
> it
> >>>>>> appears on different entry on /proc/interrupts. It looks weird.
> >>>>>
> >>>>> I don't think it has problem here, IMO, this is exactly how it works.
> >>>>> There should be different entry for the interrupts in VT-d PI mode
> >>>>> and leagcy mode.
> >>>>
> >>>> I am not saying any problem here. Just feel weird. From a normal user's
> >>>> point, he has turned on the VT-d pi and according the semantic of VT-d
> >>>> pi, he should not observe the interrupt through legacy mode, but now
> he
> >>>> do see it. Maybe print out a message here will be helpful, like what you
> >>>> did for disabled lapic found during irq injection.
> >>>
> >>> Even VT-d PI is on, not all interrupts can be handled by it, the reason the
> >>
> >> No, we can handle it but we don't do it due to the complexity.For
> >> example, we can use wake up vector to delivery the interrupt which still
> >> is in PI mode but doesn't require any mode change.
> >
> > I mean, multi-cast and broadcast interrupts cannot be handled in PI mode.
> 
> We may have different understanding on PI mode. My understanding is if
> we set the IRTE to PI format, than the subsequent interrupt will be
> handled in PI mode. multi-cast and broadcast interrupts cannot be
> injected to guest directly but it doesn't mean cannot be handled in PI
> mode. As i said, we can handle it in wake up vector or via other
> approach.But it is much complexity.

For the multicast/broastcast, we cannot set the related IRTE in PI
mode, since we cannot set only one destination in IRTE. If an interrupt
is for multiple destination, how can you use VT-d PI to injection it
to all the destinations?

Thanks,
Feng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang Zhang Jan. 21, 2016, 5:44 a.m. UTC | #10
On 2016/1/21 13:41, Wu, Feng wrote:
>
>
>> -----Original Message-----
>> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
>> Sent: Thursday, January 21, 2016 1:36 PM
>> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
>> rkrcmar@redhat.com
>> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
>> interrupt is not single-destination
>>
>> On 2016/1/21 13:07, Wu, Feng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
>>>> Sent: Thursday, January 21, 2016 1:00 PM
>>>> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
>>>> rkrcmar@redhat.com
>>>> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>>>> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
>>>> interrupt is not single-destination
>>>>
>>>> On 2016/1/21 12:42, Wu, Feng wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
>>>> On
>>>>>> Behalf Of Yang Zhang
>>>>>> Sent: Thursday, January 21, 2016 11:35 AM
>>>>>> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
>>>>>> rkrcmar@redhat.com
>>>>>> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>>>>>> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if
>> the
>>>>>> interrupt is not single-destination
>>>>>>
>>>>>> On 2016/1/21 11:14, Wu, Feng wrote:
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
>>>>>>>> Sent: Thursday, January 21, 2016 11:06 AM
>>>>>>>> To: Wu, Feng <feng.wu@intel.com>; pbonzini@redhat.com;
>>>>>>>> rkrcmar@redhat.com
>>>>>>>> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org
>>>>>>>> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if
>>>> the
>>>>>>>> interrupt is not single-destination
>>>>>>>>
>>>>>>>> On 2016/1/20 9:42, Feng Wu wrote:
>>>>>>>>> When the interrupt is not single destination any more, we need
>>>>>>>>> to change back IRTE to remapped mode explicitly.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Feng Wu <feng.wu@intel.com>
>>>>>>>>> ---
>>>>>>>>>       arch/x86/kvm/vmx.c | 11 ++++++++++-
>>>>>>>>>       1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>>>>> index e2951b6..13d14d4 100644
>>>>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>>>>> @@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct
>> kvm
>>>>>>>> *kvm, unsigned int host_irq,
>>>>>>>>>       		 */
>>>>>>>>>
>>>>>>>>>       		kvm_set_msi_irq(e, &irq);
>>>>>>>>> -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
>>>>>>>>> +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
>>>>>>>>> +			/*
>>>>>>>>> +			 * Make sure the IRTE is in remapped mode if
>>>>>>>>> +			 * we don't handle it in posted mode.
>>>>>>>>> +			 */
>>>>>>>>> +			pi_set_sn(vcpu_to_pi_desc(vcpu));
>>>>>>>>> +			ret = irq_set_vcpu_affinity(host_irq, NULL);
>>>>>>>>> +			pi_clear_sn(vcpu_to_pi_desc(vcpu));
>>>>>>>>> +
>>>>>>>>>       			continue;
>>>>>>>>> +		}
>>>>>>>>>
>>>>>>>>>       		vcpu_info.pi_desc_addr =
>>>> __pa(vcpu_to_pi_desc(vcpu));
>>>>>>>>>       		vcpu_info.vector = irq.vector;
>>>>>>>>>
>>>>>>>>
>>>>>>>> I am still feel weird with this change: according the semantic of VT-d
>>>>>>>> posted interrupt, the interrupt will injected to guest through posted
>>>>>>>> notification and /proc/interrupts shows the same meaning. But now,
>>>>>>>> without being aware of user, the interrupt changes to legacy way and
>> it
>>>>>>>> appears on different entry on /proc/interrupts. It looks weird.
>>>>>>>
>>>>>>> I don't think it has problem here, IMO, this is exactly how it works.
>>>>>>> There should be different entry for the interrupts in VT-d PI mode
>>>>>>> and leagcy mode.
>>>>>>
>>>>>> I am not saying any problem here. Just feel weird. From a normal user's
>>>>>> point, he has turned on the VT-d pi and according the semantic of VT-d
>>>>>> pi, he should not observe the interrupt through legacy mode, but now
>> he
>>>>>> do see it. Maybe print out a message here will be helpful, like what you
>>>>>> did for disabled lapic found during irq injection.
>>>>>
>>>>> Even VT-d PI is on, not all interrupts can be handled by it, the reason the
>>>>
>>>> No, we can handle it but we don't do it due to the complexity.For
>>>> example, we can use wake up vector to delivery the interrupt which still
>>>> is in PI mode but doesn't require any mode change.
>>>
>>> I mean, multi-cast and broadcast interrupts cannot be handled in PI mode.
>>
>> We may have different understanding on PI mode. My understanding is if
>> we set the IRTE to PI format, than the subsequent interrupt will be
>> handled in PI mode. multi-cast and broadcast interrupts cannot be
>> injected to guest directly but it doesn't mean cannot be handled in PI
>> mode. As i said, we can handle it in wake up vector or via other
>> approach.But it is much complexity.
>
> For the multicast/broastcast, we cannot set the related IRTE in PI
> mode, since we cannot set only one destination in IRTE. If an interrupt
> is for multiple destination, how can you use VT-d PI to injection it
> to all the destinations?

You may still not get my point. Anyway, it doesn't matter. Rollback to 
legacy mode still is the best choice so far.
Radim Krčmář Jan. 21, 2016, 4:19 p.m. UTC | #11
2016-01-20 09:42+0800, Feng Wu:
> When the interrupt is not single destination any more, we need
> to change back IRTE to remapped mode explicitly.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
> -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
> +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
> +			/*
> +			 * Make sure the IRTE is in remapped mode if
> +			 * we don't handle it in posted mode.
> +			 */
> +			pi_set_sn(vcpu_to_pi_desc(vcpu));

What could go wrong if we didn't suppress notifications here?

Thanks.

> +			ret = irq_set_vcpu_affinity(host_irq, NULL);
> +			pi_clear_sn(vcpu_to_pi_desc(vcpu));
> +
>  			continue;
> +		}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Jan. 21, 2016, 4:35 p.m. UTC | #12
2016-01-21 13:44+0800, Yang Zhang:
> On 2016/1/21 13:41, Wu, Feng wrote:
>>>From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
>>>We may have different understanding on PI mode. My understanding is if
>>>we set the IRTE to PI format, than the subsequent interrupt will be
>>>handled in PI mode. multi-cast and broadcast interrupts cannot be
>>>injected to guest directly but it doesn't mean cannot be handled in PI
>>>mode. As i said, we can handle it in wake up vector or via other
>>>approach.But it is much complexity.

KVM has to intercept the interrupt, so we'd need to trigger a deferred
work from the notification handler to send the multicast.
Reusing existing PI vectors would mean slowing them down, so we should
define a new PI notification vector just for this purpose, which would
be confusing in /proc/interrupts anyway.
On top of that, we'd need to define new PIRR array(s) and create unique
PID for every IRTE, to avoid parsing those PIRR arrays as the vector is
stored in IRTE ... it's going a bit too far, I guess.

>>For the multicast/broastcast, we cannot set the related IRTE in PI
>>mode, since we cannot set only one destination in IRTE. If an interrupt
>>is for multiple destination, how can you use VT-d PI to injection it
>>to all the destinations?
> 
> You may still not get my point. Anyway, it doesn't matter. Rollback to
> legacy mode still is the best choice so far.

I think we can't do much better than we do now.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wu, Feng Jan. 22, 2016, 1:49 a.m. UTC | #13
> -----Original Message-----
> From: Radim Kr?má? [mailto:rkrcmar@redhat.com]
> Sent: Friday, January 22, 2016 12:20 AM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: pbonzini@redhat.com; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org
> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
> interrupt is not single-destination
> 
> 2016-01-20 09:42+0800, Feng Wu:
> > When the interrupt is not single destination any more, we need
> > to change back IRTE to remapped mode explicitly.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > ---
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > @@ -10764,8 +10764,17 @@ static int vmx_update_pi_irte(struct kvm
> *kvm, unsigned int host_irq,
> > -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
> > +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
> > +			/*
> > +			 * Make sure the IRTE is in remapped mode if
> > +			 * we don't handle it in posted mode.
> > +			 */
> > +			pi_set_sn(vcpu_to_pi_desc(vcpu));
> 
> What could go wrong if we didn't suppress notifications here?

This is a good question. I also thought about this before, but after
thinking it a bit more, seems we don't need to do this. 
If we don't do this, the in-flight interrupts will continue to be
delivered in PI mode while we are changing it to remapped
mode in IRTE. Even if we do this, the in-flight interrupts are
also delivered in PI mode before setting 'SN' anyway, so seems
we really don't need this, what is your opinion?

Thanks,
Feng

Thanks,
Feng

> 
> Thanks.
> 
> > +			ret = irq_set_vcpu_affinity(host_irq, NULL);
> > +			pi_clear_sn(vcpu_to_pi_desc(vcpu));
> > +
> >  			continue;
> > +		}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang Zhang Jan. 22, 2016, 2:03 a.m. UTC | #14
On 2016/1/22 0:35, rkrcmar@redhat.com wrote:
> 2016-01-21 13:44+0800, Yang Zhang:
>> On 2016/1/21 13:41, Wu, Feng wrote:
>>>> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
>>>> We may have different understanding on PI mode. My understanding is if
>>>> we set the IRTE to PI format, than the subsequent interrupt will be
>>>> handled in PI mode. multi-cast and broadcast interrupts cannot be
>>>> injected to guest directly but it doesn't mean cannot be handled in PI
>>>> mode. As i said, we can handle it in wake up vector or via other
>>>> approach.But it is much complexity.
>
> KVM has to intercept the interrupt, so we'd need to trigger a deferred
> work from the notification handler to send the multicast.
> Reusing existing PI vectors would mean slowing them down, so we should
> define a new PI notification vector just for this purpose, which would
> be confusing in /proc/interrupts anyway.
> On top of that, we'd need to define new PIRR array(s) and create unique
> PID for every IRTE, to avoid parsing those PIRR arrays as the vector is
> stored in IRTE ... it's going a bit too far, I guess.

Not so complicated. We can reuse the wake up vector and check whether 
the interrupt is multicast when one of destination vcpu handles it. If 
it is multicast, then also notifies other vcpus. It is totally handed in 
PI mode and we already have the wakeup vector in /proc/interrupts.

>
>>> For the multicast/broastcast, we cannot set the related IRTE in PI
>>> mode, since we cannot set only one destination in IRTE. If an interrupt
>>> is for multiple destination, how can you use VT-d PI to injection it
>>> to all the destinations?
>>
>> You may still not get my point. Anyway, it doesn't matter. Rollback to
>> legacy mode still is the best choice so far.
>
> I think we can't do much better than we do now.
Radim Krčmář Jan. 22, 2016, 1:05 p.m. UTC | #15
2016-01-22 01:49+0000, Wu, Feng:
>> From: Radim Kr?má? [mailto:rkrcmar@redhat.com]
>> 2016-01-20 09:42+0800, Feng Wu:
>> > -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
>> > +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
>> > +			/*
>> > +			 * Make sure the IRTE is in remapped mode if
>> > +			 * we don't handle it in posted mode.
>> > +			 */
>> > +			pi_set_sn(vcpu_to_pi_desc(vcpu));
>> 
>> What could go wrong if we didn't suppress notifications here?
> 
> This is a good question. I also thought about this before, but after
> thinking it a bit more, seems we don't need to do this. 
> If we don't do this, the in-flight interrupts will continue to be
> delivered in PI mode while we are changing it to remapped
> mode in IRTE. Even if we do this, the in-flight interrupts are
> also delivered in PI mode before setting 'SN' anyway, so seems
> we really don't need this, what is your opinion?

I'd remove it.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Jan. 22, 2016, 1:31 p.m. UTC | #16
2016-01-22 10:03+0800, Yang Zhang:
> On 2016/1/22 0:35, rkrcmar@redhat.com wrote:
>>2016-01-21 13:44+0800, Yang Zhang:
>>>On 2016/1/21 13:41, Wu, Feng wrote:
>>>>>From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
>>>>>We may have different understanding on PI mode. My understanding is if
>>>>>we set the IRTE to PI format, than the subsequent interrupt will be
>>>>>handled in PI mode. multi-cast and broadcast interrupts cannot be
>>>>>injected to guest directly but it doesn't mean cannot be handled in PI
>>>>>mode. As i said, we can handle it in wake up vector or via other
>>>>>approach.But it is much complexity.
>>
>>KVM has to intercept the interrupt, so we'd need to trigger a deferred
>>work from the notification handler to send the multicast.
>>Reusing existing PI vectors would mean slowing them down, so we should
>>define a new PI notification vector just for this purpose, which would
>>be confusing in /proc/interrupts anyway.
>>On top of that, we'd need to define new PIRR array(s) and create unique
>>PID for every IRTE, to avoid parsing those PIRR arrays as the vector is
>>stored in IRTE ... it's going a bit too far, I guess.
> 
> Not so complicated. We can reuse the wake up vector and check whether the
> interrupt is multicast when one of destination vcpu handles it.

I'm not sure what you mean now ... I guess it is:
- Deliver the interrupt to a guest VCPU and relay the multicast to other
  VCPUs.  No, it's strictly worse than intercepting it in the host.

- Modify host's wakeup vector handler to send the multicast.
  It's so complicated, because all information you start with in the
  host is a vector number.  You start with no idea what the multicast
  interrupt should be.

  We could add per-multicast PID to the list of parsed PIDs in
  wakeup_handler and use PID->multicast interrupt mapping to tell which
  interrupt we should send, but that seems worse than just delivering a
  non-remapped interrupt.

  Also, if wakeup vector were used for wakeup and multicast, we'd be
  uselessly doing work, because we can't tell which reason triggered the
  interrupt before finishing one part -- using separate vectors for that
  would be a bit nicer.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang Zhang Jan. 25, 2016, 1:49 a.m. UTC | #17
On 2016/1/22 21:31, rkrcmar@redhat.com wrote:
> 2016-01-22 10:03+0800, Yang Zhang:
>> On 2016/1/22 0:35, rkrcmar@redhat.com wrote:
>>> 2016-01-21 13:44+0800, Yang Zhang:
>>>> On 2016/1/21 13:41, Wu, Feng wrote:
>>>>>> From: Yang Zhang [mailto:yang.zhang.wz@gmail.com]
>>>>>> We may have different understanding on PI mode. My understanding is if
>>>>>> we set the IRTE to PI format, than the subsequent interrupt will be
>>>>>> handled in PI mode. multi-cast and broadcast interrupts cannot be
>>>>>> injected to guest directly but it doesn't mean cannot be handled in PI
>>>>>> mode. As i said, we can handle it in wake up vector or via other
>>>>>> approach.But it is much complexity.
>>>
>>> KVM has to intercept the interrupt, so we'd need to trigger a deferred
>>> work from the notification handler to send the multicast.
>>> Reusing existing PI vectors would mean slowing them down, so we should
>>> define a new PI notification vector just for this purpose, which would
>>> be confusing in /proc/interrupts anyway.
>>> On top of that, we'd need to define new PIRR array(s) and create unique
>>> PID for every IRTE, to avoid parsing those PIRR arrays as the vector is
>>> stored in IRTE ... it's going a bit too far, I guess.
>>
>> Not so complicated. We can reuse the wake up vector and check whether the
>> interrupt is multicast when one of destination vcpu handles it.
>
> I'm not sure what you mean now ... I guess it is:
> - Deliver the interrupt to a guest VCPU and relay the multicast to other
>    VCPUs.  No, it's strictly worse than intercepting it in the host.

It is still handled in host context not guest context. The wakeup event 
cannot be consumed like posted event. So it relies on hypervisor to 
inject the interrupt to guest. We can add the check at this point.


>
> - Modify host's wakeup vector handler to send the multicast.
>    It's so complicated, because all information you start with in the
>    host is a vector number.  You start with no idea what the multicast
>    interrupt should be.
>
>    We could add per-multicast PID to the list of parsed PIDs in
>    wakeup_handler and use PID->multicast interrupt mapping to tell which
>    interrupt we should send, but that seems worse than just delivering a
>    non-remapped interrupt.
>
>    Also, if wakeup vector were used for wakeup and multicast, we'd be
>    uselessly doing work, because we can't tell which reason triggered the
>    interrupt before finishing one part -- using separate vectors for that
>    would be a bit nicer.
>
Paolo Bonzini Jan. 25, 2016, 12:22 p.m. UTC | #18
On 22/01/2016 14:05, Radim Krcmár wrote:
> > This is a good question. I also thought about this before, but after
> > thinking it a bit more, seems we don't need to do this. 
> > If we don't do this, the in-flight interrupts will continue to be
> > delivered in PI mode while we are changing it to remapped
> > mode in IRTE. Even if we do this, the in-flight interrupts are
> > also delivered in PI mode before setting 'SN' anyway, so seems
> > we really don't need this, what is your opinion?
> I'd remove it.

It may be necessary because IRTE writes (128 bits) are not atomic.

If so, no need to send v5, I'll add it back.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wu, Feng Jan. 25, 2016, 12:26 p.m. UTC | #19
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogUGFvbG8gQm9uemluaSBb
bWFpbHRvOnBhb2xvLmJvbnppbmlAZ21haWwuY29tXSBPbiBCZWhhbGYgT2YgUGFvbG8NCj4gQm9u
emluaQ0KPiBTZW50OiBNb25kYXksIEphbnVhcnkgMjUsIDIwMTYgODoyMyBQTQ0KPiBUbzogUmFk
aW0gS3JjbcOhciA8cmtyY21hckByZWRoYXQuY29tPjsgV3UsIEZlbmcgPGZlbmcud3VAaW50ZWwu
Y29tPg0KPiBDYzogbGludXgta2VybmVsQHZnZXIua2VybmVsLm9yZzsga3ZtQHZnZXIua2VybmVs
Lm9yZw0KPiBTdWJqZWN0OiBSZTogW1BBVENIIHYzIDEvNF0gS1ZNOiBSZWNvdmVyIElSVEUgdG8g
cmVtYXBwZWQgbW9kZSBpZiB0aGUNCj4gaW50ZXJydXB0IGlzIG5vdCBzaW5nbGUtZGVzdGluYXRp
b24NCj4gDQo+IA0KPiANCj4gT24gMjIvMDEvMjAxNiAxNDowNSwgUmFkaW0gS3JjbcOhciB3cm90
ZToNCj4gPiA+IFRoaXMgaXMgYSBnb29kIHF1ZXN0aW9uLiBJIGFsc28gdGhvdWdodCBhYm91dCB0
aGlzIGJlZm9yZSwgYnV0IGFmdGVyDQo+ID4gPiB0aGlua2luZyBpdCBhIGJpdCBtb3JlLCBzZWVt
cyB3ZSBkb24ndCBuZWVkIHRvIGRvIHRoaXMuDQo+ID4gPiBJZiB3ZSBkb24ndCBkbyB0aGlzLCB0
aGUgaW4tZmxpZ2h0IGludGVycnVwdHMgd2lsbCBjb250aW51ZSB0byBiZQ0KPiA+ID4gZGVsaXZl
cmVkIGluIFBJIG1vZGUgd2hpbGUgd2UgYXJlIGNoYW5naW5nIGl0IHRvIHJlbWFwcGVkDQo+ID4g
PiBtb2RlIGluIElSVEUuIEV2ZW4gaWYgd2UgZG8gdGhpcywgdGhlIGluLWZsaWdodCBpbnRlcnJ1
cHRzIGFyZQ0KPiA+ID4gYWxzbyBkZWxpdmVyZWQgaW4gUEkgbW9kZSBiZWZvcmUgc2V0dGluZyAn
U04nIGFueXdheSwgc28gc2VlbXMNCj4gPiA+IHdlIHJlYWxseSBkb24ndCBuZWVkIHRoaXMsIHdo
YXQgaXMgeW91ciBvcGluaW9uPw0KPiA+IEknZCByZW1vdmUgaXQuDQo+IA0KPiBJdCBtYXkgYmUg
bmVjZXNzYXJ5IGJlY2F1c2UgSVJURSB3cml0ZXMgKDEyOCBiaXRzKSBhcmUgbm90IGF0b21pYy4N
Cg0KSVJURSBpcyB1cGRhdGVkIGF0b21pY2FsbHksIEkgYWRkZWQgdGhlIHBhdGNoIHRvIHN1cHBv
cnQgdGhpcy4gUGxlYXNlDQpyZWZlciB0byAzNDRjYjRlMGI2ZjNhMGRiZWYwNjQzZWFjYjQ5NDYz
MzhlYjIyOGMwLg0KDQpUaGFua3MsDQpGZW5nDQoNCj4gDQo+IElmIHNvLCBubyBuZWVkIHRvIHNl
bmQgdjUsIEknbGwgYWRkIGl0IGJhY2suDQo+IA0KPiBQYW9sbw0K
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Jan. 25, 2016, 12:38 p.m. UTC | #20
On 25/01/2016 13:26, Wu, Feng wrote:
>> > It may be necessary because IRTE writes (128 bits) are not atomic.
> IRTE is updated atomically, I added the patch to support this. Please
> refer to 344cb4e0b6f3a0dbef0643eacb4946338eb228c0.

Great, I hadn't noticed that patch.  Thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wu, Feng Jan. 25, 2016, 12:48 p.m. UTC | #21
> -----Original Message-----

> From: Paolo Bonzini [mailto:pbonzini@redhat.com]

> Sent: Monday, January 25, 2016 8:39 PM

> To: Wu, Feng <feng.wu@intel.com>; Radim Krcmár <rkrcmar@redhat.com>

> Cc: linux-kernel@vger.kernel.org; kvm@vger.kernel.org

> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the

> interrupt is not single-destination

> 

> 

> 

> On 25/01/2016 13:26, Wu, Feng wrote:

> >> > It may be necessary because IRTE writes (128 bits) are not atomic.

> > IRTE is updated atomically, I added the patch to support this. Please

> > refer to 344cb4e0b6f3a0dbef0643eacb4946338eb228c0.

> 

> Great, I hadn't noticed that patch.  Thanks.


I might forget to cc KVM mailing list, sorry for that!

Thanks,
Feng

> 

> Paolo
Radim Krčmář Jan. 25, 2016, 1:59 p.m. UTC | #22
2016-01-25 09:49+0800, Yang Zhang:
> On 2016/1/22 21:31, rkrcmar@redhat.com wrote:
>>2016-01-22 10:03+0800, Yang Zhang:
>>>Not so complicated. We can reuse the wake up vector and check whether the
>>>interrupt is multicast when one of destination vcpu handles it.
>>
>>I'm not sure what you mean now ... I guess it is:
>>- Deliver the interrupt to a guest VCPU and relay the multicast to other
>>   VCPUs.  No, it's strictly worse than intercepting it in the host.
> 
> It is still handled in host context not guest context. The wakeup event
> cannot be consumed like posted event.

Ok.  ("when one of destination vcpu handles it" confused me into
thinking that you'd like to handle it with the notification vector.)

>                                       So it relies on hypervisor to inject
> the interrupt to guest. We can add the check at this point.

Yes, but I don't think we want to do that, because of following
drawbacks:

>>- Modify host's wakeup vector handler to send the multicast.
>>   It's so complicated, because all information you start with in the
>>   host is a vector number.  You start with no idea what the multicast
>>   interrupt should be.
>>
>>   We could add per-multicast PID to the list of parsed PIDs in
>>   wakeup_handler and use PID->multicast interrupt mapping to tell which
>>   interrupt we should send, but that seems worse than just delivering a
>>   non-remapped interrupt.

(should have been "remapped, but non-posted".)

>>   Also, if wakeup vector were used for wakeup and multicast, we'd be
>>   uselessly doing work, because we can't tell which reason triggered the
>>   interrupt before finishing one part -- using separate vectors for that
>>   would be a bit nicer.

(imprecise -- we would always have to check for ON bit of all PIDs from
 blocked VCPUs, for the original meaning of wakeup vector, and always
 either read the PIRR or check for ON bit of all PIDs that encode
 multicast interrupts;  then we have to clear ON bits for multicasts.)


---
There might be a benefit of using posted interrupts for host interrupts
when we run out of free interrupt vectors:  we could start using vectors
by multiple sources through posted interrupts, if using posted
interrupts is the fastest way to distinguish the interrupt source.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Jan. 25, 2016, 2:05 p.m. UTC | #23
2016-01-25 12:26+0000, Wu, Feng:
>> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
>> It may be necessary because IRTE writes (128 bits) are not atomic.
> 
> IRTE is updated atomically, I added the patch to support this. Please
> refer to 344cb4e0b6f3a0dbef0643eacb4946338eb228c0.

I also think that SN bit is not affected by atomicity: if the IRTE could
have been read half-updated while changing from posted to non-posted,
then it wouldn't point to the correct PID, because its address is not
within 64 bits, so the SN bit wouldn't matter.

IRTE invalidation seems important in VT-d ...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wu, Feng Jan. 26, 2016, 12:57 a.m. UTC | #24
> -----Original Message-----
> From: Radim Krcmár [mailto:rkrcmar@redhat.com]
> Sent: Monday, January 25, 2016 10:06 PM
> To: Wu, Feng <feng.wu@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; linux-kernel@vger.kernel.org;
> kvm@vger.kernel.org
> Subject: Re: [PATCH v3 1/4] KVM: Recover IRTE to remapped mode if the
> interrupt is not single-destination
> 
> 2016-01-25 12:26+0000, Wu, Feng:
> >> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> >> It may be necessary because IRTE writes (128 bits) are not atomic.
> >
> > IRTE is updated atomically, I added the patch to support this. Please
> > refer to 344cb4e0b6f3a0dbef0643eacb4946338eb228c0.
> 
> I also think that SN bit is not affected by atomicity: if the IRTE could
> have been read half-updated while changing from posted to non-posted,
> then it wouldn't point to the correct PID, because its address is not
> within 64 bits, so the SN bit wouldn't matter.

Yes, like the comments in the commit, we should atomically update the
IRTE in PI case (PI -> non-PI, non-PI -> PI), without which, it cannot
guarantee the correctness, since 'pda' is not within 64 bits, like Radim
pointed out above.

Thanks,
Feng

> 
> IRTE invalidation seems important in VT-d ...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang Zhang Jan. 26, 2016, 1:44 a.m. UTC | #25
On 2016/1/25 21:59, rkrcmar@redhat.com wrote:
> 2016-01-25 09:49+0800, Yang Zhang:
>> On 2016/1/22 21:31, rkrcmar@redhat.com wrote:
>>> 2016-01-22 10:03+0800, Yang Zhang:
>>>> Not so complicated. We can reuse the wake up vector and check whether the
>>>> interrupt is multicast when one of destination vcpu handles it.
>>>
>>> I'm not sure what you mean now ... I guess it is:
>>> - Deliver the interrupt to a guest VCPU and relay the multicast to other
>>>    VCPUs.  No, it's strictly worse than intercepting it in the host.
>>
>> It is still handled in host context not guest context. The wakeup event
>> cannot be consumed like posted event.
>
> Ok.  ("when one of destination vcpu handles it" confused me into
> thinking that you'd like to handle it with the notification vector.)

Sorry for my poor english. :(

>
>>                                        So it relies on hypervisor to inject
>> the interrupt to guest. We can add the check at this point.
>
> Yes, but I don't think we want to do that, because of following
> drawbacks:
>
>>> - Modify host's wakeup vector handler to send the multicast.
>>>    It's so complicated, because all information you start with in the
>>>    host is a vector number.  You start with no idea what the multicast
>>>    interrupt should be.
>>>
>>>    We could add per-multicast PID to the list of parsed PIDs in
>>>    wakeup_handler and use PID->multicast interrupt mapping to tell which
>>>    interrupt we should send, but that seems worse than just delivering a
>>>    non-remapped interrupt.
>
> (should have been "remapped, but non-posted".)
>
>>>    Also, if wakeup vector were used for wakeup and multicast, we'd be
>>>    uselessly doing work, because we can't tell which reason triggered the
>>>    interrupt before finishing one part -- using separate vectors for that
>>>    would be a bit nicer.
>
> (imprecise -- we would always have to check for ON bit of all PIDs from
>   blocked VCPUs, for the original meaning of wakeup vector, and always

This is what KVM does currently.

>   either read the PIRR or check for ON bit of all PIDs that encode
>   multicast interrupts;  then we have to clear ON bits for multicasts.)

Also, most part of work is covered by current logic except checking the 
multicast.

>
>
> ---
> There might be a benefit of using posted interrupts for host interrupts
> when we run out of free interrupt vectors:  we could start using vectors
> by multiple sources through posted interrupts, if using posted

Do you mean per vcpu posted interrupts?

> interrupts is the fastest way to distinguish the interrupt source.
Radim Krčmář Jan. 26, 2016, 6:22 p.m. UTC | #26
2016-01-26 09:44+0800, Yang Zhang:
> On 2016/1/25 21:59, rkrcmar@redhat.com wrote:
>>2016-01-25 09:49+0800, Yang Zhang:
>>>On 2016/1/22 21:31, rkrcmar@redhat.com wrote:
>>>>2016-01-22 10:03+0800, Yang Zhang:
>>>>>Not so complicated. We can reuse the wake up vector and check whether the
>>>>>interrupt is multicast when one of destination vcpu handles it.
>>>>
>>>>I'm not sure what you mean now ... I guess it is:
>>>>- Deliver the interrupt to a guest VCPU and relay the multicast to other
>>>>   VCPUs.  No, it's strictly worse than intercepting it in the host.
>>>
>>>It is still handled in host context not guest context. The wakeup event
>>>cannot be consumed like posted event.
>>
>>Ok.  ("when one of destination vcpu handles it" confused me into
>>thinking that you'd like to handle it with the notification vector.)
> 
> Sorry for my poor english. :(

It's good.  Ambiguity is hard to avoid if a reader doesn't want to
assume only the most likely meaning.

>>>>   Also, if wakeup vector were used for wakeup and multicast, we'd be
>>>>   uselessly doing work, because we can't tell which reason triggered the
>>>>   interrupt before finishing one part -- using separate vectors for that
>>>>   would be a bit nicer.
>>
>>(imprecise -- we would always have to check for ON bit of all PIDs from
>>  blocked VCPUs, for the original meaning of wakeup vector, and always
> 
> This is what KVM does currently.

Yep.

>>  either read the PIRR or check for ON bit of all PIDs that encode
>>  multicast interrupts;  then we have to clear ON bits for multicasts.)
> 
> Also, most part of work is covered by current logic except checking the
> multicast.

We could reuse the setup that gets us to wakeup_handler, but there is
nothing to share in the handler itself.  Sharing a handler means that we
always have to execute both parts.

We must create new PID anyway and compared to the extra work needed for
multicast handling, a new vector + handler is a relatively small code
investment that adds clarity to the design (and performance).

(Taking the vector splitting to the extreme, we'd improve performance if
 we added a vector per assigned device.  That is practically the same as
 non-posted mode, just more complicated.)

>>---
>>There might be a benefit of using posted interrupts for host interrupts
>>when we run out of free interrupt vectors:  we could start using vectors
>>by multiple sources through posted interrupts, if using posted
> 
> Do you mean per vcpu posted interrupts?

I mean using posting for host device interrupts (no virt involved).

Let's say we have 300 devices for one CPU and CPU has 200 useable
vectors.  We have 100 device interrupts that need to be shared in some
vectors and using posting might be faster than directly checking
multiple devices.

(I couldn't come up with a plausible scenario where we might want to use
 posting for host interrupts.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yang Zhang Jan. 27, 2016, 2:07 a.m. UTC | #27
On 2016/1/27 2:22, rkrcmar@redhat.com wrote:
> 2016-01-26 09:44+0800, Yang Zhang:
>> On 2016/1/25 21:59, rkrcmar@redhat.com wrote:
>>> 2016-01-25 09:49+0800, Yang Zhang:
>>>> On 2016/1/22 21:31, rkrcmar@redhat.com wrote:
>>>>> 2016-01-22 10:03+0800, Yang Zhang:
>>>>>> Not so complicated. We can reuse the wake up vector and check whether the
>>>>>> interrupt is multicast when one of destination vcpu handles it.
>>>>>
>>>>> I'm not sure what you mean now ... I guess it is:
>>>>> - Deliver the interrupt to a guest VCPU and relay the multicast to other
>>>>>    VCPUs.  No, it's strictly worse than intercepting it in the host.
>>>>
>>>> It is still handled in host context not guest context. The wakeup event
>>>> cannot be consumed like posted event.
>>>
>>> Ok.  ("when one of destination vcpu handles it" confused me into
>>> thinking that you'd like to handle it with the notification vector.)
>>
>> Sorry for my poor english. :(
>
> It's good.  Ambiguity is hard to avoid if a reader doesn't want to
> assume only the most likely meaning.
>
>>>>>    Also, if wakeup vector were used for wakeup and multicast, we'd be
>>>>>    uselessly doing work, because we can't tell which reason triggered the
>>>>>    interrupt before finishing one part -- using separate vectors for that
>>>>>    would be a bit nicer.
>>>
>>> (imprecise -- we would always have to check for ON bit of all PIDs from
>>>   blocked VCPUs, for the original meaning of wakeup vector, and always
>>
>> This is what KVM does currently.
>
> Yep.
>
>>>   either read the PIRR or check for ON bit of all PIDs that encode
>>>   multicast interrupts;  then we have to clear ON bits for multicasts.)
>>
>> Also, most part of work is covered by current logic except checking the
>> multicast.
>
> We could reuse the setup that gets us to wakeup_handler, but there is
> nothing to share in the handler itself.  Sharing a handler means that we
> always have to execute both parts.

I don't quite understand it. There is nothing need to be modified for 
wakeup logic. The only thing we need to do is add the checking before 
the vcpu pick up the pending interrupt(This is happened in VCPU context, 
not in handler).

>
> We must create new PID anyway and compared to the extra work needed for
> multicast handling, a new vector + handler is a relatively small code
> investment that adds clarity to the design (and performance).

No new PID is needed. If the target vcpu is running, no additional work 
is required in wakeup handler. If target vcpu is not running, the 
current logic will wake up the vcpu, then let vcpu itself to check 
whether pending interrupt is a multicast and handle it in vcpu's context.

>
> (Taking the vector splitting to the extreme, we'd improve performance if
>   we added a vector per assigned device.  That is practically the same as
>   non-posted mode, just more complicated.)
>
>>> ---
>>> There might be a benefit of using posted interrupts for host interrupts
>>> when we run out of free interrupt vectors:  we could start using vectors
>>> by multiple sources through posted interrupts, if using posted
>>
>> Do you mean per vcpu posted interrupts?
>
> I mean using posting for host device interrupts (no virt involved).
>
> Let's say we have 300 devices for one CPU and CPU has 200 useable
> vectors.  We have 100 device interrupts that need to be shared in some
> vectors and using posting might be faster than directly checking
> multiple devices.

Yes, this is an good point.
Radim Krčmář Jan. 27, 2016, 3:05 p.m. UTC | #28
2016-01-27 10:07+0800, Yang Zhang:
> On 2016/1/27 2:22, rkrcmar@redhat.com wrote:
>>2016-01-26 09:44+0800, Yang Zhang:
>>>On 2016/1/25 21:59, rkrcmar@redhat.com wrote:
>>>>>>   Also, if wakeup vector were used for wakeup and multicast, we'd be
>>>>>>   uselessly doing work, because we can't tell which reason triggered the
>>>>>>   interrupt before finishing one part -- using separate vectors for that
>>>>>>   would be a bit nicer.
>>>>
>>>>(imprecise -- we would always have to check for ON bit of all PIDs from
>>>>  blocked VCPUs, for the original meaning of wakeup vector, and always
>>>>  either read the PIRR or check for ON bit of all PIDs that encode
>>>>  multicast interrupts;  then we have to clear ON bits for multicasts.)
>>>
>>>Also, most part of work is covered by current logic except checking the
>>>multicast.
>>
>>We could reuse the setup that gets us to wakeup_handler, but there is
>>nothing to share in the handler itself.  Sharing a handler means that we
>>always have to execute both parts.
> 
> I don't quite understand it. There is nothing need to be modified for wakeup
> logic. The only thing we need to do is add the checking before the vcpu pick
> up the pending interrupt(This is happened in VCPU context, not in handler).

I see, there are few problems with that.

>>We must create new PID anyway and compared to the extra work needed for
>>multicast handling, a new vector + handler is a relatively small code
>>investment that adds clarity to the design (and performance).
> 
> No new PID is needed. If the target vcpu is running, no additional work is
> required in wakeup handler. If target vcpu is not running, the current logic
> will wake up the vcpu, then let vcpu itself to check whether pending
> interrupt is a multicast and handle it in vcpu's context.

We do need a new PID.  The existing VCPU PID switches between wakeup
vector and notification vector, so if the VCPU was running when the
device triggered an interrupt, we'd deliver the posted interrupt without
exiting, but we need to handle the interrupt in the host.

=> We need at least one PID that is never set to notification vector.

Reusing VCPU's PIRR is in new PID(s) is not doable.
Parsing PIRR would be our only option of recognizing multicast
interrupts and if the guest configured many sources to send the same
vector, we'd have to do unacceptable things to tell which one was
triggered.

=> We also need at least on one new PIRR.

Handling the interrupt in VCPU context doesn't pose any advantage and we
even want to do it outside, because all VCPUs can be running when the
interrupt arrives and can therefore be posted further.

I hope I covered other disadvantages of PIDs and PIRRs earlier.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e2951b6..13d14d4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10764,8 +10764,17 @@  static int vmx_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
 		 */
 
 		kvm_set_msi_irq(e, &irq);
-		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
+		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
+			/*
+			 * Make sure the IRTE is in remapped mode if
+			 * we don't handle it in posted mode.
+			 */
+			pi_set_sn(vcpu_to_pi_desc(vcpu));
+			ret = irq_set_vcpu_affinity(host_irq, NULL);
+			pi_clear_sn(vcpu_to_pi_desc(vcpu));
+
 			continue;
+		}
 
 		vcpu_info.pi_desc_addr = __pa(vcpu_to_pi_desc(vcpu));
 		vcpu_info.vector = irq.vector;