diff mbox

[v3,06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

Message ID 1418397300-10870-7-git-send-email-feng.wu@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wu, Feng Dec. 12, 2014, 3:14 p.m. UTC
We don't need to migrate the irqs for VT-d Posted-Interrupts here.
When 'pst' is set in IRTE, the associated irq will be posted to
guests instead of interrupt remapping. The destination of the
interrupt is set in Posted-Interrupts Descriptor, and the migration
happens during vCPU scheduling.

However, we still update the cached irte here, which can be used
when changing back to remapping mode.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 drivers/iommu/intel_irq_remapping.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Zhang, Yang Z Dec. 18, 2014, 2:26 p.m. UTC | #1
Feng Wu wrote on 2014-12-12:
> We don't need to migrate the irqs for VT-d Posted-Interrupts here.
> When 'pst' is set in IRTE, the associated irq will be posted to guests
> instead of interrupt remapping. The destination of the interrupt is
> set in Posted-Interrupts Descriptor, and the migration happens during
> vCPU scheduling.
> 
> However, we still update the cached irte here, which can be used when
> changing back to remapping mode.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
> ---
>  drivers/iommu/intel_irq_remapping.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> diff --git a/drivers/iommu/intel_irq_remapping.c
> b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a 100644 ---
> a/drivers/iommu/intel_irq_remapping.c +++
> b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@
> intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
>  {
>  	struct intel_ir_data *ir_data = data->chip_data; 	struct irte *irte =
>  &ir_data->irte_entry; +	struct irte_pi *irte_pi = (struct irte_pi
>  *)irte; 	struct irq_cfg *cfg = irqd_cfg(data); 	struct irq_data *parent
>  = data->parent_data; 	int ret;
> @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data,
> const struct cpumask *mask,
>  	 */
>  	irte->vector = cfg->vector;
>  	irte->dest_id = IRTE_DEST(cfg->dest_apicid);
> -	modify_irte(&ir_data->irq_2_iommu, irte);
> +
> +	/* We don't need to modify irte if the interrupt is for posting. */
> +	if (irte_pi->pst != 1)
> +		modify_irte(&ir_data->irq_2_iommu, irte);

What happens if user changes the IRQ affinity manually?

> 
>  	/*
>  	 * After this point, all the interrupts will start arriving


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wu, Feng Dec. 19, 2014, 1:40 a.m. UTC | #2
> -----Original Message-----
> From: Zhang, Yang Z
> Sent: Thursday, December 18, 2014 10:26 PM
> To: Wu, Feng; tglx@linutronix.de; mingo@redhat.com; hpa@zytor.com;
> x86@kernel.org; gleb@kernel.org; pbonzini@redhat.com;
> dwmw2@infradead.org; joro@8bytes.org; alex.williamson@redhat.com;
> jiang.liu@linux.intel.com
> Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d
> Posted-Interrupts
> 
> Feng Wu wrote on 2014-12-12:
> > We don't need to migrate the irqs for VT-d Posted-Interrupts here.
> > When 'pst' is set in IRTE, the associated irq will be posted to guests
> > instead of interrupt remapping. The destination of the interrupt is
> > set in Posted-Interrupts Descriptor, and the migration happens during
> > vCPU scheduling.
> >
> > However, we still update the cached irte here, which can be used when
> > changing back to remapping mode.
> >
> > Signed-off-by: Feng Wu <feng.wu@intel.com>
> > Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
> > ---
> >  drivers/iommu/intel_irq_remapping.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > diff --git a/drivers/iommu/intel_irq_remapping.c
> > b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a 100644 ---
> > a/drivers/iommu/intel_irq_remapping.c +++
> > b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@
> > intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
> >  {
> >  	struct intel_ir_data *ir_data = data->chip_data; 	struct irte *irte =
> >  &ir_data->irte_entry; +	struct irte_pi *irte_pi = (struct irte_pi
> >  *)irte; 	struct irq_cfg *cfg = irqd_cfg(data); 	struct irq_data *parent
> >  = data->parent_data; 	int ret;
> > @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data,
> > const struct cpumask *mask,
> >  	 */
> >  	irte->vector = cfg->vector;
> >  	irte->dest_id = IRTE_DEST(cfg->dest_apicid);
> > -	modify_irte(&ir_data->irq_2_iommu, irte);
> > +
> > +	/* We don't need to modify irte if the interrupt is for posting. */
> > +	if (irte_pi->pst != 1)
> > +		modify_irte(&ir_data->irq_2_iommu, irte);
> 
> What happens if user changes the IRQ affinity manually?

If the IRQ is posted, its affinity is controlled by guest (irq <---> vCPU <----> pCPU),
it has no effect when host changes its affinity.

Thanks,
Feng

> 
> >
> >  	/*
> >  	 * After this point, all the interrupts will start arriving
> 
> 
> Best regards,
> Yang
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z Dec. 19, 2014, 1:46 a.m. UTC | #3
Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-18:
>> jiang.liu@linux.intel.com
>> Cc: eric.auger@linaro.org; linux-kernel@vger.kernel.org;
>> iommu@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng
>> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for
>> VT-d Posted-Interrupts
>> 
>> Feng Wu wrote on 2014-12-12:
>>> We don't need to migrate the irqs for VT-d Posted-Interrupts here.
>>> When 'pst' is set in IRTE, the associated irq will be posted to
>>> guests instead of interrupt remapping. The destination of the
>>> interrupt is set in Posted-Interrupts Descriptor, and the
>>> migration happens during vCPU scheduling.
>>> 
>>> However, we still update the cached irte here, which can be used
>>> when changing back to remapping mode.
>>> 
>>> Signed-off-by: Feng Wu <feng.wu@intel.com>
>>> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
>>> ---
>>>  drivers/iommu/intel_irq_remapping.c | 6 +++++-
>>>  1 file changed, 5 insertions(+), 1 deletion(-) diff --git
>>> a/drivers/iommu/intel_irq_remapping.c
>>> b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a
>>> 100644
>>> --- a/drivers/iommu/intel_irq_remapping.c +++
>>> b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@
>>> intel_ir_set_affinity(struct irq_data *data, const struct cpumask
>>> *mask,  {
>>>  	struct intel_ir_data *ir_data = data->chip_data; 	struct irte *irte =
>>>  &ir_data->irte_entry; +	struct irte_pi *irte_pi = (struct irte_pi
>>>  *)irte; 	struct irq_cfg *cfg = irqd_cfg(data); 	struct irq_data *parent
>>>  = data->parent_data; 	int ret;
>>> @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data,
>>> const struct cpumask *mask,
>>>  	 */
>>>  	irte->vector = cfg->vector;
>>>  	irte->dest_id = IRTE_DEST(cfg->dest_apicid);
>>> -	modify_irte(&ir_data->irq_2_iommu, irte);
>>> +
>>> +	/* We don't need to modify irte if the interrupt is for posting. */
>>> +	if (irte_pi->pst != 1)
>>> +		modify_irte(&ir_data->irq_2_iommu, irte);
>> 
>> What happens if user changes the IRQ affinity manually?
> 
> If the IRQ is posted, its affinity is controlled by guest (irq <--->
> vCPU <----> pCPU), it has no effect when host changes its affinity.

That's the problem: User is able to changes it in host but it never takes effect since it is actually controlled by guest. I guess it will break the IRQ balance too.

> 
> Thanks,
> Feng
> 
>> 
>>> 
>>>  	/*
>>>  	 * After this point, all the interrupts will start arriving
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Dec. 19, 2014, 11:59 a.m. UTC | #4
On 19/12/2014 02:46, Zhang, Yang Z wrote:
>> If the IRQ is posted, its affinity is controlled by guest (irq <--->
>> vCPU <----> pCPU), it has no effect when host changes its affinity.
> 
> That's the problem: User is able to changes it in host but it never
> takes effect since it is actually controlled by guest. I guess it
> will break the IRQ balance too.

I don't think that's a problem.

Controlling the affinity in the host affects which CPU in the host takes
care of signaling the guest.

If this signaling is done directly by the chipset, there is no need to
do anything in the host and thus the host affinity can be bypassed.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z Dec. 23, 2014, 12:37 a.m. UTC | #5
Paolo Bonzini wrote on 2014-12-19:
> 
> 
> On 19/12/2014 02:46, Zhang, Yang Z wrote:
>>> If the IRQ is posted, its affinity is controlled by guest (irq
>>> <---> vCPU <----> pCPU), it has no effect when host changes its affinity.
>> 
>> That's the problem: User is able to changes it in host but it never
>> takes effect since it is actually controlled by guest. I guess it
>> will break the IRQ balance too.
> 
> I don't think that's a problem.
> 
> Controlling the affinity in the host affects which CPU in the host
> takes care of signaling the guest.
> 
> If this signaling is done directly by the chipset, there is no need to
> do anything in the host and thus the host affinity can be bypassed.

I don't quite understand it. If user set an interrupt's affinity to a CPU, but he still see the interrupt delivers to other CPUs in host. Do you think it is a right behavior?

> 
> Paolo


Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Dec. 23, 2014, 8:47 a.m. UTC | #6
On 23/12/2014 01:37, Zhang, Yang Z wrote:
> I don't quite understand it. If user set an interrupt's affinity to a
> CPU, but he still see the interrupt delivers to other CPUs in host.
> Do you think it is a right behavior?

No, the interrupt is not delivered at all in the host.  Normally you'd have:

- interrupt delivered to CPU from host affinity

- VFIO interrupt handler writes to irqfd

- interrupt delivered to vCPU from guest affinity

Here, you just skip the first two steps.  The interrupt is delivered to
the thread that is running the vCPU directly, so the host affinity is
bypassed entirely.

... unless you are considering the case where the vCPU is blocked and
the host is processing the posted interrupt wakeup vector.  In that case
yes, it would be better to set NDST to a CPU matching the host affinity.
 But it would be handled in patch 24.  We also have the same problem
with lowest-priority interrupts; likely the host has configured the
interrupt affinity for any CPU.  So we can do it later when we add
vector hashing support.  In the meanwhile, Feng, please add a FIXME comment.

Does this make sense?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wu, Feng Dec. 23, 2014, 9:07 a.m. UTC | #7
> -----Original Message-----
> From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo
> Bonzini
> Sent: Tuesday, December 23, 2014 4:48 PM
> To: Zhang, Yang Z; Paolo Bonzini; Wu, Feng; Thomas Gleixner; Ingo Molnar; H.
> Peter Anvin; x86@kernel.org; Gleb Natapov; dwmw2@infradead.org;
> joro@8bytes.org; Alex Williamson; Jiang Liu
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: Re: [v3 06/26] iommu, x86: No need to migrating irq for VT-d
> Posted-Interrupts
> 
> 
> 
> On 23/12/2014 01:37, Zhang, Yang Z wrote:
> > I don't quite understand it. If user set an interrupt's affinity to a
> > CPU, but he still see the interrupt delivers to other CPUs in host.
> > Do you think it is a right behavior?
> 
> No, the interrupt is not delivered at all in the host.  Normally you'd have:
> 
> - interrupt delivered to CPU from host affinity
> 
> - VFIO interrupt handler writes to irqfd
> 
> - interrupt delivered to vCPU from guest affinity
> 
> Here, you just skip the first two steps.  The interrupt is delivered to
> the thread that is running the vCPU directly, so the host affinity is
> bypassed entirely.
> 
> ... unless you are considering the case where the vCPU is blocked and
> the host is processing the posted interrupt wakeup vector.  In that case
> yes, it would be better to set NDST to a CPU matching the host affinity.

In my understanding, wakeup vector should have no relationship with the
host affinity of the irq. Wakeup notification event should be delivered to
the pCPU which the vCPU was blocked on. And in kernel's point of view,
the irq is not associated with the wakeup vector, right?

Thanks,
Feng

>  But it would be handled in patch 24.  We also have the same problem
> with lowest-priority interrupts; likely the host has configured the
> interrupt affinity for any CPU.  So we can do it later when we add
> vector hashing support.  In the meanwhile, Feng, please add a FIXME
> comment.
> 
> Does this make sense?
> 
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Dec. 23, 2014, 9:34 a.m. UTC | #8
On 23/12/2014 10:07, Wu, Feng wrote:
>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>> I don't quite understand it. If user set an interrupt's affinity to a
>>> CPU, but he still see the interrupt delivers to other CPUs in host.
>>> Do you think it is a right behavior?
>>
>> No, the interrupt is not delivered at all in the host.  Normally you'd have:
>>
>> - interrupt delivered to CPU from host affinity
>>
>> - VFIO interrupt handler writes to irqfd
>>
>> - interrupt delivered to vCPU from guest affinity
>>
>> Here, you just skip the first two steps.  The interrupt is delivered to
>> the thread that is running the vCPU directly, so the host affinity is
>> bypassed entirely.
>>
>> ... unless you are considering the case where the vCPU is blocked and
>> the host is processing the posted interrupt wakeup vector.  In that case
>> yes, it would be better to set NDST to a CPU matching the host affinity.
> 
> In my understanding, wakeup vector should have no relationship with the
> host affinity of the irq. Wakeup notification event should be delivered to
> the pCPU which the vCPU was blocked on. And in kernel's point of view,
> the irq is not associated with the wakeup vector, right?

That is correct indeed.  It is not associated to the wakeup vector,
hence this patch is right, I think.

However, the wakeup vector has the same function as the VFIO interrupt
handler, so you could argue that it is tied to the host affinity rather
than the guest.  Let's wait for Yang to answer.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z Dec. 24, 2014, 1:38 a.m. UTC | #9
Paolo Bonzini wrote on 2014-12-23:
> 
> 
> On 23/12/2014 10:07, Wu, Feng wrote:
>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>> I don't quite understand it. If user set an interrupt's affinity
>>>> to a CPU, but he still see the interrupt delivers to other CPUs in host.
>>>> Do you think it is a right behavior?
>>> 
>>> No, the interrupt is not delivered at all in the host.  Normally you'd have:
>>> 
>>> - interrupt delivered to CPU from host affinity
>>> 
>>> - VFIO interrupt handler writes to irqfd
>>> 
>>> - interrupt delivered to vCPU from guest affinity
>>> 
>>> Here, you just skip the first two steps.  The interrupt is
>>> delivered to the thread that is running the vCPU directly, so the
>>> host affinity is bypassed entirely.
>>> 
>>> ... unless you are considering the case where the vCPU is blocked
>>> and the host is processing the posted interrupt wakeup vector.  In
>>> that case yes, it would be better to set NDST to a CPU matching the host affinity.
>> 
>> In my understanding, wakeup vector should have no relationship with
>> the host affinity of the irq. Wakeup notification event should be
>> delivered to the pCPU which the vCPU was blocked on. And in kernel's
>> point of view, the irq is not associated with the wakeup vector, right?
> 
> That is correct indeed.  It is not associated to the wakeup vector,
> hence this patch is right, I think.
> 
> However, the wakeup vector has the same function as the VFIO interrupt
> handler, so you could argue that it is tied to the host affinity
> rather than the guest.  Let's wait for Yang to answer.

Actually, that's my original question too. I am wondering what happens if the user changes the assigned device's affinity in host's /proc/irq/? If ignore it is acceptable, then this patch is ok. But it seems the discussion out of my scope, need some experts to tell us their idea since it will impact the user experience. 

> 
> Paolo


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiang Liu Dec. 24, 2014, 2:12 a.m. UTC | #10
On 2014/12/24 9:38, Zhang, Yang Z wrote:
> Paolo Bonzini wrote on 2014-12-23:
>>
>>
>> On 23/12/2014 10:07, Wu, Feng wrote:
>>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>>> I don't quite understand it. If user set an interrupt's affinity
>>>>> to a CPU, but he still see the interrupt delivers to other CPUs in host.
>>>>> Do you think it is a right behavior?
>>>>
>>>> No, the interrupt is not delivered at all in the host.  Normally you'd have:
>>>>
>>>> - interrupt delivered to CPU from host affinity
>>>>
>>>> - VFIO interrupt handler writes to irqfd
>>>>
>>>> - interrupt delivered to vCPU from guest affinity
>>>>
>>>> Here, you just skip the first two steps.  The interrupt is
>>>> delivered to the thread that is running the vCPU directly, so the
>>>> host affinity is bypassed entirely.
>>>>
>>>> ... unless you are considering the case where the vCPU is blocked
>>>> and the host is processing the posted interrupt wakeup vector.  In
>>>> that case yes, it would be better to set NDST to a CPU matching the host affinity.
>>>
>>> In my understanding, wakeup vector should have no relationship with
>>> the host affinity of the irq. Wakeup notification event should be
>>> delivered to the pCPU which the vCPU was blocked on. And in kernel's
>>> point of view, the irq is not associated with the wakeup vector, right?
>>
>> That is correct indeed.  It is not associated to the wakeup vector,
>> hence this patch is right, I think.
>>
>> However, the wakeup vector has the same function as the VFIO interrupt
>> handler, so you could argue that it is tied to the host affinity
>> rather than the guest.  Let's wait for Yang to answer.
> 
> Actually, that's my original question too. I am wondering what happens if the user changes the assigned device's affinity in host's /proc/irq/? If ignore it is acceptable, then this patch is ok. But it seems the discussion out of my scope, need some experts to tell us their idea since it will impact the user experience. 
Hi Yang,
	Originally we have a proposal to return failure when user
sets IRQ affinity through native OS interfaces if an IRQ is in PI
mode. But that proposal will break CPU hot-removal because OS needs
to migrate away all IRQs binding to the CPU to be offlined. Then we
propose saving user IRQ affinity setting without changing hardware
configuration (keeping PI configuration). Later when PI mode is
disabled, the cached affinity setting will be used to setup IRQ
destination for native OS. On the other hand, for IRQ in PI mode,
it won't be delivered to native OS, so user may not sense that
the IRQ is delivered to CPUs other than those in the affinity set.
In that aspect, I think it's acceptable:)
Regards!
Gerry
> 
>>
>> Paolo
> 
> 
> Best regards,
> Yang
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z Dec. 24, 2014, 2:32 a.m. UTC | #11
Jiang Liu wrote on 2014-12-24:
> On 2014/12/24 9:38, Zhang, Yang Z wrote:
>> Paolo Bonzini wrote on 2014-12-23:
>>> 
>>> 
>>> On 23/12/2014 10:07, Wu, Feng wrote:
>>>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>>>> I don't quite understand it. If user set an interrupt's affinity
>>>>>> to a CPU, but he still see the interrupt delivers to other CPUs in host.
>>>>>> Do you think it is a right behavior?
>>>>> 
>>>>> No, the interrupt is not delivered at all in the host.  Normally you'd have:
>>>>> 
>>>>> - interrupt delivered to CPU from host affinity
>>>>> 
>>>>> - VFIO interrupt handler writes to irqfd
>>>>> 
>>>>> - interrupt delivered to vCPU from guest affinity
>>>>> 
>>>>> Here, you just skip the first two steps.  The interrupt is
>>>>> delivered to the thread that is running the vCPU directly, so the
>>>>> host affinity is bypassed entirely.
>>>>> 
>>>>> ... unless you are considering the case where the vCPU is blocked
>>>>> and the host is processing the posted interrupt wakeup vector.
>>>>> In that case yes, it would be better to set NDST to a CPU
>>>>> matching the host
> affinity.
>>>> 
>>>> In my understanding, wakeup vector should have no relationship
>>>> with the host affinity of the irq. Wakeup notification event
>>>> should be delivered to the pCPU which the vCPU was blocked on. And
>>>> in kernel's point of view, the irq is not associated with the wakeup vector, right?
>>> 
>>> That is correct indeed.  It is not associated to the wakeup vector,
>>> hence this patch is right, I think.
>>> 
>>> However, the wakeup vector has the same function as the VFIO
>>> interrupt handler, so you could argue that it is tied to the host
>>> affinity rather than the guest.  Let's wait for Yang to answer.
>> 
>> Actually, that's my original question too. I am wondering what
>> happens if the
> user changes the assigned device's affinity in host's /proc/irq/? If
> ignore it is acceptable, then this patch is ok. But it seems the
> discussion out of my scope, need some experts to tell us their idea since it will impact the user experience.
> Hi Yang,

Hi Jiang,

> 	Originally we have a proposal to return failure when user sets IRQ
> affinity through native OS interfaces if an IRQ is in PI mode. But
> that proposal will break CPU hot-removal because OS needs to migrate
> away all IRQs binding to the CPU to be offlined. Then we propose
> saving user IRQ affinity setting without changing hardware
> configuration (keeping PI configuration). Later when PI mode is
> disabled, the cached affinity setting will be used to setup IRQ
> destination for native OS. On the other hand, for IRQ in PI mode, it
> won't be delivered to native OS, so user may not sense that the IRQ is delivered to CPUs other than those in the affinity set.

The IRQ is still there but will be delivered to host in the form of PI event(if the VCPU is running in root-mode). I am not sure whether those interrupts should be reflected in /proc/interrupts? If the answer is yes, then which entries should be used, a new PI entry or use the original IRQ entry?

> In that aspect, I think it's acceptable:) Regards!

Yes, if all of you guys(especially the IRQ maintainer) are think it is acceptable then we can follow current implementation and document it.

> Gerry
>> 
>>> 
>>> Paolo
>> 
>> 
>> Best regards,
>> Yang
>> 
>>


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wu, Feng Dec. 24, 2014, 3:08 a.m. UTC | #12
> -----Original Message-----
> From: Zhang, Yang Z
> Sent: Wednesday, December 24, 2014 10:33 AM
> To: Jiang Liu; Paolo Bonzini; Wu, Feng; Thomas Gleixner; Ingo Molnar; H. Peter
> Anvin; x86@kernel.org; Gleb Natapov; dwmw2@infradead.org;
> joro@8bytes.org; Alex Williamson
> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org; KVM list;
> Eric Auger
> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d
> Posted-Interrupts
> 
> Jiang Liu wrote on 2014-12-24:
> > On 2014/12/24 9:38, Zhang, Yang Z wrote:
> >> Paolo Bonzini wrote on 2014-12-23:
> >>>
> >>>
> >>> On 23/12/2014 10:07, Wu, Feng wrote:
> >>>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
> >>>>>> I don't quite understand it. If user set an interrupt's affinity
> >>>>>> to a CPU, but he still see the interrupt delivers to other CPUs in host.
> >>>>>> Do you think it is a right behavior?
> >>>>>
> >>>>> No, the interrupt is not delivered at all in the host.  Normally you'd
> have:
> >>>>>
> >>>>> - interrupt delivered to CPU from host affinity
> >>>>>
> >>>>> - VFIO interrupt handler writes to irqfd
> >>>>>
> >>>>> - interrupt delivered to vCPU from guest affinity
> >>>>>
> >>>>> Here, you just skip the first two steps.  The interrupt is
> >>>>> delivered to the thread that is running the vCPU directly, so the
> >>>>> host affinity is bypassed entirely.
> >>>>>
> >>>>> ... unless you are considering the case where the vCPU is blocked
> >>>>> and the host is processing the posted interrupt wakeup vector.
> >>>>> In that case yes, it would be better to set NDST to a CPU
> >>>>> matching the host
> > affinity.
> >>>>
> >>>> In my understanding, wakeup vector should have no relationship
> >>>> with the host affinity of the irq. Wakeup notification event
> >>>> should be delivered to the pCPU which the vCPU was blocked on. And
> >>>> in kernel's point of view, the irq is not associated with the wakeup vector,
> right?
> >>>
> >>> That is correct indeed.  It is not associated to the wakeup vector,
> >>> hence this patch is right, I think.
> >>>
> >>> However, the wakeup vector has the same function as the VFIO
> >>> interrupt handler, so you could argue that it is tied to the host
> >>> affinity rather than the guest.  Let's wait for Yang to answer.
> >>
> >> Actually, that's my original question too. I am wondering what
> >> happens if the
> > user changes the assigned device's affinity in host's /proc/irq/? If
> > ignore it is acceptable, then this patch is ok. But it seems the
> > discussion out of my scope, need some experts to tell us their idea since it will
> impact the user experience.
> > Hi Yang,
> 
> Hi Jiang,
> 
> > 	Originally we have a proposal to return failure when user sets IRQ
> > affinity through native OS interfaces if an IRQ is in PI mode. But
> > that proposal will break CPU hot-removal because OS needs to migrate
> > away all IRQs binding to the CPU to be offlined. Then we propose
> > saving user IRQ affinity setting without changing hardware
> > configuration (keeping PI configuration). Later when PI mode is
> > disabled, the cached affinity setting will be used to setup IRQ
> > destination for native OS. On the other hand, for IRQ in PI mode, it
> > won't be delivered to native OS, so user may not sense that the IRQ is
> delivered to CPUs other than those in the affinity set.
> 
> The IRQ is still there but will be delivered to host in the form of PI event(if the
> VCPU is running in root-mode). I am not sure whether those interrupts should
> be reflected in /proc/interrupts? If the answer is yes, then which entries should
> be used, a new PI entry or use the original IRQ entry?

Even though, setting the affinity of the IRQ in host should not affect the destination of the
PI event (normal notification event of wakeup notification event), because the destination
of the PI event is determined in NDST field of Posted-interrupts descriptor and PI notification
vector is global. Just had a discussion with Jiang offline, maybe we can add the statistics
information for the notification vector in /proc/interrupts just like any other global
interrupts.

Thanks,
Feng

> 
> > In that aspect, I think it's acceptable:) Regards!
> 
> Yes, if all of you guys(especially the IRQ maintainer) are think it is acceptable
> then we can follow current implementation and document it.
> 
> > Gerry
> >>
> >>>
> >>> Paolo
> >>
> >>
> >> Best regards,
> >> Yang
> >>
> >>
> 
> 
> Best regards,
> Yang
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhang, Yang Z Dec. 24, 2014, 4:04 a.m. UTC | #13
Wu, Feng wrote on 2014-12-24:
> 
> 
> Zhang, Yang Z wrote on 2014-12-24:
>> Cc: iommu@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
>> KVM list; Eric Auger
>> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for
>> VT-d Posted-Interrupts
>> 
>> Jiang Liu wrote on 2014-12-24:
>>> On 2014/12/24 9:38, Zhang, Yang Z wrote:
>>>> Paolo Bonzini wrote on 2014-12-23:
>>>>> 
>>>>> 
>>>>> On 23/12/2014 10:07, Wu, Feng wrote:
>>>>>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>>>>>> I don't quite understand it. If user set an interrupt's affinity
>>>>>>>> to a CPU, but he still see the interrupt delivers to other CPUs
>>>>>>>> in host. Do you think it is a right behavior?
>>>>>>> 
>>>>>>> No, the interrupt is not delivered at all in the host. Normally
>>>>>>> you'd have:
>>>>>>> 
>>>>>>> - interrupt delivered to CPU from host affinity
>>>>>>> 
>>>>>>> - VFIO interrupt handler writes to irqfd
>>>>>>> 
>>>>>>> - interrupt delivered to vCPU from guest affinity
>>>>>>> 
>>>>>>> Here, you just skip the first two steps.  The interrupt is
>>>>>>> delivered to the thread that is running the vCPU directly, so
>>>>>>> the host affinity is bypassed entirely.
>>>>>>> 
>>>>>>> ... unless you are considering the case where the vCPU is
>>>>>>> blocked and the host is processing the posted interrupt wakeup vector.
>>>>>>> In that case yes, it would be better to set NDST to a CPU
>>>>>>> matching the host
>>> affinity.
>>>>>> 
>>>>>> In my understanding, wakeup vector should have no relationship
>>>>>> with the host affinity of the irq. Wakeup notification event
>>>>>> should be delivered to the pCPU which the vCPU was blocked on.
>>>>>> And in kernel's point of view, the irq is not associated with
>>>>>> the wakeup vector,
>> right?
>>>>> 
>>>>> That is correct indeed.  It is not associated to the wakeup
>>>>> vector, hence this patch is right, I think.
>>>>> 
>>>>> However, the wakeup vector has the same function as the VFIO
>>>>> interrupt handler, so you could argue that it is tied to the
>>>>> host affinity rather than the guest.  Let's wait for Yang to answer.
>>>> 
>>>> Actually, that's my original question too. I am wondering what
>>>> happens if the
>>> user changes the assigned device's affinity in host's /proc/irq/? If
>>> ignore it is acceptable, then this patch is ok. But it seems the
>>> discussion out of my scope, need some experts to tell us their idea
>>> since it will impact the user experience. Hi Yang,
>> 
>> Hi Jiang,
>> 
>>> 	Originally we have a proposal to return failure when user sets
>>> IRQ affinity through native OS interfaces if an IRQ is in PI mode.
>>> But that proposal will break CPU hot-removal because OS needs to
>>> migrate away all IRQs binding to the CPU to be offlined. Then we
>>> propose saving user IRQ affinity setting without changing hardware
>>> configuration (keeping PI configuration). Later when PI mode is
>>> disabled, the cached affinity setting will be used to setup IRQ
>>> destination for native OS. On the other hand, for IRQ in PI mode,
>>> it won't be delivered to native OS, so user may not sense that the
>>> IRQ is
>> delivered to CPUs other than those in the affinity set.
>> 
>> The IRQ is still there but will be delivered to host in the form of
>> PI event(if the VCPU is running in root-mode). I am not sure whether
>> those interrupts should be reflected in /proc/interrupts? If the
>> answer is yes, then which entries should be used, a new PI entry or
>> use the
> original IRQ entry?
> 
> Even though, setting the affinity of the IRQ in host should not affect
> the destination of the PI event (normal notification event of wakeup

This is your implementation. To me, disable PI if the VCPU is going to 
run in the CPU out of IRQ affinity bitmap also is acceptable. And it will 
keep the user interface looks the same as before. 

Hi Thomas, Ingo, Peter

Can you guys help to review this patch? Really appreciate if you can give
some feedbacks.

> notification event), because the destination of the PI event is
> determined in NDST field of Posted-interrupts descriptor and PI
> notification vector is global. Just had a discussion with Jiang
> offline, maybe we can add the statistics information for the notification vector in /proc/interrupts just like any other global interrupts.
> 
> Thanks,
> Feng
> 
>> 
>>> In that aspect, I think it's acceptable:) Regards!
>> 
>> Yes, if all of you guys(especially the IRQ maintainer) are think it
>> is acceptable then we can follow current implementation and document it.
>> 
>>> Gerry
>>>> 
>>>>> 
>>>>> Paolo
>>>> 
>>>> 
>>>> Best regards,
>>>> Yang
>>>> 
>>>> 
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiang Liu Dec. 24, 2014, 4:54 a.m. UTC | #14
On 2014/12/24 10:32, Zhang, Yang Z wrote:
> Jiang Liu wrote on 2014-12-24:
>> On 2014/12/24 9:38, Zhang, Yang Z wrote:
>>> Paolo Bonzini wrote on 2014-12-23:
>>>>
>>>>
>>>> On 23/12/2014 10:07, Wu, Feng wrote:
>>>>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>>>>> I don't quite understand it. If user set an interrupt's affinity
>>>>>>> to a CPU, but he still see the interrupt delivers to other CPUs in host.
>>>>>>> Do you think it is a right behavior?
>>>>>>
>>>>>> No, the interrupt is not delivered at all in the host.  Normally you'd have:
>>>>>>
>>>>>> - interrupt delivered to CPU from host affinity
>>>>>>
>>>>>> - VFIO interrupt handler writes to irqfd
>>>>>>
>>>>>> - interrupt delivered to vCPU from guest affinity
>>>>>>
>>>>>> Here, you just skip the first two steps.  The interrupt is
>>>>>> delivered to the thread that is running the vCPU directly, so the
>>>>>> host affinity is bypassed entirely.
>>>>>>
>>>>>> ... unless you are considering the case where the vCPU is blocked
>>>>>> and the host is processing the posted interrupt wakeup vector.
>>>>>> In that case yes, it would be better to set NDST to a CPU
>>>>>> matching the host
>> affinity.
>>>>>
>>>>> In my understanding, wakeup vector should have no relationship
>>>>> with the host affinity of the irq. Wakeup notification event
>>>>> should be delivered to the pCPU which the vCPU was blocked on. And
>>>>> in kernel's point of view, the irq is not associated with the wakeup vector, right?
>>>>
>>>> That is correct indeed.  It is not associated to the wakeup vector,
>>>> hence this patch is right, I think.
>>>>
>>>> However, the wakeup vector has the same function as the VFIO
>>>> interrupt handler, so you could argue that it is tied to the host
>>>> affinity rather than the guest.  Let's wait for Yang to answer.
>>>
>>> Actually, that's my original question too. I am wondering what
>>> happens if the
>> user changes the assigned device's affinity in host's /proc/irq/? If
>> ignore it is acceptable, then this patch is ok. But it seems the
>> discussion out of my scope, need some experts to tell us their idea since it will impact the user experience.
>> Hi Yang,
> 
> Hi Jiang,
> 
>> 	Originally we have a proposal to return failure when user sets IRQ
>> affinity through native OS interfaces if an IRQ is in PI mode. But
>> that proposal will break CPU hot-removal because OS needs to migrate
>> away all IRQs binding to the CPU to be offlined. Then we propose
>> saving user IRQ affinity setting without changing hardware
>> configuration (keeping PI configuration). Later when PI mode is
>> disabled, the cached affinity setting will be used to setup IRQ
>> destination for native OS. On the other hand, for IRQ in PI mode, it
>> won't be delivered to native OS, so user may not sense that the IRQ is delivered to CPUs other than those in the affinity set.
> 
> The IRQ is still there but will be delivered to host in the form of PI event(if the VCPU is running in root-mode). I am not sure whether those interrupts should be reflected in /proc/interrupts? If the answer is yes, then which entries should be used, a new PI entry or use the original IRQ entry?

You are right, the native interrupt statistics will become inaccurate.
Maybe some document about this behavior is preferred.

> 
>> In that aspect, I think it's acceptable:) Regards!
> 
> Yes, if all of you guys(especially the IRQ maintainer) are think it is acceptable then we can follow current implementation and document it.
Good suggestion, we will send an email to Thomas for advice after New
Year.
> 
>> Gerry
>>>
>>>>
>>>> Paolo
>>>
>>>
>>> Best regards,
>>> Yang
>>>
>>>
> 
> 
> Best regards,
> Yang
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Woodhouse Jan. 28, 2015, 3:29 p.m. UTC | #15
On Fri, 2014-12-12 at 23:14 +0800, Feng Wu wrote:
> We don't need to migrate the irqs for VT-d Posted-Interrupts here.
> When 'pst' is set in IRTE, the associated irq will be posted to
> guests instead of interrupt remapping. The destination of the
> interrupt is set in Posted-Interrupts Descriptor, and the migration
> happens during vCPU scheduling.
> 
> However, we still update the cached irte here, which can be used
> when changing back to remapping mode.
> 
> Signed-off-by: Feng Wu <feng.wu@intel.com>
> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>

Acked-by: David Woodhouse <David.Woodhouse@intel.com>
diff mbox

Patch

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 48c2051..ab9057a 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -977,6 +977,7 @@  intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
 {
 	struct intel_ir_data *ir_data = data->chip_data;
 	struct irte *irte = &ir_data->irte_entry;
+	struct irte_pi *irte_pi = (struct irte_pi *)irte;
 	struct irq_cfg *cfg = irqd_cfg(data);
 	struct irq_data *parent = data->parent_data;
 	int ret;
@@ -991,7 +992,10 @@  intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	 */
 	irte->vector = cfg->vector;
 	irte->dest_id = IRTE_DEST(cfg->dest_apicid);
-	modify_irte(&ir_data->irq_2_iommu, irte);
+
+	/* We don't need to modify irte if the interrupt is for posting. */
+	if (irte_pi->pst != 1)
+		modify_irte(&ir_data->irq_2_iommu, irte);
 
 	/*
 	 * After this point, all the interrupts will start arriving