diff mbox series

irqchip/gic-v3-its: Don't try to move a disabled irq

Message ID 20200529015501.15771-1-alisaidi@amazon.com (mailing list archive)
State New, archived
Headers show
Series irqchip/gic-v3-its: Don't try to move a disabled irq | expand

Commit Message

Ali Saidi May 29, 2020, 1:55 a.m. UTC
If an interrupt is disabled the ITS driver has sent a discard removing
the DeviceID and EventID from the ITT. After this occurs it can't be
moved to another collection with a MOVI and a command error occurs if
attempted. Before issuing the MOVI command make sure that the IRQ isn't
disabled and change the activate code to try and use the previous
affinity.

Signed-off-by: Ali Saidi <alisaidi@amazon.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Zenghui Yu May 29, 2020, 4:07 a.m. UTC | #1
Hi,

On 2020/5/29 9:55, Ali Saidi wrote:
> If an interrupt is disabled the ITS driver has sent a discard removing
> the DeviceID and EventID from the ITT. After this occurs it can't be
> moved to another collection with a MOVI and a command error occurs if
> attempted. Before issuing the MOVI command make sure that the IRQ isn't
> disabled and change the activate code to try and use the previous
> affinity.
> 
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> ---
>   drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 124251b0ccba..1235dd9a2fb2 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1540,7 +1540,11 @@ static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
>   	/* don't set the affinity when the target cpu is same as current one */
>   	if (cpu != its_dev->event_map.col_map[id]) {
>   		target_col = &its_dev->its->collections[cpu];
> -		its_send_movi(its_dev, target_col, id);
> +
> +		/* If the IRQ is disabled a discard was sent so don't move */
> +		if (!irqd_irq_disabled(d))
> +			its_send_movi(its_dev, target_col, id);

It looks to me that if the IRQ is disabled, we mask the enable bit in
the corresponding LPI configuration table entry, but not sending DISCARD
to remove the DevID/EventID mapping. And moving a disabled LPI is
actually allowed by the GIC architecture, right?

> +
>   		its_dev->event_map.col_map[id] = cpu;
>   		irq_data_update_effective_affinity(d, cpumask_of(cpu));
>   	}
> @@ -3439,8 +3443,16 @@ static int its_irq_domain_activate(struct irq_domain *domain,
>   	if (its_dev->its->numa_node >= 0)
>   		cpu_mask = cpumask_of_node(its_dev->its->numa_node);
>   
> -	/* Bind the LPI to the first possible CPU */
> -	cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
> +	/* If the cpu set to a different CPU that is still online use it */
> +	cpu = its_dev->event_map.col_map[event];
> +
> +	cpumask_and(cpu_mask, cpu_mask, cpu_online_mask);
> +
> +	if (!cpumask_test_cpu(cpu, cpu_mask)) {
> +		/* Bind the LPI to the first possible CPU */
> +		cpu = cpumask_first(cpu_mask);
> +	}

I'd like to know what actual problem you had seen and the way to
reproduce it :-)


Thanks,
Zenghui
Marc Zyngier May 29, 2020, 8:32 a.m. UTC | #2
Hi Ali,

On 2020-05-29 02:55, Ali Saidi wrote:
> If an interrupt is disabled the ITS driver has sent a discard removing
> the DeviceID and EventID from the ITT. After this occurs it can't be
> moved to another collection with a MOVI and a command error occurs if
> attempted. Before issuing the MOVI command make sure that the IRQ isn't
> disabled and change the activate code to try and use the previous
> affinity.
> 
> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index 124251b0ccba..1235dd9a2fb2 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1540,7 +1540,11 @@ static int its_set_affinity(struct irq_data *d,
> const struct cpumask *mask_val,
>  	/* don't set the affinity when the target cpu is same as current one 
> */
>  	if (cpu != its_dev->event_map.col_map[id]) {
>  		target_col = &its_dev->its->collections[cpu];
> -		its_send_movi(its_dev, target_col, id);
> +
> +		/* If the IRQ is disabled a discard was sent so don't move */
> +		if (!irqd_irq_disabled(d))
> +			its_send_movi(its_dev, target_col, id);
> +

This looks wrong. What you are testing here is whether the interrupt
is masked, not that there isn't a valid translation.

In the commit message, you're saying that we've issued a discard. This
hints at doing a set_affinity on an interrupt that has been deactivated
(mapping removed). Is that actually the case? If so, why was it 
deactivated
the first place?

>  		its_dev->event_map.col_map[id] = cpu;
>  		irq_data_update_effective_affinity(d, cpumask_of(cpu));
>  	}
> @@ -3439,8 +3443,16 @@ static int its_irq_domain_activate(struct
> irq_domain *domain,
>  	if (its_dev->its->numa_node >= 0)
>  		cpu_mask = cpumask_of_node(its_dev->its->numa_node);
> 
> -	/* Bind the LPI to the first possible CPU */
> -	cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
> +	/* If the cpu set to a different CPU that is still online use it */
> +	cpu = its_dev->event_map.col_map[event];
> +
> +	cpumask_and(cpu_mask, cpu_mask, cpu_online_mask);
> +
> +	if (!cpumask_test_cpu(cpu, cpu_mask)) {
> +		/* Bind the LPI to the first possible CPU */
> +		cpu = cpumask_first(cpu_mask);
> +	}
> +
>  	if (cpu >= nr_cpu_ids) {
>  		if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144)
>  			return -EINVAL;

So you deactivate an interrupt, do a set_affinity that doesn't issue
a MOVI but preserves the affinity, then reactivate it and hope that
the new mapping will target the "right" CPU.

That seems a bit mad, but I presume this isn't the whole story...

Thanks,

         M.
Ali Saidi May 29, 2020, 12:36 p.m. UTC | #3
Hi Marc,

> On May 29, 2020, at 3:33 AM, Marc Zyngier <maz@kernel.org> wrote:
> 
> Hi Ali,
> 
>> On 2020-05-29 02:55, Ali Saidi wrote:
>> If an interrupt is disabled the ITS driver has sent a discard removing
>> the DeviceID and EventID from the ITT. After this occurs it can't be
>> moved to another collection with a MOVI and a command error occurs if
>> attempted. Before issuing the MOVI command make sure that the IRQ isn't
>> disabled and change the activate code to try and use the previous
>> affinity.
>> 
>> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
>> ---
>> drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++++---
>> 1 file changed, 15 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c
>> b/drivers/irqchip/irq-gic-v3-its.c
>> index 124251b0ccba..1235dd9a2fb2 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -1540,7 +1540,11 @@ static int its_set_affinity(struct irq_data *d,
>> const struct cpumask *mask_val,
>>      /* don't set the affinity when the target cpu is same as current one
>> */
>>      if (cpu != its_dev->event_map.col_map[id]) {
>>              target_col = &its_dev->its->collections[cpu];
>> -             its_send_movi(its_dev, target_col, id);
>> +
>> +             /* If the IRQ is disabled a discard was sent so don't move */
>> +             if (!irqd_irq_disabled(d))
>> +                     its_send_movi(its_dev, target_col, id);
>> +
> 
> This looks wrong. What you are testing here is whether the interrupt
> is masked, not that there isn't a valid translation.
I’m not exactly sure the correct condition, but what I’m looking for is interrupts which are deactivated and we have thus sent a discard. 

> 
> In the commit message, you're saying that we've issued a discard. This
> hints at doing a set_affinity on an interrupt that has been deactivated
> (mapping removed). Is that actually the case? If so, why was it
> deactivated
> the first place?
This is the case. If we down a NIC, that interface’s MSIs will be deactivated but remain allocated until the device is unbound from the driver or the NIC is brought up. 

While stressing down/up a device I’ve found that irqbalance can move interrupts and you end up with the situation described. The device is downed, the interrupts are deactivated but still present and then trying to move one results in sending a MOVI after the DISCARD which is an error per the GIC spec. 

> 
>>              its_dev->event_map.col_map[id] = cpu;
>>              irq_data_update_effective_affinity(d, cpumask_of(cpu));
>>      }
>> @@ -3439,8 +3443,16 @@ static int its_irq_domain_activate(struct
>> irq_domain *domain,
>>      if (its_dev->its->numa_node >= 0)
>>              cpu_mask = cpumask_of_node(its_dev->its->numa_node);
>> 
>> -     /* Bind the LPI to the first possible CPU */
>> -     cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
>> +     /* If the cpu set to a different CPU that is still online use it */
>> +     cpu = its_dev->event_map.col_map[event];
>> +
>> +     cpumask_and(cpu_mask, cpu_mask, cpu_online_mask);
>> +
>> +     if (!cpumask_test_cpu(cpu, cpu_mask)) {
>> +             /* Bind the LPI to the first possible CPU */
>> +             cpu = cpumask_first(cpu_mask);
>> +     }
>> +
>>      if (cpu >= nr_cpu_ids) {
>>              if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144)
>>                      return -EINVAL;
> 
> So you deactivate an interrupt, do a set_affinity that doesn't issue
> a MOVI but preserves the affinity, then reactivate it and hope that
> the new mapping will target the "right" CPU.
> 
> That seems a bit mad, but I presume this isn't the whole story...
Doing some experiments it appears as though other interrupts controllers do preserve affinity across deactivate/activate, so this is my attempt at doing the same. 

Thanks,
Ali
Marc Zyngier May 30, 2020, 4:49 p.m. UTC | #4
Hi Ali,

On Fri, 29 May 2020 12:36:42 +0000
"Saidi, Ali" <alisaidi@amazon.com> wrote:

> Hi Marc,
> 
> > On May 29, 2020, at 3:33 AM, Marc Zyngier <maz@kernel.org> wrote:
> > 
> > Hi Ali,
> >   
> >> On 2020-05-29 02:55, Ali Saidi wrote:
> >> If an interrupt is disabled the ITS driver has sent a discard removing
> >> the DeviceID and EventID from the ITT. After this occurs it can't be
> >> moved to another collection with a MOVI and a command error occurs if
> >> attempted. Before issuing the MOVI command make sure that the IRQ isn't
> >> disabled and change the activate code to try and use the previous
> >> affinity.
> >> 
> >> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
> >> ---
> >> drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++++---
> >> 1 file changed, 15 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/drivers/irqchip/irq-gic-v3-its.c
> >> b/drivers/irqchip/irq-gic-v3-its.c
> >> index 124251b0ccba..1235dd9a2fb2 100644
> >> --- a/drivers/irqchip/irq-gic-v3-its.c
> >> +++ b/drivers/irqchip/irq-gic-v3-its.c
> >> @@ -1540,7 +1540,11 @@ static int its_set_affinity(struct irq_data *d,
> >> const struct cpumask *mask_val,
> >>      /* don't set the affinity when the target cpu is same as current one
> >> */
> >>      if (cpu != its_dev->event_map.col_map[id]) {
> >>              target_col = &its_dev->its->collections[cpu];
> >> -             its_send_movi(its_dev, target_col, id);
> >> +
> >> +             /* If the IRQ is disabled a discard was sent so don't move */
> >> +             if (!irqd_irq_disabled(d))
> >> +                     its_send_movi(its_dev, target_col, id);
> >> +  
> > 
> > This looks wrong. What you are testing here is whether the interrupt
> > is masked, not that there isn't a valid translation.  
> I’m not exactly sure the correct condition, but what I’m looking for
> is interrupts which are deactivated and we have thus sent a discard. 

That looks like IRQD_IRQ_STARTED not being set in this case.

> > 
> > In the commit message, you're saying that we've issued a discard.
> > This hints at doing a set_affinity on an interrupt that has been
> > deactivated (mapping removed). Is that actually the case? If so,
> > why was it deactivated
> > the first place?  
> This is the case. If we down a NIC, that interface’s MSIs will be
> deactivated but remain allocated until the device is unbound from the
> driver or the NIC is brought up. 
> 
> While stressing down/up a device I’ve found that irqbalance can move
> interrupts and you end up with the situation described. The device is
> downed, the interrupts are deactivated but still present and then
> trying to move one results in sending a MOVI after the DISCARD which
> is an error per the GIC spec. 

Not great indeed. But this is not, as far as I can tell, a GIC
driver problem.

The semantic of activate/deactivate (which maps to started/shutdown
in the IRQ code) is that the HW resources for a given interrupt are
only committed when the interrupt is activated. Trying to perform
actions involving the HW on an interrupt that isn't active cannot be
guaranteed to take effect.

I'd rather address it in the core code, by preventing set_affinity (and
potentially others) to take place when the interrupt is not in the
STARTED state. Userspace would get an error, which is perfectly
legitimate, and which it already has to deal with it for plenty of other
reasons.

> 
> >   
> >>              its_dev->event_map.col_map[id] = cpu;
> >>              irq_data_update_effective_affinity(d,
> >> cpumask_of(cpu)); }
> >> @@ -3439,8 +3443,16 @@ static int its_irq_domain_activate(struct
> >> irq_domain *domain,
> >>      if (its_dev->its->numa_node >= 0)
> >>              cpu_mask = cpumask_of_node(its_dev->its->numa_node);
> >> 
> >> -     /* Bind the LPI to the first possible CPU */
> >> -     cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
> >> +     /* If the cpu set to a different CPU that is still online
> >> use it */
> >> +     cpu = its_dev->event_map.col_map[event];
> >> +
> >> +     cpumask_and(cpu_mask, cpu_mask, cpu_online_mask);
> >> +
> >> +     if (!cpumask_test_cpu(cpu, cpu_mask)) {
> >> +             /* Bind the LPI to the first possible CPU */
> >> +             cpu = cpumask_first(cpu_mask);
> >> +     }
> >> +
> >>      if (cpu >= nr_cpu_ids) {
> >>              if (its_dev->its->flags &
> >> ITS_FLAGS_WORKAROUND_CAVIUM_23144) return -EINVAL;  
> > 
> > So you deactivate an interrupt, do a set_affinity that doesn't issue
> > a MOVI but preserves the affinity, then reactivate it and hope that
> > the new mapping will target the "right" CPU.
> > 
> > That seems a bit mad, but I presume this isn't the whole story...  
> Doing some experiments it appears as though other interrupts
> controllers do preserve affinity across deactivate/activate, so this
> is my attempt at doing the same. 

I believe this is only an artefact of these other controllers not
requiring any resource to be committed into the HW (SPIs wouldn't care,
for example).

Thanks,

	M.
kernel test robot May 31, 2020, 2:53 a.m. UTC | #5
Hi Ali,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on linux/master v5.7-rc7]
[cannot apply to tip/irq/core arm-jcooper/irqchip/for-next next-20200529]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Ali-Saidi/irqchip-gic-v3-its-Don-t-try-to-move-a-disabled-irq/20200531-043957
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 86852175b016f0c6873dcbc24b93d12b7b246612
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>, old ones prefixed by <<):

drivers/irqchip/irq-gic-v3-its.c: In function 'its_irq_domain_activate':
>> drivers/irqchip/irq-gic-v3-its.c:3449:14: warning: passing argument 1 of 'cpumask_and' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
3449 |  cpumask_and(cpu_mask, cpu_mask, cpu_online_mask);
|              ^~~~~~~~
In file included from include/linux/rcupdate.h:31,
from include/linux/radix-tree.h:15,
from include/linux/idr.h:15,
from include/linux/kernfs.h:13,
from include/linux/sysfs.h:16,
from include/linux/kobject.h:20,
from include/linux/of.h:17,
from include/linux/irqdomain.h:35,
from include/linux/acpi.h:13,
from drivers/irqchip/irq-gic-v3-its.c:7:
include/linux/cpumask.h:424:47: note: expected 'struct cpumask *' but argument is of type 'const struct cpumask *'
424 | static inline int cpumask_and(struct cpumask *dstp,
|                               ~~~~~~~~~~~~~~~~^~~~
In file included from include/linux/bits.h:23,
from include/linux/ioport.h:15,
from include/linux/acpi.h:12,
from drivers/irqchip/irq-gic-v3-its.c:7:
drivers/irqchip/irq-gic-v3-its.c: In function 'its_init_vpe_domain':
include/linux/bits.h:26:28: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
26 |   __builtin_constant_p((l) > (h)), (l) > (h), 0)))
|                            ^
include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
|                                                              ^
include/linux/bits.h:39:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
39 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
|   ^~~~~~~~~~~~~~~~~~~
drivers/irqchip/irq-gic-v3-its.c:4765:10: note: in expansion of macro 'GENMASK'
4765 |  devid = GENMASK(device_ids(its) - 1, 0);
|          ^~~~~~~
include/linux/bits.h:26:40: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
26 |   __builtin_constant_p((l) > (h)), (l) > (h), 0)))
|                                        ^
include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
|                                                              ^
include/linux/bits.h:39:3: note: in expansion of macro 'GENMASK_INPUT_CHECK'
39 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
|   ^~~~~~~~~~~~~~~~~~~
drivers/irqchip/irq-gic-v3-its.c:4765:10: note: in expansion of macro 'GENMASK'
4765 |  devid = GENMASK(device_ids(its) - 1, 0);
|          ^~~~~~~

vim +3449 drivers/irqchip/irq-gic-v3-its.c

  3433	
  3434	static int its_irq_domain_activate(struct irq_domain *domain,
  3435					   struct irq_data *d, bool reserve)
  3436	{
  3437		struct its_device *its_dev = irq_data_get_irq_chip_data(d);
  3438		u32 event = its_get_event_id(d);
  3439		const struct cpumask *cpu_mask = cpu_online_mask;
  3440		int cpu;
  3441	
  3442		/* get the cpu_mask of local node */
  3443		if (its_dev->its->numa_node >= 0)
  3444			cpu_mask = cpumask_of_node(its_dev->its->numa_node);
  3445	
  3446		/* If the cpu set to a different CPU that is still online use it */
  3447		cpu = its_dev->event_map.col_map[event];
  3448	
> 3449		cpumask_and(cpu_mask, cpu_mask, cpu_online_mask);
  3450	
  3451		if (!cpumask_test_cpu(cpu, cpu_mask)) {
  3452			/* Bind the LPI to the first possible CPU */
  3453			cpu = cpumask_first(cpu_mask);
  3454		}
  3455	
  3456		if (cpu >= nr_cpu_ids) {
  3457			if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144)
  3458				return -EINVAL;
  3459	
  3460			cpu = cpumask_first(cpu_online_mask);
  3461		}
  3462	
  3463		its_dev->event_map.col_map[event] = cpu;
  3464		irq_data_update_effective_affinity(d, cpumask_of(cpu));
  3465	
  3466		/* Map the GIC IRQ and event to the device */
  3467		its_send_mapti(its_dev, d->hwirq, event);
  3468		return 0;
  3469	}
  3470	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Marc Zyngier May 31, 2020, 11:09 a.m. UTC | #6
On 2020-05-30 17:49, Marc Zyngier wrote:
> Hi Ali,
> 
> On Fri, 29 May 2020 12:36:42 +0000
> "Saidi, Ali" <alisaidi@amazon.com> wrote:
> 
>> Hi Marc,
>> 
>> > On May 29, 2020, at 3:33 AM, Marc Zyngier <maz@kernel.org> wrote:
>> >
>> > Hi Ali,
>> >
>> >> On 2020-05-29 02:55, Ali Saidi wrote:
>> >> If an interrupt is disabled the ITS driver has sent a discard removing
>> >> the DeviceID and EventID from the ITT. After this occurs it can't be
>> >> moved to another collection with a MOVI and a command error occurs if
>> >> attempted. Before issuing the MOVI command make sure that the IRQ isn't
>> >> disabled and change the activate code to try and use the previous
>> >> affinity.
>> >>
>> >> Signed-off-by: Ali Saidi <alisaidi@amazon.com>
>> >> ---
>> >> drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++++---
>> >> 1 file changed, 15 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/drivers/irqchip/irq-gic-v3-its.c
>> >> b/drivers/irqchip/irq-gic-v3-its.c
>> >> index 124251b0ccba..1235dd9a2fb2 100644
>> >> --- a/drivers/irqchip/irq-gic-v3-its.c
>> >> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> >> @@ -1540,7 +1540,11 @@ static int its_set_affinity(struct irq_data *d,
>> >> const struct cpumask *mask_val,
>> >>      /* don't set the affinity when the target cpu is same as current one
>> >> */
>> >>      if (cpu != its_dev->event_map.col_map[id]) {
>> >>              target_col = &its_dev->its->collections[cpu];
>> >> -             its_send_movi(its_dev, target_col, id);
>> >> +
>> >> +             /* If the IRQ is disabled a discard was sent so don't move */
>> >> +             if (!irqd_irq_disabled(d))
>> >> +                     its_send_movi(its_dev, target_col, id);
>> >> +
>> >
>> > This looks wrong. What you are testing here is whether the interrupt
>> > is masked, not that there isn't a valid translation.
>> I’m not exactly sure the correct condition, but what I’m looking for
>> is interrupts which are deactivated and we have thus sent a discard.
> 
> That looks like IRQD_IRQ_STARTED not being set in this case.
> 
>> >
>> > In the commit message, you're saying that we've issued a discard.
>> > This hints at doing a set_affinity on an interrupt that has been
>> > deactivated (mapping removed). Is that actually the case? If so,
>> > why was it deactivated
>> > the first place?
>> This is the case. If we down a NIC, that interface’s MSIs will be
>> deactivated but remain allocated until the device is unbound from the
>> driver or the NIC is brought up.
>> 
>> While stressing down/up a device I’ve found that irqbalance can move
>> interrupts and you end up with the situation described. The device is
>> downed, the interrupts are deactivated but still present and then
>> trying to move one results in sending a MOVI after the DISCARD which
>> is an error per the GIC spec.
> 
> Not great indeed. But this is not, as far as I can tell, a GIC
> driver problem.
> 
> The semantic of activate/deactivate (which maps to started/shutdown
> in the IRQ code) is that the HW resources for a given interrupt are
> only committed when the interrupt is activated. Trying to perform
> actions involving the HW on an interrupt that isn't active cannot be
> guaranteed to take effect.
> 
> I'd rather address it in the core code, by preventing set_affinity (and
> potentially others) to take place when the interrupt is not in the
> STARTED state. Userspace would get an error, which is perfectly
> legitimate, and which it already has to deal with it for plenty of 
> other
> reasons.

How about this:

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 453a8a0f4804..1a2ac1392c0f 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -147,7 +147,8 @@ cpumask_var_t irq_default_affinity;
  static bool __irq_can_set_affinity(struct irq_desc *desc)
  {
  	if (!desc || !irqd_can_balance(&desc->irq_data) ||
-	    !desc->irq_data.chip || !desc->irq_data.chip->irq_set_affinity)
+	    !desc->irq_data.chip || !desc->irq_data.chip->irq_set_affinity ||
+	    !irqd_is_started(&desc->irq_data))
  		return false;
  	return true;
  }

Thanks,

         M.
Ali Saidi June 1, 2020, 12:10 a.m. UTC | #7
Marc, 

> On May 31, 2020, at 6:10 AM, Marc Zyngier <maz@kernel.org> wrote:
>> Not great indeed. But this is not, as far as I can tell, a GIC
>> driver problem.
>> 
>> The semantic of activate/deactivate (which maps to started/shutdown
>> in the IRQ code) is that the HW resources for a given interrupt are
>> only committed when the interrupt is activated. Trying to perform
>> actions involving the HW on an interrupt that isn't active cannot be
>> guaranteed to take effect.

Yes, then it absolutely makes sense to address it outside the GIC. 
>> 
>> I'd rather address it in the core code, by preventing set_affinity (and
>> potentially others) to take place when the interrupt is not in the
>> STARTED state. Userspace would get an error, which is perfectly
>> legitimate, and which it already has to deal with it for plenty of
>> other
>> reasons.
> 
> How about this:
> 
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 453a8a0f4804..1a2ac1392c0f 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -147,7 +147,8 @@ cpumask_var_t irq_default_affinity;
> static bool __irq_can_set_affinity(struct irq_desc *desc)
> {
>       if (!desc || !irqd_can_balance(&desc->irq_data) ||
> -           !desc->irq_data.chip || !desc->irq_data.chip->irq_set_affinity)
> +           !desc->irq_data.chip || !desc->irq_data.chip->irq_set_affinity ||
> +           !irqd_is_started(&desc->irq_data))
>               return false;
>       return true;
> }

Confirmed I can’t reproduce the issue with your fix. 

Thanks,
Ali
Herrenschmidt, Benjamin June 1, 2020, 2:40 a.m. UTC | #8
On Sun, 2020-05-31 at 12:09 +0100, Marc Zyngier wrote:
> 
> 
> > Not great indeed. But this is not, as far as I can tell, a GIC
> > driver problem.
> > 
> > The semantic of activate/deactivate (which maps to started/shutdown
> > in the IRQ code) is that the HW resources for a given interrupt are
> > only committed when the interrupt is activated. Trying to perform
> > actions involving the HW on an interrupt that isn't active cannot be
> > guaranteed to take effect.
> > 
> > I'd rather address it in the core code, by preventing set_affinity (and
> > potentially others) to take place when the interrupt is not in the
> > STARTED state. Userspace would get an error, which is perfectly
> > legitimate, and which it already has to deal with it for plenty of
> > other
> > reasons.

So I finally found time to dig a bit in there :) Code has changed a bit
since last I looked. But I have memories of the startup code messing
around with the affinity, and here it is. In irq_startup() :


		switch (__irq_startup_managed(desc, aff, force)) {
		case IRQ_STARTUP_NORMAL:
			ret = __irq_startup(desc);
			irq_setup_affinity(desc);
			break;
		case IRQ_STARTUP_MANAGED:
			irq_do_set_affinity(d, aff, false);
			ret = __irq_startup(desc);
			break;
		case IRQ_STARTUP_ABORT:
			irqd_set_managed_shutdown(d);
			return 0;

So we have two cases here. Normal and managed.

In the managed case, we set the affinity before startup. I feel like your
patch might break that or am I missing something ?

Additionally, your patch would break any userspace program that expects to
be able to change the affinity on an interrupt before it's been started.
I don't know if such a thing exsits but the fact that we hit that bug
makes me think it might.

Now most controller drivers (at least that I'm familiar with, which doesn't
include GiC at this point) can deal with that just fine.

Now there's also another possible issue:

Your patch checks irqd_is_started(). Now I always mixup irqd vs irq_state these
days so I may be wrong but irq_state_set_started() is only done in __irq_startup
which will *not* be called if the interrupt has NOAUTOEN.

Is that ok ? Do we intend for affinity setting not to work until the first
enable_irq() for such an interrupt ? We could check activated instead of
started I suppose. (again provided I didn't mixup two different things
between the irqd and the irq_state stuff).

For these reasons my gut feeling is we should just fix GIC as Ali wanted to
do initially.

The basic idea is simply to defer the HW configuration until the interrupt
has been started. I don't see why that would be an issue. Have set_affinity just
store the mask (and apply whatever other sanity checking it might want to do)
until the itnerrupt is started and when started, apply things to HW.

I might be missing a reason why it's more complicated than that :) But I do
feel a bit uncomfortable with your approach.

Cheers,
Ben.
Thomas Gleixner June 2, 2020, 8:54 p.m. UTC | #9
"Herrenschmidt, Benjamin" <benh@amazon.com> writes:
> On Sun, 2020-05-31 at 12:09 +0100, Marc Zyngier wrote:
>> > The semantic of activate/deactivate (which maps to started/shutdown
>> > in the IRQ code) is that the HW resources for a given interrupt are
>> > only committed when the interrupt is activated. Trying to perform
>> > actions involving the HW on an interrupt that isn't active cannot be
>> > guaranteed to take effect.
>> > 
>> > I'd rather address it in the core code, by preventing set_affinity (and
>> > potentially others) to take place when the interrupt is not in the
>> > STARTED state. Userspace would get an error, which is perfectly
>> > legitimate, and which it already has to deal with it for plenty of
>> > other
>> > reasons.
>
> So I finally found time to dig a bit in there :) Code has changed a bit
> since last I looked. But I have memories of the startup code messing
> around with the affinity, and here it is. In irq_startup() :
>
>
> 		switch (__irq_startup_managed(desc, aff, force)) {
> 		case IRQ_STARTUP_NORMAL:
> 			ret = __irq_startup(desc);
> 			irq_setup_affinity(desc);
> 			break;
> 		case IRQ_STARTUP_MANAGED:
> 			irq_do_set_affinity(d, aff, false);
> 			ret = __irq_startup(desc);
> 			break;
> 		case IRQ_STARTUP_ABORT:
> 			irqd_set_managed_shutdown(d);
> 			return 0;
>
> So we have two cases here. Normal and managed.
>
> In the managed case, we set the affinity before startup. I feel like your
> patch might break that or am I missing something ?

It will break stuff because the affinity is not stored in case that the
interrupt is not started.

I think we can fix this in the core code but that needs more thought.
__irq_can_set_affinity() is definitely the wrong place.

Thanks,

        tglx
Marc Zyngier June 3, 2020, 12:44 p.m. UTC | #10
On 2020-06-02 21:54, Thomas Gleixner wrote:
> "Herrenschmidt, Benjamin" <benh@amazon.com> writes:
>> On Sun, 2020-05-31 at 12:09 +0100, Marc Zyngier wrote:
>>> > The semantic of activate/deactivate (which maps to started/shutdown
>>> > in the IRQ code) is that the HW resources for a given interrupt are
>>> > only committed when the interrupt is activated. Trying to perform
>>> > actions involving the HW on an interrupt that isn't active cannot be
>>> > guaranteed to take effect.
>>> >
>>> > I'd rather address it in the core code, by preventing set_affinity (and
>>> > potentially others) to take place when the interrupt is not in the
>>> > STARTED state. Userspace would get an error, which is perfectly
>>> > legitimate, and which it already has to deal with it for plenty of
>>> > other
>>> > reasons.
>> 
>> So I finally found time to dig a bit in there :) Code has changed a 
>> bit
>> since last I looked. But I have memories of the startup code messing
>> around with the affinity, and here it is. In irq_startup() :
>> 
>> 
>> 		switch (__irq_startup_managed(desc, aff, force)) {
>> 		case IRQ_STARTUP_NORMAL:
>> 			ret = __irq_startup(desc);
>> 			irq_setup_affinity(desc);
>> 			break;
>> 		case IRQ_STARTUP_MANAGED:
>> 			irq_do_set_affinity(d, aff, false);
>> 			ret = __irq_startup(desc);

Grump. Nice catch. In hindsight, this is obvious, as managed interrupts
may have been allocated to target CPUs that have been hot-plugged off.

>> 			break;
>> 		case IRQ_STARTUP_ABORT:
>> 			irqd_set_managed_shutdown(d);
>> 			return 0;
>> 
>> So we have two cases here. Normal and managed.
>> 
>> In the managed case, we set the affinity before startup. I feel like 
>> your
>> patch might break that or am I missing something ?
> 
> It will break stuff because the affinity is not stored in case that the
> interrupt is not started.
> 
> I think we can fix this in the core code but that needs more thought.
> __irq_can_set_affinity() is definitely the wrong place.

Indeed. I completely missed the above. Back to square one.

Thanks,

         M.
diff mbox series

Patch

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 124251b0ccba..1235dd9a2fb2 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1540,7 +1540,11 @@  static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 	/* don't set the affinity when the target cpu is same as current one */
 	if (cpu != its_dev->event_map.col_map[id]) {
 		target_col = &its_dev->its->collections[cpu];
-		its_send_movi(its_dev, target_col, id);
+
+		/* If the IRQ is disabled a discard was sent so don't move */
+		if (!irqd_irq_disabled(d))
+			its_send_movi(its_dev, target_col, id);
+
 		its_dev->event_map.col_map[id] = cpu;
 		irq_data_update_effective_affinity(d, cpumask_of(cpu));
 	}
@@ -3439,8 +3443,16 @@  static int its_irq_domain_activate(struct irq_domain *domain,
 	if (its_dev->its->numa_node >= 0)
 		cpu_mask = cpumask_of_node(its_dev->its->numa_node);
 
-	/* Bind the LPI to the first possible CPU */
-	cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
+	/* If the cpu set to a different CPU that is still online use it */
+	cpu = its_dev->event_map.col_map[event];
+
+	cpumask_and(cpu_mask, cpu_mask, cpu_online_mask);
+
+	if (!cpumask_test_cpu(cpu, cpu_mask)) {
+		/* Bind the LPI to the first possible CPU */
+		cpu = cpumask_first(cpu_mask);
+	}
+
 	if (cpu >= nr_cpu_ids) {
 		if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144)
 			return -EINVAL;