diff mbox series

[tip:irq/core,v1] genirq: remove auto-set of the mask when setting the hint

Message ID 20210501021832.743094-1-jesse.brandeburg@intel.com (mailing list archive)
State Not Applicable
Delegated to: Bjorn Helgaas
Headers show
Series [tip:irq/core,v1] genirq: remove auto-set of the mask when setting the hint | expand

Commit Message

Jesse Brandeburg May 1, 2021, 2:18 a.m. UTC
It was pointed out by Nitesh that the original work I did in 2014
to automatically set the interrupt affinity when requesting a
mask is no longer necessary. The kernel has moved on and no
longer has the original problem, BUT the original patch
introduced a subtle bug when booting a system with reserved or
excluded CPUs. Drivers calling this function with a mask value
that included a CPU that was currently or in the future
unavailable would generally not update the hint.

I'm sure there are a million ways to solve this, but the simplest
one is to just remove a little code that tries to force the
affinity, as Nitesh has shown it fixes the bug and doesn't seem
to introduce immediate side effects.

While I'm here, introduce a kernel-doc for the hint function.

Ref: https://lore.kernel.org/lkml/CAFki+L=_dd+JgAR12_eBPX0kZO2_6=1dGdgkwHE=u=K6chMeLQ@mail.gmail.com/
Cc: netdev@vger.kernel.org
Fixes: 4fe7ffb7e17c ("genirq: Fix null pointer reference in irq_set_affinity_hint()")
Fixes: e2e64a932556 ("genirq: Set initial affinity in irq_set_affinity_hint()")
Reported-by: Nitesh Lal <nilal@redhat.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---

!!! NOTE: Compile tested only, would appreciate feedback

---
 kernel/irq/manage.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)


base-commit: 765822e1569a37aab5e69736c52d4ad4a289eba6

Comments

Robin Murphy May 4, 2021, 12:15 p.m. UTC | #1
On 2021-05-01 03:18, Jesse Brandeburg wrote:
> It was pointed out by Nitesh that the original work I did in 2014
> to automatically set the interrupt affinity when requesting a
> mask is no longer necessary. The kernel has moved on and no
> longer has the original problem, BUT the original patch
> introduced a subtle bug when booting a system with reserved or
> excluded CPUs. Drivers calling this function with a mask value
> that included a CPU that was currently or in the future
> unavailable would generally not update the hint.
> 
> I'm sure there are a million ways to solve this, but the simplest
> one is to just remove a little code that tries to force the
> affinity, as Nitesh has shown it fixes the bug and doesn't seem
> to introduce immediate side effects.

Unfortunately, I think there are quite a few other drivers now relying 
on this behaviour, since they are really using irq_set_affinity_hint() 
as a proxy for irq_set_affinity(). Partly since the latter isn't 
exported to modules, but also I have a vague memory of it being said 
that it's nice to update the user-visible hint to match when the 
affinity does have to be forced to something specific.

Robin.

> While I'm here, introduce a kernel-doc for the hint function.
> 
> Ref: https://lore.kernel.org/lkml/CAFki+L=_dd+JgAR12_eBPX0kZO2_6=1dGdgkwHE=u=K6chMeLQ@mail.gmail.com/
> Cc: netdev@vger.kernel.org
> Fixes: 4fe7ffb7e17c ("genirq: Fix null pointer reference in irq_set_affinity_hint()")
> Fixes: e2e64a932556 ("genirq: Set initial affinity in irq_set_affinity_hint()")
> Reported-by: Nitesh Lal <nilal@redhat.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> ---
> 
> !!! NOTE: Compile tested only, would appreciate feedback
> 
> ---
>   kernel/irq/manage.c | 13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index e976c4927b25..a31df64662d5 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -456,6 +456,16 @@ int __irq_set_affinity(unsigned int irq, const struct cpumask *mask, bool force)
>   	return ret;
>   }
>   
> +/**
> + * 	irq_set_affinity_hint - set the hint for an irq
> + *	@irq:	Interrupt for which to set the hint
> + *	@m:	Mask to indicate which CPUs to suggest for the interrupt, use
> + *		NULL here to indicate to clear the value.
> + *
> + *	Use this function to recommend which CPU should handle the
> + *	interrupt to any userspace that uses /proc/irq/nn/smp_affinity_hint
> + *	in order to align interrupts. Pass NULL as the mask to clear the hint.
> + */
>   int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
>   {
>   	unsigned long flags;
> @@ -465,9 +475,6 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
>   		return -EINVAL;
>   	desc->affinity_hint = m;
>   	irq_put_desc_unlock(desc, flags);
> -	/* set the initial affinity to prevent every interrupt being on CPU0 */
> -	if (m)
> -		__irq_set_affinity(irq, m, false);
>   	return 0;
>   }
>   EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
> 
> base-commit: 765822e1569a37aab5e69736c52d4ad4a289eba6
>
Nitesh Lal May 4, 2021, 2:29 p.m. UTC | #2
On Tue, May 4, 2021 at 8:15 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2021-05-01 03:18, Jesse Brandeburg wrote:
> > It was pointed out by Nitesh that the original work I did in 2014
> > to automatically set the interrupt affinity when requesting a
> > mask is no longer necessary. The kernel has moved on and no
> > longer has the original problem, BUT the original patch
> > introduced a subtle bug when booting a system with reserved or
> > excluded CPUs. Drivers calling this function with a mask value
> > that included a CPU that was currently or in the future
> > unavailable would generally not update the hint.
> >
> > I'm sure there are a million ways to solve this, but the simplest
> > one is to just remove a little code that tries to force the
> > affinity, as Nitesh has shown it fixes the bug and doesn't seem
> > to introduce immediate side effects.
>
> Unfortunately, I think there are quite a few other drivers now relying
> on this behaviour, since they are really using irq_set_affinity_hint()
> as a proxy for irq_set_affinity().

That's true.

> Partly since the latter isn't
> exported to modules, but also I have a vague memory of it being said
> that it's nice to update the user-visible hint to match when the
> affinity does have to be forced to something specific.

If you see the downside of it we are forcing the affinity to match the hint
mask without considering the default SMP affinity mask.

Also, we are repeating things here. First, we set certain mask for a device
IRQ via request_irq code path which does consider the default SMP mask but
then we are letting the driver over-write it.

If we want to set the IRQ mask in a certain way then it should be done at
the time of initial setup itself.

Do you know about a workload/use case that can show the benefit of
this behavior? As then we can try fixing it in the right way.

--
Thanks
Nitesh
Jesse Brandeburg May 4, 2021, 4:23 p.m. UTC | #3
Robin Murphy wrote:

> On 2021-05-01 03:18, Jesse Brandeburg wrote:
> > It was pointed out by Nitesh that the original work I did in 2014
> > to automatically set the interrupt affinity when requesting a
> > mask is no longer necessary. The kernel has moved on and no
> > longer has the original problem, BUT the original patch
> > introduced a subtle bug when booting a system with reserved or
> > excluded CPUs. Drivers calling this function with a mask value
> > that included a CPU that was currently or in the future
> > unavailable would generally not update the hint.
> > 
> > I'm sure there are a million ways to solve this, but the simplest
> > one is to just remove a little code that tries to force the
> > affinity, as Nitesh has shown it fixes the bug and doesn't seem
> > to introduce immediate side effects.
> 
> Unfortunately, I think there are quite a few other drivers now relying 
> on this behaviour, since they are really using irq_set_affinity_hint() 
> as a proxy for irq_set_affinity(). Partly since the latter isn't 
> exported to modules, but also I have a vague memory of it being said 
> that it's nice to update the user-visible hint to match when the 
> affinity does have to be forced to something specific.
> 
> Robin.

Thanks for your feedback Robin, but there is definitely a bug here that
is being exposed by this code. The fact that people are using this
function means they're all exposed to this bug.

Not sure if you saw, but this analysis from Nitesh explains what
happened chronologically to the kernel w.r.t this code, it's a useful
analysis! [1]

I'd add in addition that irqbalance daemon *stopped* paying attention
to hints quite a while ago, so I'm not quite sure what purpose they
serve.

[1]
https://lore.kernel.org/lkml/CAFki+Lm0W_brLu31epqD3gAV+WNKOJfVDfX2M8ZM__aj3nv9uA@mail.gmail.com/
Nitesh Lal May 17, 2021, 4:57 p.m. UTC | #4
On Tue, May 4, 2021 at 12:25 PM Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
>
> Robin Murphy wrote:
>
> > On 2021-05-01 03:18, Jesse Brandeburg wrote:
> > > It was pointed out by Nitesh that the original work I did in 2014
> > > to automatically set the interrupt affinity when requesting a
> > > mask is no longer necessary. The kernel has moved on and no
> > > longer has the original problem, BUT the original patch
> > > introduced a subtle bug when booting a system with reserved or
> > > excluded CPUs. Drivers calling this function with a mask value
> > > that included a CPU that was currently or in the future
> > > unavailable would generally not update the hint.
> > >
> > > I'm sure there are a million ways to solve this, but the simplest
> > > one is to just remove a little code that tries to force the
> > > affinity, as Nitesh has shown it fixes the bug and doesn't seem
> > > to introduce immediate side effects.
> >
> > Unfortunately, I think there are quite a few other drivers now relying
> > on this behaviour, since they are really using irq_set_affinity_hint()
> > as a proxy for irq_set_affinity(). Partly since the latter isn't
> > exported to modules, but also I have a vague memory of it being said
> > that it's nice to update the user-visible hint to match when the
> > affinity does have to be forced to something specific.
> >
> > Robin.
>
> Thanks for your feedback Robin, but there is definitely a bug here that
> is being exposed by this code. The fact that people are using this
> function means they're all exposed to this bug.
>
> Not sure if you saw, but this analysis from Nitesh explains what
> happened chronologically to the kernel w.r.t this code, it's a useful
> analysis! [1]
>
> I'd add in addition that irqbalance daemon *stopped* paying attention
> to hints quite a while ago, so I'm not quite sure what purpose they
> serve.
>
> [1]
> https://lore.kernel.org/lkml/CAFki+Lm0W_brLu31epqD3gAV+WNKOJfVDfX2M8ZM__aj3nv9uA@mail.gmail.com/
>

Wanted to follow up to see if there are any more objections or even
suggestions to take this forward?
Robin Murphy May 17, 2021, 5:26 p.m. UTC | #5
On 2021-05-17 17:57, Nitesh Lal wrote:
> On Tue, May 4, 2021 at 12:25 PM Jesse Brandeburg
> <jesse.brandeburg@intel.com> wrote:
>>
>> Robin Murphy wrote:
>>
>>> On 2021-05-01 03:18, Jesse Brandeburg wrote:
>>>> It was pointed out by Nitesh that the original work I did in 2014
>>>> to automatically set the interrupt affinity when requesting a
>>>> mask is no longer necessary. The kernel has moved on and no
>>>> longer has the original problem, BUT the original patch
>>>> introduced a subtle bug when booting a system with reserved or
>>>> excluded CPUs. Drivers calling this function with a mask value
>>>> that included a CPU that was currently or in the future
>>>> unavailable would generally not update the hint.
>>>>
>>>> I'm sure there are a million ways to solve this, but the simplest
>>>> one is to just remove a little code that tries to force the
>>>> affinity, as Nitesh has shown it fixes the bug and doesn't seem
>>>> to introduce immediate side effects.
>>>
>>> Unfortunately, I think there are quite a few other drivers now relying
>>> on this behaviour, since they are really using irq_set_affinity_hint()
>>> as a proxy for irq_set_affinity(). Partly since the latter isn't
>>> exported to modules, but also I have a vague memory of it being said
>>> that it's nice to update the user-visible hint to match when the
>>> affinity does have to be forced to something specific.
>>>
>>> Robin.
>>
>> Thanks for your feedback Robin, but there is definitely a bug here that
>> is being exposed by this code. The fact that people are using this
>> function means they're all exposed to this bug.
>>
>> Not sure if you saw, but this analysis from Nitesh explains what
>> happened chronologically to the kernel w.r.t this code, it's a useful
>> analysis! [1]
>>
>> I'd add in addition that irqbalance daemon *stopped* paying attention
>> to hints quite a while ago, so I'm not quite sure what purpose they
>> serve.
>>
>> [1]
>> https://lore.kernel.org/lkml/CAFki+Lm0W_brLu31epqD3gAV+WNKOJfVDfX2M8ZM__aj3nv9uA@mail.gmail.com/
>>
> 
> Wanted to follow up to see if there are any more objections or even
> suggestions to take this forward?

Oops, sorry, seems I got distracted before getting round to actually 
typing up my response :)

I'm not implying that there isn't a bug, or that this code ever made 
sense in the first place, just that fixing it will unfortunately be a 
bit more involved than a simple revert. This patch as-is *will* subtly 
break at least the system PMU drivers currently using 
irq_set_affinity_hint() - those I know require the IRQ affinity to 
follow whichever CPU the PMU context is bound to, in order to meet perf 
core's assumptions about mutual exclusion.

As far as the consistency argument goes, maybe that's just backwards and 
it should be irq_set_affinity() that also sets the hint, to indicate to 
userspace that the affinity has been forced by the kernel? Either way 
we'll need to do a little more diligence to figure out which callers 
actually care about more than just the hint, and sort them out first.

Robin.
Thomas Gleixner May 17, 2021, 6:08 p.m. UTC | #6
On Mon, May 17 2021 at 18:26, Robin Murphy wrote:
> On 2021-05-17 17:57, Nitesh Lal wrote:
> I'm not implying that there isn't a bug, or that this code ever made 
> sense in the first place, just that fixing it will unfortunately be a 
> bit more involved than a simple revert. This patch as-is *will* subtly 
> break at least the system PMU drivers currently using

s/using/abusing/

> irq_set_affinity_hint() - those I know require the IRQ affinity to 
> follow whichever CPU the PMU context is bound to, in order to meet perf 
> core's assumptions about mutual exclusion.

Which driver is that?

Thanks,

        tglx
Nitesh Lal May 17, 2021, 6:21 p.m. UTC | #7
On Mon, May 17, 2021 at 1:26 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2021-05-17 17:57, Nitesh Lal wrote:
> > On Tue, May 4, 2021 at 12:25 PM Jesse Brandeburg
> > <jesse.brandeburg@intel.com> wrote:
> >>
> >> Robin Murphy wrote:
> >>
> >>> On 2021-05-01 03:18, Jesse Brandeburg wrote:
> >>>> It was pointed out by Nitesh that the original work I did in 2014
> >>>> to automatically set the interrupt affinity when requesting a
> >>>> mask is no longer necessary. The kernel has moved on and no
> >>>> longer has the original problem, BUT the original patch
> >>>> introduced a subtle bug when booting a system with reserved or
> >>>> excluded CPUs. Drivers calling this function with a mask value
> >>>> that included a CPU that was currently or in the future
> >>>> unavailable would generally not update the hint.
> >>>>
> >>>> I'm sure there are a million ways to solve this, but the simplest
> >>>> one is to just remove a little code that tries to force the
> >>>> affinity, as Nitesh has shown it fixes the bug and doesn't seem
> >>>> to introduce immediate side effects.
> >>>
> >>> Unfortunately, I think there are quite a few other drivers now relying
> >>> on this behaviour, since they are really using irq_set_affinity_hint()
> >>> as a proxy for irq_set_affinity(). Partly since the latter isn't
> >>> exported to modules, but also I have a vague memory of it being said
> >>> that it's nice to update the user-visible hint to match when the
> >>> affinity does have to be forced to something specific.
> >>>
> >>> Robin.
> >>
> >> Thanks for your feedback Robin, but there is definitely a bug here that
> >> is being exposed by this code. The fact that people are using this
> >> function means they're all exposed to this bug.
> >>
> >> Not sure if you saw, but this analysis from Nitesh explains what
> >> happened chronologically to the kernel w.r.t this code, it's a useful
> >> analysis! [1]
> >>
> >> I'd add in addition that irqbalance daemon *stopped* paying attention
> >> to hints quite a while ago, so I'm not quite sure what purpose they
> >> serve.
> >>
> >> [1]
> >> https://lore.kernel.org/lkml/CAFki+Lm0W_brLu31epqD3gAV+WNKOJfVDfX2M8ZM__aj3nv9uA@mail.gmail.com/
> >>
> >
> > Wanted to follow up to see if there are any more objections or even
> > suggestions to take this forward?
>
> Oops, sorry, seems I got distracted before getting round to actually
> typing up my response :)

No worries.

>
> I'm not implying that there isn't a bug, or that this code ever made
> sense in the first place, just that fixing it will unfortunately be a
> bit more involved than a simple revert.

Fair point.

> This patch as-is *will* subtly
> break at least the system PMU drivers currently using
> irq_set_affinity_hint() - those I know require the IRQ affinity to
> follow whichever CPU the PMU context is bound to, in order to meet perf
> core's assumptions about mutual exclusion.

Thanks for bringing this up.
Please correct me if I am wrong, so the PMU driver(s) is/are written
in a way that
it uses the hint API to overwrite the previously set affinity mask with a
CPU to which the PMU context is bound to?

Is this context information exposed in the userspace and can we modify the
IRQ affinity mask from the userspace based on that?
I do understand that this is a behavior change from the PMU drivers
perspective.

>
> As far as the consistency argument goes, maybe that's just backwards and
> it should be irq_set_affinity() that also sets the hint, to indicate to
> userspace that the affinity has been forced by the kernel? Either way
> we'll need to do a little more diligence to figure out which callers
> actually care about more than just the hint, and sort them out first.
>

We can use irq_set_affinity() to set the hint mask as well, however, maybe
there is a specific reason behind separating those two in the
first place (maybe not?).
But even in this case, we have to either modify the PMU drivers' IRQs
affinity from the userspace or we will have to make changes in the existing
request_irq code path.
I am not sure about the latter because we already have the required controls
to adjust the device IRQ mask (by using default_smp_affinity or by modifying
them manually).
Robin Murphy May 17, 2021, 6:50 p.m. UTC | #8
On 2021-05-17 19:08, Thomas Gleixner wrote:
> On Mon, May 17 2021 at 18:26, Robin Murphy wrote:
>> On 2021-05-17 17:57, Nitesh Lal wrote:
>> I'm not implying that there isn't a bug, or that this code ever made
>> sense in the first place, just that fixing it will unfortunately be a
>> bit more involved than a simple revert. This patch as-is *will* subtly
>> break at least the system PMU drivers currently using
> 
> s/using/abusing/
> 
>> irq_set_affinity_hint() - those I know require the IRQ affinity to
>> follow whichever CPU the PMU context is bound to, in order to meet perf
>> core's assumptions about mutual exclusion.
> 
> Which driver is that?

Right now, any driver which wants to control an IRQ's affinity and also 
build as a module, for one thing. I'm familiar with drivers/perf/ where 
a basic pattern has been widely copied; some of the callers in other 
subsystems appear to *expect* it to set the underlying affinity as well, 
but whether any of those added within the last 6 years represent a 
functional dependency rather than just a performance concern I don't know.

Robin.
Thomas Gleixner May 17, 2021, 7:08 p.m. UTC | #9
On Mon, May 17 2021 at 19:50, Robin Murphy wrote:

> On 2021-05-17 19:08, Thomas Gleixner wrote:
>> On Mon, May 17 2021 at 18:26, Robin Murphy wrote:
>>> On 2021-05-17 17:57, Nitesh Lal wrote:
>>> I'm not implying that there isn't a bug, or that this code ever made
>>> sense in the first place, just that fixing it will unfortunately be a
>>> bit more involved than a simple revert. This patch as-is *will* subtly
>>> break at least the system PMU drivers currently using
>> 
>> s/using/abusing/
>> 
>>> irq_set_affinity_hint() - those I know require the IRQ affinity to
>>> follow whichever CPU the PMU context is bound to, in order to meet perf
>>> core's assumptions about mutual exclusion.
>> 
>> Which driver is that?
>
> Right now, any driver which wants to control an IRQ's affinity and also 
> build as a module, for one thing. I'm familiar with drivers/perf/ where 
> a basic pattern has been widely copied;

Bah. Why the heck can't people talk and just go and rumage until they
find something which hopefully does what they want...

The name of that function should have rang all alarm bells...

> some of the callers in other subsystems appear to *expect* it to set
> the underlying affinity as well, but whether any of those added within
> the last 6 years represent a functional dependency rather than just a
> performance concern I don't know.

Sigh. Let me do yet another tree wide audit...

Thanks,

        tglx
Thomas Gleixner May 17, 2021, 7:43 p.m. UTC | #10
On Mon, May 17 2021 at 21:08, Thomas Gleixner wrote:
> On Mon, May 17 2021 at 19:50, Robin Murphy wrote:
>> some of the callers in other subsystems appear to *expect* it to set
>> the underlying affinity as well, but whether any of those added within
>> the last 6 years represent a functional dependency rather than just a
>> performance concern I don't know.
>
> Sigh. Let me do yet another tree wide audit...

It's clearly only the perf muck which has a functional dependency.

None of the other usage sites has IRQF_NOBALANCING set which clearly
makes this a hint because user space can freely muck with the affinity.

Thanks,

        tglx
Thomas Gleixner May 17, 2021, 7:47 p.m. UTC | #11
Nitesh,

On Mon, May 17 2021 at 14:21, Nitesh Lal wrote:
> On Mon, May 17, 2021 at 1:26 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> We can use irq_set_affinity() to set the hint mask as well, however, maybe
> there is a specific reason behind separating those two in the
> first place (maybe not?).

Yes, because kernel side settings might overwrite the hint.

> But even in this case, we have to either modify the PMU drivers' IRQs
> affinity from the userspace or we will have to make changes in the existing
> request_irq code path.

Adjusting them from user space does not work for various reasons,
especially CPU hotplug.

> I am not sure about the latter because we already have the required controls
> to adjust the device IRQ mask (by using default_smp_affinity or by modifying
> them manually).

default_smp_affinity does not help at all and there is nothing a module
can modify manually.

I'll send out a patch series which cleans that up soon.

Thanks,

        tglx
Thomas Gleixner May 17, 2021, 8:18 p.m. UTC | #12
On Mon, May 17 2021 at 21:08, Thomas Gleixner wrote:
> On Mon, May 17 2021 at 19:50, Robin Murphy wrote:
>> On 2021-05-17 19:08, Thomas Gleixner wrote:
>>> On Mon, May 17 2021 at 18:26, Robin Murphy wrote:
>>>> On 2021-05-17 17:57, Nitesh Lal wrote:
>>>> I'm not implying that there isn't a bug, or that this code ever made
>>>> sense in the first place, just that fixing it will unfortunately be a
>>>> bit more involved than a simple revert. This patch as-is *will* subtly
>>>> break at least the system PMU drivers currently using
>>> 
>>> s/using/abusing/
>>> 
>>>> irq_set_affinity_hint() - those I know require the IRQ affinity to
>>>> follow whichever CPU the PMU context is bound to, in order to meet perf
>>>> core's assumptions about mutual exclusion.
>>> 
>>> Which driver is that?
>>
>> Right now, any driver which wants to control an IRQ's affinity and also 
>> build as a module, for one thing. I'm familiar with drivers/perf/ where 
>> a basic pattern has been widely copied;
>
> Bah. Why the heck can't people talk and just go and rumage until they
> find something which hopefully does what they want...
>
> The name of that function should have rang all alarm bells...

Aside of that all the warnings around the return value are useless cargo
cult. Why?

The only reason why this function returns an error code is when there is
no irq descriptor assigned to the interrupt number, which is well close
to impossible in that context.

But it does _NOT_ return an error when the actual affinity setting
fails...

Thanks,

        tglx
Thomas Gleixner May 17, 2021, 8:48 p.m. UTC | #13
On Tue, May 04 2021 at 09:23, Jesse Brandeburg wrote:
> I'd add in addition that irqbalance daemon *stopped* paying attention
> to hints quite a while ago, so I'm not quite sure what purpose they
> serve.

The hint was added so that userspace has a better understanding where it
should place the interrupt. So if irqbalanced ignores it anyway, then
what's the point of the hint? IOW, why is it still used drivers?

Now there is another aspect to that. What happens if irqbalanced does
not run at all and a driver relies on the side effect of the hint
setting the initial affinity. Bah...

While none of the drivers (except the perf muck) actually prevents
userspace from fiddling with the affinity (via IRQF_NOBALANCING) a
deeper inspection shows that they actually might rely on the current
behaviour if irqbalanced is disabled. Of course every driver has its own
convoluted way to do that and all of those functions are well
documented. What a mess.

If the hint still serves a purpose then we can provide a variant which
solely applies the hint and does not fiddle with the actual affinity,
but if the hint is useless anyway then we have a way better option to
clean that up.

Most users are in networking, there are a few in crypto, a couple of
leftovers in scsi, virtio and a handfull of oddball drivers.

The perf muck wants to be cleaned up anyway as it's just crystal clear
abuse.

Thanks,

        tglx
Nitesh Lal May 17, 2021, 9:13 p.m. UTC | #14
On Mon, May 17, 2021 at 3:47 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Nitesh,
>
> On Mon, May 17 2021 at 14:21, Nitesh Lal wrote:
> > On Mon, May 17, 2021 at 1:26 PM Robin Murphy <robin.murphy@arm.com> wrote:
> >
> > We can use irq_set_affinity() to set the hint mask as well, however, maybe
> > there is a specific reason behind separating those two in the
> > first place (maybe not?).
>
> Yes, because kernel side settings might overwrite the hint.
>
> > But even in this case, we have to either modify the PMU drivers' IRQs
> > affinity from the userspace or we will have to make changes in the existing
> > request_irq code path.
>
> Adjusting them from user space does not work for various reasons,
> especially CPU hotplug.
>
> > I am not sure about the latter because we already have the required controls
> > to adjust the device IRQ mask (by using default_smp_affinity or by modifying
> > them manually).
>
> default_smp_affinity does not help at all and there is nothing a module
> can modify manually.

Right, it will not help a module.

>
> I'll send out a patch series which cleans that up soon.

Ack, thanks.

>
> Thanks,
>
>         tglx
>
>


--
Nitesh
Nitesh Lal May 17, 2021, 10:44 p.m. UTC | #15
On Mon, May 17, 2021 at 4:48 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Tue, May 04 2021 at 09:23, Jesse Brandeburg wrote:
> > I'd add in addition that irqbalance daemon *stopped* paying attention
> > to hints quite a while ago, so I'm not quite sure what purpose they
> > serve.
>
> The hint was added so that userspace has a better understanding where it
> should place the interrupt. So if irqbalanced ignores it anyway, then
> what's the point of the hint? IOW, why is it still used drivers?
>

Took a quick look at the irqbalance repo and saw the following commit:

dcc411e7bf    remove affinity_hint infrastructure

The commit message mentions that "PJ is redesiging how affinity hinting
works in the kernel, the future model will just tell us to ignore an IRQ,
and the kernel will handle placement for us.  As such we can remove the
affinity_hint recognition entirely".

This does indicate that apparently, irqbalance moved away from the usage of
affinity_hint. However, the next question is what was this future model?
I don't know but I can surely look into it if that helps or maybe someone
here already knows about it?

> Now there is another aspect to that. What happens if irqbalanced does
> not run at all and a driver relies on the side effect of the hint
> setting the initial affinity. Bah...
>

Right, but if they only rely on this API so that the IRQs are spread across
all the CPUs then that issue is already resolved and these other drivers
should not regress because of changing this behavior. Isn't it?

> While none of the drivers (except the perf muck) actually prevents
> userspace from fiddling with the affinity (via IRQF_NOBALANCING) a
> deeper inspection shows that they actually might rely on the current
> behaviour if irqbalanced is disabled. Of course every driver has its own
> convoluted way to do that and all of those functions are well
> documented. What a mess.
>
> If the hint still serves a purpose then we can provide a variant which
> solely applies the hint and does not fiddle with the actual affinity,
> but if the hint is useless anyway then we have a way better option to
> clean that up.
>

+1
Thomas Gleixner May 18, 2021, 12:03 a.m. UTC | #16
On Mon, May 17 2021 at 18:44, Nitesh Lal wrote:
> On Mon, May 17, 2021 at 4:48 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>> The hint was added so that userspace has a better understanding where it
>> should place the interrupt. So if irqbalanced ignores it anyway, then
>> what's the point of the hint? IOW, why is it still used drivers?
>>
> Took a quick look at the irqbalance repo and saw the following commit:
>
> dcc411e7bf    remove affinity_hint infrastructure
>
> The commit message mentions that "PJ is redesiging how affinity hinting
> works in the kernel, the future model will just tell us to ignore an IRQ,
> and the kernel will handle placement for us.  As such we can remove the
> affinity_hint recognition entirely".

No idea who PJ is. I really love useful commit messages. Maybe Neil can
shed some light on that.

> This does indicate that apparently, irqbalance moved away from the usage of
> affinity_hint. However, the next question is what was this future
> model?

I might have missed something in the last 5 years, but that's the first
time I hear about someone trying to cleanup that thing.

> I don't know but I can surely look into it if that helps or maybe someone
> here already knows about it?

I CC'ed Neil :)

>> Now there is another aspect to that. What happens if irqbalanced does
>> not run at all and a driver relies on the side effect of the hint
>> setting the initial affinity. Bah...
>>
>
> Right, but if they only rely on this API so that the IRQs are spread across
> all the CPUs then that issue is already resolved and these other drivers
> should not regress because of changing this behavior. Isn't it?

Is that true for all architectures?

>> While none of the drivers (except the perf muck) actually prevents
>> userspace from fiddling with the affinity (via IRQF_NOBALANCING) a
>> deeper inspection shows that they actually might rely on the current
>> behaviour if irqbalanced is disabled. Of course every driver has its own
>> convoluted way to do that and all of those functions are well
>> documented. What a mess.
>>
>> If the hint still serves a purpose then we can provide a variant which
>> solely applies the hint and does not fiddle with the actual affinity,
>> but if the hint is useless anyway then we have a way better option to
>> clean that up.
>>
>
> +1

= 1

Thanks,

        tglx
Nitesh Lal May 18, 2021, 12:23 a.m. UTC | #17
On Mon, May 17, 2021 at 8:04 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Mon, May 17 2021 at 18:44, Nitesh Lal wrote:
> > On Mon, May 17, 2021 at 4:48 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >> The hint was added so that userspace has a better understanding where it
> >> should place the interrupt. So if irqbalanced ignores it anyway, then
> >> what's the point of the hint? IOW, why is it still used drivers?
> >>
> > Took a quick look at the irqbalance repo and saw the following commit:
> >
> > dcc411e7bf    remove affinity_hint infrastructure
> >
> > The commit message mentions that "PJ is redesiging how affinity hinting
> > works in the kernel, the future model will just tell us to ignore an IRQ,
> > and the kernel will handle placement for us.  As such we can remove the
> > affinity_hint recognition entirely".
>
> No idea who PJ is. I really love useful commit messages. Maybe Neil can
> shed some light on that.
>
> > This does indicate that apparently, irqbalance moved away from the usage of
> > affinity_hint. However, the next question is what was this future
> > model?
>
> I might have missed something in the last 5 years, but that's the first
> time I hear about someone trying to cleanup that thing.
>
> > I don't know but I can surely look into it if that helps or maybe someone
> > here already knows about it?
>
> I CC'ed Neil :)

Thanks, I have added PJ Waskiewicz as well who I think was referred in
that commit message as PJ.

>
> >> Now there is another aspect to that. What happens if irqbalanced does
> >> not run at all and a driver relies on the side effect of the hint
> >> setting the initial affinity. Bah...
> >>
> >
> > Right, but if they only rely on this API so that the IRQs are spread across
> > all the CPUs then that issue is already resolved and these other drivers
> > should not regress because of changing this behavior. Isn't it?
>
> Is that true for all architectures?

Unfortunately, I don't know and that's probably why we have to be careful.

--
Nitesh
Nitesh Lal May 20, 2021, 9:57 p.m. UTC | #18
On Mon, May 17, 2021 at 8:23 PM Nitesh Lal <nilal@redhat.com> wrote:
>
> On Mon, May 17, 2021 at 8:04 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > On Mon, May 17 2021 at 18:44, Nitesh Lal wrote:
> > > On Mon, May 17, 2021 at 4:48 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > >> The hint was added so that userspace has a better understanding where it
> > >> should place the interrupt. So if irqbalanced ignores it anyway, then
> > >> what's the point of the hint? IOW, why is it still used drivers?
> > >>
> > > Took a quick look at the irqbalance repo and saw the following commit:
> > >
> > > dcc411e7bf    remove affinity_hint infrastructure
> > >
> > > The commit message mentions that "PJ is redesiging how affinity hinting
> > > works in the kernel, the future model will just tell us to ignore an IRQ,
> > > and the kernel will handle placement for us.  As such we can remove the
> > > affinity_hint recognition entirely".
> >
> > No idea who PJ is. I really love useful commit messages. Maybe Neil can
> > shed some light on that.
> >
> > > This does indicate that apparently, irqbalance moved away from the usage of
> > > affinity_hint. However, the next question is what was this future
> > > model?
> >
> > I might have missed something in the last 5 years, but that's the first
> > time I hear about someone trying to cleanup that thing.
> >
> > > I don't know but I can surely look into it if that helps or maybe someone
> > > here already knows about it?
> >
> > I CC'ed Neil :)
>
> Thanks, I have added PJ Waskiewicz as well who I think was referred in
> that commit message as PJ.
>
> >
> > >> Now there is another aspect to that. What happens if irqbalanced does
> > >> not run at all and a driver relies on the side effect of the hint
> > >> setting the initial affinity. Bah...
> > >>
> > >
> > > Right, but if they only rely on this API so that the IRQs are spread across
> > > all the CPUs then that issue is already resolved and these other drivers
> > > should not regress because of changing this behavior. Isn't it?
> >
> > Is that true for all architectures?
>
> Unfortunately, I don't know and that's probably why we have to be careful.

I think here to ensure that we are not breaking any of the drivers we have
to first analyze all the existing drivers and understand how they are using
this API.
AFAIK there are three possible scenarios:

- A driver use this API to spread the IRQs
  + For this case we should be safe considering the spreading is naturally
    done from the IRQ subsystem itself.

- A driver use this API to actually set the hint
  + These drivers should have no functional impact because of this revert

- Driver use this API to force a certain affinity mask
  + In this case we have to replace the API with the irq_force_affinity()

I can start looking into the individual drivers, however, testing them will
be a challenge.

Any thoughts?

--
Thanks
Nitesh
Nitesh Lal May 21, 2021, 12:03 a.m. UTC | #19
On Thu, May 20, 2021 at 5:57 PM Nitesh Lal <nilal@redhat.com> wrote:
>
> On Mon, May 17, 2021 at 8:23 PM Nitesh Lal <nilal@redhat.com> wrote:
> >
> > On Mon, May 17, 2021 at 8:04 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > On Mon, May 17 2021 at 18:44, Nitesh Lal wrote:
> > > > On Mon, May 17, 2021 at 4:48 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > >> The hint was added so that userspace has a better understanding where it
> > > >> should place the interrupt. So if irqbalanced ignores it anyway, then
> > > >> what's the point of the hint? IOW, why is it still used drivers?
> > > >>
> > > > Took a quick look at the irqbalance repo and saw the following commit:
> > > >
> > > > dcc411e7bf    remove affinity_hint infrastructure
> > > >
> > > > The commit message mentions that "PJ is redesiging how affinity hinting
> > > > works in the kernel, the future model will just tell us to ignore an IRQ,
> > > > and the kernel will handle placement for us.  As such we can remove the
> > > > affinity_hint recognition entirely".
> > >
> > > No idea who PJ is. I really love useful commit messages. Maybe Neil can
> > > shed some light on that.
> > >
> > > > This does indicate that apparently, irqbalance moved away from the usage of
> > > > affinity_hint. However, the next question is what was this future
> > > > model?
> > >
> > > I might have missed something in the last 5 years, but that's the first
> > > time I hear about someone trying to cleanup that thing.
> > >
> > > > I don't know but I can surely look into it if that helps or maybe someone
> > > > here already knows about it?
> > >
> > > I CC'ed Neil :)
> >
> > Thanks, I have added PJ Waskiewicz as well who I think was referred in
> > that commit message as PJ.
> >
> > >
> > > >> Now there is another aspect to that. What happens if irqbalanced does
> > > >> not run at all and a driver relies on the side effect of the hint
> > > >> setting the initial affinity. Bah...
> > > >>
> > > >
> > > > Right, but if they only rely on this API so that the IRQs are spread across
> > > > all the CPUs then that issue is already resolved and these other drivers
> > > > should not regress because of changing this behavior. Isn't it?
> > >
> > > Is that true for all architectures?
> >
> > Unfortunately, I don't know and that's probably why we have to be careful.
>
> I think here to ensure that we are not breaking any of the drivers we have
> to first analyze all the existing drivers and understand how they are using
> this API.
> AFAIK there are three possible scenarios:
>
> - A driver use this API to spread the IRQs
>   + For this case we should be safe considering the spreading is naturally
>     done from the IRQ subsystem itself.

Forgot to mention another thing in the above case is to determine whether
it is true for all architectures or not as Thomas mentioned.

>
> - A driver use this API to actually set the hint
>   + These drivers should have no functional impact because of this revert
>
> - Driver use this API to force a certain affinity mask
>   + In this case we have to replace the API with the irq_force_affinity()
>
> I can start looking into the individual drivers, however, testing them will
> be a challenge.
>
> Any thoughts?
>
> --
> Thanks
> Nitesh
Thomas Gleixner May 21, 2021, 11:56 a.m. UTC | #20
Nitesh,

On Thu, May 20 2021 at 20:03, Nitesh Lal wrote:
> On Thu, May 20, 2021 at 5:57 PM Nitesh Lal <nilal@redhat.com> wrote:
>> I think here to ensure that we are not breaking any of the drivers we have
>> to first analyze all the existing drivers and understand how they are using
>> this API.
>> AFAIK there are three possible scenarios:
>>
>> - A driver use this API to spread the IRQs
>>   + For this case we should be safe considering the spreading is naturally
>>     done from the IRQ subsystem itself.
>
> Forgot to mention another thing in the above case is to determine whether
> it is true for all architectures or not as Thomas mentioned.

Yes.

>>
>> - A driver use this API to actually set the hint
>>   + These drivers should have no functional impact because of this revert

Correct.


>> - Driver use this API to force a certain affinity mask
>>   + In this case we have to replace the API with the irq_force_affinity()

irq_set_affinity() or irq_set_affinity_and_hint()

>> I can start looking into the individual drivers, however, testing them will
>> be a challenge.

The only way to do that is to have the core infrastructure added and
then send patches changing it in the way you think. The relevant
maintainers/developers should be able to tell you when your analysis
went south. :)

Been there, done that. It's just lots of work :)

Thanks,

        tglx
Nitesh Lal May 21, 2021, 1:46 p.m. UTC | #21
On Fri, May 21, 2021 at 7:56 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Nitesh,
>
> On Thu, May 20 2021 at 20:03, Nitesh Lal wrote:
> > On Thu, May 20, 2021 at 5:57 PM Nitesh Lal <nilal@redhat.com> wrote:
> >> I think here to ensure that we are not breaking any of the drivers we have
> >> to first analyze all the existing drivers and understand how they are using
> >> this API.
> >> AFAIK there are three possible scenarios:
> >>
> >> - A driver use this API to spread the IRQs
> >>   + For this case we should be safe considering the spreading is naturally
> >>     done from the IRQ subsystem itself.
> >
> > Forgot to mention another thing in the above case is to determine whether
> > it is true for all architectures or not as Thomas mentioned.
>
> Yes.
>
> >>
> >> - A driver use this API to actually set the hint
> >>   + These drivers should have no functional impact because of this revert
>
> Correct.
>
>
> >> - Driver use this API to force a certain affinity mask
> >>   + In this case we have to replace the API with the irq_force_affinity()
>
> irq_set_affinity() or irq_set_affinity_and_hint()

Ah yes! my bad. _force_ doesn't check the mask against the online CPUs.
Hmm, I didn't realize that you also added irq_set_affinity_and_hint()
in your last patchset.

>
> >> I can start looking into the individual drivers, however, testing them will
> >> be a challenge.
>
> The only way to do that is to have the core infrastructure added and

Right.

> then send patches changing it in the way you think. The relevant
> maintainers/developers should be able to tell you when your analysis
> went south. :)

Ack will start looking into this.


--
Thanks
Nitesh
Thomas Gleixner May 21, 2021, 3:15 p.m. UTC | #22
On Fri, May 21 2021 at 09:46, Nitesh Lal wrote:
> On Fri, May 21, 2021 at 7:56 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>> >> - Driver use this API to force a certain affinity mask
>> >>   + In this case we have to replace the API with the irq_force_affinity()
>>
>> irq_set_affinity() or irq_set_affinity_and_hint()
>
> Ah yes! my bad. _force_ doesn't check the mask against the online CPUs.
> Hmm, I didn't realize that you also added irq_set_affinity_and_hint()
> in your last patchset.

I did not. It just exposed irq_set_affinity().

See https://lore.kernel.org/r/87wnrs9tvp.ffs@nanos.tec.linutronix.de
for the new hint interface I came up with.

Thanks,

        tglx
diff mbox series

Patch

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index e976c4927b25..a31df64662d5 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -456,6 +456,16 @@  int __irq_set_affinity(unsigned int irq, const struct cpumask *mask, bool force)
 	return ret;
 }
 
+/**
+ * 	irq_set_affinity_hint - set the hint for an irq
+ *	@irq:	Interrupt for which to set the hint
+ *	@m:	Mask to indicate which CPUs to suggest for the interrupt, use
+ *		NULL here to indicate to clear the value.
+ *
+ *	Use this function to recommend which CPU should handle the
+ *	interrupt to any userspace that uses /proc/irq/nn/smp_affinity_hint
+ *	in order to align interrupts. Pass NULL as the mask to clear the hint.
+ */
 int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
 {
 	unsigned long flags;
@@ -465,9 +475,6 @@  int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
 		return -EINVAL;
 	desc->affinity_hint = m;
 	irq_put_desc_unlock(desc, flags);
-	/* set the initial affinity to prevent every interrupt being on CPU0 */
-	if (m)
-		__irq_set_affinity(irq, m, false);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(irq_set_affinity_hint);