diff mbox

[v4,1/2] PCI: Add tango MSI controller support

Message ID b10a211d-1f4a-1cc1-7139-9b48a90dd1c6@free.fr (mailing list archive)
State New, archived
Headers show

Commit Message

Mason May 31, 2017, 2:18 p.m. UTC
On 24/05/2017 12:22, Marc Zyngier wrote:
> On 24/05/17 11:00, Robin Murphy wrote:
>> On 23/05/17 20:15, Mason wrote:
>>> On 23/05/2017 20:03, Robin Murphy wrote:
>>>> On 23/05/17 18:54, Mason wrote:
>>>>> On 23/05/2017 19:03, Bjorn Helgaas wrote:
>>>>>> On Wed, May 17, 2017 at 04:56:08PM +0200, Marc Gonzalez wrote:
>>>>>>> On 20/04/2017 16:28, Marc Gonzalez wrote:
>>>>>>>
>>>>>>>> +static int tango_set_affinity(struct irq_data *data,
>>>>>>>> +		const struct cpumask *mask, bool force)
>>>>>>>> +{
>>>>>>>> +	return -EINVAL;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static struct irq_chip tango_chip = {
>>>>>>>> +	.irq_ack		= tango_ack,
>>>>>>>> +	.irq_mask		= tango_mask,
>>>>>>>> +	.irq_unmask		= tango_unmask,
>>>>>>>> +	.irq_set_affinity	= tango_set_affinity,
>>>>>>>> +	.irq_compose_msi_msg	= tango_compose_msi_msg,
>>>>>>>> +};
>>>>>>>
>>>>>>> Hmmm... I'm wondering why .irq_set_affinity is required.
>>>>>>>
>>>>>>> static int setup_affinity(struct irq_desc *desc, struct cpumask *mask)
>>>>>>> first calls __irq_can_set_affinity() to check whether
>>>>>>> desc->irq_data.chip->irq_set_affinity) exists.
>>>>>>>
>>>>>>> then calls irq_do_set_affinity(&desc->irq_data, mask, false);
>>>>>>> which calls chip->irq_set_affinity(data, mask, force);
>>>>>>> = msi_domain_set_affinity()
>>>>>>> which calls parent->chip->irq_set_affinity() unconditionally.
>>>>>>>
>>>>>>> Would it make sense to test that the callback is implemented
>>>>>>> before calling it?
>>>>>>>
>>>>>>>
>>>>>>> [    0.723895] Unable to handle kernel NULL pointer dereference at virtual address 00000000
>>>>>>
>>>>>> I'm not sure what you're asking.
>>>>>>
>>>>>> Is this a bug report for the v4 tango driver?
>>>>>
>>>>> No.
>>>>>
>>>>>> Or are you asking whether msi_domain_set_affinity() should be changed
>>>>>> to check whether parent->chip->irq_set_affinity is implemented?
>>>>>
>>>>> Yes. The way things are implemented now, drivers are forced
>>>>> to define an irq_set_affinity callback, even if it just returns
>>>>> an error, otherwise, the kernel crashes, because of the
>>>>> unconditional function pointer deref. 
>>>>>
>>>>>> msi_domain_set_affinity() has called parent->chip->irq_set_affinity()
>>>>>> without checking since it was added in 2014 by f3cf8bb0d6c3 ("genirq: Add
>>>>>> generic msi irq domain support"), so if there's a problem here, it's most
>>>>>> likely in the tango code.
>>>>>
>>>>> The issue is having to define an "empty" function.
>>>>> (Unnecessary code bloat and maintenance.)
>>>>
>>>> AFAICS, only one driver (other than this one) implements a "do nothing"
>>>> set_affinity callback - everyone else who doesn't do anything hardware
>>>> specific just passes it along via irq_chip_set_affinity_parent().
>>>
>>> I counted 4. Where did I mess up?
>>>
>>> advk_msi_set_affinity
>>> altera_msi_set_affinity
>>> nwl_msi_set_affinity
>>> vmd_irq_set_affinity
>>> tango_set_affinity
>>
>> Fair point - I went through drivers/irqchip and the various
>> arch-specific instances and found ls_scfg_msi_set_affinity(), but
>> somehow skipped over drivers/pci. Anyway, I think the question stands of
>> why are these handful of drivers *not* using irq_chip_set_affinity_parent()?
> 
> Probably because they don't have a parent, in the hierarchical sense.
> All they have is a chained interrupt (*puke*). Which implies that
> changing the affinity of one MSI would affect all of them, completely
> confusing unsuspecting userspace such as irqbalance.
> 
>> As an outsider, it naively seems logical that the affinity of an MSI
>> which just gets translated to a wired interrupt should propagate through
>> to the affinity of that wired interrupt, but maybe there are reasons not
>> to; I really don't know.
> 
> See above. The main issue is that HW folks haven't understood the actual
> use of MSIs, and persist in implementing them as an afterthought, based
> on some cascading interrupt controller design.
> 
> Sad. So sad.

For the record, below is the patch I had in mind:




Then, it would be safe to remove "do-nothing" .irq_set_affinity
callbacks in drivers/pci/host

Regards.

Comments

Bjorn Helgaas May 31, 2017, 5:34 p.m. UTC | #1
On Wed, May 31, 2017 at 04:18:44PM +0200, Mason wrote:
> On 24/05/2017 12:22, Marc Zyngier wrote:
> > On 24/05/17 11:00, Robin Murphy wrote:
> >> On 23/05/17 20:15, Mason wrote:
> >>> On 23/05/2017 20:03, Robin Murphy wrote:
> >>>> On 23/05/17 18:54, Mason wrote:
> >>>>> On 23/05/2017 19:03, Bjorn Helgaas wrote:
> >>>>>> On Wed, May 17, 2017 at 04:56:08PM +0200, Marc Gonzalez wrote:
> >>>>>>> On 20/04/2017 16:28, Marc Gonzalez wrote:
> >>>>>>>
> >>>>>>>> +static int tango_set_affinity(struct irq_data *data,
> >>>>>>>> +		const struct cpumask *mask, bool force)
> >>>>>>>> +{
> >>>>>>>> +	return -EINVAL;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static struct irq_chip tango_chip = {
> >>>>>>>> +	.irq_ack		= tango_ack,
> >>>>>>>> +	.irq_mask		= tango_mask,
> >>>>>>>> +	.irq_unmask		= tango_unmask,
> >>>>>>>> +	.irq_set_affinity	= tango_set_affinity,
> >>>>>>>> +	.irq_compose_msi_msg	= tango_compose_msi_msg,
> >>>>>>>> +};
> >>>>>>>
> >>>>>>> Hmmm... I'm wondering why .irq_set_affinity is required.
> >>>>>>>
> >>>>>>> static int setup_affinity(struct irq_desc *desc, struct cpumask *mask)
> >>>>>>> first calls __irq_can_set_affinity() to check whether
> >>>>>>> desc->irq_data.chip->irq_set_affinity) exists.
> >>>>>>>
> >>>>>>> then calls irq_do_set_affinity(&desc->irq_data, mask, false);
> >>>>>>> which calls chip->irq_set_affinity(data, mask, force);
> >>>>>>> = msi_domain_set_affinity()
> >>>>>>> which calls parent->chip->irq_set_affinity() unconditionally.
> >>>>>>>
> >>>>>>> Would it make sense to test that the callback is implemented
> >>>>>>> before calling it?
> >>>>>>>
> >>>>>>>
> >>>>>>> [    0.723895] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> >>>>>>
> >>>>>> I'm not sure what you're asking.
> >>>>>>
> >>>>>> Is this a bug report for the v4 tango driver?
> >>>>>
> >>>>> No.
> >>>>>
> >>>>>> Or are you asking whether msi_domain_set_affinity() should be changed
> >>>>>> to check whether parent->chip->irq_set_affinity is implemented?
> >>>>>
> >>>>> Yes. The way things are implemented now, drivers are forced
> >>>>> to define an irq_set_affinity callback, even if it just returns
> >>>>> an error, otherwise, the kernel crashes, because of the
> >>>>> unconditional function pointer deref. 
> >>>>>
> >>>>>> msi_domain_set_affinity() has called parent->chip->irq_set_affinity()
> >>>>>> without checking since it was added in 2014 by f3cf8bb0d6c3 ("genirq: Add
> >>>>>> generic msi irq domain support"), so if there's a problem here, it's most
> >>>>>> likely in the tango code.
> >>>>>
> >>>>> The issue is having to define an "empty" function.
> >>>>> (Unnecessary code bloat and maintenance.)
> >>>>
> >>>> AFAICS, only one driver (other than this one) implements a "do nothing"
> >>>> set_affinity callback - everyone else who doesn't do anything hardware
> >>>> specific just passes it along via irq_chip_set_affinity_parent().
> >>>
> >>> I counted 4. Where did I mess up?
> >>>
> >>> advk_msi_set_affinity
> >>> altera_msi_set_affinity
> >>> nwl_msi_set_affinity
> >>> vmd_irq_set_affinity
> >>> tango_set_affinity
> >>
> >> Fair point - I went through drivers/irqchip and the various
> >> arch-specific instances and found ls_scfg_msi_set_affinity(), but
> >> somehow skipped over drivers/pci. Anyway, I think the question stands of
> >> why are these handful of drivers *not* using irq_chip_set_affinity_parent()?
> > 
> > Probably because they don't have a parent, in the hierarchical sense.
> > All they have is a chained interrupt (*puke*). Which implies that
> > changing the affinity of one MSI would affect all of them, completely
> > confusing unsuspecting userspace such as irqbalance.
> > 
> >> As an outsider, it naively seems logical that the affinity of an MSI
> >> which just gets translated to a wired interrupt should propagate through
> >> to the affinity of that wired interrupt, but maybe there are reasons not
> >> to; I really don't know.
> > 
> > See above. The main issue is that HW folks haven't understood the actual
> > use of MSIs, and persist in implementing them as an afterthought, based
> > on some cascading interrupt controller design.
> > 
> > Sad. So sad.
> 
> For the record, below is the patch I had in mind:
> 
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index 8a3e872798f3..edfc95575a37 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -91,9 +91,11 @@ int msi_domain_set_affinity(struct irq_data *irq_data,
>  {
>         struct irq_data *parent = irq_data->parent_data;
>         struct msi_msg msg;
> -       int ret;
> +       int ret = -EINVAL;
> +
> +       if (parent->chip->irq_set_affinity)
> +               ret = parent->chip->irq_set_affinity(parent, mask, force);
>  
> -       ret = parent->chip->irq_set_affinity(parent, mask, force);
>         if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
>                 BUG_ON(irq_chip_compose_msi_msg(irq_data, &msg));
>                 irq_chip_write_msi_msg(irq_data, &msg);
> 
> 
> 
> Then, it would be safe to remove "do-nothing" .irq_set_affinity
> callbacks in drivers/pci/host

I assume it's obvious that nobody else is going to take this and run
with it, so nothing will happen with this unless you finish it up by
doing the cleanup (removing the "do-nothing" callbacks) and adding a
changelog and signed-off-by.

This would be more an IRQ patch than a PCI patch, but if I were
reviewing it, I would look for assurance that *all* the no-op
.irq_set_affinity callbacks were cleaned up, not just those in
drivers/pci/host.

Bjorn
Mason May 31, 2017, 6:49 p.m. UTC | #2
On 31/05/2017 19:34, Bjorn Helgaas wrote:
> On Wed, May 31, 2017 at 04:18:44PM +0200, Mason wrote:
>
>> For the record, below is the patch I had in mind:
>>
>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>> index 8a3e872798f3..edfc95575a37 100644
>> --- a/kernel/irq/msi.c
>> +++ b/kernel/irq/msi.c
>> @@ -91,9 +91,11 @@ int msi_domain_set_affinity(struct irq_data *irq_data,
>>  {
>>         struct irq_data *parent = irq_data->parent_data;
>>         struct msi_msg msg;
>> -       int ret;
>> +       int ret = -EINVAL;
>> +
>> +       if (parent->chip->irq_set_affinity)
>> +               ret = parent->chip->irq_set_affinity(parent, mask, force);
>>  
>> -       ret = parent->chip->irq_set_affinity(parent, mask, force);
>>         if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
>>                 BUG_ON(irq_chip_compose_msi_msg(irq_data, &msg));
>>                 irq_chip_write_msi_msg(irq_data, &msg);
>>
>>
>>
>> Then, it would be safe to remove "do-nothing" .irq_set_affinity
>> callbacks in drivers/pci/host
> 
> I assume it's obvious that nobody else is going to take this and run
> with it, so nothing will happen with this unless you finish it up by
> doing the cleanup (removing the "do-nothing" callbacks) and adding a
> changelog and signed-off-by.

Smirk.

> This would be more an IRQ patch than a PCI patch, but if I were
> reviewing it, I would look for assurance that *all* the no-op
> .irq_set_affinity callbacks were cleaned up, not just those in
> drivers/pci/host.

Are you saying the patch is *wrong* if not all "do-nothing"
callbacks are cleaned up?

Regards.
Bjorn Helgaas May 31, 2017, 7 p.m. UTC | #3
On Wed, May 31, 2017 at 08:49:04PM +0200, Mason wrote:
> On 31/05/2017 19:34, Bjorn Helgaas wrote:
> ...

> > This would be more an IRQ patch than a PCI patch, but if I were
> > reviewing it, I would look for assurance that *all* the no-op
> > .irq_set_affinity callbacks were cleaned up, not just those in
> > drivers/pci/host.
> 
> Are you saying the patch is *wrong* if not all "do-nothing"
> callbacks are cleaned up?

I'm saying that (1) this probably wouldn't be applied via the PCI
tree, and (2) if it *were* applied via PCI, I would ask that all the
no-op callbacks were cleaned up at the same time.

Huh, that sounds a lot like what I wrote above.  Was I unclear?

Bjorn
Bjorn Helgaas May 31, 2017, 7:12 p.m. UTC | #4
On Wed, May 31, 2017 at 02:00:37PM -0500, Bjorn Helgaas wrote:
> On Wed, May 31, 2017 at 08:49:04PM +0200, Mason wrote:
> > On 31/05/2017 19:34, Bjorn Helgaas wrote:
> > ...
> 
> > > This would be more an IRQ patch than a PCI patch, but if I were
> > > reviewing it, I would look for assurance that *all* the no-op
> > > .irq_set_affinity callbacks were cleaned up, not just those in
> > > drivers/pci/host.
> > 
> > Are you saying the patch is *wrong* if not all "do-nothing"
> > callbacks are cleaned up?
> 
> I'm saying that (1) this probably wouldn't be applied via the PCI
> tree, and (2) if it *were* applied via PCI, I would ask that all the
> no-op callbacks were cleaned up at the same time.
> 
> Huh, that sounds a lot like what I wrote above.  Was I unclear?

I'm afraid this sounded snarky, which isn't my intention.  It seems
like there's a useful patch here, and I didn't want to see it get
ignored for lack of following the usual process.  If this is all
obvious to you, my apologies and please ignore my suggestion.

Bjorn
Mason May 31, 2017, 7:27 p.m. UTC | #5
On 31/05/2017 21:12, Bjorn Helgaas wrote:
> On Wed, May 31, 2017 at 02:00:37PM -0500, Bjorn Helgaas wrote:
>> On Wed, May 31, 2017 at 08:49:04PM +0200, Mason wrote:
>>> On 31/05/2017 19:34, Bjorn Helgaas wrote:
>>> ...
>>
>>>> This would be more an IRQ patch than a PCI patch, but if I were
>>>> reviewing it, I would look for assurance that *all* the no-op
>>>> .irq_set_affinity callbacks were cleaned up, not just those in
>>>> drivers/pci/host.
>>>
>>> Are you saying the patch is *wrong* if not all "do-nothing"
>>> callbacks are cleaned up?
>>
>> I'm saying that (1) this probably wouldn't be applied via the PCI
>> tree, and (2) if it *were* applied via PCI, I would ask that all the
>> no-op callbacks were cleaned up at the same time.
>>
>> Huh, that sounds a lot like what I wrote above.  Was I unclear?
> 
> I'm afraid this sounded snarky, which isn't my intention.  It seems
> like there's a useful patch here, and I didn't want to see it get
> ignored for lack of following the usual process.  If this is all
> obvious to you, my apologies and please ignore my suggestion.

Thanks for clearing things up. I had indeed assumed from
your first reply that the patch was pointless.

Writing a script locating all candidates will be an
interesting exercise.

Regards.
Bjorn Helgaas May 31, 2017, 8:04 p.m. UTC | #6
On Wed, May 31, 2017 at 09:27:50PM +0200, Mason wrote:
> On 31/05/2017 21:12, Bjorn Helgaas wrote:
> > On Wed, May 31, 2017 at 02:00:37PM -0500, Bjorn Helgaas wrote:
> >> On Wed, May 31, 2017 at 08:49:04PM +0200, Mason wrote:
> >>> On 31/05/2017 19:34, Bjorn Helgaas wrote:
> >>> ...
> >>
> >>>> This would be more an IRQ patch than a PCI patch, but if I were
> >>>> reviewing it, I would look for assurance that *all* the no-op
> >>>> .irq_set_affinity callbacks were cleaned up, not just those in
> >>>> drivers/pci/host.
> >>>
> >>> Are you saying the patch is *wrong* if not all "do-nothing"
> >>> callbacks are cleaned up?
> >>
> >> I'm saying that (1) this probably wouldn't be applied via the PCI
> >> tree, and (2) if it *were* applied via PCI, I would ask that all the
> >> no-op callbacks were cleaned up at the same time.
> >>
> >> Huh, that sounds a lot like what I wrote above.  Was I unclear?
> > 
> > I'm afraid this sounded snarky, which isn't my intention.  It seems
> > like there's a useful patch here, and I didn't want to see it get
> > ignored for lack of following the usual process.  If this is all
> > obvious to you, my apologies and please ignore my suggestion.
> 
> Thanks for clearing things up. I had indeed assumed from
> your first reply that the patch was pointless.
> 
> Writing a script locating all candidates will be an
> interesting exercise.

Cscope only sees 94 definitions of irq_set_affinity.  I know *I* could
never write a script faster than looking at them manually.  While
doing that, I noticed irq_chip_set_affinity_parent(), which is used in
14 cases and appears similar to your patch.
diff mbox

Patch

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 8a3e872798f3..edfc95575a37 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -91,9 +91,11 @@  int msi_domain_set_affinity(struct irq_data *irq_data,
 {
        struct irq_data *parent = irq_data->parent_data;
        struct msi_msg msg;
-       int ret;
+       int ret = -EINVAL;
+
+       if (parent->chip->irq_set_affinity)
+               ret = parent->chip->irq_set_affinity(parent, mask, force);
 
-       ret = parent->chip->irq_set_affinity(parent, mask, force);
        if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
                BUG_ON(irq_chip_compose_msi_msg(irq_data, &msg));
                irq_chip_write_msi_msg(irq_data, &msg);