diff mbox series

[v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated

Message ID 20210608080409.1729276-1-javierm@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated | expand

Commit Message

Javier Martinez Canillas June 8, 2021, 8:04 a.m. UTC
IRQ handlers that are registered for shared interrupts can be called at
any time after have been registered using the request_irq() function.

It's up to drivers to ensure that's always safe for these to be called.

Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
their handlers are registered very early in the probe function, an error
later can lead to these handlers being executed before all the required
resources have been properly setup.

For example, the rockchip_pcie_read() function used by these IRQ handlers
expects that some PCIe clocks will already be enabled, otherwise trying
to access the PCIe registers causes the read to hang and never return.

The CONFIG_DEBUG_SHIRQ option tests if drivers are able to cope with their
shared interrupt handlers being called, by generating a spurious interrupt
just before a shared interrupt handler is unregistered.

But this means that if the option is enabled, any error in the probe path
of this driver could lead to one of the IRQ handlers to be executed.

In a rockpro64 board, the following sequence of events happens:

  1) "pcie-sys" IRQ is requested and its handler registered.
  2) "pcie-client" IRQ is requested and its handler registered.
  3) probe later fails due readl_poll_timeout() returning a timeout.
  4) the "pcie-sys" IRQ is unregistered.
  5) CONFIG_DEBUG_SHIRQ triggers a spurious interrupt.
  6) "pcie-client" IRQ handler is called for this spurious interrupt.
  7) IRQ handler tries to read PCIE_CLIENT_INT_STATUS with clocks gated.
  8) the machine hangs because rockchip_pcie_read() call never returns.

To avoid cases like this, the handlers don't have to be registered until
very late in the probe function, once all the resources have been setup.

So let's just move all the IRQ init before the pci_host_probe() call, that
will prevent issues like this and seems to be the correct thing to do too.

Reported-by: Peter Robinson <pbrobinson@gmail.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Shawn Lin <shawn.lin@rock-chips.com>
---

Changes in v2:
- Add missing word in the commit message.
- Include Shawn Lin's Acked-by tag.

 drivers/pci/controller/pcie-rockchip-host.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Peter Robinson June 12, 2021, 10:02 p.m. UTC | #1
On Tue, Jun 8, 2021 at 9:04 AM Javier Martinez Canillas
<javierm@redhat.com> wrote:
>
> IRQ handlers that are registered for shared interrupts can be called at
> any time after have been registered using the request_irq() function.
>
> It's up to drivers to ensure that's always safe for these to be called.
>
> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
> their handlers are registered very early in the probe function, an error
> later can lead to these handlers being executed before all the required
> resources have been properly setup.
>
> For example, the rockchip_pcie_read() function used by these IRQ handlers
> expects that some PCIe clocks will already be enabled, otherwise trying
> to access the PCIe registers causes the read to hang and never return.
>
> The CONFIG_DEBUG_SHIRQ option tests if drivers are able to cope with their
> shared interrupt handlers being called, by generating a spurious interrupt
> just before a shared interrupt handler is unregistered.
>
> But this means that if the option is enabled, any error in the probe path
> of this driver could lead to one of the IRQ handlers to be executed.
>
> In a rockpro64 board, the following sequence of events happens:
>
>   1) "pcie-sys" IRQ is requested and its handler registered.
>   2) "pcie-client" IRQ is requested and its handler registered.
>   3) probe later fails due readl_poll_timeout() returning a timeout.
>   4) the "pcie-sys" IRQ is unregistered.
>   5) CONFIG_DEBUG_SHIRQ triggers a spurious interrupt.
>   6) "pcie-client" IRQ handler is called for this spurious interrupt.
>   7) IRQ handler tries to read PCIE_CLIENT_INT_STATUS with clocks gated.
>   8) the machine hangs because rockchip_pcie_read() call never returns.
>
> To avoid cases like this, the handlers don't have to be registered until
> very late in the probe function, once all the resources have been setup.
>
> So let's just move all the IRQ init before the pci_host_probe() call, that
> will prevent issues like this and seems to be the correct thing to do too.
>
> Reported-by: Peter Robinson <pbrobinson@gmail.com>
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> Acked-by: Shawn Lin <shawn.lin@rock-chips.com>
Tested-by: Peter Robinson <pbrobinson@gmail.com>

Tested on a Rock960, Firefly3399 and a Pinebook Pro

> ---
>
> Changes in v2:
> - Add missing word in the commit message.
> - Include Shawn Lin's Acked-by tag.
>
>  drivers/pci/controller/pcie-rockchip-host.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c
> index f1d08a1b159..78d04ac29cd 100644
> --- a/drivers/pci/controller/pcie-rockchip-host.c
> +++ b/drivers/pci/controller/pcie-rockchip-host.c
> @@ -592,10 +592,6 @@ static int rockchip_pcie_parse_host_dt(struct rockchip_pcie *rockchip)
>         if (err)
>                 return err;
>
> -       err = rockchip_pcie_setup_irq(rockchip);
> -       if (err)
> -               return err;
> -
>         rockchip->vpcie12v = devm_regulator_get_optional(dev, "vpcie12v");
>         if (IS_ERR(rockchip->vpcie12v)) {
>                 if (PTR_ERR(rockchip->vpcie12v) != -ENODEV)
> @@ -973,8 +969,6 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
>         if (err)
>                 goto err_vpcie;
>
> -       rockchip_pcie_enable_interrupts(rockchip);
> -
>         err = rockchip_pcie_init_irq_domain(rockchip);
>         if (err < 0)
>                 goto err_deinit_port;
> @@ -992,6 +986,12 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
>         bridge->sysdata = rockchip;
>         bridge->ops = &rockchip_pcie_ops;
>
> +       err = rockchip_pcie_setup_irq(rockchip);
> +       if (err)
> +               goto err_remove_irq_domain;
> +
> +       rockchip_pcie_enable_interrupts(rockchip);
> +
>         err = pci_host_probe(bridge);
>         if (err < 0)
>                 goto err_remove_irq_domain;
> --
> 2.31.1
>
Lorenzo Pieralisi June 22, 2021, 10:31 a.m. UTC | #2
On Tue, 8 Jun 2021 10:04:09 +0200, Javier Martinez Canillas wrote:
> IRQ handlers that are registered for shared interrupts can be called at
> any time after have been registered using the request_irq() function.
> 
> It's up to drivers to ensure that's always safe for these to be called.
> 
> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
> their handlers are registered very early in the probe function, an error
> later can lead to these handlers being executed before all the required
> resources have been properly setup.
> 
> [...]

Applied to pci/rockchip, thanks!

[1/1] PCI: rockchip: Avoid accessing PCIe registers with clocks gated
      https://git.kernel.org/lpieralisi/pci/c/c025f5b2e5

Thanks,
Lorenzo
Bjorn Helgaas June 24, 2021, 9:57 p.m. UTC | #3
On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
> IRQ handlers that are registered for shared interrupts can be called at
> any time after have been registered using the request_irq() function.
> 
> It's up to drivers to ensure that's always safe for these to be called.
> 
> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
> their handlers are registered very early in the probe function, an error
> later can lead to these handlers being executed before all the required
> resources have been properly setup.
> 
> For example, the rockchip_pcie_read() function used by these IRQ handlers
> expects that some PCIe clocks will already be enabled, otherwise trying
> to access the PCIe registers causes the read to hang and never return.

The read *never* completes?  That might be a bit problematic because
it implies that we may not be able to recover from PCIe errors.  Most
controllers will timeout eventually, log an error, and either
fabricate some data (typically ~0) to complete the CPU's read or cause
some kind of abort or machine check.

Just asking in case there's some controller configuration that should
be tweaked.
Bjorn Helgaas June 24, 2021, 10:40 p.m. UTC | #4
[+cc Michal, Ley Foon, Jingoo, Thierry, Jonathan]

On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
> IRQ handlers that are registered for shared interrupts can be called at
> any time after have been registered using the request_irq() function.
> 
> It's up to drivers to ensure that's always safe for these to be called.
> 
> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
> their handlers are registered very early in the probe function, an error
> later can lead to these handlers being executed before all the required
> resources have been properly setup.
> 
> For example, the rockchip_pcie_read() function used by these IRQ handlers
> expects that some PCIe clocks will already be enabled, otherwise trying
> to access the PCIe registers causes the read to hang and never return.
> 
> The CONFIG_DEBUG_SHIRQ option tests if drivers are able to cope with their
> shared interrupt handlers being called, by generating a spurious interrupt
> just before a shared interrupt handler is unregistered.
> 
> But this means that if the option is enabled, any error in the probe path
> of this driver could lead to one of the IRQ handlers to be executed.

I'm not an IRQ expert, but I think this is an issue regardless of
CONFIG_DEBUG_SHIRQ, isn't it?  Anything used by an IRQ handler should
be initialized before the handler is registered.  CONFIG_DEBUG_SHIRQ
is just a way to help find latent problems.

> In a rockpro64 board, the following sequence of events happens:
> 
>   1) "pcie-sys" IRQ is requested and its handler registered.
>   2) "pcie-client" IRQ is requested and its handler registered.
>   3) probe later fails due readl_poll_timeout() returning a timeout.
>   4) the "pcie-sys" IRQ is unregistered.
>   5) CONFIG_DEBUG_SHIRQ triggers a spurious interrupt.
>   6) "pcie-client" IRQ handler is called for this spurious interrupt.
>   7) IRQ handler tries to read PCIE_CLIENT_INT_STATUS with clocks gated.
>   8) the machine hangs because rockchip_pcie_read() call never returns.
> 
> To avoid cases like this, the handlers don't have to be registered until
> very late in the probe function, once all the resources have been setup.
> 
> So let's just move all the IRQ init before the pci_host_probe() call, that
> will prevent issues like this and seems to be the correct thing to do too.

Previously we registered rockchip_pcie_subsys_irq_handler() and
rockchip_pcie_client_irq_handler() before the PCIe clocks were
enabled.  That's a problem because they depend on those clocks being
enabled, and your patch fixes that.

rockchip_pcie_legacy_int_handler() depends on rockchip->irq_domain,
which isn't initialized until rockchip_pcie_init_irq_domain().
Previously we registered rockchip_pcie_legacy_int_handler() as the
handler for the "legacy" IRQ before rockchip_pcie_init_irq_domain().

I think you patch *also* fixes that problem, right?

I think this is also an issue with the following other drivers.  They all
set the handler to something that uses an IRQ domain before they
actually initialize the domain:

  # nwl --------------------------------------------------------------
  nwl_pcie_probe
    nwl_pcie_parse_dt
      pcie->irq_intx = platform_get_irq_byname(pdev, "intx")
      irq_set_chained_handler_and_data(pcie->irq_intx, nwl_pcie_leg_handler)
    nwl_pcie_bridge_init
      devm_request_irq
    nwl_pcie_init_irq_domain
      pcie->legacy_irq_domain = irq_domain_add_linear()

  nwl_pcie_leg_handler
    irq_find_mapping(pcie->legacy_irq_domain)


  # altera --------------------------------------------------------------
  altera_pcie_probe
    altera_pcie_parse_dt
      irq_set_chained_handler_and_data(altera_pcie_isr)
    altera_pcie_init_irq_domain
      pcie->irq_domain = irq_domain_add_linear()

  altera_pcie_isr
    irq_find_mapping(pcie->irq_domain)


  # ks --------------------------------------------------------------
  ks_pcie_config_legacy_irq
    irq_set_chained_handler_and_data(ks_pcie_legacy_irq_handler)
    ks_pcie->legacy_irq_domain = irq_domain_add_linear()

  ks_pcie_legacy_irq_handler
    ks_pcie_handle_legacy_irqirq_linear_revmap(ks_pcie->legacy_irq_domain)


  # tegra --------------------------------------------------------------
  tegra_pcie_msi_setup
    if (IS_ENABLED(CONFIG_PCI_MSI)) {
      tegra_allocate_domains
	parent = irq_domain_create_linear()
	msi->domain = pci_msi_create_irq_domain(parent)
    }
    irq_set_chained_handler_and_data(tegra_pcie_msi_irq)

  tegra_pcie_msi_irq
    irq_find_mapping(msi->domain->parent)


Tegra is a little wierd because the IS_ENABLED(CONFIG_PCI_MSI) only
covers the IRQ domain alloc; it doesn't cover setting the handler.  So
if CONFIG_PCI_MSI is not set, we don't alloc the IRQ domain, but we
still set a handler that *uses* the IRQ domain.  That seems somewhat
broken.

> Reported-by: Peter Robinson <pbrobinson@gmail.com>
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> Acked-by: Shawn Lin <shawn.lin@rock-chips.com>
> ---
> 
> Changes in v2:
> - Add missing word in the commit message.
> - Include Shawn Lin's Acked-by tag.
> 
>  drivers/pci/controller/pcie-rockchip-host.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c
> index f1d08a1b159..78d04ac29cd 100644
> --- a/drivers/pci/controller/pcie-rockchip-host.c
> +++ b/drivers/pci/controller/pcie-rockchip-host.c
> @@ -592,10 +592,6 @@ static int rockchip_pcie_parse_host_dt(struct rockchip_pcie *rockchip)
>  	if (err)
>  		return err;
>  
> -	err = rockchip_pcie_setup_irq(rockchip);
> -	if (err)
> -		return err;
> -
>  	rockchip->vpcie12v = devm_regulator_get_optional(dev, "vpcie12v");
>  	if (IS_ERR(rockchip->vpcie12v)) {
>  		if (PTR_ERR(rockchip->vpcie12v) != -ENODEV)
> @@ -973,8 +969,6 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
>  	if (err)
>  		goto err_vpcie;
>  
> -	rockchip_pcie_enable_interrupts(rockchip);
> -
>  	err = rockchip_pcie_init_irq_domain(rockchip);
>  	if (err < 0)
>  		goto err_deinit_port;
> @@ -992,6 +986,12 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
>  	bridge->sysdata = rockchip;
>  	bridge->ops = &rockchip_pcie_ops;
>  
> +	err = rockchip_pcie_setup_irq(rockchip);
> +	if (err)
> +		goto err_remove_irq_domain;
> +
> +	rockchip_pcie_enable_interrupts(rockchip);
> +
>  	err = pci_host_probe(bridge);
>  	if (err < 0)
>  		goto err_remove_irq_domain;
> -- 
> 2.31.1
>
Robin Murphy June 24, 2021, 11:18 p.m. UTC | #5
On 2021-06-24 22:57, Bjorn Helgaas wrote:
> On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
>> IRQ handlers that are registered for shared interrupts can be called at
>> any time after have been registered using the request_irq() function.
>>
>> It's up to drivers to ensure that's always safe for these to be called.
>>
>> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
>> their handlers are registered very early in the probe function, an error
>> later can lead to these handlers being executed before all the required
>> resources have been properly setup.
>>
>> For example, the rockchip_pcie_read() function used by these IRQ handlers
>> expects that some PCIe clocks will already be enabled, otherwise trying
>> to access the PCIe registers causes the read to hang and never return.
> 
> The read *never* completes?  That might be a bit problematic because
> it implies that we may not be able to recover from PCIe errors.  Most
> controllers will timeout eventually, log an error, and either
> fabricate some data (typically ~0) to complete the CPU's read or cause
> some kind of abort or machine check.
> 
> Just asking in case there's some controller configuration that should
> be tweaked.

If I'm following correctly, that'll be a read transaction to the native 
side of the controller itself; it can't complete that read, or do 
anything else either, because it's clock-gated, and thus completely 
oblivious (it might be that if another CPU was able to enable the clocks 
then everything would carry on as normal, or it might end up totally 
deadlocking the SoC interconnect). I think it's safe to assume that in 
that state nothing of importance would be happening on the PCIe side, 
and even if it was we'd never get to know about it.

The only relevant configuration would be "don't turn the clocks off if 
you're using the thing", which in actual operation can be taken for 
granted. It's a fairly typical bug to register an IRQ as shared but 
assume in the handler that you'll only ever be called for your own 
device's IRQ while it's powered up/clocked/etc. in its normal 
operational state, hence CONFIG_DEBUG_SHIRQ helps flush those kinds of 
unreliable assumptions out.

Robin.

(this reminds me of the "fun" I once had where a machine was locking up 
during boot, but simply connecting an external debugger to find out 
exactly where it was stuck happened to automatically enable the 
offending power domain and un-stick it)
Bjorn Helgaas June 24, 2021, 11:28 p.m. UTC | #6
On Fri, Jun 25, 2021 at 12:18:48AM +0100, Robin Murphy wrote:
> On 2021-06-24 22:57, Bjorn Helgaas wrote:
> > On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
> > > IRQ handlers that are registered for shared interrupts can be called at
> > > any time after have been registered using the request_irq() function.
> > > 
> > > It's up to drivers to ensure that's always safe for these to be called.
> > > 
> > > Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
> > > their handlers are registered very early in the probe function, an error
> > > later can lead to these handlers being executed before all the required
> > > resources have been properly setup.
> > > 
> > > For example, the rockchip_pcie_read() function used by these IRQ handlers
> > > expects that some PCIe clocks will already be enabled, otherwise trying
> > > to access the PCIe registers causes the read to hang and never return.
> > 
> > The read *never* completes?  That might be a bit problematic because
> > it implies that we may not be able to recover from PCIe errors.  Most
> > controllers will timeout eventually, log an error, and either
> > fabricate some data (typically ~0) to complete the CPU's read or cause
> > some kind of abort or machine check.
> > 
> > Just asking in case there's some controller configuration that should
> > be tweaked.
> 
> If I'm following correctly, that'll be a read transaction to the native side
> of the controller itself; it can't complete that read, or do anything else
> either, because it's clock-gated, and thus completely oblivious (it might be
> that if another CPU was able to enable the clocks then everything would
> carry on as normal, or it might end up totally deadlocking the SoC
> interconnect). I think it's safe to assume that in that state nothing of
> importance would be happening on the PCIe side, and even if it was we'd
> never get to know about it.

Oh, right, that makes sense.  I was thinking about the PCIe side, but
if the controller itself isn't working, of course we wouldn't get that
far.

I would expect that the CPU itself would have some kind of timeout for
the read, but that's far outside of the PCI world.

Bjorn
Robin Murphy June 24, 2021, 11:51 p.m. UTC | #7
On 2021-06-25 00:28, Bjorn Helgaas wrote:
> On Fri, Jun 25, 2021 at 12:18:48AM +0100, Robin Murphy wrote:
>> On 2021-06-24 22:57, Bjorn Helgaas wrote:
>>> On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
>>>> IRQ handlers that are registered for shared interrupts can be called at
>>>> any time after have been registered using the request_irq() function.
>>>>
>>>> It's up to drivers to ensure that's always safe for these to be called.
>>>>
>>>> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
>>>> their handlers are registered very early in the probe function, an error
>>>> later can lead to these handlers being executed before all the required
>>>> resources have been properly setup.
>>>>
>>>> For example, the rockchip_pcie_read() function used by these IRQ handlers
>>>> expects that some PCIe clocks will already be enabled, otherwise trying
>>>> to access the PCIe registers causes the read to hang and never return.
>>>
>>> The read *never* completes?  That might be a bit problematic because
>>> it implies that we may not be able to recover from PCIe errors.  Most
>>> controllers will timeout eventually, log an error, and either
>>> fabricate some data (typically ~0) to complete the CPU's read or cause
>>> some kind of abort or machine check.
>>>
>>> Just asking in case there's some controller configuration that should
>>> be tweaked.
>>
>> If I'm following correctly, that'll be a read transaction to the native side
>> of the controller itself; it can't complete that read, or do anything else
>> either, because it's clock-gated, and thus completely oblivious (it might be
>> that if another CPU was able to enable the clocks then everything would
>> carry on as normal, or it might end up totally deadlocking the SoC
>> interconnect). I think it's safe to assume that in that state nothing of
>> importance would be happening on the PCIe side, and even if it was we'd
>> never get to know about it.
> 
> Oh, right, that makes sense.  I was thinking about the PCIe side, but
> if the controller itself isn't working, of course we wouldn't get that
> far.
> 
> I would expect that the CPU itself would have some kind of timeout for
> the read, but that's far outside of the PCI world.

Nah, in AMBA I'm not sure if it's even legal to abandon a transaction 
without waiting for the handshake to complete. If you're lucky the 
interconnect might have a clock/power domain bridge which can reply with 
an error when it knows its other side isn't running, otherwise the 
initiator will just happily sit there waiting for a response to come 
back "in a timely manner" :)

Robin.
Javier Martinez Canillas June 25, 2021, 7:09 a.m. UTC | #8
Hello Bjorn,

On 6/25/21 12:40 AM, Bjorn Helgaas wrote:
> [+cc Michal, Ley Foon, Jingoo, Thierry, Jonathan]
> 
> On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
>> IRQ handlers that are registered for shared interrupts can be called at
>> any time after have been registered using the request_irq() function.
>>
>> It's up to drivers to ensure that's always safe for these to be called.
>>
>> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
>> their handlers are registered very early in the probe function, an error
>> later can lead to these handlers being executed before all the required
>> resources have been properly setup.
>>
>> For example, the rockchip_pcie_read() function used by these IRQ handlers
>> expects that some PCIe clocks will already be enabled, otherwise trying
>> to access the PCIe registers causes the read to hang and never return.
>>
>> The CONFIG_DEBUG_SHIRQ option tests if drivers are able to cope with their
>> shared interrupt handlers being called, by generating a spurious interrupt
>> just before a shared interrupt handler is unregistered.
>>
>> But this means that if the option is enabled, any error in the probe path
>> of this driver could lead to one of the IRQ handlers to be executed.
> 
> I'm not an IRQ expert, but I think this is an issue regardless of
> CONFIG_DEBUG_SHIRQ, isn't it?  Anything used by an IRQ handler should
> be initialized before the handler is registered.  CONFIG_DEBUG_SHIRQ
> is just a way to help find latent problems.
>

Yes, it's an issue regardless. It's just that this debug option tests if the
drivers aren't making the wrong assumption, exactly to find issues like this.

>> In a rockpro64 board, the following sequence of events happens:
>>
>>   1) "pcie-sys" IRQ is requested and its handler registered.
>>   2) "pcie-client" IRQ is requested and its handler registered.
>>   3) probe later fails due readl_poll_timeout() returning a timeout.
>>   4) the "pcie-sys" IRQ is unregistered.
>>   5) CONFIG_DEBUG_SHIRQ triggers a spurious interrupt.
>>   6) "pcie-client" IRQ handler is called for this spurious interrupt.
>>   7) IRQ handler tries to read PCIE_CLIENT_INT_STATUS with clocks gated.
>>   8) the machine hangs because rockchip_pcie_read() call never returns.
>>
>> To avoid cases like this, the handlers don't have to be registered until
>> very late in the probe function, once all the resources have been setup.
>>
>> So let's just move all the IRQ init before the pci_host_probe() call, that
>> will prevent issues like this and seems to be the correct thing to do too.
> 
> Previously we registered rockchip_pcie_subsys_irq_handler() and
> rockchip_pcie_client_irq_handler() before the PCIe clocks were
> enabled.  That's a problem because they depend on those clocks being
> enabled, and your patch fixes that.
>
> rockchip_pcie_legacy_int_handler() depends on rockchip->irq_domain,
> which isn't initialized until rockchip_pcie_init_irq_domain().
> Previously we registered rockchip_pcie_legacy_int_handler() as the
> handler for the "legacy" IRQ before rockchip_pcie_init_irq_domain().
> 
> I think you patch *also* fixes that problem, right?
>

Correct, that's why I moved the initialization and IRQ enable after that.
 
> I think this is also an issue with the following other drivers.  They all
> set the handler to something that uses an IRQ domain before they
> actually initialize the domain:

Yes, I agreed with your assessment and also noticed that others drivers have
similar issues. I just don't have any of those platforms to try to reproduce
the bugs and test a fix.

Best regards,
Bjorn Helgaas June 25, 2021, 2:32 p.m. UTC | #9
On Fri, Jun 25, 2021 at 09:09:36AM +0200, Javier Martinez Canillas wrote:
> On 6/25/21 12:40 AM, Bjorn Helgaas wrote:

> > I think this is also an issue with the following other drivers.  They all
> > set the handler to something that uses an IRQ domain before they
> > actually initialize the domain:
> 
> Yes, I agreed with your assessment and also noticed that others drivers have
> similar issues. I just don't have any of those platforms to try to reproduce
> the bugs and test a fix.

Even if you don't have other platforms for testing, I'm thrilled when
folks point out issues with them and (given time and inclination) post
patches for them.

I'd much rather fix *all* instances of the problem than just one, even
if we can't test them all.  Frequently driver maintainers will review
and test patches for their hardware even if we can't.

Bjorn
Javier Martinez Canillas June 25, 2021, 6:34 p.m. UTC | #10
On 6/25/21 4:32 PM, Bjorn Helgaas wrote:
> On Fri, Jun 25, 2021 at 09:09:36AM +0200, Javier Martinez Canillas wrote:
>> On 6/25/21 12:40 AM, Bjorn Helgaas wrote:
> 
>>> I think this is also an issue with the following other drivers.  They all
>>> set the handler to something that uses an IRQ domain before they
>>> actually initialize the domain:
>>
>> Yes, I agreed with your assessment and also noticed that others drivers have
>> similar issues. I just don't have any of those platforms to try to reproduce
>> the bugs and test a fix.
> 
> Even if you don't have other platforms for testing, I'm thrilled when
> folks point out issues with them and (given time and inclination) post
> patches for them.
> 
> I'd much rather fix *all* instances of the problem than just one, even
> if we can't test them all.  Frequently driver maintainers will review
> and test patches for their hardware even if we can't.
>

Ok. I'll try to make some time next week to look at these drivers then
and post at least RFC/RFT patches for people with available hardware to
test the changes.

Best regards,
Bjorn Helgaas June 29, 2021, 12:38 a.m. UTC | #11
On Thu, Jun 24, 2021 at 05:40:40PM -0500, Bjorn Helgaas wrote:
> [+cc Michal, Ley Foon, Jingoo, Thierry, Jonathan]
> 
> On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
> > IRQ handlers that are registered for shared interrupts can be called at
> > any time after have been registered using the request_irq() function.
> > 
> > It's up to drivers to ensure that's always safe for these to be called.
> > 
> > Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
> > their handlers are registered very early in the probe function, an error
> > later can lead to these handlers being executed before all the required
> > resources have been properly setup.
> > 
> > For example, the rockchip_pcie_read() function used by these IRQ handlers
> > expects that some PCIe clocks will already be enabled, otherwise trying
> > to access the PCIe registers causes the read to hang and never return.
> > 
> > The CONFIG_DEBUG_SHIRQ option tests if drivers are able to cope with their
> > shared interrupt handlers being called, by generating a spurious interrupt
> > just before a shared interrupt handler is unregistered.
> > 
> > But this means that if the option is enabled, any error in the probe path
> > of this driver could lead to one of the IRQ handlers to be executed.
> 
> I'm not an IRQ expert, but I think this is an issue regardless of
> CONFIG_DEBUG_SHIRQ, isn't it?  Anything used by an IRQ handler should
> be initialized before the handler is registered.  CONFIG_DEBUG_SHIRQ
> is just a way to help find latent problems.
> 
> > In a rockpro64 board, the following sequence of events happens:
> > 
> >   1) "pcie-sys" IRQ is requested and its handler registered.
> >   2) "pcie-client" IRQ is requested and its handler registered.
> >   3) probe later fails due readl_poll_timeout() returning a timeout.
> >   4) the "pcie-sys" IRQ is unregistered.
> >   5) CONFIG_DEBUG_SHIRQ triggers a spurious interrupt.
> >   6) "pcie-client" IRQ handler is called for this spurious interrupt.
> >   7) IRQ handler tries to read PCIE_CLIENT_INT_STATUS with clocks gated.
> >   8) the machine hangs because rockchip_pcie_read() call never returns.
> > 
> > To avoid cases like this, the handlers don't have to be registered until
> > very late in the probe function, once all the resources have been setup.
> > 
> > So let's just move all the IRQ init before the pci_host_probe() call, that
> > will prevent issues like this and seems to be the correct thing to do too.
> 
> Previously we registered rockchip_pcie_subsys_irq_handler() and
> rockchip_pcie_client_irq_handler() before the PCIe clocks were
> enabled.  That's a problem because they depend on those clocks being
> enabled, and your patch fixes that.
> 
> rockchip_pcie_legacy_int_handler() depends on rockchip->irq_domain,
> which isn't initialized until rockchip_pcie_init_irq_domain().
> Previously we registered rockchip_pcie_legacy_int_handler() as the
> handler for the "legacy" IRQ before rockchip_pcie_init_irq_domain().
> 
> I think your patch *also* fixes that problem, right?

The lack of consistency in how we use
irq_set_chained_handler_and_data() really bugs me.

Your patch fixes the ordering issue where we installed
rockchip_pcie_legacy_int_handler() before initializing data
(rockchip->irq_domain) that it depends on.

But AFAICT, rockchip still has the problem that we don't *unregister*
rockchip_pcie_legacy_int_handler() when the rockchip-pcie module is
removed.  Doesn't this mean that if we unload the module, then receive 
an interrupt from the device, we'll try to call a function that is no
longer present?

> > diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c
> > index f1d08a1b159..78d04ac29cd 100644
> > --- a/drivers/pci/controller/pcie-rockchip-host.c
> > +++ b/drivers/pci/controller/pcie-rockchip-host.c
> > @@ -592,10 +592,6 @@ static int rockchip_pcie_parse_host_dt(struct rockchip_pcie *rockchip)
> >  	if (err)
> >  		return err;
> >  
> > -	err = rockchip_pcie_setup_irq(rockchip);
> > -	if (err)
> > -		return err;
> > -
> >  	rockchip->vpcie12v = devm_regulator_get_optional(dev, "vpcie12v");
> >  	if (IS_ERR(rockchip->vpcie12v)) {
> >  		if (PTR_ERR(rockchip->vpcie12v) != -ENODEV)
> > @@ -973,8 +969,6 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
> >  	if (err)
> >  		goto err_vpcie;
> >  
> > -	rockchip_pcie_enable_interrupts(rockchip);
> > -
> >  	err = rockchip_pcie_init_irq_domain(rockchip);
> >  	if (err < 0)
> >  		goto err_deinit_port;
> > @@ -992,6 +986,12 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
> >  	bridge->sysdata = rockchip;
> >  	bridge->ops = &rockchip_pcie_ops;
> >  
> > +	err = rockchip_pcie_setup_irq(rockchip);
> > +	if (err)
> > +		goto err_remove_irq_domain;
> > +
> > +	rockchip_pcie_enable_interrupts(rockchip);
> > +
> >  	err = pci_host_probe(bridge);
> >  	if (err < 0)
> >  		goto err_remove_irq_domain;
> > -- 
> > 2.31.1
> >
Javier Martinez Canillas June 29, 2021, 6:17 a.m. UTC | #12
On 6/29/21 2:38 AM, Bjorn Helgaas wrote:
> On Thu, Jun 24, 2021 at 05:40:40PM -0500, Bjorn Helgaas wrote:

[snip]

>>>
>>> So let's just move all the IRQ init before the pci_host_probe() call, that
>>> will prevent issues like this and seems to be the correct thing to do too.
>>
>> Previously we registered rockchip_pcie_subsys_irq_handler() and
>> rockchip_pcie_client_irq_handler() before the PCIe clocks were
>> enabled.  That's a problem because they depend on those clocks being
>> enabled, and your patch fixes that.
>>
>> rockchip_pcie_legacy_int_handler() depends on rockchip->irq_domain,
>> which isn't initialized until rockchip_pcie_init_irq_domain().
>> Previously we registered rockchip_pcie_legacy_int_handler() as the
>> handler for the "legacy" IRQ before rockchip_pcie_init_irq_domain().
>>
>> I think your patch *also* fixes that problem, right?
> 
> The lack of consistency in how we use
> irq_set_chained_handler_and_data() really bugs me.
> 
> Your patch fixes the ordering issue where we installed
> rockchip_pcie_legacy_int_handler() before initializing data
> (rockchip->irq_domain) that it depends on.
> 
> But AFAICT, rockchip still has the problem that we don't *unregister*
> rockchip_pcie_legacy_int_handler() when the rockchip-pcie module is
> removed.  Doesn't this mean that if we unload the module, then receive 
> an interrupt from the device, we'll try to call a function that is no
> longer present?
> 

Good question, I don't to be honest. I'll have to dig deeper on this but
my experience is that the module removal (and device unbind) is not that
well tested on ARM device drivers in general.

Best regards,
Robin Murphy June 29, 2021, 10:52 a.m. UTC | #13
On 2021-06-29 07:17, Javier Martinez Canillas wrote:
> On 6/29/21 2:38 AM, Bjorn Helgaas wrote:
>> On Thu, Jun 24, 2021 at 05:40:40PM -0500, Bjorn Helgaas wrote:
> 
> [snip]
> 
>>>>
>>>> So let's just move all the IRQ init before the pci_host_probe() call, that
>>>> will prevent issues like this and seems to be the correct thing to do too.
>>>
>>> Previously we registered rockchip_pcie_subsys_irq_handler() and
>>> rockchip_pcie_client_irq_handler() before the PCIe clocks were
>>> enabled.  That's a problem because they depend on those clocks being
>>> enabled, and your patch fixes that.
>>>
>>> rockchip_pcie_legacy_int_handler() depends on rockchip->irq_domain,
>>> which isn't initialized until rockchip_pcie_init_irq_domain().
>>> Previously we registered rockchip_pcie_legacy_int_handler() as the
>>> handler for the "legacy" IRQ before rockchip_pcie_init_irq_domain().
>>>
>>> I think your patch *also* fixes that problem, right?
>>
>> The lack of consistency in how we use
>> irq_set_chained_handler_and_data() really bugs me.
>>
>> Your patch fixes the ordering issue where we installed
>> rockchip_pcie_legacy_int_handler() before initializing data
>> (rockchip->irq_domain) that it depends on.
>>
>> But AFAICT, rockchip still has the problem that we don't *unregister*
>> rockchip_pcie_legacy_int_handler() when the rockchip-pcie module is
>> removed.  Doesn't this mean that if we unload the module, then receive
>> an interrupt from the device, we'll try to call a function that is no
>> longer present?
>>
> 
> Good question, I don't to be honest. I'll have to dig deeper on this but
> my experience is that the module removal (and device unbind) is not that
> well tested on ARM device drivers in general.

Well, it does use devm_request_irq() so the handler should be 
unregistered by devres *after* ->remove has finished, however that does 
still leave a potential race window in which a pending IRQ could be 
taken during the later part of rockchip_pcie_remove() after it has 
started turning off critical things. Unless the clocks and regulators 
can also be delegated to devres, it might be more robust to explicitly 
manage the IRQs as well. Mixing the two schemes can be problematic when 
the exact order of both setup and teardown matters.

Robin.
Bjorn Helgaas June 29, 2021, 11:14 p.m. UTC | #14
On Tue, Jun 29, 2021 at 11:52:44AM +0100, Robin Murphy wrote:
> On 2021-06-29 07:17, Javier Martinez Canillas wrote:
> > On 6/29/21 2:38 AM, Bjorn Helgaas wrote:
> > > On Thu, Jun 24, 2021 at 05:40:40PM -0500, Bjorn Helgaas wrote:
> > 
> > [snip]
> > 
> > > > > 
> > > > > So let's just move all the IRQ init before the pci_host_probe() call, that
> > > > > will prevent issues like this and seems to be the correct thing to do too.
> > > > 
> > > > Previously we registered rockchip_pcie_subsys_irq_handler() and
> > > > rockchip_pcie_client_irq_handler() before the PCIe clocks were
> > > > enabled.  That's a problem because they depend on those clocks being
> > > > enabled, and your patch fixes that.
> > > > 
> > > > rockchip_pcie_legacy_int_handler() depends on rockchip->irq_domain,
> > > > which isn't initialized until rockchip_pcie_init_irq_domain().
> > > > Previously we registered rockchip_pcie_legacy_int_handler() as the
> > > > handler for the "legacy" IRQ before rockchip_pcie_init_irq_domain().
> > > > 
> > > > I think your patch *also* fixes that problem, right?
> > > 
> > > The lack of consistency in how we use
> > > irq_set_chained_handler_and_data() really bugs me.
> > > 
> > > Your patch fixes the ordering issue where we installed
> > > rockchip_pcie_legacy_int_handler() before initializing data
> > > (rockchip->irq_domain) that it depends on.
> > > 
> > > But AFAICT, rockchip still has the problem that we don't *unregister*
> > > rockchip_pcie_legacy_int_handler() when the rockchip-pcie module is
> > > removed.  Doesn't this mean that if we unload the module, then receive
> > > an interrupt from the device, we'll try to call a function that is no
> > > longer present?
> > 
> > Good question, I don't to be honest. I'll have to dig deeper on this but
> > my experience is that the module removal (and device unbind) is not that
> > well tested on ARM device drivers in general.
> 
> Well, it does use devm_request_irq() so the handler should be unregistered
> by devres *after* ->remove has finished, however that does still leave a
> potential race window in which a pending IRQ could be taken during the later
> part of rockchip_pcie_remove() after it has started turning off critical
> things. Unless the clocks and regulators can also be delegated to devres, it
> might be more robust to explicitly manage the IRQs as well. Mixing the two
> schemes can be problematic when the exact order of both setup and teardown
> matters.

I don't understand the devm_request_irq() connection.  I'm looking at
this irq_set_chained_handler_and_data() call [1]:

  static int rockchip_pcie_setup_irq(struct rockchip_pcie *rockchip)
  {
    ...
    irq = platform_get_irq_byname(pdev, "legacy");
    irq_set_chained_handler_and_data(irq,
				     rockchip_pcie_legacy_int_handler,
				     rockchip);

    irq = platform_get_irq_byname(pdev, "client");
    ...

We look up "irq", pass it to irq_set_chained_handler_and_data(), and
throw it away without saving it anywhere.  How would anything know how
to unregister rockchip_pcie_legacy_int_handler()?

I could imagine irq_set_chained_handler_and_data() saving what's
needed for unregistration, but I would think that would require a
device pointer, which we don't give it.

I'm IRQ-illiterate, so please educate me!

Bjorn

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-rockchip-host.c?id=v5.13#n562
Robin Murphy June 30, 2021, 9:44 a.m. UTC | #15
On 2021-06-30 00:14, Bjorn Helgaas wrote:
> On Tue, Jun 29, 2021 at 11:52:44AM +0100, Robin Murphy wrote:
>> On 2021-06-29 07:17, Javier Martinez Canillas wrote:
>>> On 6/29/21 2:38 AM, Bjorn Helgaas wrote:
>>>> On Thu, Jun 24, 2021 at 05:40:40PM -0500, Bjorn Helgaas wrote:
>>>
>>> [snip]
>>>
>>>>>>
>>>>>> So let's just move all the IRQ init before the pci_host_probe() call, that
>>>>>> will prevent issues like this and seems to be the correct thing to do too.
>>>>>
>>>>> Previously we registered rockchip_pcie_subsys_irq_handler() and
>>>>> rockchip_pcie_client_irq_handler() before the PCIe clocks were
>>>>> enabled.  That's a problem because they depend on those clocks being
>>>>> enabled, and your patch fixes that.
>>>>>
>>>>> rockchip_pcie_legacy_int_handler() depends on rockchip->irq_domain,
>>>>> which isn't initialized until rockchip_pcie_init_irq_domain().
>>>>> Previously we registered rockchip_pcie_legacy_int_handler() as the
>>>>> handler for the "legacy" IRQ before rockchip_pcie_init_irq_domain().
>>>>>
>>>>> I think your patch *also* fixes that problem, right?
>>>>
>>>> The lack of consistency in how we use
>>>> irq_set_chained_handler_and_data() really bugs me.
>>>>
>>>> Your patch fixes the ordering issue where we installed
>>>> rockchip_pcie_legacy_int_handler() before initializing data
>>>> (rockchip->irq_domain) that it depends on.
>>>>
>>>> But AFAICT, rockchip still has the problem that we don't *unregister*
>>>> rockchip_pcie_legacy_int_handler() when the rockchip-pcie module is
>>>> removed.  Doesn't this mean that if we unload the module, then receive
>>>> an interrupt from the device, we'll try to call a function that is no
>>>> longer present?
>>>
>>> Good question, I don't to be honest. I'll have to dig deeper on this but
>>> my experience is that the module removal (and device unbind) is not that
>>> well tested on ARM device drivers in general.
>>
>> Well, it does use devm_request_irq() so the handler should be unregistered
>> by devres *after* ->remove has finished, however that does still leave a
>> potential race window in which a pending IRQ could be taken during the later
>> part of rockchip_pcie_remove() after it has started turning off critical
>> things. Unless the clocks and regulators can also be delegated to devres, it
>> might be more robust to explicitly manage the IRQs as well. Mixing the two
>> schemes can be problematic when the exact order of both setup and teardown
>> matters.
> 
> I don't understand the devm_request_irq() connection.  I'm looking at
> this irq_set_chained_handler_and_data() call [1]:
> 
>    static int rockchip_pcie_setup_irq(struct rockchip_pcie *rockchip)
>    {
>      ...
>      irq = platform_get_irq_byname(pdev, "legacy");
>      irq_set_chained_handler_and_data(irq,
> 				     rockchip_pcie_legacy_int_handler,
> 				     rockchip);
> 
>      irq = platform_get_irq_byname(pdev, "client");
>      ...
> 
> We look up "irq", pass it to irq_set_chained_handler_and_data(), and
> throw it away without saving it anywhere.  How would anything know how
> to unregister rockchip_pcie_legacy_int_handler()?
> 
> I could imagine irq_set_chained_handler_and_data() saving what's
> needed for unregistration, but I would think that would require a
> device pointer, which we don't give it.
> 
> I'm IRQ-illiterate, so please educate me!

Oh, my mistake, I was looking at the "sys" and "client" IRQs and spoke 
too soon; the "legacy" IRQ does appear to be completely leaked as you 
say, so there are two degrees of issue here.

Robin.

> 
> Bjorn
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pcie-rockchip-host.c?id=v5.13#n562
>
Bjorn Helgaas June 30, 2021, 6:49 p.m. UTC | #16
On Tue, Jun 29, 2021 at 11:52:44AM +0100, Robin Murphy wrote:

> Well, it does use devm_request_irq() so the handler should be unregistered
> by devres *after* ->remove has finished, however that does still leave a
> potential race window in which a pending IRQ could be taken during the later
> part of rockchip_pcie_remove() after it has started turning off critical
> things. Unless the clocks and regulators can also be delegated to devres, it
> might be more robust to explicitly manage the IRQs as well. Mixing the two
> schemes can be problematic when the exact order of both setup and teardown
> matters.

Thanks for this.  I missed this problem.  We have lots of PCI
controller drivers that use some devm interfaces but use the non-devm
clk_prepare_enable(), and they generally turn things off in their
.remove() methods, which is obviously before any devm unregistration.

Many .suspend() methods turn off clocks and power while leaving the
IRQ handler installed.  Should we get an interrupt from our device
after .suspend()?  *Probably* not, but it makes me a little queasy to
rely on that, or to rely on the assumption that the IRQ is not shared.

Bjorn
Bjorn Helgaas June 30, 2021, 6:59 p.m. UTC | #17
[+cc Michal, Jingoo, Thierry, Jonathan]

On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
> IRQ handlers that are registered for shared interrupts can be called at
> any time after have been registered using the request_irq() function.
> 
> It's up to drivers to ensure that's always safe for these to be called.
> 
> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
> their handlers are registered very early in the probe function, an error
> later can lead to these handlers being executed before all the required
> resources have been properly setup.
> 
> For example, the rockchip_pcie_read() function used by these IRQ handlers
> expects that some PCIe clocks will already be enabled, otherwise trying
> to access the PCIe registers causes the read to hang and never return.
> 
> The CONFIG_DEBUG_SHIRQ option tests if drivers are able to cope with their
> shared interrupt handlers being called, by generating a spurious interrupt
> just before a shared interrupt handler is unregistered.
> 
> But this means that if the option is enabled, any error in the probe path
> of this driver could lead to one of the IRQ handlers to be executed.
> 
> In a rockpro64 board, the following sequence of events happens:
> 
>   1) "pcie-sys" IRQ is requested and its handler registered.
>   2) "pcie-client" IRQ is requested and its handler registered.
>   3) probe later fails due readl_poll_timeout() returning a timeout.
>   4) the "pcie-sys" IRQ is unregistered.
>   5) CONFIG_DEBUG_SHIRQ triggers a spurious interrupt.
>   6) "pcie-client" IRQ handler is called for this spurious interrupt.
>   7) IRQ handler tries to read PCIE_CLIENT_INT_STATUS with clocks gated.
>   8) the machine hangs because rockchip_pcie_read() call never returns.
> 
> To avoid cases like this, the handlers don't have to be registered until
> very late in the probe function, once all the resources have been setup.
> 
> So let's just move all the IRQ init before the pci_host_probe() call, that
> will prevent issues like this and seems to be the correct thing to do too.
> 
> Reported-by: Peter Robinson <pbrobinson@gmail.com>
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> Acked-by: Shawn Lin <shawn.lin@rock-chips.com>

I think the above commit log is perfectly accurate, but all the
details might suggest that this is something specific to rockchip or
CONFIG_DEBUG_SHIRQ, which it isn't, and they might obscure the
fundamental problem, which is actually very simple: we registered IRQ
handlers before we were ready for them to be called.

I propose the following commit log in the hope that it would help
other driver authors to make similar fixes:

    PCI: rockchip: Register IRQ handlers after device and data are ready

    An IRQ handler may be called at any time after it is registered, so
    anything it relies on must be ready before registration.

    rockchip_pcie_subsys_irq_handler() and rockchip_pcie_client_irq_handler()
    read registers in the PCIe controller, but we registered them before
    turning on clocks to the controller.  If either is called before the clocks
    are turned on, the register reads fail and the machine hangs.

    Similarly, rockchip_pcie_legacy_int_handler() uses rockchip->irq_domain,
    but we installed it before initializing irq_domain.

    Register IRQ handlers after their data structures are initialized and
    clocks are enabled.

If this is inaccurate or omits something important, let me know.  I
can make any updates locally.

Bjorn

> ---
> 
> Changes in v2:
> - Add missing word in the commit message.
> - Include Shawn Lin's Acked-by tag.
> 
>  drivers/pci/controller/pcie-rockchip-host.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c
> index f1d08a1b159..78d04ac29cd 100644
> --- a/drivers/pci/controller/pcie-rockchip-host.c
> +++ b/drivers/pci/controller/pcie-rockchip-host.c
> @@ -592,10 +592,6 @@ static int rockchip_pcie_parse_host_dt(struct rockchip_pcie *rockchip)
>  	if (err)
>  		return err;
>  
> -	err = rockchip_pcie_setup_irq(rockchip);
> -	if (err)
> -		return err;
> -
>  	rockchip->vpcie12v = devm_regulator_get_optional(dev, "vpcie12v");
>  	if (IS_ERR(rockchip->vpcie12v)) {
>  		if (PTR_ERR(rockchip->vpcie12v) != -ENODEV)
> @@ -973,8 +969,6 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
>  	if (err)
>  		goto err_vpcie;
>  
> -	rockchip_pcie_enable_interrupts(rockchip);
> -
>  	err = rockchip_pcie_init_irq_domain(rockchip);
>  	if (err < 0)
>  		goto err_deinit_port;
> @@ -992,6 +986,12 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
>  	bridge->sysdata = rockchip;
>  	bridge->ops = &rockchip_pcie_ops;
>  
> +	err = rockchip_pcie_setup_irq(rockchip);
> +	if (err)
> +		goto err_remove_irq_domain;
> +
> +	rockchip_pcie_enable_interrupts(rockchip);
> +
>  	err = pci_host_probe(bridge);
>  	if (err < 0)
>  		goto err_remove_irq_domain;
> -- 
> 2.31.1
>
Javier Martinez Canillas June 30, 2021, 7:59 p.m. UTC | #18
On 6/30/21 8:59 PM, Bjorn Helgaas wrote:
> [+cc Michal, Jingoo, Thierry, Jonathan]

[snip]

> 
> I think the above commit log is perfectly accurate, but all the
> details might suggest that this is something specific to rockchip or
> CONFIG_DEBUG_SHIRQ, which it isn't, and they might obscure the
> fundamental problem, which is actually very simple: we registered IRQ
> handlers before we were ready for them to be called.
> 
> I propose the following commit log in the hope that it would help
> other driver authors to make similar fixes:
>
>     PCI: rockchip: Register IRQ handlers after device and data are ready
> 
>     An IRQ handler may be called at any time after it is registered, so
>     anything it relies on must be ready before registration.
> 
>     rockchip_pcie_subsys_irq_handler() and rockchip_pcie_client_irq_handler()
>     read registers in the PCIe controller, but we registered them before
>     turning on clocks to the controller.  If either is called before the clocks
>     are turned on, the register reads fail and the machine hangs.
> 
>     Similarly, rockchip_pcie_legacy_int_handler() uses rockchip->irq_domain,
>     but we installed it before initializing irq_domain.
> 
>     Register IRQ handlers after their data structures are initialized and
>     clocks are enabled.
> 
> If this is inaccurate or omits something important, let me know.  I
> can make any updates locally.
>

I think your description is accurate and agree that the commit message may
be misleading. As you said, this is a general problem and the fact that an
IRQ is shared and CONFIG_DEBUG_SHIRQ fires a spurious interrupt just make
the assumptions in the driver to fall apart.

But maybe you can also add a paragraph that mentions the CONFIG_DEBUG_SHIRQ
option and shared interrupts? That way, other driver authors could know that
by enabling this an underlying problem might be exposed for them to fix.

Best regards,
Bjorn Helgaas June 30, 2021, 8:30 p.m. UTC | #19
On Wed, Jun 30, 2021 at 09:59:58PM +0200, Javier Martinez Canillas wrote:
> On 6/30/21 8:59 PM, Bjorn Helgaas wrote:
> > [+cc Michal, Jingoo, Thierry, Jonathan]
> 
> [snip]
> 
> > 
> > I think the above commit log is perfectly accurate, but all the
> > details might suggest that this is something specific to rockchip or
> > CONFIG_DEBUG_SHIRQ, which it isn't, and they might obscure the
> > fundamental problem, which is actually very simple: we registered IRQ
> > handlers before we were ready for them to be called.
> > 
> > I propose the following commit log in the hope that it would help
> > other driver authors to make similar fixes:
> >
> >     PCI: rockchip: Register IRQ handlers after device and data are ready
> > 
> >     An IRQ handler may be called at any time after it is registered, so
> >     anything it relies on must be ready before registration.
> > 
> >     rockchip_pcie_subsys_irq_handler() and rockchip_pcie_client_irq_handler()
> >     read registers in the PCIe controller, but we registered them before
> >     turning on clocks to the controller.  If either is called before the clocks
> >     are turned on, the register reads fail and the machine hangs.
> > 
> >     Similarly, rockchip_pcie_legacy_int_handler() uses rockchip->irq_domain,
> >     but we installed it before initializing irq_domain.
> > 
> >     Register IRQ handlers after their data structures are initialized and
> >     clocks are enabled.
> > 
> > If this is inaccurate or omits something important, let me know.  I
> > can make any updates locally.
> >
> 
> I think your description is accurate and agree that the commit message may
> be misleading. As you said, this is a general problem and the fact that an
> IRQ is shared and CONFIG_DEBUG_SHIRQ fires a spurious interrupt just make
> the assumptions in the driver to fall apart.
> 
> But maybe you can also add a paragraph that mentions the CONFIG_DEBUG_SHIRQ
> option and shared interrupts? That way, other driver authors could know that
> by enabling this an underlying problem might be exposed for them to fix.

Good idea, thanks!  I added this; is it something like what you had in
mind?

    Found by enabling CONFIG_DEBUG_SHIRQ, which calls the IRQ handler when it
    is being unregistered.  An error during the probe path might cause this
    unregistration and IRQ handler execution before the device or data
    structure init has finished.

Bjorn
Peter Robinson June 30, 2021, 8:46 p.m. UTC | #20
On Wed, Jun 30, 2021 at 9:30 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Jun 30, 2021 at 09:59:58PM +0200, Javier Martinez Canillas wrote:
> > On 6/30/21 8:59 PM, Bjorn Helgaas wrote:
> > > [+cc Michal, Jingoo, Thierry, Jonathan]
> >
> > [snip]
> >
> > >
> > > I think the above commit log is perfectly accurate, but all the
> > > details might suggest that this is something specific to rockchip or
> > > CONFIG_DEBUG_SHIRQ, which it isn't, and they might obscure the
> > > fundamental problem, which is actually very simple: we registered IRQ
> > > handlers before we were ready for them to be called.
> > >
> > > I propose the following commit log in the hope that it would help
> > > other driver authors to make similar fixes:
> > >
> > >     PCI: rockchip: Register IRQ handlers after device and data are ready
> > >
> > >     An IRQ handler may be called at any time after it is registered, so
> > >     anything it relies on must be ready before registration.
> > >
> > >     rockchip_pcie_subsys_irq_handler() and rockchip_pcie_client_irq_handler()
> > >     read registers in the PCIe controller, but we registered them before
> > >     turning on clocks to the controller.  If either is called before the clocks
> > >     are turned on, the register reads fail and the machine hangs.
> > >
> > >     Similarly, rockchip_pcie_legacy_int_handler() uses rockchip->irq_domain,
> > >     but we installed it before initializing irq_domain.
> > >
> > >     Register IRQ handlers after their data structures are initialized and
> > >     clocks are enabled.
> > >
> > > If this is inaccurate or omits something important, let me know.  I
> > > can make any updates locally.
> > >
> >
> > I think your description is accurate and agree that the commit message may
> > be misleading. As you said, this is a general problem and the fact that an
> > IRQ is shared and CONFIG_DEBUG_SHIRQ fires a spurious interrupt just make
> > the assumptions in the driver to fall apart.
> >
> > But maybe you can also add a paragraph that mentions the CONFIG_DEBUG_SHIRQ
> > option and shared interrupts? That way, other driver authors could know that
> > by enabling this an underlying problem might be exposed for them to fix.
>
> Good idea, thanks!  I added this; is it something like what you had in
> mind?
>
>     Found by enabling CONFIG_DEBUG_SHIRQ, which calls the IRQ handler when it
>     is being unregistered.  An error during the probe path might cause this
>     unregistration and IRQ handler execution before the device or data
>     structure init has finished.

Would it make sense to enable CONFIG_DEBUG_SHIRQ in defconfig to
better pick up these problems?

Peter
Javier Martinez Canillas June 30, 2021, 10:09 p.m. UTC | #21
On 6/30/21 10:30 PM, Bjorn Helgaas wrote:
> On Wed, Jun 30, 2021 at 09:59:58PM +0200, Javier Martinez Canillas wrote:

[snip]

>>
>> But maybe you can also add a paragraph that mentions the CONFIG_DEBUG_SHIRQ
>> option and shared interrupts? That way, other driver authors could know that
>> by enabling this an underlying problem might be exposed for them to fix.
> 
> Good idea, thanks!  I added this; is it something like what you had in
> mind?
> 

Thanks a lot for doing this rewording. I just have a small nit for the text.

>     Found by enabling CONFIG_DEBUG_SHIRQ, which calls the IRQ handler when it
>     is being unregistered.  An error during the probe path might cause this
>     unregistration and IRQ handler execution before the device or data
>     structure init has finished.
> 

The IRQ handler is not called when unregistered, but it is called when another
handler for the shared IRQ is unregistered. In this particular driver, both a
"pcie-sys" and "pcie-client" handlers are registered, then an error leads to
"pcie-sys" being unregistered and the handler for "pcie-client" being called.

So maybe the following instead?

    Found by enabling CONFIG_DEBUG_SHIRQ, which calls the IRQ handlers when a
    handler for the shared IRQ is unregistered. An error during the probe path
    might cause this unregistration and handler execution before the device or
    data structure init has finished.

Best regards,
Bjorn Helgaas July 1, 2021, 1:59 p.m. UTC | #22
On Thu, Jul 01, 2021 at 12:09:58AM +0200, Javier Martinez Canillas wrote:
> On 6/30/21 10:30 PM, Bjorn Helgaas wrote:
> > On Wed, Jun 30, 2021 at 09:59:58PM +0200, Javier Martinez Canillas wrote:
> 
> [snip]
> 
> >> But maybe you can also add a paragraph that mentions the
> >> CONFIG_DEBUG_SHIRQ option and shared interrupts? That way, other
> >> driver authors could know that by enabling this an underlying
> >> problem might be exposed for them to fix.
> > 
> > Good idea, thanks!  I added this; is it something like what you
> > had in mind?
> 
> Thanks a lot for doing this rewording. I just have a small nit for
> the text.
> 
> >     Found by enabling CONFIG_DEBUG_SHIRQ, which calls the IRQ
> >     handler when it is being unregistered.  An error during the
> >     probe path might cause this unregistration and IRQ handler
> >     execution before the device or data structure init has
> >     finished.
> 
> The IRQ handler is not called when unregistered, but it is called
> when another handler for the shared IRQ is unregistered. In this
> particular driver, both a "pcie-sys" and "pcie-client" handlers are
> registered, then an error leads to "pcie-sys" being unregistered and
> the handler for "pcie-client" being called.

Is this really true?  I think that would mean CONFIG_DEBUG_SHIRQ would
not find this kind of bug unless we actually registered two or more
handlers for the shared IRQ, but it's still a bug even only one
handler is registered.

Looking at __free_irq() [1], my impression is that "action" is what
we're removing and action->handler() is the IRQ handler we call when
CONFIG_DEBUG_SHIRQ, so it doesn't look like it's calling the remaining
handlers after removing one of them.

> So maybe the following instead?
> 
>     Found by enabling CONFIG_DEBUG_SHIRQ, which calls the IRQ
>     handlers when a handler for the shared IRQ is unregistered. An
>     error during the probe path might cause this unregistration and
>     handler execution before the device or data structure init has
>     finished.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq/manage.c?id=v5.13#n1805
Javier Martinez Canillas July 1, 2021, 2:59 p.m. UTC | #23
On 7/1/21 3:59 PM, Bjorn Helgaas wrote:

[snip]

>> The IRQ handler is not called when unregistered, but it is called
>> when another handler for the shared IRQ is unregistered. In this
>> particular driver, both a "pcie-sys" and "pcie-client" handlers are
>> registered, then an error leads to "pcie-sys" being unregistered and
>> the handler for "pcie-client" being called.
> 
> Is this really true?  I think that would mean CONFIG_DEBUG_SHIRQ would
> not find this kind of bug unless we actually registered two or more
> handlers for the shared IRQ, but it's still a bug even only one
> handler is registered.
> 
> Looking at __free_irq() [1], my impression is that "action" is what
> we're removing and action->handler() is the IRQ handler we call when
> CONFIG_DEBUG_SHIRQ, so it doesn't look like it's calling the remaining
> handlers after removing one of them.
> 

Oh, you are completely right. I wrongly assumed that it was for the other
registered IRQ handlers but reading the source is clearly how you say it.

I now wonder why when debugging this I saw that the "pcie-client" handler
was called when "pcie-sys" was unregistered...

But anyways, you are correct and I'm OK with the text you shared.

Best regards,
diff mbox series

Patch

diff --git a/drivers/pci/controller/pcie-rockchip-host.c b/drivers/pci/controller/pcie-rockchip-host.c
index f1d08a1b159..78d04ac29cd 100644
--- a/drivers/pci/controller/pcie-rockchip-host.c
+++ b/drivers/pci/controller/pcie-rockchip-host.c
@@ -592,10 +592,6 @@  static int rockchip_pcie_parse_host_dt(struct rockchip_pcie *rockchip)
 	if (err)
 		return err;
 
-	err = rockchip_pcie_setup_irq(rockchip);
-	if (err)
-		return err;
-
 	rockchip->vpcie12v = devm_regulator_get_optional(dev, "vpcie12v");
 	if (IS_ERR(rockchip->vpcie12v)) {
 		if (PTR_ERR(rockchip->vpcie12v) != -ENODEV)
@@ -973,8 +969,6 @@  static int rockchip_pcie_probe(struct platform_device *pdev)
 	if (err)
 		goto err_vpcie;
 
-	rockchip_pcie_enable_interrupts(rockchip);
-
 	err = rockchip_pcie_init_irq_domain(rockchip);
 	if (err < 0)
 		goto err_deinit_port;
@@ -992,6 +986,12 @@  static int rockchip_pcie_probe(struct platform_device *pdev)
 	bridge->sysdata = rockchip;
 	bridge->ops = &rockchip_pcie_ops;
 
+	err = rockchip_pcie_setup_irq(rockchip);
+	if (err)
+		goto err_remove_irq_domain;
+
+	rockchip_pcie_enable_interrupts(rockchip);
+
 	err = pci_host_probe(bridge);
 	if (err < 0)
 		goto err_remove_irq_domain;