diff mbox series

net: txgbe: fix GPIO interrupt blocking

Message ID 20240206070824.17460-1-jiawenwu@trustnetic.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: txgbe: fix GPIO interrupt blocking | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1048 this patch: 1048
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 1065 this patch: 1065
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1065 this patch: 1065
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 57 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-02-07--00-00 (tests: 684)

Commit Message

Jiawen Wu Feb. 6, 2024, 7:08 a.m. UTC
GPIO interrupt is generated before MAC IRQ is enabled, it causes
subsequent GPIO interrupts that can no longer be reported if it is
not cleared in time. So clear GPIO interrupt status at the right
time. And executing function txgbe_gpio_irq_ack() manually since
handle_nested_irq() does not call .irq_ack for irq_chip.

Fixes: aefd013624a1 ("net: txgbe: use irq_domain for interrupt controller")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
---
 .../net/ethernet/wangxun/txgbe/txgbe_main.c   |  1 +
 .../net/ethernet/wangxun/txgbe/txgbe_phy.c    | 29 +++++++++++++++++++
 .../net/ethernet/wangxun/txgbe/txgbe_phy.h    |  1 +
 3 files changed, 31 insertions(+)

Comments

Andrew Lunn Feb. 6, 2024, 3:29 p.m. UTC | #1
On Tue, Feb 06, 2024 at 03:08:24PM +0800, Jiawen Wu wrote:
> GPIO interrupt is generated before MAC IRQ is enabled, it causes
> subsequent GPIO interrupts that can no longer be reported if it is
> not cleared in time. So clear GPIO interrupt status at the right
> time.

This does not sound correct. Since this is an interrupt controller, it
is a level interrupt. If its not cleared, as soon as the parent
interrupt is re-enabled, is should cause another interrupt at the
parent level. Servicing that interrupt, should case a descent to the
child, which will service the interrupt, and atomically clear the
interrupt status.

Is something wrong here, like you are using edge interrupts, not
level?

> And executing function txgbe_gpio_irq_ack() manually since
> handle_nested_irq() does not call .irq_ack for irq_chip.

I don't know the interrupt code too well, so could you explain this in
more detail. Your explanation sounds odd to me.

What is the big picture problem here? Do you have the PHY interrupt
connected to a GPIO and you are loosing PHY interrupts?

	Andrew
Jiawen Wu Feb. 18, 2024, 9:04 a.m. UTC | #2
On Tue, Feb 6, 2024 11:29 PM, Andrew Lunn wrote:
> On Tue, Feb 06, 2024 at 03:08:24PM +0800, Jiawen Wu wrote:
> > GPIO interrupt is generated before MAC IRQ is enabled, it causes
> > subsequent GPIO interrupts that can no longer be reported if it is
> > not cleared in time. So clear GPIO interrupt status at the right
> > time.
> 
> This does not sound correct. Since this is an interrupt controller, it
> is a level interrupt. If its not cleared, as soon as the parent
> interrupt is re-enabled, is should cause another interrupt at the
> parent level. Servicing that interrupt, should case a descent to the
> child, which will service the interrupt, and atomically clear the
> interrupt status.
> 
> Is something wrong here, like you are using edge interrupts, not
> level?

Yes, it is edge interrupt.

> 
> > And executing function txgbe_gpio_irq_ack() manually since
> > handle_nested_irq() does not call .irq_ack for irq_chip.
> 
> I don't know the interrupt code too well, so could you explain this in
> more detail. Your explanation sounds odd to me.

This is because I changed the interrupt controller in
https://git.kernel.org/netdev/net-next/c/aefd013624a1.
In the previous interrupt controller, .irq_ack in struct irq_chip is called
to clear the interrupt after the GPIO interrupt is handled. But I found
that in the current interrupt controller, this .irq_ack is not called. Maybe
I don't know enough about this interrupt code, I have to manually add
txgbe_gpio_irq_ack() to clear the interrupt in the handler.

> 
> What is the big picture problem here? Do you have the PHY interrupt
> connected to a GPIO and you are loosing PHY interrupts?

No, PHY interrupt is connected to the LINK UP/DOWN filed in the MAC
interrupt. The problem I encountered was that the GPIO interrupts were
not cleaned up in time and could not continue to generate the next GPIO
interrupt.
Andrew Lunn Feb. 18, 2024, 4:44 p.m. UTC | #3
On Sun, Feb 18, 2024 at 05:04:52PM +0800, Jiawen Wu wrote:
> On Tue, Feb 6, 2024 11:29 PM, Andrew Lunn wrote:
> > On Tue, Feb 06, 2024 at 03:08:24PM +0800, Jiawen Wu wrote:
> > > GPIO interrupt is generated before MAC IRQ is enabled, it causes
> > > subsequent GPIO interrupts that can no longer be reported if it is
> > > not cleared in time. So clear GPIO interrupt status at the right
> > > time.
> > 
> > This does not sound correct. Since this is an interrupt controller, it
> > is a level interrupt. If its not cleared, as soon as the parent
> > interrupt is re-enabled, is should cause another interrupt at the
> > parent level. Servicing that interrupt, should case a descent to the
> > child, which will service the interrupt, and atomically clear the
> > interrupt status.
> > 
> > Is something wrong here, like you are using edge interrupts, not
> > level?
> 
> Yes, it is edge interrupt.

So fix this first, use level interrupts.

> > > And executing function txgbe_gpio_irq_ack() manually since
> > > handle_nested_irq() does not call .irq_ack for irq_chip.
> > 
> > I don't know the interrupt code too well, so could you explain this in
> > more detail. Your explanation sounds odd to me.
> 
> This is because I changed the interrupt controller in
> https://git.kernel.org/netdev/net-next/c/aefd013624a1.
> In the previous interrupt controller, .irq_ack in struct irq_chip is called
> to clear the interrupt after the GPIO interrupt is handled. But I found
> that in the current interrupt controller, this .irq_ack is not called. Maybe
> I don't know enough about this interrupt code, I have to manually add
> txgbe_gpio_irq_ack() to clear the interrupt in the handler.

You should dig deeper into interrupts.
[goes and digs]

https://elixir.bootlin.com/linux/latest/source/include/linux/irq.h#L461
 * @irq_ack:		start of a new interrupt

The comment makes it sound like irq_ack will be the first callback
used when handling an interrupt.

static inline void mask_ack_irq(struct irq_desc *desc)
{
        if (desc->irq_data.chip->irq_mask_ack) {
                desc->irq_data.chip->irq_mask_ack(&desc->irq_data);
                irq_state_set_masked(desc);
        } else {
                mask_irq(desc);
                if (desc->irq_data.chip->irq_ack)
                        desc->irq_data.chip->irq_ack(&desc->irq_data);
        }
}

So the comment might be a little misleading. It will first mask the
interrupt, and then ack it.

/**
 *      handle_level_irq - Level type irq handler
 *      @desc:  the interrupt description structure for this irq
 *
 *      Level type interrupts are active as long as the hardware line has
 *      the active level. This may require to mask the interrupt and unmask
 *      it after the associated handler has acknowledged the device, so the
 *      interrupt line is back to inactive.
 */
void handle_level_irq(struct irq_desc *desc)
{
        raw_spin_lock(&desc->lock);
        mask_ack_irq(desc);

So when handling a level interrupt, mask and then ack is the first
thing done. And it is unconditional.

edge interrupts are different:

/**
 *      handle_edge_irq - edge type IRQ handler
 *      @desc:  the interrupt description structure for this irq
 *
 *      Interrupt occurs on the falling and/or rising edge of a hardware
 *      signal. The occurrence is latched into the irq controller hardware
 *      and must be acked in order to be reenabled. After the ack another
 *      interrupt can happen on the same source even before the first one
 *      is handled by the associated event handler. If this happens it
 *      might be necessary to disable (mask) the interrupt depending on the
 *      controller hardware. This requires to reenable the interrupt inside
 *      of the loop which handles the interrupts which have arrived while
 *      the handler was running. If all pending interrupts are handled, the
 *      loop is left.
 */
void handle_edge_irq(struct irq_desc *desc)
{
        raw_spin_lock(&desc->lock);

        desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);

        if (!irq_may_run(desc)) {
                desc->istate |= IRQS_PENDING;
                mask_ack_irq(desc);
                goto out_unlock;
        }

        /*
         * If its disabled or no action available then mask it and get
         * out of here.
         */
        if (irqd_irq_disabled(&desc->irq_data) || !desc->action) {
                desc->istate |= IRQS_PENDING;
                mask_ack_irq(desc);
                goto out_unlock;
        }

       /* Start handling the irq */
        desc->irq_data.chip->irq_ack(&desc->irq_data);

So if the interrupt handler is already running, it will mask and ack
it, but not handle it, since there is already a handler running.
Otherwise it will ack the interrupt and then handle it. And it loops
handling the interrupt while IRQS_PENDING is set, i.e. another
interrupt has arrived while the handler was running.

I would suggest you first get it using level interrupts, and then dig
into how level interrupts are used, which do appear to be simpler than
edge.

	Andrew
Jiawen Wu Feb. 20, 2024, 9:25 a.m. UTC | #4
On Mon, Feb 19, 2024 12:45 AM, Andrew Lunn wrote:
> On Sun, Feb 18, 2024 at 05:04:52PM +0800, Jiawen Wu wrote:
> > On Tue, Feb 6, 2024 11:29 PM, Andrew Lunn wrote:
> > > On Tue, Feb 06, 2024 at 03:08:24PM +0800, Jiawen Wu wrote:
> > > > GPIO interrupt is generated before MAC IRQ is enabled, it causes
> > > > subsequent GPIO interrupts that can no longer be reported if it is
> > > > not cleared in time. So clear GPIO interrupt status at the right
> > > > time.
> > >
> > > This does not sound correct. Since this is an interrupt controller, it
> > > is a level interrupt. If its not cleared, as soon as the parent
> > > interrupt is re-enabled, is should cause another interrupt at the
> > > parent level. Servicing that interrupt, should case a descent to the
> > > child, which will service the interrupt, and atomically clear the
> > > interrupt status.
> > >
> > > Is something wrong here, like you are using edge interrupts, not
> > > level?
> >
> > Yes, it is edge interrupt.
> 
> So fix this first, use level interrupts.

I have a question here.

I've been setting the interrupt type in chip->irq_set_type. The 'type' is
passed as IRQ_TYPE_EDGE_BOTH. Then I config GPIO registers based on
this type, and use edge interrupts. Who decides this type? Can I change
it at will?

> 
> > > > And executing function txgbe_gpio_irq_ack() manually since
> > > > handle_nested_irq() does not call .irq_ack for irq_chip.
> > >
> > > I don't know the interrupt code too well, so could you explain this in
> > > more detail. Your explanation sounds odd to me.
> >
> > This is because I changed the interrupt controller in
> > https://git.kernel.org/netdev/net-next/c/aefd013624a1.
> > In the previous interrupt controller, .irq_ack in struct irq_chip is called
> > to clear the interrupt after the GPIO interrupt is handled. But I found
> > that in the current interrupt controller, this .irq_ack is not called. Maybe
> > I don't know enough about this interrupt code, I have to manually add
> > txgbe_gpio_irq_ack() to clear the interrupt in the handler.
> 
> You should dig deeper into interrupts.
> [goes and digs]
> 
> https://elixir.bootlin.com/linux/latest/source/include/linux/irq.h#L461
>  * @irq_ack:		start of a new interrupt
> 
> The comment makes it sound like irq_ack will be the first callback
> used when handling an interrupt.
> 
> static inline void mask_ack_irq(struct irq_desc *desc)
> {
>         if (desc->irq_data.chip->irq_mask_ack) {
>                 desc->irq_data.chip->irq_mask_ack(&desc->irq_data);
>                 irq_state_set_masked(desc);
>         } else {
>                 mask_irq(desc);
>                 if (desc->irq_data.chip->irq_ack)
>                         desc->irq_data.chip->irq_ack(&desc->irq_data);
>         }
> }
> 
> So the comment might be a little misleading. It will first mask the
> interrupt, and then ack it.
> 
> /**
>  *      handle_level_irq - Level type irq handler
>  *      @desc:  the interrupt description structure for this irq
>  *
>  *      Level type interrupts are active as long as the hardware line has
>  *      the active level. This may require to mask the interrupt and unmask
>  *      it after the associated handler has acknowledged the device, so the
>  *      interrupt line is back to inactive.
>  */
> void handle_level_irq(struct irq_desc *desc)
> {
>         raw_spin_lock(&desc->lock);
>         mask_ack_irq(desc);
> 
> So when handling a level interrupt, mask and then ack is the first
> thing done. And it is unconditional.
> 
> edge interrupts are different:
> 
> /**
>  *      handle_edge_irq - edge type IRQ handler
>  *      @desc:  the interrupt description structure for this irq
>  *
>  *      Interrupt occurs on the falling and/or rising edge of a hardware
>  *      signal. The occurrence is latched into the irq controller hardware
>  *      and must be acked in order to be reenabled. After the ack another
>  *      interrupt can happen on the same source even before the first one
>  *      is handled by the associated event handler. If this happens it
>  *      might be necessary to disable (mask) the interrupt depending on the
>  *      controller hardware. This requires to reenable the interrupt inside
>  *      of the loop which handles the interrupts which have arrived while
>  *      the handler was running. If all pending interrupts are handled, the
>  *      loop is left.
>  */
> void handle_edge_irq(struct irq_desc *desc)
> {
>         raw_spin_lock(&desc->lock);
> 
>         desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
> 
>         if (!irq_may_run(desc)) {
>                 desc->istate |= IRQS_PENDING;
>                 mask_ack_irq(desc);
>                 goto out_unlock;
>         }
> 
>         /*
>          * If its disabled or no action available then mask it and get
>          * out of here.
>          */
>         if (irqd_irq_disabled(&desc->irq_data) || !desc->action) {
>                 desc->istate |= IRQS_PENDING;
>                 mask_ack_irq(desc);
>                 goto out_unlock;
>         }
> 
>        /* Start handling the irq */
>         desc->irq_data.chip->irq_ack(&desc->irq_data);
> 
> So if the interrupt handler is already running, it will mask and ack
> it, but not handle it, since there is already a handler running.
> Otherwise it will ack the interrupt and then handle it. And it loops
> handling the interrupt while IRQS_PENDING is set, i.e. another
> interrupt has arrived while the handler was running.
> 
> I would suggest you first get it using level interrupts, and then dig
> into how level interrupts are used, which do appear to be simpler than
> edge.
> 
> 	Andrew
>
Andrew Lunn Feb. 21, 2024, 2:35 p.m. UTC | #5
On Tue, Feb 20, 2024 at 05:25:26PM +0800, Jiawen Wu wrote:
> On Mon, Feb 19, 2024 12:45 AM, Andrew Lunn wrote:
> > On Sun, Feb 18, 2024 at 05:04:52PM +0800, Jiawen Wu wrote:
> > > On Tue, Feb 6, 2024 11:29 PM, Andrew Lunn wrote:
> > > > On Tue, Feb 06, 2024 at 03:08:24PM +0800, Jiawen Wu wrote:
> > > > > GPIO interrupt is generated before MAC IRQ is enabled, it causes
> > > > > subsequent GPIO interrupts that can no longer be reported if it is
> > > > > not cleared in time. So clear GPIO interrupt status at the right
> > > > > time.
> > > >
> > > > This does not sound correct. Since this is an interrupt controller, it
> > > > is a level interrupt. If its not cleared, as soon as the parent
> > > > interrupt is re-enabled, is should cause another interrupt at the
> > > > parent level. Servicing that interrupt, should case a descent to the
> > > > child, which will service the interrupt, and atomically clear the
> > > > interrupt status.
> > > >
> > > > Is something wrong here, like you are using edge interrupts, not
> > > > level?
> > >
> > > Yes, it is edge interrupt.
> > 
> > So fix this first, use level interrupts.
> 
> I have a question here.
> 
> I've been setting the interrupt type in chip->irq_set_type. The 'type' is
> passed as IRQ_TYPE_EDGE_BOTH. Then I config GPIO registers based on
> this type, and use edge interrupts. Who decides this type? Can I change
> it at will?

There are a few different mechanism. In DT you can specify it as part
of the phandle reference. You can also pass flags to

request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,
	    const char *name, void *dev)

#define IRQF_TRIGGER_NONE	0x00000000
#define IRQF_TRIGGER_RISING	0x00000001
#define IRQF_TRIGGER_FALLING	0x00000002
#define IRQF_TRIGGER_HIGH	0x00000004
#define IRQF_TRIGGER_LOW	0x00000008

	Andrew
Jiawen Wu Feb. 22, 2024, 1:56 a.m. UTC | #6
On Wed, Feb 21, 2024 10:36 PM, Andrew Lunn wrote:
> On Tue, Feb 20, 2024 at 05:25:26PM +0800, Jiawen Wu wrote:
> > On Mon, Feb 19, 2024 12:45 AM, Andrew Lunn wrote:
> > > On Sun, Feb 18, 2024 at 05:04:52PM +0800, Jiawen Wu wrote:
> > > > On Tue, Feb 6, 2024 11:29 PM, Andrew Lunn wrote:
> > > > > On Tue, Feb 06, 2024 at 03:08:24PM +0800, Jiawen Wu wrote:
> > > > > > GPIO interrupt is generated before MAC IRQ is enabled, it causes
> > > > > > subsequent GPIO interrupts that can no longer be reported if it is
> > > > > > not cleared in time. So clear GPIO interrupt status at the right
> > > > > > time.
> > > > >
> > > > > This does not sound correct. Since this is an interrupt controller, it
> > > > > is a level interrupt. If its not cleared, as soon as the parent
> > > > > interrupt is re-enabled, is should cause another interrupt at the
> > > > > parent level. Servicing that interrupt, should case a descent to the
> > > > > child, which will service the interrupt, and atomically clear the
> > > > > interrupt status.
> > > > >
> > > > > Is something wrong here, like you are using edge interrupts, not
> > > > > level?
> > > >
> > > > Yes, it is edge interrupt.
> > >
> > > So fix this first, use level interrupts.
> >
> > I have a question here.
> >
> > I've been setting the interrupt type in chip->irq_set_type. The 'type' is
> > passed as IRQ_TYPE_EDGE_BOTH. Then I config GPIO registers based on
> > this type, and use edge interrupts. Who decides this type? Can I change
> > it at will?
> 
> There are a few different mechanism. In DT you can specify it as part
> of the phandle reference. You can also pass flags to
> 
> request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,
> 	    const char *name, void *dev)
> 
> #define IRQF_TRIGGER_NONE	0x00000000
> #define IRQF_TRIGGER_RISING	0x00000001
> #define IRQF_TRIGGER_FALLING	0x00000002
> #define IRQF_TRIGGER_HIGH	0x00000004
> #define IRQF_TRIGGER_LOW	0x00000008

There are flags passed in sfp.c:

err = devm_request_threaded_irq(sfp->dev, sfp->gpio_irq[i],
				NULL, sfp_irq,
				IRQF_ONESHOT |
				IRQF_TRIGGER_RISING |
				IRQF_TRIGGER_FALLING,
				sfp_irq_name, sfp);

So specific GPIO IRQs are request as edge interrupts. It looks like I can't
change it.
Andrew Lunn Feb. 22, 2024, 3:07 p.m. UTC | #7
> There are flags passed in sfp.c:
> 
> err = devm_request_threaded_irq(sfp->dev, sfp->gpio_irq[i],
> 				NULL, sfp_irq,
> 				IRQF_ONESHOT |
> 				IRQF_TRIGGER_RISING |
> 				IRQF_TRIGGER_FALLING,
> 				sfp_irq_name, sfp);

Does you hardware support edges for GPIOs? And by that, i mean the
whole chain of interrupt controllers? So your GPIO controller notices
an edge in the GPIO. It then passed a notification to the interrupt
controller within the GPIO controller. It then sets a bit to indicate
an interrupt has happened. At that point you have a level
interrupt. That bit causes a level interrupt to the interrupt
controller above in the chain. And it needs to be level all way up.

	Andrew
Jiawen Wu Feb. 23, 2024, 9:14 a.m. UTC | #8
On Thu, Feb 22, 2024 11:08 PM, Andrew Lunn wrote:
> > There are flags passed in sfp.c:
> >
> > err = devm_request_threaded_irq(sfp->dev, sfp->gpio_irq[i],
> > 				NULL, sfp_irq,
> > 				IRQF_ONESHOT |
> > 				IRQF_TRIGGER_RISING |
> > 				IRQF_TRIGGER_FALLING,
> > 				sfp_irq_name, sfp);
> 
> Does you hardware support edges for GPIOs? And by that, i mean the
> whole chain of interrupt controllers? So your GPIO controller notices
> an edge in the GPIO. It then passed a notification to the interrupt
> controller within the GPIO controller. It then sets a bit to indicate
> an interrupt has happened. At that point you have a level
> interrupt. That bit causes a level interrupt to the interrupt
> controller above in the chain. And it needs to be level all way up.

My hardware is required to configure GPIOs as edge-sensitive.
But I think I got something wrong. There were two problems to be solved
in this patch:

1) The register of GPIO interrupt status is masked before MAC IRQ enabled.

This is because of hardware deficiency. I need to manually clear the interrupt
status before using them. Otherwise, GPIO interrupts will never be reported
again. So there is a workaround for clearing interrupts to set GPIOs EOI in
txgbe_up_complete().

2) GPIO EOI is not set to clear interrupt status after handling the interrupt,
it  should be done in chip->irq_ack, but this ops is not called.

This is because I used handle_nested_irq() in txgbe_gpio_irq_handler() to
handle the IRQ of specific GPIO line. Since the IRQ is requested as threaded
IRQ and only action->thread_fn is created in sfp.c, the highlevel irq-events
handler (handle_level_irq() or handle_edge_irq() set in gpio irq chip) is not
called. Both level and edge type will call chip->irq_ack, but they are not called.

So I should use generic_handle_domain_irq() instead of handle_nested_irq()
to handle GPIO IRQ. But there is call trace when I do it,

[   86.784113] ------------[ cut here ]------------
[   86.784114] irq 154 handler irq_default_primary_handler+0x0/0x10 enabled interrupts
[   86.784122] WARNING: CPU: 0 PID: 3383 at kernel/irq/handle.c:161 __handle_irq_event_percpu+0x150/0x1a0
[   86.784125] Modules linked in: i2c_designware_platform sfp i2c_designware_core txgbe libwx fuse vfat fat nouveau
snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel intel_rapl_msr snd_intel_dspcfg intel_rapl_common
snd_hda_codec snd_hda_core edac_mce_amd snd_hwdep eeepc_wmi crc32_pclmul asus_wmi snd_pcm ledtrig_audio platform_profile
ghash_clmulni_intel sparse_keymap snd_seq_dummy sha512_ssse3 rfkill wmi_bmof drm_gpuvm snd_seq_oss mxm_wmi drm_exec snd_seq_midi
binfmt_misc snd_seq_midi_event gpu_sched snd_rawmidi aesni_intel i2c_algo_bit crypto_simd bridge cryptd snd_seq drm_display_helper
snd_seq_device drm_ttm_helper snd_timer ttm stp drm_kms_helper snd llc acpi_cpufreq k10temp ccp video soundcore wmi squashfs loop
sch_fq_codel drm parport_pc ppdev lp parport ramoops reed_solomon ip_tables ext4 mbcache jbd2 mdio_i2c nvme ahci nvme_core libahci
t10_pi i2c_piix4 libata pcs_xpcs crc32c_intel crc64_rocksoft i2c_core crc64 crc_t10dif r8169 crct10dif_generic crct10dif_pclmul
phylink
[   86.784200]  crct10dif_common realtek [last unloaded: i2c_designware_core]
[   86.784204] CPU: 0 PID: 3383 Comm: irq/126-eth%d Not tainted 6.8.0-rc1+ #147
[   86.784206] Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 4403 04/28/2022
[   86.784207] RIP: 0010:__handle_irq_event_percpu+0x150/0x1a0
[   86.784210] Code: 44 00 00 e9 09 ff ff ff 80 3d 17 15 7e 01 00 75 1b 48 8b 13 44 89 ee 48 c7 c7 f8 e7 82 a8 c6 05 01 15 7e 01 01
e8 00 d8 f6 ff <0f> 0b fa 0f 1f 44 00 00 e9 fc fe ff ff f0 48 0f ba 6b 40 01 0f 82
[   86.784211] RSP: 0018:ffffa04300d6bd68 EFLAGS: 00010282
[   86.784213] RAX: 0000000000000000 RBX: ffff8defda7bc000 RCX: 0000000000000000
[   86.784214] RDX: 0000000000000002 RSI: ffffffffa88644c3 RDI: 00000000ffffffff
[   86.784215] RBP: 0000000000000002 R08: 0000000000000000 R09: ffffa04300d6bc00
[   86.784216] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[   86.784217] R13: 000000000000009a R14: ffff8def8ff50a00 R15: ffff8defda7bc300
[   86.784219] FS:  0000000000000000(0000) GS:ffff8df68ea00000(0000) knlGS:0000000000000000
[   86.784220] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   86.784221] CR2: 000000c00081f000 CR3: 00000001821b6000 CR4: 0000000000750ef0
[   86.784222] PKRU: 55555554
[   86.784223] Call Trace:
[   86.784225]  <TASK>
[   86.784227]  ? __warn+0x80/0x130
[   86.784231]  ? __handle_irq_event_percpu+0x150/0x1a0
[   86.784233]  ? report_bug+0x1f4/0x200
[   86.784236]  ? srso_alias_return_thunk+0x5/0xfbef5
[   86.784240]  ? handle_bug+0x42/0x70
[   86.784243]  ? exc_invalid_op+0x14/0x70
[   86.784245]  ? asm_exc_invalid_op+0x16/0x20
[   86.784249]  ? __handle_irq_event_percpu+0x150/0x1a0
[   86.784251]  ? __handle_irq_event_percpu+0x150/0x1a0
[   86.784253]  ? __pfx_irq_thread_fn+0x10/0x10
[   86.784255]  handle_irq_event_percpu+0x10/0x50
[   86.784257]  handle_irq_event+0x34/0x60
[   86.784260]  handle_level_irq+0xa5/0x120
[   86.784263]  handle_irq_desc+0x3a/0x50
[   86.784266]  txgbe_gpio_irq_handler+0x82/0x140 [txgbe]
[   86.784271]  ? __pfx_irq_thread_fn+0x10/0x10
[   86.784273]  handle_nested_irq+0xaf/0x100
[   86.784275]  txgbe_misc_irq_handle+0x60/0x80 [txgbe]
[   86.784279]  irq_thread_fn+0x20/0x60
[   86.784282]  irq_thread+0xe2/0x190
[   86.784284]  ? srso_alias_return_thunk+0x5/0xfbef5
[   86.784286]  ? __pfx_irq_thread_dtor+0x10/0x10
[   86.784288]  ? __pfx_irq_thread+0x10/0x10
[   86.784290]  kthread+0xf0/0x120
[   86.784294]  ? __pfx_kthread+0x10/0x10
[   86.784296]  ret_from_fork+0x30/0x50
[   86.784299]  ? __pfx_kthread+0x10/0x10
[   86.784301]  ret_from_fork_asm+0x1b/0x30
[   86.784306]  </TASK>
[   86.784307] ---[ end trace 0000000000000000 ]---

This is due to default primary handler in the irq action chain.
I'm not quite sure what I dig here, am I missing any flags?
diff mbox series

Patch

diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
index e67a21294158..bd4624d14ca0 100644
--- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
+++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c
@@ -81,6 +81,7 @@  static void txgbe_up_complete(struct wx *wx)
 {
 	struct net_device *netdev = wx->netdev;
 
+	txgbe_reinit_gpio_intr(wx);
 	wx_control_hw(wx, true);
 	wx_configure_vectors(wx);
 
diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c
index bae0a8ee7014..93295916b1d2 100644
--- a/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c
+++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c
@@ -475,8 +475,10 @@  irqreturn_t txgbe_gpio_irq_handler(int irq, void *data)
 	gc = txgbe->gpio;
 	for_each_set_bit(hwirq, &gpioirq, gc->ngpio) {
 		int gpio = irq_find_mapping(gc->irq.domain, hwirq);
+		struct irq_data *d = irq_get_irq_data(gpio);
 		u32 irq_type = irq_get_trigger_type(gpio);
 
+		txgbe_gpio_irq_ack(d);
 		handle_nested_irq(gpio);
 
 		if ((irq_type & IRQ_TYPE_SENSE_MASK) == IRQ_TYPE_EDGE_BOTH) {
@@ -489,6 +491,33 @@  irqreturn_t txgbe_gpio_irq_handler(int irq, void *data)
 	return IRQ_HANDLED;
 }
 
+void txgbe_reinit_gpio_intr(struct wx *wx)
+{
+	struct txgbe *txgbe = wx->priv;
+	irq_hw_number_t hwirq;
+	unsigned long gpioirq;
+	struct gpio_chip *gc;
+	unsigned long flags;
+
+	/* for gpio interrupt pending before irq enable */
+	gpioirq = rd32(wx, WX_GPIO_INTSTATUS);
+
+	gc = txgbe->gpio;
+	for_each_set_bit(hwirq, &gpioirq, gc->ngpio) {
+		int gpio = irq_find_mapping(gc->irq.domain, hwirq);
+		struct irq_data *d = irq_get_irq_data(gpio);
+		u32 irq_type = irq_get_trigger_type(gpio);
+
+		txgbe_gpio_irq_ack(d);
+
+		if ((irq_type & IRQ_TYPE_SENSE_MASK) == IRQ_TYPE_EDGE_BOTH) {
+			raw_spin_lock_irqsave(&wx->gpio_lock, flags);
+			txgbe_toggle_trigger(gc, hwirq);
+			raw_spin_unlock_irqrestore(&wx->gpio_lock, flags);
+		}
+	}
+}
+
 static int txgbe_gpio_init(struct txgbe *txgbe)
 {
 	struct gpio_irq_chip *girq;
diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.h b/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.h
index 9855d44076cb..8a026d804fe2 100644
--- a/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.h
+++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_phy.h
@@ -5,6 +5,7 @@ 
 #define _TXGBE_PHY_H_
 
 irqreturn_t txgbe_gpio_irq_handler(int irq, void *data);
+void txgbe_reinit_gpio_intr(struct wx *wx);
 irqreturn_t txgbe_link_irq_handler(int irq, void *data);
 int txgbe_init_phy(struct txgbe *txgbe);
 void txgbe_remove_phy(struct txgbe *txgbe);