diff mbox series

mt76: mt7921e: fix crash in chip reset fail

Message ID 727eb5ffd3c7c805245e512da150ecf0a7154020.1659452909.git.deren.wu@mediatek.com (mailing list archive)
State Accepted
Delegated to: Felix Fietkau
Headers show
Series mt76: mt7921e: fix crash in chip reset fail | expand

Commit Message

Deren Wu Aug. 2, 2022, 3:15 p.m. UTC
From: Deren Wu <deren.wu@mediatek.com>

In case of drv own fail in reset, we may need to run mac_reset several
times. The sequence would trigger system crash as the log below.

Because we do not re-enable/schedule "tx_napi" before disable it again,
the process would keep waiting for state change in napi_diable(). To
avoid the problem and keep status synchronize for each run, goto final
resource handling if drv own failed.

[ 5857.353423] mt7921e 0000:3b:00.0: driver own failed
[ 5858.433427] mt7921e 0000:3b:00.0: Timeout for driver own
[ 5859.633430] mt7921e 0000:3b:00.0: driver own failed
[ 5859.633444] ------------[ cut here ]------------
[ 5859.633446] WARNING: CPU: 6 at kernel/kthread.c:659 kthread_park+0x11d
[ 5859.633717] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
[ 5859.633728] RIP: 0010:kthread_park+0x11d/0x150
[ 5859.633736] RSP: 0018:ffff8881b676fc68 EFLAGS: 00010202
......
[ 5859.633766] Call Trace:
[ 5859.633768]  <TASK>
[ 5859.633771]  mt7921e_mac_reset+0x176/0x6f0 [mt7921e]
[ 5859.633778]  mt7921_mac_reset_work+0x184/0x3a0 [mt7921_common]
[ 5859.633785]  ? mt7921_mac_set_timing+0x520/0x520 [mt7921_common]
[ 5859.633794]  ? __kasan_check_read+0x11/0x20
[ 5859.633802]  process_one_work+0x7ee/0x1320
[ 5859.633810]  worker_thread+0x53c/0x1240
[ 5859.633818]  kthread+0x2b8/0x370
[ 5859.633824]  ? process_one_work+0x1320/0x1320
[ 5859.633828]  ? kthread_complete_and_exit+0x30/0x30
[ 5859.633834]  ret_from_fork+0x1f/0x30
[ 5859.633842]  </TASK>

Fixes: 0efaf31dec57 ("mt76: mt7921: fix MT7921E reset failure")
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
---
 drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

sean wang Aug. 25, 2022, 1:45 a.m. UTC | #1
Hi Kalle,

If the patch looks good to you, could you help apply the patch to
wireless-drivers.git because there are getting more users reporting
the issue with the stable kernel such as [1]. I would like to backport
it earlier once it appears in the Linus tree to solve the indefinite
hang issue.

[1] https://lore.kernel.org/linux-wireless/VE1PR04MB64945C660A81D38F290E4A4BE59F9@VE1PR04MB6494.eurprd04.prod.outlook.com/T/

Sean

On Tue, Aug 2, 2022 at 8:20 AM Deren Wu <Deren.Wu@mediatek.com> wrote:
>
> From: Deren Wu <deren.wu@mediatek.com>
>
> In case of drv own fail in reset, we may need to run mac_reset several
> times. The sequence would trigger system crash as the log below.
>
> Because we do not re-enable/schedule "tx_napi" before disable it again,
> the process would keep waiting for state change in napi_diable(). To
> avoid the problem and keep status synchronize for each run, goto final
> resource handling if drv own failed.
>
> [ 5857.353423] mt7921e 0000:3b:00.0: driver own failed
> [ 5858.433427] mt7921e 0000:3b:00.0: Timeout for driver own
> [ 5859.633430] mt7921e 0000:3b:00.0: driver own failed
> [ 5859.633444] ------------[ cut here ]------------
> [ 5859.633446] WARNING: CPU: 6 at kernel/kthread.c:659 kthread_park+0x11d
> [ 5859.633717] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
> [ 5859.633728] RIP: 0010:kthread_park+0x11d/0x150
> [ 5859.633736] RSP: 0018:ffff8881b676fc68 EFLAGS: 00010202
> ......
> [ 5859.633766] Call Trace:
> [ 5859.633768]  <TASK>
> [ 5859.633771]  mt7921e_mac_reset+0x176/0x6f0 [mt7921e]
> [ 5859.633778]  mt7921_mac_reset_work+0x184/0x3a0 [mt7921_common]
> [ 5859.633785]  ? mt7921_mac_set_timing+0x520/0x520 [mt7921_common]
> [ 5859.633794]  ? __kasan_check_read+0x11/0x20
> [ 5859.633802]  process_one_work+0x7ee/0x1320
> [ 5859.633810]  worker_thread+0x53c/0x1240
> [ 5859.633818]  kthread+0x2b8/0x370
> [ 5859.633824]  ? process_one_work+0x1320/0x1320
> [ 5859.633828]  ? kthread_complete_and_exit+0x30/0x30
> [ 5859.633834]  ret_from_fork+0x1f/0x30
> [ 5859.633842]  </TASK>
>
> Fixes: 0efaf31dec57 ("mt76: mt7921: fix MT7921E reset failure")
> Signed-off-by: Deren Wu <deren.wu@mediatek.com>
> ---
>  drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> index e1800674089a..576a0149251b 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> +++ b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> @@ -261,7 +261,7 @@ int mt7921e_mac_reset(struct mt7921_dev *dev)
>
>         err = mt7921e_driver_own(dev);
>         if (err)
> -               return err;
> +               goto out;
>
>         err = mt7921_run_firmware(dev);
>         if (err)
> --
> 2.18.0
>
Sean Wang Aug. 26, 2022, 10:50 a.m. UTC | #2
Hi Johannes,

Kalle seemed not available this week, so I would like to look for help from you.
If the patch looks good to you, could you help apply the patch to
wireless-drivers.git because there are getting more users reporting
the issue with the stable kernel such as [1]. I would like to backport
it sooner once it appears in the Linus tree to solve the indefinite
hang issue. Sorry for the hurry request, I knew you just sent the pull
request one moment ago :(

[1] https://lore.kernel.org/linux-wireless/VE1PR04MB64945C660A81D38F290E4A4BE59F9@VE1PR04MB6494.eurprd04.prod.outlook.com/T/

Sean

On Wed, Aug 24, 2022 at 6:45 PM sean wang <objelf@gmail.com> wrote:
>
> Hi Kalle,
>
> If the patch looks good to you, could you help apply the patch to
> wireless-drivers.git because there are getting more users reporting
> the issue with the stable kernel such as [1]. I would like to backport
> it earlier once it appears in the Linus tree to solve the indefinite
> hang issue.
>
> [1] https://lore.kernel.org/linux-wireless/VE1PR04MB64945C660A81D38F290E4A4BE59F9@VE1PR04MB6494.eurprd04.prod.outlook.com/T/
>
> Sean
>
> On Tue, Aug 2, 2022 at 8:20 AM Deren Wu <Deren.Wu@mediatek.com> wrote:
> >
> > From: Deren Wu <deren.wu@mediatek.com>
> >
> > In case of drv own fail in reset, we may need to run mac_reset several
> > times. The sequence would trigger system crash as the log below.
> >
> > Because we do not re-enable/schedule "tx_napi" before disable it again,
> > the process would keep waiting for state change in napi_diable(). To
> > avoid the problem and keep status synchronize for each run, goto final
> > resource handling if drv own failed.
> >
> > [ 5857.353423] mt7921e 0000:3b:00.0: driver own failed
> > [ 5858.433427] mt7921e 0000:3b:00.0: Timeout for driver own
> > [ 5859.633430] mt7921e 0000:3b:00.0: driver own failed
> > [ 5859.633444] ------------[ cut here ]------------
> > [ 5859.633446] WARNING: CPU: 6 at kernel/kthread.c:659 kthread_park+0x11d
> > [ 5859.633717] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
> > [ 5859.633728] RIP: 0010:kthread_park+0x11d/0x150
> > [ 5859.633736] RSP: 0018:ffff8881b676fc68 EFLAGS: 00010202
> > ......
> > [ 5859.633766] Call Trace:
> > [ 5859.633768]  <TASK>
> > [ 5859.633771]  mt7921e_mac_reset+0x176/0x6f0 [mt7921e]
> > [ 5859.633778]  mt7921_mac_reset_work+0x184/0x3a0 [mt7921_common]
> > [ 5859.633785]  ? mt7921_mac_set_timing+0x520/0x520 [mt7921_common]
> > [ 5859.633794]  ? __kasan_check_read+0x11/0x20
> > [ 5859.633802]  process_one_work+0x7ee/0x1320
> > [ 5859.633810]  worker_thread+0x53c/0x1240
> > [ 5859.633818]  kthread+0x2b8/0x370
> > [ 5859.633824]  ? process_one_work+0x1320/0x1320
> > [ 5859.633828]  ? kthread_complete_and_exit+0x30/0x30
> > [ 5859.633834]  ret_from_fork+0x1f/0x30
> > [ 5859.633842]  </TASK>
> >
> > Fixes: 0efaf31dec57 ("mt76: mt7921: fix MT7921E reset failure")
> > Signed-off-by: Deren Wu <deren.wu@mediatek.com>
> > ---
> >  drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> > index e1800674089a..576a0149251b 100644
> > --- a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> > +++ b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
> > @@ -261,7 +261,7 @@ int mt7921e_mac_reset(struct mt7921_dev *dev)
> >
> >         err = mt7921e_driver_own(dev);
> >         if (err)
> > -               return err;
> > +               goto out;
> >
> >         err = mt7921_run_firmware(dev);
> >         if (err)
> > --
> > 2.18.0
> >
Kalle Valo Aug. 29, 2022, 4:10 p.m. UTC | #3
Sean Wang <sean.wang@kernel.org> writes:

> Kalle seemed not available this week, so I would like to look for help from you.
> If the patch looks good to you, could you help apply the patch to
> wireless-drivers.git because there are getting more users reporting
> the issue with the stable kernel such as [1]. I would like to backport
> it sooner once it appears in the Linus tree to solve the indefinite
> hang issue. Sorry for the hurry request, I knew you just sent the pull
> request one moment ago :(
>
> [1]
> https://lore.kernel.org/linux-wireless/VE1PR04MB64945C660A81D38F290E4A4BE59F9@VE1PR04MB6494.eurprd04.prod.outlook.com/T/

Johannes applied this now:

https://git.kernel.org/wireless/wireless/c/fa3fbe640378
diff mbox series

Patch

diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
index e1800674089a..576a0149251b 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/pci_mac.c
@@ -261,7 +261,7 @@  int mt7921e_mac_reset(struct mt7921_dev *dev)
 
 	err = mt7921e_driver_own(dev);
 	if (err)
-		return err;
+		goto out;
 
 	err = mt7921_run_firmware(dev);
 	if (err)