diff mbox series

[net,v2] stmmac: Clear variable when destroying workqueue

Message ID 20240226164231.145848-1-j.raczynski@samsung.com (mailing list archive)
State Accepted
Commit 8af411bbba1f457c33734795f024d0ef26d0963f
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] stmmac: Clear variable when destroying workqueue | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 956 this patch: 956
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 4 blamed authors not CCed: weifeng.voon@intel.com tee.min.tan@intel.com boon.leong.ong@intel.com mohammad.athari.ismail@intel.com; 9 maintainers not CCed: pabeni@redhat.com edumazet@google.com tee.min.tan@intel.com mcoquelin.stm32@gmail.com linux-stm32@st-md-mailman.stormreply.com linux-arm-kernel@lists.infradead.org boon.leong.ong@intel.com mohammad.athari.ismail@intel.com weifeng.voon@intel.com
netdev/build_clang success Errors and warnings before: 973 this patch: 973
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 973 this patch: 973
netdev/checkpatch warning WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: 5a5586112b92 ("net: stmmac: support FPE link partner hand-shaking procedure")'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-02-27--03-00 (tests: 1456)

Commit Message

Jakub Raczynski Feb. 26, 2024, 4:42 p.m. UTC
Currently when suspending driver and stopping workqueue it is checked whether
workqueue is not NULL and if so, it is destroyed.
Function destroy_workqueue() does drain queue and does clear variable, but
it does not set workqueue variable to NULL. This can cause kernel/module
panic if code attempts to clear workqueue that was not initialized.

This scenario is possible when resuming suspended driver in stmmac_resume(),
because there is no handling for failed stmmac_hw_setup(),
which can fail and return if DMA engine has failed to initialize,
and workqueue is initialized after DMA engine.
Should DMA engine fail to initialize, resume will proceed normally,
but interface won't work and TX queue will eventually timeout,
causing 'Reset adapter' error.
This then does destroy workqueue during reset process.
And since workqueue is initialized after DMA engine and can be skipped,
it will cause kernel/module panic.

To secure against this possible crash, set workqueue variable to NULL when
destroying workqueue.

Log/backtrace from crash goes as follows:
[88.031977]------------[ cut here ]------------
[88.031985]NETDEV WATCHDOG: eth0 (sxgmac): transmit queue 1 timed out
[88.032017]WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x390/0x398
           <Skipping backtrace for watchdog timeout>
[88.032251]---[ end trace e70de432e4d5c2c0 ]---
[88.032282]sxgmac 16d88000.ethernet eth0: Reset adapter.
[88.036359]------------[ cut here ]------------
[88.036519]Call trace:
[88.036523] flush_workqueue+0x3e4/0x430
[88.036528] drain_workqueue+0xc4/0x160
[88.036533] destroy_workqueue+0x40/0x270
[88.036537] stmmac_fpe_stop_wq+0x4c/0x70
[88.036541] stmmac_release+0x278/0x280
[88.036546] __dev_close_many+0xcc/0x158
[88.036551] dev_close_many+0xbc/0x190
[88.036555] dev_close.part.0+0x70/0xc0
[88.036560] dev_close+0x24/0x30
[88.036564] stmmac_service_task+0x110/0x140
[88.036569] process_one_work+0x1d8/0x4a0
[88.036573] worker_thread+0x54/0x408
[88.036578] kthread+0x164/0x170
[88.036583] ret_from_fork+0x10/0x20
[88.036588]---[ end trace e70de432e4d5c2c1 ]---
[88.036597]Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004

Fixes: 5a5586112b929 ("net: stmmac: support FPE link partner hand-shaking procedure")
Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Jiri Pirko Feb. 27, 2024, 7:06 a.m. UTC | #1
Mon, Feb 26, 2024 at 05:42:32PM CET, j.raczynski@samsung.com wrote:
>Currently when suspending driver and stopping workqueue it is checked whether
>workqueue is not NULL and if so, it is destroyed.
>Function destroy_workqueue() does drain queue and does clear variable, but
>it does not set workqueue variable to NULL. This can cause kernel/module
>panic if code attempts to clear workqueue that was not initialized.
>
>This scenario is possible when resuming suspended driver in stmmac_resume(),
>because there is no handling for failed stmmac_hw_setup(),
>which can fail and return if DMA engine has failed to initialize,
>and workqueue is initialized after DMA engine.
>Should DMA engine fail to initialize, resume will proceed normally,
>but interface won't work and TX queue will eventually timeout,
>causing 'Reset adapter' error.
>This then does destroy workqueue during reset process.
>And since workqueue is initialized after DMA engine and can be skipped,
>it will cause kernel/module panic.
>
>To secure against this possible crash, set workqueue variable to NULL when
>destroying workqueue.
>
>Log/backtrace from crash goes as follows:
>[88.031977]------------[ cut here ]------------
>[88.031985]NETDEV WATCHDOG: eth0 (sxgmac): transmit queue 1 timed out
>[88.032017]WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x390/0x398
>           <Skipping backtrace for watchdog timeout>
>[88.032251]---[ end trace e70de432e4d5c2c0 ]---
>[88.032282]sxgmac 16d88000.ethernet eth0: Reset adapter.
>[88.036359]------------[ cut here ]------------
>[88.036519]Call trace:
>[88.036523] flush_workqueue+0x3e4/0x430
>[88.036528] drain_workqueue+0xc4/0x160
>[88.036533] destroy_workqueue+0x40/0x270
>[88.036537] stmmac_fpe_stop_wq+0x4c/0x70
>[88.036541] stmmac_release+0x278/0x280
>[88.036546] __dev_close_many+0xcc/0x158
>[88.036551] dev_close_many+0xbc/0x190
>[88.036555] dev_close.part.0+0x70/0xc0
>[88.036560] dev_close+0x24/0x30
>[88.036564] stmmac_service_task+0x110/0x140
>[88.036569] process_one_work+0x1d8/0x4a0
>[88.036573] worker_thread+0x54/0x408
>[88.036578] kthread+0x164/0x170
>[88.036583] ret_from_fork+0x10/0x20
>[88.036588]---[ end trace e70de432e4d5c2c1 ]---
>[88.036597]Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
>
>Fixes: 5a5586112b929 ("net: stmmac: support FPE link partner hand-shaking procedure")
>Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>

Reviewed-by: Jiri Pirko <jiri@nvidia.com>

Next time, send v2 as a separate email starting new thread. Thanks!
patchwork-bot+netdevbpf@kernel.org Feb. 28, 2024, 11:30 a.m. UTC | #2
Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Mon, 26 Feb 2024 17:42:32 +0100 you wrote:
> Currently when suspending driver and stopping workqueue it is checked whether
> workqueue is not NULL and if so, it is destroyed.
> Function destroy_workqueue() does drain queue and does clear variable, but
> it does not set workqueue variable to NULL. This can cause kernel/module
> panic if code attempts to clear workqueue that was not initialized.
> 
> This scenario is possible when resuming suspended driver in stmmac_resume(),
> because there is no handling for failed stmmac_hw_setup(),
> which can fail and return if DMA engine has failed to initialize,
> and workqueue is initialized after DMA engine.
> Should DMA engine fail to initialize, resume will proceed normally,
> but interface won't work and TX queue will eventually timeout,
> causing 'Reset adapter' error.
> This then does destroy workqueue during reset process.
> And since workqueue is initialized after DMA engine and can be skipped,
> it will cause kernel/module panic.
> 
> [...]

Here is the summary with links:
  - [net,v2] stmmac: Clear variable when destroying workqueue
    https://git.kernel.org/netdev/net/c/8af411bbba1f

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 75d029704503..0681029a2489 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -4005,8 +4005,10 @@  static void stmmac_fpe_stop_wq(struct stmmac_priv *priv)
 {
 	set_bit(__FPE_REMOVING, &priv->fpe_task_state);
 
-	if (priv->fpe_wq)
+	if (priv->fpe_wq) {
 		destroy_workqueue(priv->fpe_wq);
+		priv->fpe_wq = NULL;
+	}
 
 	netdev_info(priv->dev, "FPE workqueue stop");
 }