diff mbox series

stmmac: Clear variable when destroying workqueue

Message ID 20240221143233.54350-1-j.raczynski@samsung.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series stmmac: Clear variable when destroying workqueue | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 940 this patch: 940
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 6 maintainers not CCed: mcoquelin.stm32@gmail.com pabeni@redhat.com linux-stm32@st-md-mailman.stormreply.com linux-arm-kernel@lists.infradead.org kuba@kernel.org edumazet@google.com
netdev/build_clang success Errors and warnings before: 957 this patch: 957
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 957 this patch: 957
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-02-23--03-00 (tests: 1457)

Commit Message

Jakub Raczynski Feb. 21, 2024, 2:32 p.m. UTC
Currently when suspending driver and stopping workqueue it is checked whether
workqueue is not NULL and if so, it is destroyed.
Function destroy_workqueue() does drain queue and does clear variable, but
it does not set workqueue variable to NULL. This can cause kernel/module
panic if code attempts to clear workqueue that was not initialized.

This scenario is possible when resuming suspended driver in stmmac_resume(),
because there is no handling for failed stmmac_hw_setup(),
which can fail and return if DMA engine has failed to initialize,
and workqueue is initialized after DMA engine.
Should DMA engine fail to initialize, resume will proceed normally,
but interface won't work and TX queue will eventually timeout,
causing 'Reset adapter' error.
This then does destroy workqueue during reset process.
And since workqueue is initialized after DMA engine and can be skipped,
it will cause kernel/module panic.

This commit sets workqueue variable to NULL when destroying workqueue,
which secures against that possible driver crash.

Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Jakub Kicinski Feb. 24, 2024, 12:32 a.m. UTC | #1
On Wed, 21 Feb 2024 15:32:33 +0100 Jakub Raczynski wrote:
> Currently when suspending driver and stopping workqueue it is checked whether
> workqueue is not NULL and if so, it is destroyed.
> Function destroy_workqueue() does drain queue and does clear variable, but
> it does not set workqueue variable to NULL. This can cause kernel/module
> panic if code attempts to clear workqueue that was not initialized.

Is there no risk that we'll try to queue_work() on the uninitialized
workqueue?  I wonder if we should also set __FPE_REMOVING when the
queue allocation fails, just to make sure we never try to queue?

Please repost with a Fixes tag added (pointing to oldest commit where
the problem may happen), and with [PATCH net v2] as the subject tag.
Jakub Raczynski Feb. 26, 2024, 11:06 a.m. UTC | #2
On Wed, 21 Feb 2024 15:32:33 +0100 Jakub Raczynski wrote:
>> Currently when suspending driver and stopping workqueue it is checked 
>> whether workqueue is not NULL and if so, it is destroyed.
>> Function destroy_workqueue() does drain queue and does clear variable, 
>> but it does not set workqueue variable to NULL. This can cause 
>> kernel/module panic if code attempts to clear workqueue that was not
initialized.

> Adding __FPE_REMOVING for allocation, it would be something, but failure
here is less that likely. DMA engine start can happen since some Synopsys IP
have specific clock timing requirements for
> DMA, which sometimes must be provided by another driver (if for example
PHY is driven by GPIO or PHY uses low-power mode during suspend).
> As for queued work, you are right, additional check for __FPE_REMOVING and
NULL check should be added to stmmac_service_event_schedule(), as is in
stmmac_fpe_event_status().
> Will re-test that and resend patch as requested.

Scratch that, confused main workqueue with fpe_workqueue in that message.
Proposed commit should not introduce problem with fpe_workqueue, since in
stmmac_fpe_event_status() there is check for both NULL and __FPE_REMOVING
before queueing work.
Will re-check if there are additional calls before submitting commit.

As for the addition of __FPE_REMOVING to initialization fail, would prefer
that in different commit.
Jakub Kicinski Feb. 26, 2024, 2:55 p.m. UTC | #3
On Mon, 26 Feb 2024 12:06:02 +0100 Jakub Raczynski wrote:
> Scratch that, confused main workqueue with fpe_workqueue in that message.
> Proposed commit should not introduce problem with fpe_workqueue, since in
> stmmac_fpe_event_status() there is check for both NULL and __FPE_REMOVING
> before queueing work.

You're right, I missed the NULL check.. if there's no other use of
fpe_wq - v2 just needs the Fixes tag.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 75d029704503..0681029a2489 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -4005,8 +4005,10 @@  static void stmmac_fpe_stop_wq(struct stmmac_priv *priv)
 {
 	set_bit(__FPE_REMOVING, &priv->fpe_task_state);
 
-	if (priv->fpe_wq)
+	if (priv->fpe_wq) {
 		destroy_workqueue(priv->fpe_wq);
+		priv->fpe_wq = NULL;
+	}
 
 	netdev_info(priv->dev, "FPE workqueue stop");
 }