Message ID | 20240221143233.54350-1-j.raczynski@samsung.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | stmmac: Clear variable when destroying workqueue | expand |
On Wed, 21 Feb 2024 15:32:33 +0100 Jakub Raczynski wrote: > Currently when suspending driver and stopping workqueue it is checked whether > workqueue is not NULL and if so, it is destroyed. > Function destroy_workqueue() does drain queue and does clear variable, but > it does not set workqueue variable to NULL. This can cause kernel/module > panic if code attempts to clear workqueue that was not initialized. Is there no risk that we'll try to queue_work() on the uninitialized workqueue? I wonder if we should also set __FPE_REMOVING when the queue allocation fails, just to make sure we never try to queue? Please repost with a Fixes tag added (pointing to oldest commit where the problem may happen), and with [PATCH net v2] as the subject tag.
On Wed, 21 Feb 2024 15:32:33 +0100 Jakub Raczynski wrote: >> Currently when suspending driver and stopping workqueue it is checked >> whether workqueue is not NULL and if so, it is destroyed. >> Function destroy_workqueue() does drain queue and does clear variable, >> but it does not set workqueue variable to NULL. This can cause >> kernel/module panic if code attempts to clear workqueue that was not initialized. > Adding __FPE_REMOVING for allocation, it would be something, but failure here is less that likely. DMA engine start can happen since some Synopsys IP have specific clock timing requirements for > DMA, which sometimes must be provided by another driver (if for example PHY is driven by GPIO or PHY uses low-power mode during suspend). > As for queued work, you are right, additional check for __FPE_REMOVING and NULL check should be added to stmmac_service_event_schedule(), as is in stmmac_fpe_event_status(). > Will re-test that and resend patch as requested. Scratch that, confused main workqueue with fpe_workqueue in that message. Proposed commit should not introduce problem with fpe_workqueue, since in stmmac_fpe_event_status() there is check for both NULL and __FPE_REMOVING before queueing work. Will re-check if there are additional calls before submitting commit. As for the addition of __FPE_REMOVING to initialization fail, would prefer that in different commit.
On Mon, 26 Feb 2024 12:06:02 +0100 Jakub Raczynski wrote: > Scratch that, confused main workqueue with fpe_workqueue in that message. > Proposed commit should not introduce problem with fpe_workqueue, since in > stmmac_fpe_event_status() there is check for both NULL and __FPE_REMOVING > before queueing work. You're right, I missed the NULL check.. if there's no other use of fpe_wq - v2 just needs the Fixes tag.
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 75d029704503..0681029a2489 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4005,8 +4005,10 @@ static void stmmac_fpe_stop_wq(struct stmmac_priv *priv) { set_bit(__FPE_REMOVING, &priv->fpe_task_state); - if (priv->fpe_wq) + if (priv->fpe_wq) { destroy_workqueue(priv->fpe_wq); + priv->fpe_wq = NULL; + } netdev_info(priv->dev, "FPE workqueue stop"); }
Currently when suspending driver and stopping workqueue it is checked whether workqueue is not NULL and if so, it is destroyed. Function destroy_workqueue() does drain queue and does clear variable, but it does not set workqueue variable to NULL. This can cause kernel/module panic if code attempts to clear workqueue that was not initialized. This scenario is possible when resuming suspended driver in stmmac_resume(), because there is no handling for failed stmmac_hw_setup(), which can fail and return if DMA engine has failed to initialize, and workqueue is initialized after DMA engine. Should DMA engine fail to initialize, resume will proceed normally, but interface won't work and TX queue will eventually timeout, causing 'Reset adapter' error. This then does destroy workqueue during reset process. And since workqueue is initialized after DMA engine and can be skipped, it will cause kernel/module panic. This commit sets workqueue variable to NULL when destroying workqueue, which secures against that possible driver crash. Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com> --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)