Message ID | 20231117091655.872426-3-u.kleine-koenig@pengutronix.de (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: ethernet: Convert to platform remove callback | expand |
On 17/11/2023 11:16, Uwe Kleine-König wrote: > Returning early from .remove() with an error code still results in the > driver unbinding the device. So the driver core ignores the returned error > code and the resources that were not freed are never catched up. In > combination with devm this also often results in use-after-free bugs. > > If runtime resume fails, it's still important to free all resources, so > don't return with an error code, but emit an error message and continue > freeing acquired stuff. > > This prepares changing cpsw_remove() to return void. > > Fixes: 8a0b6dc958fd ("drivers: net: cpsw: fix wrong regs access in cpsw_remove") > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> > --- > drivers/net/ethernet/ti/cpsw.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c > index ca4d4548f85e..db5a2ba8a6d4 100644 > --- a/drivers/net/ethernet/ti/cpsw.c > +++ b/drivers/net/ethernet/ti/cpsw.c > @@ -1727,16 +1727,24 @@ static int cpsw_remove(struct platform_device *pdev) > struct cpsw_common *cpsw = platform_get_drvdata(pdev); > int i, ret; > > - ret = pm_runtime_resume_and_get(&pdev->dev); > + ret = pm_runtime_get_sync(&pdev->dev); > if (ret < 0) > - return ret; > + /* There is no need to do something about that. The important > + * thing is to not exit early, but do all cleanup that doesn't > + * require register access. > + */ > + dev_err(&pdev->dev, "runtime resume failed (%pe)\n", > + ERR_PTR(ret)); > > for (i = 0; i < cpsw->data.slaves; i++) > if (cpsw->slaves[i].ndev) > unregister_netdev(cpsw->slaves[i].ndev); > > - cpts_release(cpsw->cpts); > - cpdma_ctlr_destroy(cpsw->dma); > + if (ret >= 0) { > + cpts_release(cpsw->cpts); cpts_release() only does clk_unprepare(). Why not do that in the error path as well? > + cpdma_ctlr_destroy(cpsw->dma); cpdma_ctrl_destroy() not only stops the DMA controller but also frees up the channel and calls dma_free_coherent? We still want to free up the channel and dma_free_coherent in the error path? > + } > + > cpsw_remove_dt(pdev); > pm_runtime_put_sync(&pdev->dev); > pm_runtime_disable(&pdev->dev);
On 18/11/2023 12:00, Roger Quadros wrote: > > > On 17/11/2023 11:16, Uwe Kleine-König wrote: >> Returning early from .remove() with an error code still results in the >> driver unbinding the device. So the driver core ignores the returned error >> code and the resources that were not freed are never catched up. In >> combination with devm this also often results in use-after-free bugs. >> >> If runtime resume fails, it's still important to free all resources, so >> don't return with an error code, but emit an error message and continue >> freeing acquired stuff. >> >> This prepares changing cpsw_remove() to return void. >> >> Fixes: 8a0b6dc958fd ("drivers: net: cpsw: fix wrong regs access in cpsw_remove") >> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> >> --- >> drivers/net/ethernet/ti/cpsw.c | 16 ++++++++++++---- >> 1 file changed, 12 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c >> index ca4d4548f85e..db5a2ba8a6d4 100644 >> --- a/drivers/net/ethernet/ti/cpsw.c >> +++ b/drivers/net/ethernet/ti/cpsw.c >> @@ -1727,16 +1727,24 @@ static int cpsw_remove(struct platform_device *pdev) >> struct cpsw_common *cpsw = platform_get_drvdata(pdev); >> int i, ret; >> >> - ret = pm_runtime_resume_and_get(&pdev->dev); >> + ret = pm_runtime_get_sync(&pdev->dev); >> if (ret < 0) >> - return ret; >> + /* There is no need to do something about that. The important >> + * thing is to not exit early, but do all cleanup that doesn't >> + * require register access. >> + */ >> + dev_err(&pdev->dev, "runtime resume failed (%pe)\n", >> + ERR_PTR(ret)); >> >> for (i = 0; i < cpsw->data.slaves; i++) >> if (cpsw->slaves[i].ndev) >> unregister_netdev(cpsw->slaves[i].ndev); >> >> - cpts_release(cpsw->cpts); >> - cpdma_ctlr_destroy(cpsw->dma); >> + if (ret >= 0) { >> + cpts_release(cpsw->cpts); > > cpts_release() only does clk_unprepare(). > Why not do that in the error path as well? > >> + cpdma_ctlr_destroy(cpsw->dma); > > cpdma_ctrl_destroy() not only stops the DMA controller > but also frees up the channel and calls dma_free_coherent? > > We still want to free up the channel and dma_free_coherent in the > error path? cpdma_chan_destroy() does a cpdma_chan_stop() which does register accesses so I suppose it cannot be called in the error path. which leaves only the cpdma_desc_pool_destroy() call. > >> + } >> + >> cpsw_remove_dt(pdev); >> pm_runtime_put_sync(&pdev->dev); >> pm_runtime_disable(&pdev->dev); >
Hello Roger, [dropping Mugunthan V N from Cc, their address bounces, and adding the net maintainers that I failed to do for the original submission] On Sat, Nov 18, 2023 at 12:00:08PM +0200, Roger Quadros wrote: > > - cpts_release(cpsw->cpts); > > - cpdma_ctlr_destroy(cpsw->dma); > > + if (ret >= 0) { > > + cpts_release(cpsw->cpts); > > cpts_release() only does clk_unprepare(). > Why not do that in the error path as well? I thought I saw the pm suspend do a clk_unprepare, but I don't find that. Indeed this step shouldn't be skipped. > > + cpdma_ctlr_destroy(cpsw->dma); > > cpdma_ctrl_destroy() not only stops the DMA controller > but also frees up the channel and calls dma_free_coherent? > > We still want to free up the channel and dma_free_coherent in the > error path? Then we need to split the function and only do the software part. I don't have hardware to test this change and getting it right without testing seems to be hard. May I suggest that only do the conversion to remove_new (that doesn't modify the resource leak behaviour) and you care for fixing the leaks? My change would then just look as follows: diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index ca4d4548f85e..1251160b563b 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1722,14 +1722,16 @@ static int cpsw_probe(struct platform_device *pdev) return ret; } -static int cpsw_remove(struct platform_device *pdev) +static void cpsw_remove(struct platform_device *pdev) { struct cpsw_common *cpsw = platform_get_drvdata(pdev); int i, ret; ret = pm_runtime_resume_and_get(&pdev->dev); - if (ret < 0) - return ret; + if (ret < 0) { + dev_err(&pdev->dev, "Failed to resume device (%pe)\n", + ERR_PTR(ret)) + } for (i = 0; i < cpsw->data.slaves; i++) if (cpsw->slaves[i].ndev) @@ -1740,7 +1742,6 @@ static int cpsw_remove(struct platform_device *pdev) cpsw_remove_dt(pdev); pm_runtime_put_sync(&pdev->dev); pm_runtime_disable(&pdev->dev); - return 0; } #ifdef CONFIG_PM_SLEEP @@ -1795,7 +1796,7 @@ static struct platform_driver cpsw_driver = { .of_match_table = cpsw_of_mtable, }, .probe = cpsw_probe, - .remove = cpsw_remove, + .remove_new = cpsw_remove, }; module_platform_driver(cpsw_driver); A similar question for the other two cpsw drivers. Best regards Uwe
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index ca4d4548f85e..db5a2ba8a6d4 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1727,16 +1727,24 @@ static int cpsw_remove(struct platform_device *pdev) struct cpsw_common *cpsw = platform_get_drvdata(pdev); int i, ret; - ret = pm_runtime_resume_and_get(&pdev->dev); + ret = pm_runtime_get_sync(&pdev->dev); if (ret < 0) - return ret; + /* There is no need to do something about that. The important + * thing is to not exit early, but do all cleanup that doesn't + * require register access. + */ + dev_err(&pdev->dev, "runtime resume failed (%pe)\n", + ERR_PTR(ret)); for (i = 0; i < cpsw->data.slaves; i++) if (cpsw->slaves[i].ndev) unregister_netdev(cpsw->slaves[i].ndev); - cpts_release(cpsw->cpts); - cpdma_ctlr_destroy(cpsw->dma); + if (ret >= 0) { + cpts_release(cpsw->cpts); + cpdma_ctlr_destroy(cpsw->dma); + } + cpsw_remove_dt(pdev); pm_runtime_put_sync(&pdev->dev); pm_runtime_disable(&pdev->dev);
Returning early from .remove() with an error code still results in the driver unbinding the device. So the driver core ignores the returned error code and the resources that were not freed are never catched up. In combination with devm this also often results in use-after-free bugs. If runtime resume fails, it's still important to free all resources, so don't return with an error code, but emit an error message and continue freeing acquired stuff. This prepares changing cpsw_remove() to return void. Fixes: 8a0b6dc958fd ("drivers: net: cpsw: fix wrong regs access in cpsw_remove") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> --- drivers/net/ethernet/ti/cpsw.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)