diff mbox series

[2/7] net: ethernet: ti: cpsw: Don't error out in .remove()

Message ID 20231117091655.872426-3-u.kleine-koenig@pengutronix.de (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: ethernet: Convert to platform remove callback | expand

Checks

Context Check Description
netdev/series_format warning Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1127 this patch: 1127
netdev/cc_maintainers warning 6 maintainers not CCed: lorenzo@kernel.org aleksander.lobakin@intel.com pabeni@redhat.com edumazet@google.com kuba@kernel.org alexanderduyck@fb.com
netdev/build_clang success Errors and warnings before: 1154 this patch: 1154
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1154 this patch: 1154
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 28 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Uwe Kleine-König Nov. 17, 2023, 9:16 a.m. UTC
Returning early from .remove() with an error code still results in the
driver unbinding the device. So the driver core ignores the returned error
code and the resources that were not freed are never catched up. In
combination with devm this also often results in use-after-free bugs.

If runtime resume fails, it's still important to free all resources, so
don't return with an error code, but emit an error message and continue
freeing acquired stuff.

This prepares changing cpsw_remove() to return void.

Fixes: 8a0b6dc958fd ("drivers: net: cpsw: fix wrong regs access in cpsw_remove")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
 drivers/net/ethernet/ti/cpsw.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Comments

Roger Quadros Nov. 18, 2023, 10 a.m. UTC | #1
On 17/11/2023 11:16, Uwe Kleine-König wrote:
> Returning early from .remove() with an error code still results in the
> driver unbinding the device. So the driver core ignores the returned error
> code and the resources that were not freed are never catched up. In
> combination with devm this also often results in use-after-free bugs.
> 
> If runtime resume fails, it's still important to free all resources, so
> don't return with an error code, but emit an error message and continue
> freeing acquired stuff.
> 
> This prepares changing cpsw_remove() to return void.
> 
> Fixes: 8a0b6dc958fd ("drivers: net: cpsw: fix wrong regs access in cpsw_remove")
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
>  drivers/net/ethernet/ti/cpsw.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index ca4d4548f85e..db5a2ba8a6d4 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -1727,16 +1727,24 @@ static int cpsw_remove(struct platform_device *pdev)
>  	struct cpsw_common *cpsw = platform_get_drvdata(pdev);
>  	int i, ret;
>  
> -	ret = pm_runtime_resume_and_get(&pdev->dev);
> +	ret = pm_runtime_get_sync(&pdev->dev);
>  	if (ret < 0)
> -		return ret;
> +		/* There is no need to do something about that. The important
> +		 * thing is to not exit early, but do all cleanup that doesn't
> +		 * require register access.
> +		 */
> +		dev_err(&pdev->dev, "runtime resume failed (%pe)\n",
> +			ERR_PTR(ret));
>  
>  	for (i = 0; i < cpsw->data.slaves; i++)
>  		if (cpsw->slaves[i].ndev)
>  			unregister_netdev(cpsw->slaves[i].ndev);
>  
> -	cpts_release(cpsw->cpts);
> -	cpdma_ctlr_destroy(cpsw->dma);
> +	if (ret >= 0) {
> +		cpts_release(cpsw->cpts);

cpts_release() only does clk_unprepare().
Why not do that in the error path as well?

> +		cpdma_ctlr_destroy(cpsw->dma);

cpdma_ctrl_destroy() not only stops the DMA controller
but also frees up the channel and calls dma_free_coherent?

We still want to free up the channel and dma_free_coherent in the
error path?

> +	}
> +
>  	cpsw_remove_dt(pdev);
>  	pm_runtime_put_sync(&pdev->dev);
>  	pm_runtime_disable(&pdev->dev);
Roger Quadros Nov. 18, 2023, 10:05 a.m. UTC | #2
On 18/11/2023 12:00, Roger Quadros wrote:
> 
> 
> On 17/11/2023 11:16, Uwe Kleine-König wrote:
>> Returning early from .remove() with an error code still results in the
>> driver unbinding the device. So the driver core ignores the returned error
>> code and the resources that were not freed are never catched up. In
>> combination with devm this also often results in use-after-free bugs.
>>
>> If runtime resume fails, it's still important to free all resources, so
>> don't return with an error code, but emit an error message and continue
>> freeing acquired stuff.
>>
>> This prepares changing cpsw_remove() to return void.
>>
>> Fixes: 8a0b6dc958fd ("drivers: net: cpsw: fix wrong regs access in cpsw_remove")
>> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
>> ---
>>  drivers/net/ethernet/ti/cpsw.c | 16 ++++++++++++----
>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
>> index ca4d4548f85e..db5a2ba8a6d4 100644
>> --- a/drivers/net/ethernet/ti/cpsw.c
>> +++ b/drivers/net/ethernet/ti/cpsw.c
>> @@ -1727,16 +1727,24 @@ static int cpsw_remove(struct platform_device *pdev)
>>  	struct cpsw_common *cpsw = platform_get_drvdata(pdev);
>>  	int i, ret;
>>  
>> -	ret = pm_runtime_resume_and_get(&pdev->dev);
>> +	ret = pm_runtime_get_sync(&pdev->dev);
>>  	if (ret < 0)
>> -		return ret;
>> +		/* There is no need to do something about that. The important
>> +		 * thing is to not exit early, but do all cleanup that doesn't
>> +		 * require register access.
>> +		 */
>> +		dev_err(&pdev->dev, "runtime resume failed (%pe)\n",
>> +			ERR_PTR(ret));
>>  
>>  	for (i = 0; i < cpsw->data.slaves; i++)
>>  		if (cpsw->slaves[i].ndev)
>>  			unregister_netdev(cpsw->slaves[i].ndev);
>>  
>> -	cpts_release(cpsw->cpts);
>> -	cpdma_ctlr_destroy(cpsw->dma);
>> +	if (ret >= 0) {
>> +		cpts_release(cpsw->cpts);
> 
> cpts_release() only does clk_unprepare().
> Why not do that in the error path as well?
> 
>> +		cpdma_ctlr_destroy(cpsw->dma);
> 
> cpdma_ctrl_destroy() not only stops the DMA controller
> but also frees up the channel and calls dma_free_coherent?
> 
> We still want to free up the channel and dma_free_coherent in the
> error path?

cpdma_chan_destroy() does a cpdma_chan_stop() which does register
accesses so I suppose it cannot be called in the error path.

which leaves only the cpdma_desc_pool_destroy() call.

> 
>> +	}
>> +
>>  	cpsw_remove_dt(pdev);
>>  	pm_runtime_put_sync(&pdev->dev);
>>  	pm_runtime_disable(&pdev->dev);
>
Uwe Kleine-König Nov. 19, 2023, 9:40 a.m. UTC | #3
Hello Roger,

[dropping Mugunthan V N from Cc, their address bounces, and adding the
net maintainers that I failed to do for the original submission]

On Sat, Nov 18, 2023 at 12:00:08PM +0200, Roger Quadros wrote:
> > -	cpts_release(cpsw->cpts);
> > -	cpdma_ctlr_destroy(cpsw->dma);
> > +	if (ret >= 0) {
> > +		cpts_release(cpsw->cpts);
> 
> cpts_release() only does clk_unprepare().
> Why not do that in the error path as well?

I thought I saw the pm suspend do a clk_unprepare, but I don't find
that. Indeed this step shouldn't be skipped.

> > +		cpdma_ctlr_destroy(cpsw->dma);
> 
> cpdma_ctrl_destroy() not only stops the DMA controller
> but also frees up the channel and calls dma_free_coherent?
> 
> We still want to free up the channel and dma_free_coherent in the
> error path?

Then we need to split the function and only do the software part. I
don't have hardware to test this change and getting it right without
testing seems to be hard.

May I suggest that only do the conversion to remove_new (that doesn't
modify the resource leak behaviour) and you care for fixing the leaks?

My change would then just look as follows:

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index ca4d4548f85e..1251160b563b 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1722,14 +1722,16 @@ static int cpsw_probe(struct platform_device *pdev)
 	return ret;
 }
 
-static int cpsw_remove(struct platform_device *pdev)
+static void cpsw_remove(struct platform_device *pdev)
 {
 	struct cpsw_common *cpsw = platform_get_drvdata(pdev);
 	int i, ret;
 
 	ret = pm_runtime_resume_and_get(&pdev->dev);
-	if (ret < 0)
-		return ret;
+	if (ret < 0) {
+		dev_err(&pdev->dev, "Failed to resume device (%pe)\n",
+			ERR_PTR(ret))
+	}
 
 	for (i = 0; i < cpsw->data.slaves; i++)
 		if (cpsw->slaves[i].ndev)
@@ -1740,7 +1742,6 @@ static int cpsw_remove(struct platform_device *pdev)
 	cpsw_remove_dt(pdev);
 	pm_runtime_put_sync(&pdev->dev);
 	pm_runtime_disable(&pdev->dev);
-	return 0;
 }
 
 #ifdef CONFIG_PM_SLEEP
@@ -1795,7 +1796,7 @@ static struct platform_driver cpsw_driver = {
 		.of_match_table = cpsw_of_mtable,
 	},
 	.probe = cpsw_probe,
-	.remove = cpsw_remove,
+	.remove_new = cpsw_remove,
 };
 
 module_platform_driver(cpsw_driver);

A similar question for the other two cpsw drivers.

Best regards
Uwe
diff mbox series

Patch

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index ca4d4548f85e..db5a2ba8a6d4 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1727,16 +1727,24 @@  static int cpsw_remove(struct platform_device *pdev)
 	struct cpsw_common *cpsw = platform_get_drvdata(pdev);
 	int i, ret;
 
-	ret = pm_runtime_resume_and_get(&pdev->dev);
+	ret = pm_runtime_get_sync(&pdev->dev);
 	if (ret < 0)
-		return ret;
+		/* There is no need to do something about that. The important
+		 * thing is to not exit early, but do all cleanup that doesn't
+		 * require register access.
+		 */
+		dev_err(&pdev->dev, "runtime resume failed (%pe)\n",
+			ERR_PTR(ret));
 
 	for (i = 0; i < cpsw->data.slaves; i++)
 		if (cpsw->slaves[i].ndev)
 			unregister_netdev(cpsw->slaves[i].ndev);
 
-	cpts_release(cpsw->cpts);
-	cpdma_ctlr_destroy(cpsw->dma);
+	if (ret >= 0) {
+		cpts_release(cpsw->cpts);
+		cpdma_ctlr_destroy(cpsw->dma);
+	}
+
 	cpsw_remove_dt(pdev);
 	pm_runtime_put_sync(&pdev->dev);
 	pm_runtime_disable(&pdev->dev);