Message ID | 20250114-am65-cpsw-fix-tx-irq-free-v1-1-b2069e6ed185@kernel.org (mailing list archive) |
---|---|
State | New |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: ethernet: ti: am65-cpsw: fix freeing IRQ in am65_cpsw_nuss_remove_tx_chns() | expand |
On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: > When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns > negative error value on error. So not NULL check is not sufficient > to deteremine if IRQ is valid. Check that IRQ is greater then zero > to ensure it is valid. > > Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") > Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org>
On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: Hello Roger, > When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows: tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); Additionally, following the above section we have: if (tx_chn->irq < 0) { dev_err(dev, "Failed to get tx dma irq %d\n", tx_chn->irq); ret = tx_chn->irq; goto err; } Could you please provide details on the code-path which will lead to a negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"? There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely: 1. am65_cpsw_nuss_update_tx_rx_chns() 2. am65_cpsw_nuss_suspend() Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it appears to me that "tx_chn->irq" will never be negative within am65_cpsw_nuss_remove_tx_chns() Please let me know if I have overlooked something. Regards, Siddharth. > negative error value on error. So not NULL check is not sufficient > to deteremine if IRQ is valid. Check that IRQ is greater then zero > to ensure it is valid. > > Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") > Signed-off-by: Roger Quadros <rogerq@kernel.org> > --- > drivers/net/ethernet/ti/am65-cpsw-nuss.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c > index 5465bf872734..e1de45fb18ae 100644 > --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c > +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c > @@ -2248,7 +2248,7 @@ static void am65_cpsw_nuss_remove_tx_chns(struct am65_cpsw_common *common) > for (i = 0; i < common->tx_ch_num; i++) { > struct am65_cpsw_tx_chn *tx_chn = &common->tx_chns[i]; > > - if (tx_chn->irq) > + if (tx_chn->irq > 0) > devm_free_irq(dev, tx_chn->irq, tx_chn); > > netif_napi_del(&tx_chn->napi_tx); > > --- > base-commit: 5bc55a333a2f7316b58edc7573e8e893f7acb532 > change-id: 20250114-am65-cpsw-fix-tx-irq-free-846ac55ee6e1 > > Best regards, > -- > Roger Quadros <rogerq@kernel.org> >
Hi Siddharth, On 15/01/2025 07:18, Siddharth Vadapalli wrote: > On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: > > Hello Roger, > >> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns > > You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to > assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows: Yes I meant tx instead of rx. > > tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); > > Additionally, following the above section we have: > > if (tx_chn->irq < 0) { > dev_err(dev, "Failed to get tx dma irq %d\n", > tx_chn->irq); > ret = tx_chn->irq; > goto err; > } > > Could you please provide details on the code-path which will lead to a > negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"? > > There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely: > 1. am65_cpsw_nuss_update_tx_rx_chns() > 2. am65_cpsw_nuss_suspend() > Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only > in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it > appears to me that "tx_chn->irq" will never be negative within > am65_cpsw_nuss_remove_tx_chns() > > Please let me know if I have overlooked something. The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called repeatedly (by user changing number of TX queues) even if previous call to am65_cpsw_nuss_init_tx_chns() failed. Please try the below patch to simulate the error condition. Then do the following - bring down all network interfaces - change num TX queues to 2. IRQ for 2nd channel fails. - change num TX queues to 3. Now we try to free an invalid IRQ and we see warning. Also I think it is good practice to code for set value than to code for existing code paths as they can change in the future. --test patch starts-- diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c index 36c29d3db329..c22cadaaf3d3 100644 --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c @@ -155,7 +155,7 @@ NETIF_MSG_IFUP | NETIF_MSG_PROBE | NETIF_MSG_IFDOWN | \ NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR) -#define AM65_CPSW_DEFAULT_TX_CHNS 8 +#define AM65_CPSW_DEFAULT_TX_CHNS 1 #define AM65_CPSW_DEFAULT_RX_CHN_FLOWS 1 /* CPPI streaming packet interface */ @@ -2346,7 +2348,10 @@ static int am65_cpsw_nuss_init_tx_chns(struct am65_cpsw_common *common) tx_chn->dsize_log2 = __fls(hdesc_size_out); WARN_ON(hdesc_size_out != (1 << tx_chn->dsize_log2)); - tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); + if (i == 1) + tx_chn->irq = -ENODEV; + else + tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); if (tx_chn->irq < 0) { dev_err(dev, "Failed to get tx dma irq %d\n", tx_chn->irq); -- cheers, -roger
On Wed, Jan 15, 2025 at 12:04:17PM +0200, Roger Quadros wrote: > Hi Siddharth, > > On 15/01/2025 07:18, Siddharth Vadapalli wrote: > > On Tue, Jan 14, 2025 at 06:44:02PM +0200, Roger Quadros wrote: > > > > Hello Roger, > > > >> When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns > > > > You probably meant "k3_udma_glue_tx_get_irq()" instead? It is used to > > assign tx_chn->irq within am65_cpsw_nuss_init_tx_chns() as follows: > > Yes I meant tx instead of rx. > > > > > tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); > > > > Additionally, following the above section we have: > > > > if (tx_chn->irq < 0) { > > dev_err(dev, "Failed to get tx dma irq %d\n", > > tx_chn->irq); > > ret = tx_chn->irq; > > goto err; > > } > > > > Could you please provide details on the code-path which will lead to a > > negative "tx_chn->irq" within "am65_cpsw_nuss_remove_tx_chns()"? > > > > There seem to be two callers of am65_cpsw_nuss_remove_tx_chns(), namely: > > 1. am65_cpsw_nuss_update_tx_rx_chns() > > 2. am65_cpsw_nuss_suspend() > > Since both of them seem to invoke am65_cpsw_nuss_remove_tx_chns() only > > in the case where am65_cpsw_nuss_init_tx_chns() *did not* error out, it > > appears to me that "tx_chn->irq" will never be negative within > > am65_cpsw_nuss_remove_tx_chns() > > > > Please let me know if I have overlooked something. > > The issue is with am65_cpsw_nuss_update_tx_rx_chns(). It can be called > repeatedly (by user changing number of TX queues) even if previous call > to am65_cpsw_nuss_init_tx_chns() failed. Thank you for clarifying. So the issue/bug was discovered since the implementation of am65_cpsw_nuss_update_tx_rx_chns(). The "Fixes" tag misled me. Maybe the "Fixes" tag should be updated? Though we should code to future-proof it as done in this patch, the "Fixes" tag pointing to the very first commit of the driver might not be accurate as the code-path associated with the bug cannot be exercised at that commit. Independent of the above change suggested for the "Fixes" tag, Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> There seems to be a different bug in am65_cpsw_nuss_update_tx_rx_chns() which I have described below. > > Please try the below patch to simulate the error condition. > > Then do the following > - bring down all network interfaces > - change num TX queues to 2. IRQ for 2nd channel fails. > - change num TX queues to 3. Now we try to free an invalid IRQ and we see warning. > > Also I think it is good practice to code for set value than to code > for existing code paths as they can change in the future. > > --test patch starts-- > > diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c > index 36c29d3db329..c22cadaaf3d3 100644 > --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c > +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c > @@ -155,7 +155,7 @@ > NETIF_MSG_IFUP | NETIF_MSG_PROBE | NETIF_MSG_IFDOWN | \ > NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR) > > -#define AM65_CPSW_DEFAULT_TX_CHNS 8 > +#define AM65_CPSW_DEFAULT_TX_CHNS 1 > #define AM65_CPSW_DEFAULT_RX_CHN_FLOWS 1 > > /* CPPI streaming packet interface */ > @@ -2346,7 +2348,10 @@ static int am65_cpsw_nuss_init_tx_chns(struct am65_cpsw_common *common) > tx_chn->dsize_log2 = __fls(hdesc_size_out); > WARN_ON(hdesc_size_out != (1 << tx_chn->dsize_log2)); > > - tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); > + if (i == 1) > + tx_chn->irq = -ENODEV; > + else > + tx_chn->irq = k3_udma_glue_tx_get_irq(tx_chn->tx_chn); The pair - am65_cpsw_nuss_init_tx_chns()/am65_cpsw_nuss_remove_tx_chns() seem to be written under the assumption that failure will result in the driver's probe failing. With am65_cpsw_nuss_update_tx_rx_chns(), that assumption no longer holds true. Please consider the following sequence: 1. am65_cpsw_nuss_probe() am65_cpsw_nuss_register_ndevs() am65_cpsw_nuss_init_tx_chns() => Succeeds 2. Probe is successful 3. am65_cpsw_nuss_update_tx_rx_chns() => Invoked by user am65_cpsw_nuss_remove_tx_chns() => Succeeds am65_cpsw_nuss_init_tx_chns() => Partially fails devm_add_action(dev, am65_cpsw_nuss_free_tx_chns, common); ^ DEVM Action is added, but since the driver isn't removed, the cleanup via am65_cpsw_nuss_free_tx_chns() will not run. Only when the user re-invokes am65_cpsw_nuss_update_tx_rx_chns(), the cleanup will be performed. This might have to be fixed in the following manner: @@ -3416,10 +3416,17 @@ int am65_cpsw_nuss_update_tx_rx_chns(struct am65_cpsw_common *common, common->tx_ch_num = num_tx; common->rx_ch_num_flows = num_rx; ret = am65_cpsw_nuss_init_tx_chns(common); - if (ret) + if (ret) { + devm_remove_action(dev, am65_cpsw_nuss_free_tx_chns, common); + am65_cpsw_nuss_free_tx_chns(common); return ret; + } ret = am65_cpsw_nuss_init_rx_chns(common); + if (ret) { + devm_remove_action(dev, am65_cpsw_nuss_free_rx_chns, common); + am65_cpsw_nuss_free_rx_chns(common); + } return ret; } Please let me know what you think. Regards, Siddharth.
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c index 5465bf872734..e1de45fb18ae 100644 --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c @@ -2248,7 +2248,7 @@ static void am65_cpsw_nuss_remove_tx_chns(struct am65_cpsw_common *common) for (i = 0; i < common->tx_ch_num; i++) { struct am65_cpsw_tx_chn *tx_chn = &common->tx_chns[i]; - if (tx_chn->irq) + if (tx_chn->irq > 0) devm_free_irq(dev, tx_chn->irq, tx_chn); netif_napi_del(&tx_chn->napi_tx);
When getting the IRQ we use k3_udma_glue_rx_get_irq() which returns negative error value on error. So not NULL check is not sufficient to deteremine if IRQ is valid. Check that IRQ is greater then zero to ensure it is valid. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Roger Quadros <rogerq@kernel.org> --- drivers/net/ethernet/ti/am65-cpsw-nuss.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- base-commit: 5bc55a333a2f7316b58edc7573e8e893f7acb532 change-id: 20250114-am65-cpsw-fix-tx-irq-free-846ac55ee6e1 Best regards,