diff mbox series

[net,2/3] amd-xgbe: handle the corner-case during tx completion

Message ID 20231121191435.4049995-3-Raju.Rangoju@amd.com (mailing list archive)
State Accepted
Commit 7121205d5330c6a3cb3379348886d47c77b78d06
Delegated to: Netdev Maintainers
Headers show
Series amd-xgbe: fixes to handle corner-cases | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/codegen success Generated files up to date
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1115 this patch: 1115
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 1142 this patch: 1142
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1143 this patch: 1143
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 24 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Raju Rangoju Nov. 21, 2023, 7:14 p.m. UTC
The existing implementation uses software logic to accumulate tx
completions until the specified time (1ms) is met and then poll them.
However, there exists a tiny gap which leads to a race between
resetting and checking the tx_activate flag. Due to this the tx
completions are not reported to upper layer and tx queue timeout
kicks-in restarting the device.

To address this, introduce a tx cleanup mechanism as part of the
periodic maintenance process.

Fixes: c5aa9e3b8156 ("amd-xgbe: Initial AMD 10GbE platform driver")
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Wojciech Drewek Nov. 22, 2023, 10:50 a.m. UTC | #1
On 21.11.2023 20:14, Raju Rangoju wrote:
> The existing implementation uses software logic to accumulate tx
> completions until the specified time (1ms) is met and then poll them.
> However, there exists a tiny gap which leads to a race between
> resetting and checking the tx_activate flag. Due to this the tx
> completions are not reported to upper layer and tx queue timeout
> kicks-in restarting the device.
> 
> To address this, introduce a tx cleanup mechanism as part of the
> periodic maintenance process.
> 
> Fixes: c5aa9e3b8156 ("amd-xgbe: Initial AMD 10GbE platform driver")
> Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
> ---

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>

>  drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> index 614c0278419b..6b73648b3779 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> @@ -682,10 +682,24 @@ static void xgbe_service(struct work_struct *work)
>  static void xgbe_service_timer(struct timer_list *t)
>  {
>  	struct xgbe_prv_data *pdata = from_timer(pdata, t, service_timer);
> +	struct xgbe_channel *channel;
> +	unsigned int i;
>  
>  	queue_work(pdata->dev_workqueue, &pdata->service_work);
>  
>  	mod_timer(&pdata->service_timer, jiffies + HZ);
> +
> +	if (!pdata->tx_usecs)
> +		return;
> +
> +	for (i = 0; i < pdata->channel_count; i++) {
> +		channel = pdata->channel[i];
> +		if (!channel->tx_ring || channel->tx_timer_active)
> +			break;
> +		channel->tx_timer_active = 1;
> +		mod_timer(&channel->tx_timer,
> +			  jiffies + usecs_to_jiffies(pdata->tx_usecs));
> +	}
>  }
>  
>  static void xgbe_init_timers(struct xgbe_prv_data *pdata)
Tom Lendacky Nov. 25, 2023, 2:53 p.m. UTC | #2
On 11/21/23 13:14, Raju Rangoju wrote:
> The existing implementation uses software logic to accumulate tx
> completions until the specified time (1ms) is met and then poll them.
> However, there exists a tiny gap which leads to a race between
> resetting and checking the tx_activate flag. Due to this the tx
> completions are not reported to upper layer and tx queue timeout
> kicks-in restarting the device.
> 
> To address this, introduce a tx cleanup mechanism as part of the
> periodic maintenance process.

This looks to just be a work-around that happens to work (for now) and the 
actual race condition should be fixed.

Thanks,
Tom

> 
> Fixes: c5aa9e3b8156 ("amd-xgbe: Initial AMD 10GbE platform driver")
> Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
> ---
>   drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> index 614c0278419b..6b73648b3779 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> @@ -682,10 +682,24 @@ static void xgbe_service(struct work_struct *work)
>   static void xgbe_service_timer(struct timer_list *t)
>   {
>   	struct xgbe_prv_data *pdata = from_timer(pdata, t, service_timer);
> +	struct xgbe_channel *channel;
> +	unsigned int i;
>   
>   	queue_work(pdata->dev_workqueue, &pdata->service_work);
>   
>   	mod_timer(&pdata->service_timer, jiffies + HZ);
> +
> +	if (!pdata->tx_usecs)
> +		return;
> +
> +	for (i = 0; i < pdata->channel_count; i++) {
> +		channel = pdata->channel[i];
> +		if (!channel->tx_ring || channel->tx_timer_active)
> +			break;
> +		channel->tx_timer_active = 1;
> +		mod_timer(&channel->tx_timer,
> +			  jiffies + usecs_to_jiffies(pdata->tx_usecs));
> +	}
>   }
>   
>   static void xgbe_init_timers(struct xgbe_prv_data *pdata)
diff mbox series

Patch

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 614c0278419b..6b73648b3779 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -682,10 +682,24 @@  static void xgbe_service(struct work_struct *work)
 static void xgbe_service_timer(struct timer_list *t)
 {
 	struct xgbe_prv_data *pdata = from_timer(pdata, t, service_timer);
+	struct xgbe_channel *channel;
+	unsigned int i;
 
 	queue_work(pdata->dev_workqueue, &pdata->service_work);
 
 	mod_timer(&pdata->service_timer, jiffies + HZ);
+
+	if (!pdata->tx_usecs)
+		return;
+
+	for (i = 0; i < pdata->channel_count; i++) {
+		channel = pdata->channel[i];
+		if (!channel->tx_ring || channel->tx_timer_active)
+			break;
+		channel->tx_timer_active = 1;
+		mod_timer(&channel->tx_timer,
+			  jiffies + usecs_to_jiffies(pdata->tx_usecs));
+	}
 }
 
 static void xgbe_init_timers(struct xgbe_prv_data *pdata)