diff mbox series

[v3] net/sched: adjust device watchdog timer to detect stopped queue at right time

Message ID 20240508133617.4424-1-praveen.kannoju@oracle.com (mailing list archive)
State Accepted
Commit 33fb988b67050d9bb512f77f08453fa00088943c
Delegated to: Netdev Maintainers
Headers show
Series [v3] net/sched: adjust device watchdog timer to detect stopped queue at right time | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 934 this patch: 934
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 3 maintainers not CCed: pabeni@redhat.com kuba@kernel.org edumazet@google.com
netdev/build_clang success Errors and warnings before: 938 this patch: 938
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 945 this patch: 945
netdev/checkpatch warning WARNING: line length of 93 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-10--00-00 (tests: 1009)

Commit Message

Praveen Kumar Kannoju May 8, 2024, 1:36 p.m. UTC
Applications are sensitive to long network latency, particularly
heartbeat monitoring ones. Longer the tx timeout recovery higher the
risk with such applications on a production machines. This patch
remedies, yet honoring device set tx timeout.

Modify watchdog next timeout to be shorter than the device specified.
Compute the next timeout be equal to device watchdog timeout less the
how long ago queue stop had been done. At next watchdog timeout tx
timeout handler is called into if still in stopped state. Either called
or not called, restore the watchdog timeout back to device specified.

Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com>
---
v3:
  - Address redundant braces.
v2: https://lore.kernel.org/linux-kernel/20240506135944.7753-1-praveen.kannoju@oracle.com/
  - Identify the oldest trans_start from all the queues and use it.
v1: https://lore.kernel.org/netdev/20240430140010.5005-1-praveen.kannoju@oracle.com/
---
 net/sched/sch_generic.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org May 10, 2024, 3:50 a.m. UTC | #1
Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed,  8 May 2024 19:06:17 +0530 you wrote:
> Applications are sensitive to long network latency, particularly
> heartbeat monitoring ones. Longer the tx timeout recovery higher the
> risk with such applications on a production machines. This patch
> remedies, yet honoring device set tx timeout.
> 
> Modify watchdog next timeout to be shorter than the device specified.
> Compute the next timeout be equal to device watchdog timeout less the
> how long ago queue stop had been done. At next watchdog timeout tx
> timeout handler is called into if still in stopped state. Either called
> or not called, restore the watchdog timeout back to device specified.
> 
> [...]

Here is the summary with links:
  - [v3] net/sched: adjust device watchdog timer to detect stopped queue at right time
    https://git.kernel.org/netdev/net-next/c/33fb988b6705

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 4a2c763e2d11..c671b261252a 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -506,19 +506,22 @@  static void dev_watchdog(struct timer_list *t)
 			unsigned int timedout_ms = 0;
 			unsigned int i;
 			unsigned long trans_start;
+			unsigned long oldest_start = jiffies;
 
 			for (i = 0; i < dev->num_tx_queues; i++) {
 				struct netdev_queue *txq;
 
 				txq = netdev_get_tx_queue(dev, i);
 				trans_start = READ_ONCE(txq->trans_start);
-				if (netif_xmit_stopped(txq) &&
-				    time_after(jiffies, (trans_start +
-							 dev->watchdog_timeo))) {
+				if (!netif_xmit_stopped(txq))
+					continue;
+				if (time_after(jiffies, trans_start + dev->watchdog_timeo)) {
 					timedout_ms = jiffies_to_msecs(jiffies - trans_start);
 					atomic_long_inc(&txq->trans_timeout);
 					break;
 				}
+				if (time_after(oldest_start, trans_start))
+					oldest_start = trans_start;
 			}
 
 			if (unlikely(timedout_ms)) {
@@ -531,7 +534,7 @@  static void dev_watchdog(struct timer_list *t)
 				netif_unfreeze_queues(dev);
 			}
 			if (!mod_timer(&dev->watchdog_timer,
-				       round_jiffies(jiffies +
+				       round_jiffies(oldest_start +
 						     dev->watchdog_timeo)))
 				release = false;
 		}