Message ID | 20240430140010.5005-1-praveen.kannoju@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RFC] net/sched: adjust device watchdog timer to detect stopped queue at right time | expand |
On Tue, 30 Apr 2024 19:30:10 +0530 Praveen Kumar Kannoju wrote: > Applications are sensitive to long network latency, particularly > heartbeat monitoring ones. Longer the tx timeout recovery higher the > risk with such applications on a production machines. This patch > remedies, yet honoring device set tx timeout. > > Modify watchdog next timeout to be shorter than the device specified. > Compute the next timeout be equal to device watchdog timeout less the > how long ago queue stop had been done. At next watchdog timeout tx > timeout handler is called into if still in stopped state. Either called > or not called, restore the watchdog timeout back to device specified. Idea makes sense, some comments on the code below. > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index 4a2c763e2d11..64e31f8b4ac1 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -506,18 +506,25 @@ static void dev_watchdog(struct timer_list *t) > unsigned int timedout_ms = 0; > unsigned int i; > unsigned long trans_start; > + unsigned long next_check = 0; > + unsigned long current_jiffies; > > for (i = 0; i < dev->num_tx_queues; i++) { > struct netdev_queue *txq; > + current_jiffies = jiffies; Not sure why you save current jiffies. > txq = netdev_get_tx_queue(dev, i); > trans_start = READ_ONCE(txq->trans_start); > - if (netif_xmit_stopped(txq) && > - time_after(jiffies, (trans_start + > - dev->watchdog_timeo))) { > - timedout_ms = jiffies_to_msecs(jiffies - trans_start); > - atomic_long_inc(&txq->trans_timeout); > - break; > + if (netif_xmit_stopped(txq)) { please use continue instead of adding another indentation level > + if (time_after(current_jiffies, (trans_start + wrap at 80 characters > + dev->watchdog_timeo))) { > + timedout_ms = jiffies_to_msecs(current_jiffies - > + trans_start); > + atomic_long_inc(&txq->trans_timeout); > + break; > + } > + next_check = trans_start + dev->watchdog_timeo - > + current_jiffies; this will give us "next_check" for last queue. Let's instead find the oldest trans_start in the loop. Do: unsigned long oldest_start = jiffies; then in the loop: oldest_start = min(...) > } > } > > @@ -530,9 +537,11 @@ static void dev_watchdog(struct timer_list *t) > dev->netdev_ops->ndo_tx_timeout(dev, i); > netif_unfreeze_queues(dev); > } > + if (!next_check) > + next_check = dev->watchdog_timeo; > if (!mod_timer(&dev->watchdog_timer, > round_jiffies(jiffies + > - dev->watchdog_timeo))) > + next_check))) then here you just need to swap jiffies for oldest_start > release = false; > } > }
Hi Jakub, Thank you very much for the review and positive feedback. We've incorporated your review comments except, using "continue" instead of adding another indentation level. Request you to elaborate your comment on it. - Praveen. > -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: 02 May 2024 03:43 AM > To: Praveen Kannoju <praveen.kannoju@oracle.com> > Cc: jhs@mojatatu.com; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; netdev@vger.kernel.org; linux- > kernel@vger.kernel.org; Rajesh Sivaramasubramaniom <rajesh.sivaramasubramaniom@oracle.com>; Rama Nichanamatlu > <rama.nichanamatlu@oracle.com>; Manjunath Patil <manjunath.b.patil@oracle.com> > Subject: Re: [PATCH RFC] net/sched: adjust device watchdog timer to detect stopped queue at right time > > On Tue, 30 Apr 2024 19:30:10 +0530 Praveen Kumar Kannoju wrote: > > Applications are sensitive to long network latency, particularly > > heartbeat monitoring ones. Longer the tx timeout recovery higher the > > risk with such applications on a production machines. This patch > > remedies, yet honoring device set tx timeout. > > > > Modify watchdog next timeout to be shorter than the device specified. > > Compute the next timeout be equal to device watchdog timeout less the > > how long ago queue stop had been done. At next watchdog timeout tx > > timeout handler is called into if still in stopped state. Either > > called or not called, restore the watchdog timeout back to device specified. > > Idea makes sense, some comments on the code below. > > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index > > 4a2c763e2d11..64e31f8b4ac1 100644 > > --- a/net/sched/sch_generic.c > > +++ b/net/sched/sch_generic.c > > @@ -506,18 +506,25 @@ static void dev_watchdog(struct timer_list *t) > > unsigned int timedout_ms = 0; > > unsigned int i; > > unsigned long trans_start; > > + unsigned long next_check = 0; > > + unsigned long current_jiffies; > > > > for (i = 0; i < dev->num_tx_queues; i++) { > > struct netdev_queue *txq; > > + current_jiffies = jiffies; > > Not sure why you save current jiffies. > > > txq = netdev_get_tx_queue(dev, i); > > trans_start = READ_ONCE(txq->trans_start); > > - if (netif_xmit_stopped(txq) && > > - time_after(jiffies, (trans_start + > > - dev->watchdog_timeo))) { > > - timedout_ms = jiffies_to_msecs(jiffies - trans_start); > > - atomic_long_inc(&txq->trans_timeout); > > - break; > > + if (netif_xmit_stopped(txq)) { > > please use continue instead of adding another indentation level We need to take decision on whether to break out of loop or modify "oldest_start" only when Queue is stopped. Hence one more level of indentation is needed. Can you please elaborate on using "continue" in existing condition instead of adding a new indentation level. > > > + if (time_after(current_jiffies, (trans_start + > > wrap at 80 characters > > > + dev->watchdog_timeo))) { > > + timedout_ms = jiffies_to_msecs(current_jiffies - > > + trans_start); > > + atomic_long_inc(&txq->trans_timeout); > > + break; > > + } > > + next_check = trans_start + dev->watchdog_timeo - > > + current_jiffies; > > this will give us "next_check" for last queue. Let's instead find the oldest trans_start in the loop. Do: > > unsigned long oldest_start = jiffies; > > then in the loop: > > oldest_start = min(...) > > > } > > } > > > > @@ -530,9 +537,11 @@ static void dev_watchdog(struct timer_list *t) > > dev->netdev_ops->ndo_tx_timeout(dev, i); > > netif_unfreeze_queues(dev); > > } > > + if (!next_check) > > + next_check = dev->watchdog_timeo; > > if (!mod_timer(&dev->watchdog_timer, > > round_jiffies(jiffies + > > - dev->watchdog_timeo))) > > + next_check))) > > then here you just need to swap jiffies for oldest_start > > > release = false; > > } > > }
On Fri, 3 May 2024 14:28:13 +0000 Praveen Kannoju wrote: > > > txq = netdev_get_tx_queue(dev, i); > > > trans_start = READ_ONCE(txq->trans_start); > > > - if (netif_xmit_stopped(txq) && > > > - time_after(jiffies, (trans_start + > > > - dev->watchdog_timeo))) { > > > - timedout_ms = jiffies_to_msecs(jiffies - trans_start); > > > - atomic_long_inc(&txq->trans_timeout); > > > - break; > > > + if (netif_xmit_stopped(txq)) { > > > > please use continue instead of adding another indentation level > > We need to take decision on whether to break out of loop or modify "oldest_start" only when > Queue is stopped. Hence one more level of indentation is needed. Can you please elaborate > on using "continue" in existing condition instead of adding a new indentation level. If the queue is not stopped, continue. Split the condition into multiple ifs. > > > + dev->watchdog_timeo))) { > > > + timedout_ms = jiffies_to_msecs(current_jiffies - > > > + trans_start); > > > + atomic_long_inc(&txq->trans_timeout); > > > + break; > > > + } > > > + next_check = trans_start + dev->watchdog_timeo - > > > + current_jiffies; > > > > this will give us "next_check" for last queue. Let's instead find the oldest trans_start in the loop. Do: > > > > unsigned long oldest_start = jiffies; > > > > then in the loop: > > > > oldest_start = min(...) BTW, the min() I suggested here needs to be a if (time_after(...)), we can't use bare min() to compare jiffies, because they may wrap.
Thank you, Jakub. Your comments have been addressed in the v2 patch. We've tested it internally and the patch works as expected. Please review and let us know If any additional changes are needed. - Praveen. > -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: 04 May 2024 01:00 AM > To: Praveen Kannoju <praveen.kannoju@oracle.com> > Cc: jhs@mojatatu.com; xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net; netdev@vger.kernel.org; linux- > kernel@vger.kernel.org; Rajesh Sivaramasubramaniom <rajesh.sivaramasubramaniom@oracle.com>; Rama Nichanamatlu > <rama.nichanamatlu@oracle.com>; Manjunath Patil <manjunath.b.patil@oracle.com> > Subject: Re: [PATCH RFC] net/sched: adjust device watchdog timer to detect stopped queue at right time > > On Fri, 3 May 2024 14:28:13 +0000 Praveen Kannoju wrote: > > > > txq = netdev_get_tx_queue(dev, i); > > > > trans_start = READ_ONCE(txq->trans_start); > > > > - if (netif_xmit_stopped(txq) && > > > > - time_after(jiffies, (trans_start + > > > > - dev->watchdog_timeo))) { > > > > - timedout_ms = jiffies_to_msecs(jiffies - trans_start); > > > > - atomic_long_inc(&txq->trans_timeout); > > > > - break; > > > > + if (netif_xmit_stopped(txq)) { > > > > > > please use continue instead of adding another indentation level > > > > We need to take decision on whether to break out of loop or modify > > "oldest_start" only when Queue is stopped. Hence one more level of > > indentation is needed. Can you please elaborate on using "continue" in existing condition instead of adding a new indentation level. > > If the queue is not stopped, continue. Split the condition into multiple ifs. > > > > > + dev->watchdog_timeo))) { > > > > + timedout_ms = jiffies_to_msecs(current_jiffies - > > > > + trans_start); > > > > + atomic_long_inc(&txq->trans_timeout); > > > > + break; > > > > + } > > > > + next_check = trans_start + dev->watchdog_timeo - > > > > + current_jiffies; > > > > > > this will give us "next_check" for last queue. Let's instead find the oldest trans_start in the loop. Do: > > > > > > unsigned long oldest_start = jiffies; > > > > > > then in the loop: > > > > > > oldest_start = min(...) > > BTW, the min() I suggested here needs to be a if (time_after(...)), we can't use bare min() to compare jiffies, because they may wrap.
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 4a2c763e2d11..64e31f8b4ac1 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -506,18 +506,25 @@ static void dev_watchdog(struct timer_list *t) unsigned int timedout_ms = 0; unsigned int i; unsigned long trans_start; + unsigned long next_check = 0; + unsigned long current_jiffies; for (i = 0; i < dev->num_tx_queues; i++) { struct netdev_queue *txq; + current_jiffies = jiffies; txq = netdev_get_tx_queue(dev, i); trans_start = READ_ONCE(txq->trans_start); - if (netif_xmit_stopped(txq) && - time_after(jiffies, (trans_start + - dev->watchdog_timeo))) { - timedout_ms = jiffies_to_msecs(jiffies - trans_start); - atomic_long_inc(&txq->trans_timeout); - break; + if (netif_xmit_stopped(txq)) { + if (time_after(current_jiffies, (trans_start + + dev->watchdog_timeo))) { + timedout_ms = jiffies_to_msecs(current_jiffies - + trans_start); + atomic_long_inc(&txq->trans_timeout); + break; + } + next_check = trans_start + dev->watchdog_timeo - + current_jiffies; } } @@ -530,9 +537,11 @@ static void dev_watchdog(struct timer_list *t) dev->netdev_ops->ndo_tx_timeout(dev, i); netif_unfreeze_queues(dev); } + if (!next_check) + next_check = dev->watchdog_timeo; if (!mod_timer(&dev->watchdog_timer, round_jiffies(jiffies + - dev->watchdog_timeo))) + next_check))) release = false; } }
Applications are sensitive to long network latency, particularly heartbeat monitoring ones. Longer the tx timeout recovery higher the risk with such applications on a production machines. This patch remedies, yet honoring device set tx timeout. Modify watchdog next timeout to be shorter than the device specified. Compute the next timeout be equal to device watchdog timeout less the how long ago queue stop had been done. At next watchdog timeout tx timeout handler is called into if still in stopped state. Either called or not called, restore the watchdog timeout back to device specified. For example, mellanox driver with 15 sec watchdog timeout on its interfaces will be called for handling timeout at random times as shown below: mlx5_core 0000:af:00.0 enp175s0f0: TX timeout on queue: 40, SQ: 0x13106, CQ: 0x5b5, SQ Cons: 0x1 SQ Prod: 0x1, usecs since last trans: 28559000 mlx5_core 0000:af:00.0 enp175s0f0: TX timeout on queue: 59, SQ: 0x132a4, CQ: 0x4f3, SQ Cons: 0x1 SQ Prod: 0x1, usecs since last trans: 21424000 mlx5_core 0000:af:00.0 enp175s0f0: TX timeout on queue: 42, SQ: 0x12688, CQ: 0x5b5, SQ Cons: 0x1 SQ Prod: 0x1, usecs since last trans: 15919000 whereas with the proposed fix, timeout handler is always called at appropriate time set into the watchdog by the driver as shown below: mlx5_core 0000:af:00.0 enp175s0f0: TX timeout on queue: 42, SQ: 0x122d6, CQ: 0x4af, SQ Cons: 0x4 SQ Prod: 0x4, usecs since last trans: 15137000 mlx5_core 0000:af:00.0 enp175s0f0: TX timeout on queue: 33, SQ: 0x1410f, CQ: 0x48d, SQ Cons: 0xb SQ Prod: 0xb, usecs since last trans: 15568000 mlx5_core 0000:af:00.0 enp175s0f0: TX timeout on queue: 33, SQ: 0x1424a, CQ: 0x599, SQ Cons: 0x8 SQ Prod: 0x8, usecs since last trans: 15539000 Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com> --- net/sched/sch_generic.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)