diff mbox series

[RFC,v2] netpoll: Remove 4s sleep during carrier detection

Message ID 20230119180008.2156048-1-leitao@debian.org (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [RFC,v2] netpoll: Remove 4s sleep during carrier detection | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/cc_maintainers warning 1 maintainers not CCed: wsa+renesas@sang-engineering.com
netdev/build_clang success Errors and warnings before: 1 this patch: 1
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 3 this patch: 3
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 30 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Breno Leitao Jan. 19, 2023, 6 p.m. UTC
This patch proposes to remove the msleep(4s) during netpoll_setup() if
the carrier appears instantly.

Modern NICs do not seem to have this bouncing problem anymore, and this
sleep slows down the machine boot unnecessarily

Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 net/core/netpoll.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

Comments

Jakub Kicinski Jan. 19, 2023, 7:04 p.m. UTC | #1
On Thu, 19 Jan 2023 10:00:08 -0800 Breno Leitao wrote:
> This patch proposes to remove the msleep(4s) during netpoll_setup() if
> the carrier appears instantly.
> 
> Modern NICs do not seem to have this bouncing problem anymore, and this
> sleep slows down the machine boot unnecessarily

We should mention in the message that the wait is counter-productive on
servers which have BMC communicating over NC-SI via the same NIC as gets
used for netconsole. BMC will keep the PHY up, hence the carrier
appearing instantly.

We could add a smaller delay, but really having instant carrier and
then loosing it seems like a driver bug, so let's try to rip the band
aid off and ask for forgiveness instead.


Few extra process rules:
 - don't repost another version within 24h,
 - keep a changelog under --- 
 - add tree name to the tag - [PATCH net-next]

Also, I'd just go for PATCH, no need to RFC this.
If someone wants to object they can object to a PATCH.
Andrew Lunn Jan. 23, 2023, 1:56 p.m. UTC | #2
On Thu, Jan 19, 2023 at 11:04:21AM -0800, Jakub Kicinski wrote:
> On Thu, 19 Jan 2023 10:00:08 -0800 Breno Leitao wrote:
> > This patch proposes to remove the msleep(4s) during netpoll_setup() if
> > the carrier appears instantly.
> > 
> > Modern NICs do not seem to have this bouncing problem anymore, and this
> > sleep slows down the machine boot unnecessarily

I'm not sure 'bouncing' is the correct word here. That would imply up,
down, up, down and then stable up. What i guess the real issue here
was the MAC driver said the link was up while autoneg was still
happening, which takes around 1.5 seconds.

> We should mention in the message that the wait is counter-productive on
> servers which have BMC communicating over NC-SI via the same NIC as gets
> used for netconsole. BMC will keep the PHY up, hence the carrier
> appearing instantly.
> 
> We could add a smaller delay, but really having instant carrier and
> then loosing it seems like a driver bug, so let's try to rip the band
> aid off and ask for forgiveness instead.

It would be good to put some of this into the commit message. Explain
the case you see it go wrong.

The other scenarios i can think of are:

The bootloader configured the interface up, and used the interface,
e.g. to tftp boot. The PHY was left up when transitioning into
Linux. Hence there is no need to wait around 1.5 seconds for autoneg
to complete.

The link is fibre, SERDES getting sync could happen within 0.1Hz, and
so it appears to be instantaneously.

This work around does seem very old, pre-git times, so i also doubt
there are many systems which are truly broken like this.

      Andrew
diff mbox series

Patch

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 9be762e1d042..a089b704b986 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -682,7 +682,7 @@  int netpoll_setup(struct netpoll *np)
 	}
 
 	if (!netif_running(ndev)) {
-		unsigned long atmost, atleast;
+		unsigned long atmost;
 
 		np_info(np, "device %s not up yet, forcing it\n", np->dev_name);
 
@@ -694,7 +694,6 @@  int netpoll_setup(struct netpoll *np)
 		}
 
 		rtnl_unlock();
-		atleast = jiffies + HZ/10;
 		atmost = jiffies + carrier_timeout * HZ;
 		while (!netif_carrier_ok(ndev)) {
 			if (time_after(jiffies, atmost)) {
@@ -704,15 +703,6 @@  int netpoll_setup(struct netpoll *np)
 			msleep(1);
 		}
 
-		/* If carrier appears to come up instantly, we don't
-		 * trust it and pause so that we don't pump all our
-		 * queued console messages into the bitbucket.
-		 */
-
-		if (time_before(jiffies, atleast)) {
-			np_notice(np, "carrier detect appears untrustworthy, waiting 4 seconds\n");
-			msleep(4000);
-		}
 		rtnl_lock();
 	}