diff mbox series

[05/14] lnet: Router ping timeout with discovery disabled

Message ID 1620087016-17857-6-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series Update to OpenSFS tree as of May 3, 2021 | expand

Commit Message

James Simmons May 4, 2021, 12:10 a.m. UTC
From: Chris Horn <chris.horn@hpe.com>

Discovery pings are used to determine the health of gateways and
associated routes. Ping replies from gateways with dynamic discovery
(DD) disabled (or if DD is disabled locally) are handled in
a special routine, lnet_router_discovery_ping_reply(), but this
function and related code doesn't handle the case where a discovery
ping hits the response tracker timeout and is unlinked by the
monitor thread. In this case, an UNLINK event is generated and we
do not call the lnet_router_discovery_ping_reply(). For gateways
with DD enabled (and DD enabled locally), we handle this case
in lnet_router_discovery_complete(). If discovery failed then
lp_dc_error is set and we mark all routes down for the gateway. We
can simply extend this logic to the case of gateways w/DD disabled
(or DD disabled locally).

Fixes: dc80207e3a ("lnet: fix asym routing with multi-hop")
HPE-bug-id: LUS-9612
WC-bug-id: https://jira.whamcloud.com/browse/LU-14206
Lustre-commit: 173d86c6e9a704a8 ("LU-14206 lnet: Router ping timeout with discovery disabled")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/40923
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/router.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)
diff mbox series

Patch

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index ae7582ca..e179997 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -495,11 +495,11 @@  bool lnet_is_route_alive(struct lnet_route *route)
 	lp->lp_alive = lp->lp_dc_error == 0;
 	spin_unlock(&lp->lp_lock);
 
-	/* ping replies are being handled when discovery is disabled */
-	if (lnet_is_discovery_disabled_locked(lp))
-		return;
-
 	if (!lp->lp_dc_error) {
+		/* ping replies are being handled when discovery is disabled */
+		if (lnet_is_discovery_disabled_locked(lp))
+			return;
+
 		/* mark single-hop routes. If the remote net is not configured
 		 * on the gateway we assume this is intentional and we mark the
 		 * gateway as multi-hop