diff mbox series

[17/29] lnet: Use lr_hops for avoid_asym_router_failure

Message ID 1619381316-7719-18-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series lustre: Update to OpenSFS tree as of April 25, 2020 | expand

Commit Message

James Simmons April 25, 2021, 8:08 p.m. UTC
From: Chris Horn <chris.horn@hpe.com>

In order for the asymmetric route failure avoidance feature to work
properly it needs to know what the hop count of a route should be.
This information is defined by the lr_hops field of the lnet_route.
The lr_single_hop is what discovery was able to determine the hop
count actually is (single or multi) based on the last ping reply.
If a remote interface on a router goes missing, the route may be
classified as multi-hop by discovery, but it should be considered
single-hop for the purposes of avoiding asymmetric route failure.

HPE-bug-id: LUS-9099
WC-bug-id: https://jira.whamcloud.com/browse/LU-13785
Lustre-commit: 2e07619477684f28 ("LU-13785 lnet: Use lr_hops for avoid_asym_router_failure")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39362
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/router.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)
diff mbox series

Patch

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index ee3c15f..af16263 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -317,7 +317,8 @@  bool lnet_is_route_alive(struct lnet_route *route)
 	 * that the remote net must exist on the gateway. For multi-hop
 	 * routes the next-hop will not have the remote net.
 	 */
-	if (avoid_asym_router_failure && route->lr_single_hop) {
+	if (avoid_asym_router_failure &&
+	    (route->lr_hops == 1 || route->lr_hops == LNET_UNDEFINED_HOPS)) {
 		rlpn = lnet_peer_get_net_locked(gw, route->lr_net);
 		if (!rlpn)
 			return false;
@@ -367,7 +368,8 @@  bool lnet_is_route_alive(struct lnet_route *route)
 static inline void
 lnet_check_route_inconsistency(struct lnet_route *route)
 {
-	if (!route->lr_single_hop && (int)route->lr_hops <= 1) {
+	if (!route->lr_single_hop &&
+	    (route->lr_hops == 1 || route->lr_hops == LNET_UNDEFINED_HOPS)) {
 		CWARN("route %s->%s is detected to be multi-hop but hop count is set to %d\n",
 		      libcfs_net2str(route->lr_net),
 		      libcfs_nid2str(route->lr_gateway->lp_primary_nid),
@@ -482,7 +484,9 @@  bool lnet_is_route_alive(struct lnet_route *route)
 		}
 
 		route->lr_single_hop = single_hop;
-		if (avoid_asym_router_failure && single_hop)
+		if (avoid_asym_router_failure &&
+		    (route->lr_hops == 1 ||
+		     route->lr_hops == LNET_UNDEFINED_HOPS))
 			lnet_set_route_aliveness(route, net_up);
 		else
 			lnet_set_route_aliveness(route, true);