diff mbox series

[18/24] lnet: Correct net selection for router ping

Message ID 1662429337-18737-19-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series lustre: update to OpenSFS tree Sept 5, 2022 | expand

Commit Message

James Simmons Sept. 6, 2022, 1:55 a.m. UTC
From: Chris Horn <chris.horn@hpe.com>

lnet_find_best_ni_on_local_net() contains logic for restricting
the NI selection to a net specified by lnet_peer::lp_disc_net_id. The
purpose of this is to ensure that LNet peers ping every interface on
a router at a regular interval as part of the LNet router health
feature. However, this logic is flawed because lnet_msg_discovery()
is used to determine whether the message being sent is a discovery
message, but that function actually determines whether a given message
can _trigger_ discovery.

Introduce a new function, lnet_msg_is_ping(), which determines whether
a given lnet_msg is a GET on the LNET_RESERVED_PORTAL.
Modify lnet_find_best_ni_on_local_net() to restrict NI selection to
lp_disc_net_id iff:
1. lp_disc_net_id is non-zero
2. The peer has the LNET_PEER_RTR_DISCOVERY flag set.
3. lnet_msg_is_ping() returns true

HPE-bug-id: LUS-11017
WC-bug-id: https://jira.whamcloud.com/browse/LU-15929
Lustre-commit: 2431e099b143a4c7e ("LU-15929 lnet: Correct net selection for router ping")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/47527
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-move.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)
diff mbox series

Patch

diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index ec8be8f..3c9602e 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1577,7 +1577,8 @@  void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	return false;
 }
 
-/*
+/* Can the specified message trigger peer discovery?
+ *
  * Traffic to the LNET_RESERVED_PORTAL may not trigger peer discovery,
  * because such traffic is required to perform discovery. We therefore
  * exclude all GET and PUT on that portal. We also exclude all ACK and
@@ -1591,6 +1592,18 @@  void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	return !(lnet_reserved_msg(msg) || lnet_msg_is_response(msg));
 }
 
+/* Is the specified message an LNet ping?
+ */
+static bool
+lnet_msg_is_ping(struct lnet_msg *msg)
+{
+	if (msg->msg_type == LNET_MSG_GET &&
+	    msg->msg_hdr.msg.get.ptl_index == LNET_RESERVED_PORTAL)
+		return true;
+
+	return false;
+}
+
 #define SRC_SPEC	0x0001
 #define SRC_ANY		0x0002
 #define LOCAL_DST	0x0004
@@ -2228,10 +2241,14 @@  struct lnet_ni *
 	u32 best_net_sel_prio = LNET_MAX_SELECTION_PRIORITY;
 	u32 net_sel_prio;
 
-	/* if this is a discovery message and lp_disc_net_id is
-	 * specified then use that net to send the discovery on.
+	/* If lp_disc_net_id is set, this peer is a router undergoing
+	 * discovery, and this message is an LNet ping, then this may be a
+	 * discovery message and we need to select an NI on the peer net
+	 * specified by lp_disc_net_id
 	 */
-	if (discovery && peer->lp_disc_net_id) {
+	if (peer->lp_disc_net_id &&
+	    (peer->lp_state & LNET_PEER_RTR_DISCOVERY) &&
+	    lnet_msg_is_ping(msg)) {
 		best_lpn = lnet_peer_get_net_locked(peer, peer->lp_disc_net_id);
 		if (best_lpn && lnet_get_net_locked(best_lpn->lpn_net_id))
 			goto select_best_ni;