diff mbox series

[07/49] lnet: Transfer disc src NID when merging peers

Message ID 1618459361-17909-8-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series lustre: sync to OpenSFS as of March 30 2021 | expand

Commit Message

James Simmons April 15, 2021, 4:01 a.m. UTC
From: Chris Horn <chris.horn@hpe.com>

If we're merging two peers in lnet_peer_data_present() then we need
to transfer the src NID stored in the peer whose ping buffer we are
processing to the peer that actually owns the NIDs in the ping
buffer. Otherwise it is possible that the subsequent push to the peer
that is being discovered will go out over an interface that the peer
does not know about and it will be dropped.

HPE-bug-id: LUS-9193
WC-bug-id: https://jira.whamcloud.com/browse/LU-13894
Lustre-commit: e65d8ba583858ae1 ("LU-13894 lnet: Transfer disc src NID when merging peers")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39607
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)
diff mbox series

Patch

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 34153a8..1b240f1 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -3116,7 +3116,7 @@  static int lnet_peer_data_present(struct lnet_peer *lp)
 		rc = lnet_peer_merge_data(lp, pbuf);
 	} else {
 		lpni = lnet_find_peer_ni_locked(nid);
-		if (!lpni) {
+		if (!lpni || lp == lpni->lpni_peer_net->lpn_peer) {
 			rc = lnet_peer_set_primary_nid(lp, nid, flags);
 			if (rc) {
 				CERROR("Primary NID error %s versus %s: %d\n",
@@ -3125,6 +3125,8 @@  static int lnet_peer_data_present(struct lnet_peer *lp)
 			} else {
 				rc = lnet_peer_merge_data(lp, pbuf);
 			}
+			if (lpni)
+				lnet_peer_ni_decref_locked(lpni);
 		} else {
 			struct lnet_peer *new_lp;
 
@@ -3133,10 +3135,22 @@  static int lnet_peer_data_present(struct lnet_peer *lp)
 			 * should have discovery/MR enabled as well, since
 			 * it's the same peer, which we're about to merge
 			 */
+			spin_lock(&lp->lp_lock);
+			spin_lock(&new_lp->lp_lock);
 			if (!(lp->lp_state & LNET_PEER_NO_DISCOVERY))
 				new_lp->lp_state &= ~LNET_PEER_NO_DISCOVERY;
 			if (lp->lp_state & LNET_PEER_MULTI_RAIL)
 				new_lp->lp_state |= LNET_PEER_MULTI_RAIL;
+			/* If we're processing a ping reply then we may be
+			 * about to send a push to the peer that we ping'd.
+			 * Since the ping reply that we're processing was
+			 * received by lp, we need to set the discovery source
+			 * NID for new_lp to the NID stored in lp.
+			 */
+			if (lp->lp_disc_src_nid != LNET_NID_ANY)
+				new_lp->lp_disc_src_nid = lp->lp_disc_src_nid;
+			spin_unlock(&new_lp->lp_lock);
+			spin_unlock(&lp->lp_lock);
 
 			rc = lnet_peer_set_primary_data(new_lp, pbuf);
 			lnet_consolidate_routes_locked(lp, new_lp);