diff mbox series

[05/39] lnet: Correct handling of NETWORK_TIMEOUT status

Message ID 1611249422-556-6-git-send-email-jsimmons@infradead.org (mailing list archive)
State New
Headers show
Series lustre: update to latest OpenSFS version as of Jan 21 2021 | expand

Commit Message

James Simmons Jan. 21, 2021, 5:16 p.m. UTC
From: Chris Horn <chris.horn@hpe.com>

The original intent of the LNET_MSG_STATUS_NETWORK_TIMEOUT health
status was to handle cases where the LND was unsure whether the
failure was due to the local or remote NI. In this case, we'll want
to decrement both the local and remote NI health and allow recovery
to ascertain which interface is actually healthy.

HPE-bug-id: LUS-9342
WC-bug-id: https://jira.whamcloud.com/browse/LU-13751
Lustre-commit: ffd4523f2d50ef ("LU-13571 lnet: Correct handling of NETWORK_TIMEOUT status")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/39898
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-msg.c | 5 +++++
 1 file changed, 5 insertions(+)
diff mbox series

Patch

diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index e84cf02..d888090 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -925,9 +925,14 @@ 
 
 	case LNET_MSG_STATUS_REMOTE_ERROR:
 	case LNET_MSG_STATUS_REMOTE_TIMEOUT:
+		if (handle_remote_health)
+			lnet_handle_remote_failure(lpni);
+		return -1;
 	case LNET_MSG_STATUS_NETWORK_TIMEOUT:
 		if (handle_remote_health)
 			lnet_handle_remote_failure(lpni);
+		if (handle_local_health)
+			lnet_handle_local_failure(ni);
 		return -1;
 	default:
 		LBUG();