From patchwork Wed Jul 15 20:44:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 11666213 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 69531618 for ; Wed, 15 Jul 2020 20:45:40 +0000 (UTC) Received: from pdx1-mailman02.dreamhost.com (pdx1-mailman02.dreamhost.com [64.90.62.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5296920672 for ; Wed, 15 Jul 2020 20:45:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5296920672 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lustre-devel-bounces@lists.lustre.org Received: from pdx1-mailman02.dreamhost.com (localhost [IPv6:::1]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 1A0A221F868; Wed, 15 Jul 2020 13:45:35 -0700 (PDT) X-Original-To: lustre-devel@lists.lustre.org Delivered-To: lustre-devel-lustre.org@pdx1-mailman02.dreamhost.com Received: from smtp3.ccs.ornl.gov (smtp3.ccs.ornl.gov [160.91.203.39]) by pdx1-mailman02.dreamhost.com (Postfix) with ESMTP id 5B1E121F6E3 for ; Wed, 15 Jul 2020 13:45:25 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp3.ccs.ornl.gov (Postfix) with ESMTP id 7E2F1478; Wed, 15 Jul 2020 16:45:20 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 779AA8D; Wed, 15 Jul 2020 16:45:20 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Wed, 15 Jul 2020 16:44:49 -0400 Message-Id: <1594845918-29027-9-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1594845918-29027-1-git-send-email-jsimmons@infradead.org> References: <1594845918-29027-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 08/37] lnet: socklnd: fix local interface binding X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata When a node is configured with multiple interfaces in Multi-Rail config, socklnd was not utilizing the local interface requested by LNet. In essence LNet was using all the NIDs in round robin, however the socklnd module was not binding to the correct interface. Traffic was thus sent on a subset of the interfaces. The reason is that the route interface number was not being set. In most cases lnet_connect() is called to create a socket. The socket is bound to the interface provided and then ksocknal_create_conn() is called to create the socklnd connection. ksocknal_create_conn() calls ksocknal_associate_route_conn_locked() at which point the route's local interface is assigned. However, this is already too late as the socket has already been created and bound to a local interface. Therefore, it's important to assign the route's interface before calling lnet_connect() to ensure socket is bound to correct local interface. To address this issue, the route's interface index is initialized to the NI's interface index when it's added to the peer_ni. Another bug fixed: The interface index was not being initialized in the startup routine. Note: We're strictly assuming that there is one interface for each NI. This is because tcp bonding will be removed from the socklnd as it has been deprecated by LNet mutli-rail. WC-bug-id: https://jira.whamcloud.com/browse/LU-13566 Lustre-commit: a7c9aba5eb96d ("LU-13566 socklnd: fix local interface binding") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/38743 Reviewed-by: Neil Brown Reviewed-by: Serguei Smirnov Reviewed-by: James Simmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/socklnd/socklnd.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c index 444b90b..2b8fd3d 100644 --- a/net/lnet/klnds/socklnd/socklnd.c +++ b/net/lnet/klnds/socklnd/socklnd.c @@ -409,12 +409,14 @@ struct ksock_peer_ni * { struct ksock_conn *conn; struct ksock_route *route2; + struct ksock_net *net = peer_ni->ksnp_ni->ni_data; LASSERT(!peer_ni->ksnp_closing); LASSERT(!route->ksnr_peer); LASSERT(!route->ksnr_scheduled); LASSERT(!route->ksnr_connecting); LASSERT(!route->ksnr_connected); + LASSERT(net->ksnn_ninterfaces > 0); /* LASSERT(unique) */ list_for_each_entry(route2, &peer_ni->ksnp_routes, ksnr_list) { @@ -428,6 +430,11 @@ struct ksock_peer_ni * route->ksnr_peer = peer_ni; ksocknal_peer_addref(peer_ni); + + /* set the route's interface to the current net's interface */ + route->ksnr_myiface = net->ksnn_interfaces[0].ksni_index; + net->ksnn_interfaces[0].ksni_nroutes++; + /* peer_ni's routelist takes over my ref on 'route' */ list_add_tail(&route->ksnr_list, &peer_ni->ksnp_routes); @@ -2667,6 +2674,7 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) net->ksnn_ninterfaces = 1; ni->ni_dev_cpt = ifaces[0].li_cpt; ksi->ksni_ipaddr = ifaces[0].li_ipaddr; + ksi->ksni_index = ksocknal_ip2index(ksi->ksni_ipaddr, ni); ksi->ksni_netmask = ifaces[0].li_netmask; strlcpy(ksi->ksni_name, ifaces[0].li_name, sizeof(ksi->ksni_name)); @@ -2706,6 +2714,8 @@ static int ksocknal_push(struct lnet_ni *ni, struct lnet_process_id id) ksi = &net->ksnn_interfaces[j]; ni->ni_dev_cpt = ifaces[j].li_cpt; ksi->ksni_ipaddr = ifaces[j].li_ipaddr; + ksi->ksni_index = + ksocknal_ip2index(ksi->ksni_ipaddr, ni); ksi->ksni_netmask = ifaces[j].li_netmask; strlcpy(ksi->ksni_name, ifaces[j].li_name, sizeof(ksi->ksni_name));