From patchwork Fri Dec 15 16:36:39 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Morey-Chaisemartin X-Patchwork-Id: 10115495 X-Patchwork-Delegate: leon@leon.nu Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0C8DD60231 for ; Fri, 15 Dec 2017 16:36:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F36212876B for ; Fri, 15 Dec 2017 16:36:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E85F129F1E; Fri, 15 Dec 2017 16:36:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6AE442876B for ; Fri, 15 Dec 2017 16:36:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932364AbdLOQgo (ORCPT ); Fri, 15 Dec 2017 11:36:44 -0500 Received: from mx2.suse.de ([195.135.220.15]:36763 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756717AbdLOQgl (ORCPT ); Fri, 15 Dec 2017 11:36:41 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 56891AE26; Fri, 15 Dec 2017 16:36:40 +0000 (UTC) From: Nicolas Morey-Chaisemartin Subject: [PATCHv3 rdma-core 1/2] srp_daemon: handle SM lid change To: linux-rdma@vger.kernel.org Cc: hal@dev.mellanox.co.il, stable@linux-rdma.org, bvanassche@acm.org References: <001cdcfd-8ace-261f-ab86-a09ae3582dd8@suse.com> Message-ID: <6ea765f9-e770-74a3-bbe7-19b7ebc76ebe@suse.com> Date: Fri, 15 Dec 2017 17:36:39 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Thunderbird/58.0 MIME-Version: 1.0 In-Reply-To: <001cdcfd-8ace-261f-ab86-a09ae3582dd8@suse.com> Content-Language: fr-xx-classique+reforme1990 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When srp_daemon was running and the master SM host changes, srp_daemon output these errors at every scan: srp_daemon[25394]: No response to inform info registration srp_daemon[25394]: Fail to register to traps, maybe there is no opensm running on fabric or IB port is down This was introduced by commit 4952e5f Fix a memory leak. A side effect of this patch was that create_ah was only called when the port lid changes. Which meant register_to_traps used an older, obsolete, version of sm_lid and failed to connect to it. This patch fixes this behaviour by checking for both local lid changes and SM lid changes, and calling create_ah on any of these events. Fixes: 4952e5f7df0c (Fix a memory leak) Signed-off-by: Nicolas Morey-Chaisemartin Cc: stable@linux-rdma.org # v14, v15, v16 --- Since v2, expand abbrev sha1 of Fixes:... to 12B srp_daemon/srp_daemon.c | 10 ++++++---- srp_daemon/srp_daemon.h | 2 +- srp_daemon/srp_handle_traps.c | 14 +++++++++++--- 3 files changed, 18 insertions(+), 8 deletions(-) diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c index cec36db2e0f1..38501886110a 100644 --- a/srp_daemon/srp_daemon.c +++ b/srp_daemon/srp_daemon.c @@ -1103,7 +1103,7 @@ static int get_shared_pkeys(struct resources *res, int i, num_pkeys = 0; uint16_t pkey; uint16_t local_port_lid = get_port_lid(res->ud_res->ib_ctx, - config->port_num); + config->port_num, NULL); in_mad_buf = malloc(sizeof(struct ib_user_mad) + node_table_response_size); @@ -2092,7 +2092,7 @@ int main(int argc, char *argv[]) { int ret; struct resources *res; - uint16_t lid; + uint16_t lid, sm_lid; uint16_t pkey; union umad_gid gid; struct target_details *target; @@ -2196,8 +2196,10 @@ catas_start: pr_debug("Starting a recalculation\n"); port_lid = get_port_lid(res->ud_res->ib_ctx, - config->port_num); - if (port_lid != res->ud_res->port_attr.lid) { + config->port_num, &sm_lid); + if (port_lid != res->ud_res->port_attr.lid || + sm_lid != res->ud_res->port_attr.sm_lid) { + if (res->ud_res->ah) { ibv_destroy_ah(res->ud_res->ah); res->ud_res->ah = NULL; diff --git a/srp_daemon/srp_daemon.h b/srp_daemon/srp_daemon.h index 5d268ed395e1..864b3d42fb46 100644 --- a/srp_daemon/srp_daemon.h +++ b/srp_daemon/srp_daemon.h @@ -299,7 +299,7 @@ void *run_thread_listen_to_events(void *res_in); int get_node(struct umad_resources *umad_res, uint16_t dlid, uint64_t *guid); int create_trap_resources(struct ud_resources *ud_res); int register_to_traps(struct resources *res, int subscribe); -uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num); +uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid); int create_ah(struct ud_resources *ud_res); void push_gid_to_list(struct sync_resources *res, union umad_gid *gid, uint16_t pkey); diff --git a/srp_daemon/srp_handle_traps.c b/srp_daemon/srp_handle_traps.c index 6b36b15cc84c..8c428756a379 100644 --- a/srp_daemon/srp_handle_traps.c +++ b/srp_daemon/srp_handle_traps.c @@ -340,12 +340,20 @@ int ud_resources_create(struct ud_resources *res) return 0; } -uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num) +uint16_t get_port_lid(struct ibv_context *ib_ctx, int port_num, uint16_t *sm_lid) { struct ibv_port_attr port_attr; + int ret; + + ret = ibv_query_port(ib_ctx, port_num, &port_attr); - return ibv_query_port(ib_ctx, port_num, &port_attr) == 0 ? - port_attr.lid : 0; + if (!ret) { + if (sm_lid) + *sm_lid = port_attr.sm_lid; + return port_attr.lid; + } + + return 0; } int create_ah(struct ud_resources *ud_res)