From patchwork Fri Jan 4 13:17:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Haakon Bugge X-Patchwork-Id: 10748357 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DA4E514E5 for ; Fri, 4 Jan 2019 13:18:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BC803284A3 for ; Fri, 4 Jan 2019 13:18:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B0839284A7; Fri, 4 Jan 2019 13:18:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 48D14284A3 for ; Fri, 4 Jan 2019 13:18:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726277AbfADNSN (ORCPT ); Fri, 4 Jan 2019 08:18:13 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:49258 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726069AbfADNSM (ORCPT ); Fri, 4 Jan 2019 08:18:12 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id x04DDorl087195; Fri, 4 Jan 2019 13:18:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : mime-version : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=FVnw9W3Apt3o1Xokp+03BzKgolX4yTrPSg4oEVaQzbw=; b=xStlaPMt6BSp+TEoPhqyZIsvLA9cZgvyo+0kWPcAuL5VWbZSeOXdxMzdIU/WAw4oRI7+ BtzJoRWsizOFYVyZ/S3jzqEKvU8EkCaV0sn37chlgWHbITTezMYRdCk+oFSyzlu9kZKi rg8ahl0kNZCwja5BOMAAGr3fy51dOoJhdT6/lrZk14PTM2aBDLsa/05/br3jJCmOLDRD kU5z0WfxB3Z7bWe/nqdA9ZsbF+vd46ofViHn7sU3H243JlUxiXaIMuBZ9aFEMr4dwO1N 91/dcHeLJnkTdmbg+TxicSzVMk2qUzCWzrEia3P1Xm59QrHZC+klrTys4QsffdzhWilS Mg== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2130.oracle.com with ESMTP id 2pp0bu43bp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 04 Jan 2019 13:18:04 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id x04DI2Mf020197 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 4 Jan 2019 13:18:03 GMT Received: from abhmp0003.oracle.com (abhmp0003.oracle.com [141.146.116.9]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id x04DI1cl010860; Fri, 4 Jan 2019 13:18:01 GMT Received: from lab02.no.oracle.com (/10.172.144.56) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 04 Jan 2019 05:17:30 -0800 From: =?utf-8?q?H=C3=A5kon_Bugge?= To: sean.hefty@intel.com, hal@dev.mellanox.co.il Cc: dledford@redhat.com, leon@kernel.org, jgg@mellanox.com, mark.haywood@oracle.com, aron.silverton@oracle.com, linux-rdma@vger.kernel.org Subject: [PATCH] ibacm: Unable to resurrect an interface Date: Fri, 4 Jan 2019 14:17:20 +0100 Message-Id: <20190104131720.466386-1-haakon.bugge@oracle.com> X-Mailer: git-send-email 2.19.2 MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9125 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1901040117 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When an IB port has been brought back to Active state, after being down, ibacm gets an event about it. It will then (re) enumerate the devices, and does so by executing an ioctl with SIOCGIFCONF. This particular ioctl will only return interfaces that are "running". There may be a delay after the IB port becomes Active until its address has been provisioned, and becomes "running". If ibacm attempts to associate IPoIB interfaces to the port during this interval, it will not see the interface because it is not "running". Later, when ibacm is asked for a Path Record (PR) using the IP address of the resurrected IPoIB interface, it will not be able to find the associated EP, and the following is printed in the log: acm_svr_resolve_path: notice - unknown local end point address The bug can be provoked by the following script. We have a single HCA with two ports, the IPoIB interfaces are named stib{0,1}, the IP address of the first interface is 192.168.200.200, and the remote IP address is 192.168.200.202. The LID of the IB switch is 1 and the switch port number connected to port 1 of the HCA is 22. The fix is in acm_add_ep_ip(). When acm_find_ep() fails, an attempt to take the EP up is performed by calling acm_ep_up(). Signed-off-by: HÃ¥kon Bugge --- ibacm/src/acm.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/ibacm/src/acm.c b/ibacm/src/acm.c index 6453c5f0..0887d0c6 100644 --- a/ibacm/src/acm.c +++ b/ibacm/src/acm.c @@ -200,6 +200,7 @@ static int acm_ep_insert_addr(struct acmc_ep *ep, const char *name, uint8_t *add uint8_t addr_type); static void acm_event_handler(struct acmc_device *dev); static int acm_nl_send(int sock, struct acm_msg *msg); +static void acm_ep_up(struct acmc_port *port, uint16_t pkey); static struct sa_data { int timeout; @@ -1321,9 +1322,17 @@ static void acm_add_ep_ip(char *ifname, struct acm_ep_addr_data *data, char *ip_ if (acm_if_get_pkey(ifname, &pkey)) return; - acm_log(0, " %s\n", ip_str); + acm_log(0, " %s pkey: %04x port: %d\n", ip_str, pkey, port_num); ep = acm_find_ep(&dev->port[port_num - 1], pkey); + + if (!ep) { + acm_log(2, "no EP found, attempt adding it\n"); + acm_ep_up(&dev->port[port_num - 1], pkey); + ep = acm_find_ep(&dev->port[port_num - 1], pkey); + acm_log(2, "EP was %s\n", ep ? "found" : "still not found"); + } + if (ep) { if (acm_ep_insert_addr(ep, ip_str, data->info.addr, data->type))