From patchwork Tue Mar 28 16:02:52 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shamir Rabinovitch X-Patchwork-Id: 9650127 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5C96E601E9 for ; Tue, 28 Mar 2017 16:03:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4DA7028304 for ; Tue, 28 Mar 2017 16:03:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 402A1283FF; Tue, 28 Mar 2017 16:03:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.9 required=2.0 tests=BAYES_00,HEXHASH_WORD, RCVD_IN_DNSWL_HI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3015F28304 for ; Tue, 28 Mar 2017 16:03:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754860AbdC1QDI (ORCPT ); Tue, 28 Mar 2017 12:03:08 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:19487 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753942AbdC1QDH (ORCPT ); Tue, 28 Mar 2017 12:03:07 -0400 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v2SG307r017744 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 28 Mar 2017 16:03:00 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v2SG2xeq017217 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 28 Mar 2017 16:02:59 GMT Received: from abhmp0007.oracle.com (abhmp0007.oracle.com [141.146.116.13]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id v2SG2xYW023056; Tue, 28 Mar 2017 16:02:59 GMT Received: from srabinov-linux.uk.oracle.com (/10.175.168.201) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Mar 2017 09:02:58 -0700 Date: Tue, 28 Mar 2017 19:02:52 +0300 From: Shamir Rabinovitch To: Mark Bloch Cc: linux-rdma@vger.kernel.org, dledford@redhat.com, vijay.ac.kumar@oracle.com Subject: Re: [PATCH v3] IB/IPoIB: ibX: failed to create mcg debug file Message-ID: <20170328160251.GA26781@srabinov-linux.uk.oracle.com> References: <1490599139-12665-1-git-send-email-shamir.rabinovitch@oracle.com> <4058624b-a947-9635-76ca-482fd6a6bd95@mellanox.com> <20170327201156.GA29831@srabinov-linux.uk.oracle.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, Mar 28, 2017 at 06:45:44PM +0300, Mark Bloch wrote: > > > > Hi Mark, > > > > v3 of this patch works fine on system that has CX3 with 2 ports and the > > below udev rules: > > > > # InfiniBand: Mellanox Technologies MT27500 Family [ConnectX-3] > > SUBSYSTEM=="net", ACTION=="add", DRIVERS=="mlx4_core", BUS=="pci", > > ID=="0002:01:00.0", ATTR{dev_id}=="0x0", KERNEL=="ib*", NAME="ib1" > > SUBSYSTEM=="net", ACTION=="add", DRIVERS=="mlx4_core", BUS=="pci", > > ID=="0002:01:00.0", ATTR{dev_id}=="0x1", KERNEL=="ib*", NAME="ib0" > > > > On this system, the udev rules rename ib0-ib1 & ib1->ib0 causing small > > chaos in the ipoib device names. > > > > The attached logs include the information collected when the openibd > > service was started and when it was stopped. You can have a look in the > > files and see what are the network events and how they are processed by > > the ipoib devices. > > > > I think it will answer your concerns. > > > > BR, Shamir > > > > I'm not saying it doesn't work, I'm saying works != works correctly. > We are calling ipoib_delete_debug_file too many times, it works by luck/chance. > > While testing the patch, I've encountered another issue, running: > > modprobe ib_ipoib > echo "0x0043" > /sys/class/net/ib0/create_child > modprobe -r ib_ipoib > > and then looking the at the debugfs dir: > [root@dev-r-vrt-175 ~]# ls /sys/kernel/debug/ipoib/ > ib0.8043_mcg ib0.8043_pat1 > > As you can see the the debugfs entries for the ib0 child weren't removed. > Also notice that after that, I can't load ib_ipoib > [root@dev-r-vrt-175 ~]# modprobe ib_ipoib > modprobe: ERROR: could not insert 'ib_ipoib': Cannot allocate memory > > The more interesting issue is, dmesg output has this: > [ 467.185609] ib0.8043: failed to create mcg debug file > [ 467.192551] ib0.8043: failed to create path debug file > > so maybe this is a debugfs bug? > > Sorry I can't look into it, I have some internal stuff I need to work on :/ > > Mark. > Hi Mark, I am confused. Have you used v3 of the patch? If yes please add this print after you apply the patch and send me the output when you stop the openibd service: ipoib_netdev_event: dev fff8001f568b4000 name ib0 event 0x2 ipoib_netdev_event: dev fff8001f57b4a000 name ib2 event 0x9 ipoib_netdev_event: dev fff8001f57b4a000 name ib2 event 0x2 ipoib_netdev_event: dev fff8001f54dda000 name ib3 event 0x9 ipoib_netdev_event: dev fff8001f54dda000 name ib3 event 0x2 ipoib_netdev_event: dev fff8001f59984000 name ib1 event 0x6 <-- NETDEV_UNREGISTER { here we delete the debugfs entries } ipoib_netdev_event: dev fff8001f568b4000 name ib0 event 0x6 ipoib_netdev_event: dev fff8001f57b4a000 name ib2 event 0x6 ipoib_netdev_event: dev fff8001f54dda000 name ib3 event 0x6 So the 4 ports I have are closed only once. Hence no double free. I am not sure why you see the double free. Please double check your findings. I am using the 4.9.9 upstream kernel because the commit "Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma" cause MAD DMA mapping kernel panic on SPARC T7. BR, Shamir --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index c84b8ee..a2f43ff 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -118,12 +118,17 @@ static int ipoib_netdev_event(struct notifier_block *this, if (dev->netdev_ops->ndo_open != ipoib_open) return NOTIFY_DONE; + pr_err("%s: dev %p name %s event 0x%lx\n", + __func__, dev, dev->name, event); + switch (event) { case NETDEV_REGISTER: ipoib_create_debug_files(dev); break; My output show this: ipoib_netdev_event: dev fff8001f59984000 name ib1 event 0x9 ipoib_netdev_event: dev fff8001f59984000 name ib1 event 0x2 ipoib_netdev_event: dev fff8001f568b4000 name ib0 event 0x9