From patchwork Tue Jul 3 10:52:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oza Pawandeep X-Patchwork-Id: 10503795 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1F6C06035E for ; Tue, 3 Jul 2018 10:52:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0BCB52886F for ; Tue, 3 Jul 2018 10:52:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F25522887E; Tue, 3 Jul 2018 10:52:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 99A942886F for ; Tue, 3 Jul 2018 10:52:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:Message-ID:References:In-Reply-To:Subject:To:From: Date:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=nNhGHgEmCrNGpNAl+hMaBWehsxoMZyWnWUEmzDE8swQ=; b=Z9iggUUgPgoTX1TXOVO38icQB ghqdlb5UMM4E8dZXYwOC35UCPpLieYi8Etnpuq38zMREZ46vhVx0z5u6nFxsNz4ZvWRzWeCm87qNG eA21f+MerjjhDbDaEuGVi8lR1wchT2CN5XLcBnszzbPJRRJYKlTm9H2VHrx+duzBH5UumvoVuWzCR yItWUElEVqAap+4V7ztQDsP0O/4yJ9rY3T9LA7Su5TD7N1G/Hq0y2n11riw6h9zdGkPnxATODceFg vrHCjYhw1UG4SfRf82m3wKZLYT7ZBhr27htQloFnzMQkNhBSMslEZFl0fsgnHn/w+cYk042nxcNJa hCSbNQ90A==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1faIvH-0003Ne-Fl; Tue, 03 Jul 2018 10:52:39 +0000 Received: from smtp.codeaurora.org ([198.145.29.96]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1faIvE-0003M5-J3 for linux-arm-kernel@lists.infradead.org; Tue, 03 Jul 2018 10:52:38 +0000 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 9795E60B14; Tue, 3 Jul 2018 10:52:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1530615145; bh=pdhEUd6zCmB40iuhXUGlHf+DWPrIwzBRuW8mP4Cv8Bw=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=NgT5ckm8kqX3bsIfYkMYPvGYuEOf5uL0ZWL4hF9gP8wsU2OFbgcBIRV6X7wVu1KV3 2BI+Wrpu8s2wzeyzI5bphBlJUrzKnFpGO3B/swjOWgcBD2r08lZI60LFV/CTrCg6r6 G01qtruXvF2rWnd3bhf9fLdBCAtcuzTnBlfvb/CA= Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) by smtp.codeaurora.org (Postfix) with ESMTP id 39CF16028D; Tue, 3 Jul 2018 10:52:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1530615145; bh=pdhEUd6zCmB40iuhXUGlHf+DWPrIwzBRuW8mP4Cv8Bw=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=NgT5ckm8kqX3bsIfYkMYPvGYuEOf5uL0ZWL4hF9gP8wsU2OFbgcBIRV6X7wVu1KV3 2BI+Wrpu8s2wzeyzI5bphBlJUrzKnFpGO3B/swjOWgcBD2r08lZI60LFV/CTrCg6r6 G01qtruXvF2rWnd3bhf9fLdBCAtcuzTnBlfvb/CA= MIME-Version: 1.0 Date: Tue, 03 Jul 2018 16:22:25 +0530 From: poza@codeaurora.org To: Lukas Wunner Subject: Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset In-Reply-To: <20180703083447.GA2689@wunner.de> References: <1530571967-19099-1-git-send-email-okaya@codeaurora.org> <1530571967-19099-4-git-send-email-okaya@codeaurora.org> <20180703083447.GA2689@wunner.de> Message-ID: <324f8cf2fe6f7bdc43ca8a646eea908d@codeaurora.org> X-Sender: poza@codeaurora.org User-Agent: Roundcube Webmail/1.2.5 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180703_035236_691413_ED04891C X-CRM114-Status: GOOD ( 20.18 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Keith Busch , linux-pci@vger.kernel.org, open list , Sinan Kaya , linux-arm-msm@vger.kernel.org, Bjorn Helgaas , linux-pci-owner@vger.kernel.org, linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP On 2018-07-03 14:04, Lukas Wunner wrote: > On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote: >> If a bridge supports hotplug and observes a PCIe fatal error, the >> following >> events happen: >> >> 1. AER driver removes the devices from PCI tree on fatal error >> 2. AER driver brings down the link by issuing a secondary bus reset >> waits >> for the link to come up. >> 3. Hotplug driver observes a link down interrupt >> 4. Hotplug driver tries to remove the devices waiting for the rescan >> lock >> but devices are already removed by the AER driver and AER driver is >> waiting >> for the link to come back up. >> 5. AER driver tries to re-enumerate devices after polling for the link >> state to go up. >> 6. Hotplug driver obtains the lock and tries to remove the devices >> again. >> >> If a bridge is a hotplug capable bridge, mask hotplug interrupts >> before the >> reset and unmask afterwards. > > Would it work for you if you just amended the AER driver to skip > removal and re-enumeration of devices if the port is a hotplug bridge? > Just check for is_hotplug_bridge in struct pci_dev. > I tend to agree with you Lukas. on this line I already have follow up patches although I am waiting for Bjorn to review some patch-series before that. [PATCH v2 0/6] Fix issues and cleanup for ERR_FATAL and ERR_NONFATAL It doesn't look to me a an entirely a race condition since its guarded by pci_lock_rescan_remove()) I observed that both hotplug and aer/dpc comes out of it in a quiet sane state. My thinking is: Disabling hotplug interrupts during ERR_FATAL, is something little away from natural course of link_down event handling, which is handled by pciehp more maturely. so it would be just easy not to take any action e.g. removal and re-enumeration of devices from ERR_FATAL handling point of view. I leave it to Bjorn. follwing is the patch wich I am trying to set it right and under test. so till now I am in an opinion to handle this by checking in err.c } result = reset_link(udev, service); @@ -318,7 +320,7 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service) } if (result == PCI_ERS_RESULT_RECOVERED) { - if (pcie_wait_for_link(udev, true)) + if (pcie_wait_for_link(udev, true) && !udev->is_hotplug_bridge) pci_rescan_bus(udev->bus); pci_info(dev, "Device recovery from fatal error successful\n"); dev->error_state = pci_channel_io_normal; > That would seem like a much simpler solution, given that it is known > that the link will flap on reset, causing the hotplug driver to remove > and re-enumerate devices. That would also cover cases where hotplug is > handled by a different driver than pciehp, or by the platform firmware. > > Thanks, > > Lukas diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 410c35c..607a234 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -292,15 +292,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service) parent = udev->subordinate; pci_lock_rescan_remove(); - list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, - bus_list) { - pci_dev_get(pdev); - pci_dev_set_disconnected(pdev, NULL); - if (pci_has_subordinate(pdev)) - pci_walk_bus(pdev->subordinate, - pci_dev_set_disconnected, NULL); - pci_stop_and_remove_bus_device(pdev); - pci_dev_put(pdev); + if (!udev->is_hotplug_bridge) { + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, + bus_list) { + pci_dev_get(pdev); + pci_dev_set_disconnected(pdev, NULL); + if (pci_has_subordinate(pdev)) + pci_walk_bus(pdev->subordinate, + pci_dev_set_disconnected, NULL); + pci_stop_and_remove_bus_device(pdev); + pci_dev_put(pdev); + }