From patchwork Tue Nov 26 18:37:50 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Khalid Aziz X-Patchwork-Id: 3240621 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: patchwork-linux-pci@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id DE9E99F3AE for ; Tue, 26 Nov 2013 18:38:42 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id AC7A32045A for ; Tue, 26 Nov 2013 18:38:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5811120454 for ; Tue, 26 Nov 2013 18:38:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752333Ab3KZSij (ORCPT ); Tue, 26 Nov 2013 13:38:39 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:21571 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750879Ab3KZSii (ORCPT ); Tue, 26 Nov 2013 13:38:38 -0500 Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by userp1040.oracle.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.1) with ESMTP id rAQIc0uH031238 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 26 Nov 2013 18:38:01 GMT Received: from aserz7021.oracle.com (aserz7021.oracle.com [141.146.126.230]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id rAQIbvbc008860 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 26 Nov 2013 18:37:58 GMT Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23]) by aserz7021.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id rAQIbv2r002495; Tue, 26 Nov 2013 18:37:57 GMT Received: from concerto (/10.159.70.85) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 26 Nov 2013 10:37:57 -0800 Date: Tue, 26 Nov 2013 11:37:50 -0700 From: Khalid Aziz To: Matthew Garrett Cc: Bjorn Helgaas , Chang Liu , "linux-pci@vger.kernel.org" , Lan Tianyu , Konstantin Khlebnikov , Alan Cox , Takao Indoh , Jility , Florian Otti , "linux-kernel@vger.kernel.org" , "Eric W. Biederman" Subject: Re: [PATCH] PCI: add a quirk for keeping Bus Master bit on shutdown Message-ID: <20131126183750.GA29265@concerto> References: <1384285203-642-1-git-send-email-cl91tp@gmail.com> <5294CF1A.6080707@oracle.com> <20131126174806.GA14789@srcf.ucam.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20131126174806.GA14789@srcf.ucam.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Source-IP: acsinet22.oracle.com [141.146.126.238] Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, Nov 26, 2013 at 05:48:06PM +0000, Matthew Garrett wrote: > On Tue, Nov 26, 2013 at 10:35:26AM -0700, Bjorn Helgaas wrote: > > On Tue, Nov 26, 2013 at 9:40 AM, Khalid Aziz wrote: > > > Disabling Bus Master bit is effectively a brute force and not an elegant way > > > to stop unwanted DMA. It can have side effects as Alan and others pointed > > > out in the original discussion, and we are now seeing one with Lynx Point on > > > Acer. > > > > I'm getting more queasy all the time about disabling Bus Master. I > > don't think RHEL does it, and that's probably where most kexec use is. > > So I doubt we really have much experience with it yet. > > Does Windows disable the BM bit on shutdown? If not, it's likely that > there are platforms where the SMM code assumes it's still enabled. We > also know that there are devices that hang if BM is disabled while their > DMA engines are still running. > > Unless we verify that Windows does this, I think there's no way we can > guarantee that firmware won't make assumptions about the state of PCI. > The easiest compromise would probably be to set a flag that disables > busmastering purely when we're performing a kexec. > This is the approach that is most appealing to me. If we confine clearing Bus Master bit to just the kexec path, we can side step all the land mines in normal shutdown path. Here is an informal patch just for comments and discussion. It has been tested lightly. If this approach is acceptable, I will create a more formal patch. --- Khalid -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 9042fdb..c605be0 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -400,10 +400,11 @@ static void pci_device_shutdown(struct device *dev) pci_msix_shutdown(pci_dev); /* - * Turn off Bus Master bit on the device to tell it to not - * continue to do DMA. Don't touch devices in D3cold or unknown states. + * If this is a kexec reboot, turn off Bus Master bit on the + * device to tell it to not continue to do DMA. Don't touch + * devices in D3cold or unknown states. */ - if (pci_dev->current_state <= PCI_D3hot) + if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot)) pci_clear_master(pci_dev); } diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 9c91ecc..7d85733 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -9,6 +9,9 @@ extern const unsigned char pcix_bus_speed[]; extern const unsigned char pcie_link_speed[]; +/* flag to track if kexec reboot is in progress */ +extern unsigned long kexec_in_progress; + /* Functions internal to the PCI core code */ int pci_create_sysfs_dev_files(struct pci_dev *pdev); diff --git a/kernel/kexec.c b/kernel/kexec.c index 490afc0..fd2d63e 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -47,6 +47,9 @@ u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; size_t vmcoreinfo_size; size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data); +/* Flag to indicate we are going to kexec a new kernel */ +unsigned long kexec_in_progress = 0; + /* Location of the reserved area for the crash kernel */ struct resource crashk_res = { .name = "Crash kernel", @@ -1675,6 +1678,7 @@ int kernel_kexec(void) } else #endif { + kexec_in_progress = 1; kernel_restart_prepare(NULL); printk(KERN_EMERG "Starting new kernel\n"); machine_shutdown();