Message ID | 20200124144248.11719-1-yu.c.chen@intel.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | [RFC] PCI: Add "pci=blacklist_dev=" parameter to blacklist specific devices | expand |
On Fri, Jan 24, 2020 at 10:42:48PM +0800, Chen Yu wrote: > It was found that on some platforms the bogus pci device might bring > troubles to the system. For example, on a MacBookPro the system could > not be power off or suspended due to internal pci resource confliction > between bogus pci device and [io 0x1804]. Another case is that, once > resumed from hibernation on a VM, the pci config space of a pci device > is corrupt. > > To narrow down and benefit future debugging for such kind of issues, > introduce the command line blacklist_dev=<vendor:device_id>> to blacklist > such pci devices thus they will not be scanned thus not visible after > bootup. For example, > > pci.blacklist_dev=8086:293e > > forbid the audio device to be exposed to the OS. This feels really unsafe to me. Just because Linux ignores the device doesn't mean the device will ignore I/O requests. I think we should call this pci.disable_dev and clear the device's I/O Space Enable, Memory Space Enable and Bus Master Enable bits (in the Command register, config space offset 4).
On Fri, Jan 24, 2020 at 10:42:48PM +0800, Chen Yu wrote: > It was found that on some platforms the bogus pci device might bring > troubles to the system. For example, on a MacBookPro the system could > not be power off or suspended due to internal pci resource confliction > between bogus pci device and [io 0x1804]. Another case is that, once > resumed from hibernation on a VM, the pci config space of a pci device > is corrupt. > > To narrow down and benefit future debugging for such kind of issues, > introduce the command line blacklist_dev=<vendor:device_id>> to blacklist > such pci devices thus they will not be scanned thus not visible after > bootup. For example, > > pci.blacklist_dev=8086:293e > > forbid the audio device to be exposed to the OS. I'm not really a fan of this. I'd rather see some details about what the problem is so we can actually fix it. Ignoring the device doesn't mean the device is removed or even inactive. It may still be consuming address space that we need to avoid. Can you point us to bug reports about the issues you mentioned? > Signed-off-by: Chen Yu <yu.c.chen@intel.com> > --- > Documentation/admin-guide/kernel-parameters.txt | 6 ++++++ > drivers/pci/pci.c | 17 +++++++++++++++++ > drivers/pci/pci.h | 1 + > drivers/pci/probe.c | 3 +++ > 4 files changed, 27 insertions(+) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index ade4e6ec23e0..cd4a47e236aa 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -3583,6 +3583,12 @@ > may put more devices in an IOMMU group. > force_floating [S390] Force usage of floating interrupts. > nomio [S390] Do not use MIO instructions. > + blacklist_dev=<vendor:device_id>[; ...] > + Specify one or more PCI devices (in the format > + specified above) separated by semicolons. > + Each device specified will not be scanned thus > + will be invisible after boot up. This can be > + used for debugging purpose. > > pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power > Management. > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index e87196cc1a7f..0e3626a401f4 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -6393,6 +6393,19 @@ void __weak pci_fixup_cardbus(struct pci_bus *bus) > } > EXPORT_SYMBOL(pci_fixup_cardbus); > > +static const char *pci_blacklist_devs_param; > + > +bool pci_is_blacklist_dev(unsigned short vendor, unsigned short device) > +{ > + char search[10]; > + > + if (!pci_blacklist_devs_param) > + return false; > + sprintf(search, "%x:%x", vendor, device); > + > + return strstr(pci_blacklist_devs_param, search) ? true : false; > +} > + > static int __init pci_setup(char *str) > { > while (str) { > @@ -6451,6 +6464,8 @@ static int __init pci_setup(char *str) > pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS); > } else if (!strncmp(str, "disable_acs_redir=", 18)) { > disable_acs_redir_param = str + 18; > + } else if (!strncmp(str, "blacklist_dev=", 14)) { > + pci_blacklist_devs_param = str + 14; > } else { > pr_err("PCI: Unknown option `%s'\n", str); > } > @@ -6476,6 +6491,8 @@ static int __init pci_realloc_setup_params(void) > GFP_KERNEL); > disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL); > > + pci_blacklist_devs_param = kstrdup(pci_blacklist_devs_param, GFP_KERNEL); > + > return 0; > } > pure_initcall(pci_realloc_setup_params); > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index a0a53bd05a0b..01b8ab2da065 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -669,4 +669,5 @@ static inline int pci_acpi_program_hp_params(struct pci_dev *dev) > extern const struct attribute_group aspm_ctrl_attr_group; > #endif > > +bool pci_is_blacklist_dev(unsigned short vendor, unsigned short device); > #endif /* DRIVERS_PCI_H */ > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 512cb4312ddd..812ef901ecea 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -2271,6 +2271,9 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn) > if (!pci_bus_read_dev_vendor_id(bus, devfn, &l, 60*1000)) > return NULL; > > + if (pci_is_blacklist_dev(l & 0xffff, (l >> 16) & 0xffff)) > + return NULL; > + > dev = pci_alloc_dev(bus); > if (!dev) > return NULL; > -- > 2.17.1 >
Hi Matthew, Thanks for repy. On Sat, Jan 25, 2020 at 5:01 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Jan 24, 2020 at 10:42:48PM +0800, Chen Yu wrote: > > It was found that on some platforms the bogus pci device might bring > > troubles to the system. For example, on a MacBookPro the system could > > not be power off or suspended due to internal pci resource confliction > > between bogus pci device and [io 0x1804]. Another case is that, once > > resumed from hibernation on a VM, the pci config space of a pci device > > is corrupt. > > > > To narrow down and benefit future debugging for such kind of issues, > > introduce the command line blacklist_dev=<vendor:device_id>> to blacklist > > such pci devices thus they will not be scanned thus not visible after > > bootup. For example, > > > > pci.blacklist_dev=8086:293e > > > > forbid the audio device to be exposed to the OS. > > This feels really unsafe to me. Just because Linux ignores the device > doesn't mean the device will ignore I/O requests. I think we should > call this pci.disable_dev and clear the device's I/O Space Enable, > Memory Space Enable and Bus Master Enable bits (in the Command register, > config space offset 4). Oh right, the BIOS might already has enabled Memory/IO space in the config during boot up and thus has already claimed the resource range for this pci device. I'll summarize the problem I found currently in Bjorn's reply and let's discuss it there. Thanks, chenyu
On Fri, Jan 24, 2020 at 1:01 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Jan 24, 2020 at 10:42:48PM +0800, Chen Yu wrote: > > It was found that on some platforms the bogus pci device might bring > > troubles to the system. For example, on a MacBookPro the system could > > not be power off or suspended due to internal pci resource confliction > > between bogus pci device and [io 0x1804]. Another case is that, once > > resumed from hibernation on a VM, the pci config space of a pci device > > is corrupt. > > > > To narrow down and benefit future debugging for such kind of issues, > > introduce the command line blacklist_dev=<vendor:device_id>> to blacklist > > such pci devices thus they will not be scanned thus not visible after > > bootup. For example, > > > > pci.blacklist_dev=8086:293e > > > > forbid the audio device to be exposed to the OS. > > I'm not really a fan of this. I'd rather see some details about what > the problem is so we can actually fix it. > > Ignoring the device doesn't mean the device is removed or even > inactive. It may still be consuming address space that we need to > avoid. > > Can you point us to bug reports about the issues you mentioned? I'm not sure which issues Chen Yu is referring to, but a proposal like has come up before [1], and didn't go anywhere. I think this is useful to people doing new / pre-release hardware bring up, but it's unlikely that such hardware makes it into a production to make this feature useful upstream. Hardware bring-up efforts can just use local hacks to workaround problems, if broken hardware actually makes it into production it needs precise quirks to be developed/applied. [1]: http://lore.kernel.org/r/1506544822-2632-2-git-send-email-jonathan.derrick@intel.com
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index ade4e6ec23e0..cd4a47e236aa 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3583,6 +3583,12 @@ may put more devices in an IOMMU group. force_floating [S390] Force usage of floating interrupts. nomio [S390] Do not use MIO instructions. + blacklist_dev=<vendor:device_id>[; ...] + Specify one or more PCI devices (in the format + specified above) separated by semicolons. + Each device specified will not be scanned thus + will be invisible after boot up. This can be + used for debugging purpose. pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power Management. diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index e87196cc1a7f..0e3626a401f4 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -6393,6 +6393,19 @@ void __weak pci_fixup_cardbus(struct pci_bus *bus) } EXPORT_SYMBOL(pci_fixup_cardbus); +static const char *pci_blacklist_devs_param; + +bool pci_is_blacklist_dev(unsigned short vendor, unsigned short device) +{ + char search[10]; + + if (!pci_blacklist_devs_param) + return false; + sprintf(search, "%x:%x", vendor, device); + + return strstr(pci_blacklist_devs_param, search) ? true : false; +} + static int __init pci_setup(char *str) { while (str) { @@ -6451,6 +6464,8 @@ static int __init pci_setup(char *str) pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS); } else if (!strncmp(str, "disable_acs_redir=", 18)) { disable_acs_redir_param = str + 18; + } else if (!strncmp(str, "blacklist_dev=", 14)) { + pci_blacklist_devs_param = str + 14; } else { pr_err("PCI: Unknown option `%s'\n", str); } @@ -6476,6 +6491,8 @@ static int __init pci_realloc_setup_params(void) GFP_KERNEL); disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL); + pci_blacklist_devs_param = kstrdup(pci_blacklist_devs_param, GFP_KERNEL); + return 0; } pure_initcall(pci_realloc_setup_params); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index a0a53bd05a0b..01b8ab2da065 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -669,4 +669,5 @@ static inline int pci_acpi_program_hp_params(struct pci_dev *dev) extern const struct attribute_group aspm_ctrl_attr_group; #endif +bool pci_is_blacklist_dev(unsigned short vendor, unsigned short device); #endif /* DRIVERS_PCI_H */ diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 512cb4312ddd..812ef901ecea 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2271,6 +2271,9 @@ static struct pci_dev *pci_scan_device(struct pci_bus *bus, int devfn) if (!pci_bus_read_dev_vendor_id(bus, devfn, &l, 60*1000)) return NULL; + if (pci_is_blacklist_dev(l & 0xffff, (l >> 16) & 0xffff)) + return NULL; + dev = pci_alloc_dev(bus); if (!dev) return NULL;
It was found that on some platforms the bogus pci device might bring troubles to the system. For example, on a MacBookPro the system could not be power off or suspended due to internal pci resource confliction between bogus pci device and [io 0x1804]. Another case is that, once resumed from hibernation on a VM, the pci config space of a pci device is corrupt. To narrow down and benefit future debugging for such kind of issues, introduce the command line blacklist_dev=<vendor:device_id>> to blacklist such pci devices thus they will not be scanned thus not visible after bootup. For example, pci.blacklist_dev=8086:293e forbid the audio device to be exposed to the OS. Signed-off-by: Chen Yu <yu.c.chen@intel.com> --- Documentation/admin-guide/kernel-parameters.txt | 6 ++++++ drivers/pci/pci.c | 17 +++++++++++++++++ drivers/pci/pci.h | 1 + drivers/pci/probe.c | 3 +++ 4 files changed, 27 insertions(+)