diff mbox

x86/PCI: Scan all functions during probing

Message ID alpine.DEB.2.20.1608091318530.5388@nanos (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Thomas Gleixner Aug. 9, 2016, 11:22 a.m. UTC
From: Benedikt Spranger <b.spranger@linutronix.de>

PCI and PCIBIOS probing only scans devices at function number 0/8/16/...
Subdevices (e.g. multiqueue) have function numbers which are not a
multiple of 8.

Simple hypervisors (e.g. Jailhouse) pass subdevices directly w/o providing
virtual PCI mappings like KVM. As a consequence a simple PCI passthrough from
Jailhouse to a linux guest is not able to detect such devices.

Changing the probe functions to scan all function numbers makes it work. This
has no side effects and there is no reason to force the 0/8/16... probing
scheme.

Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/pci/legacy.c |    2 +-
 drivers/pci/probe.c   |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Lukas Wunner Aug. 9, 2016, 12:13 p.m. UTC | #1
On Tue, Aug 09, 2016 at 01:22:30PM +0200, Thomas Gleixner wrote:
> From: Benedikt Spranger <b.spranger@linutronix.de>
> 
> PCI and PCIBIOS probing only scans devices at function number 0/8/16/...
> Subdevices (e.g. multiqueue) have function numbers which are not a
> multiple of 8.
> 
> Simple hypervisors (e.g. Jailhouse) pass subdevices directly w/o providing
> virtual PCI mappings like KVM. As a consequence a simple PCI passthrough from
> Jailhouse to a linux guest is not able to detect such devices.
> 
> Changing the probe functions to scan all function numbers makes it work. This
> has no side effects and there is no reason to force the 0/8/16... probing
> scheme.

It does have the side effect that probing (and thus booting) is prolonged.
Depending on how much that is, it may be worth pondering if usage of the
smaller stride should be constrained to platforms that really need it
(assuming they can be detected/quirked).

Just claiming "has no side effects" is going out on a limb I think.

Thanks,

Lukas

> 
> Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/pci/legacy.c |    2 +-
>  drivers/pci/probe.c   |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/arch/x86/pci/legacy.c
> +++ b/arch/x86/pci/legacy.c
> @@ -42,7 +42,7 @@ void pcibios_scan_specific_bus(int busn)
>  	if (pci_find_bus(0, busn))
>  		return;
>  
> -	for (devfn = 0; devfn < 256; devfn += 8) {
> +	for (devfn = 0; devfn < 256; devfn++) {
>  		if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
>  		    l != 0x0000 && l != 0xffff) {
>  			DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2063,7 +2063,7 @@ unsigned int pci_scan_child_bus(struct p
>  	dev_dbg(&bus->dev, "scanning bus\n");
>  
>  	/* Go find them, Rover! */
> -	for (devfn = 0; devfn < 0x100; devfn += 8)
> +	for (devfn = 0; devfn < 0x100; devfn++)
>  		pci_scan_slot(bus, devfn);
>  
>  	/* Reserve buses for SR-IOV capability. */
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Aug. 9, 2016, 1:44 p.m. UTC | #2
[+cc Lukas]

On Tue, Aug 09, 2016 at 01:22:30PM +0200, Thomas Gleixner wrote:
> From: Benedikt Spranger <b.spranger@linutronix.de>
> 
> PCI and PCIBIOS probing only scans devices at function number 0/8/16/...
> Subdevices (e.g. multiqueue) have function numbers which are not a
> multiple of 8.
> 
> Simple hypervisors (e.g. Jailhouse) pass subdevices directly w/o providing
> virtual PCI mappings like KVM. As a consequence a simple PCI passthrough from
> Jailhouse to a linux guest is not able to detect such devices.
> 
> Changing the probe functions to scan all function numbers makes it work. This
> has no side effects and there is no reason to force the 0/8/16... probing
> scheme.

"devfn" here is a 8-bit field (5 bits of device number and 3 bits of
function number), so incrementing by 8 is really a way of looking at
function 0 of each device number.  I'm pretty sure this is based on
something in the spec that says a multi-function device must implement
function 0.  Please look that up and include a reference in the
changelog so we have a more complete story here.

It's possible there are other assumptions in the code about
multi-function devices always having a function 0.  It would take a
little more research to be certain that this wouldn't break anything.

As Lukas pointed out, it does increase the number of probe attempts by
a factor of 8.  I don't know how much that will affect boot time, but
it's certainly something to consider and hopefully quantify.

> Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/pci/legacy.c |    2 +-
>  drivers/pci/probe.c   |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/arch/x86/pci/legacy.c
> +++ b/arch/x86/pci/legacy.c
> @@ -42,7 +42,7 @@ void pcibios_scan_specific_bus(int busn)
>  	if (pci_find_bus(0, busn))
>  		return;
>  
> -	for (devfn = 0; devfn < 256; devfn += 8) {
> +	for (devfn = 0; devfn < 256; devfn++) {
>  		if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
>  		    l != 0x0000 && l != 0xffff) {
>  			DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2063,7 +2063,7 @@ unsigned int pci_scan_child_bus(struct p
>  	dev_dbg(&bus->dev, "scanning bus\n");
>  
>  	/* Go find them, Rover! */
> -	for (devfn = 0; devfn < 0x100; devfn += 8)
> +	for (devfn = 0; devfn < 0x100; devfn++)
>  		pci_scan_slot(bus, devfn);
>  
>  	/* Reserve buses for SR-IOV capability. */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Aug. 18, 2016, 8:33 p.m. UTC | #3
On Tue, Aug 09, 2016 at 08:44:53AM -0500, Bjorn Helgaas wrote:
> [+cc Lukas]
> 
> On Tue, Aug 09, 2016 at 01:22:30PM +0200, Thomas Gleixner wrote:
> > From: Benedikt Spranger <b.spranger@linutronix.de>
> > 
> > PCI and PCIBIOS probing only scans devices at function number 0/8/16/...
> > Subdevices (e.g. multiqueue) have function numbers which are not a
> > multiple of 8.
> > 
> > Simple hypervisors (e.g. Jailhouse) pass subdevices directly w/o providing
> > virtual PCI mappings like KVM. As a consequence a simple PCI passthrough from
> > Jailhouse to a linux guest is not able to detect such devices.
> > 
> > Changing the probe functions to scan all function numbers makes it work. This
> > has no side effects and there is no reason to force the 0/8/16... probing
> > scheme.
> 
> "devfn" here is a 8-bit field (5 bits of device number and 3 bits of
> function number), so incrementing by 8 is really a way of looking at
> function 0 of each device number.  I'm pretty sure this is based on
> something in the spec that says a multi-function device must implement
> function 0.  Please look that up and include a reference in the
> changelog so we have a more complete story here.
> 
> It's possible there are other assumptions in the code about
> multi-function devices always having a function 0.  It would take a
> little more research to be certain that this wouldn't break anything.
> 
> As Lukas pointed out, it does increase the number of probe attempts by
> a factor of 8.  I don't know how much that will affect boot time, but
> it's certainly something to consider and hopefully quantify.

Any comments on this?  I'm waiting for at least the spec reference
and hopefully some warm fuzzies about boot time and safety.

I looked up the spec: PCI (not PCIe) r3.0, sec 3.2.2.3.4, says:

  A single-function device may optionally respond to all function
  numbers as the same function or may ... respond only to function 0
  and not respond to the other function numbers.

I'm concerned that a single-function device that responds to all
function numbers might break with this patch.

  [multi-function devices] are also required to always implement
  function 0 in the device.

Here's the reason we can advance by 8 in the "Go find them" loop.

  If a single function device is detected (i.e., bit 7 in the Header
  Type register of function 0 is 0), no more functions for that Device
  Number will be checked.  If a multi-function device is detected
  (i.e., bit 7 in the Header Type register of function 0 is 1), then
  all remaining Function Numbers will be checked.

This patch does the opposite of what the first sentence recommends.

> > Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > ---
> >  arch/x86/pci/legacy.c |    2 +-
> >  drivers/pci/probe.c   |    2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > --- a/arch/x86/pci/legacy.c
> > +++ b/arch/x86/pci/legacy.c
> > @@ -42,7 +42,7 @@ void pcibios_scan_specific_bus(int busn)
> >  	if (pci_find_bus(0, busn))
> >  		return;
> >  
> > -	for (devfn = 0; devfn < 256; devfn += 8) {
> > +	for (devfn = 0; devfn < 256; devfn++) {
> >  		if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
> >  		    l != 0x0000 && l != 0xffff) {
> >  			DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -2063,7 +2063,7 @@ unsigned int pci_scan_child_bus(struct p
> >  	dev_dbg(&bus->dev, "scanning bus\n");
> >  
> >  	/* Go find them, Rover! */
> > -	for (devfn = 0; devfn < 0x100; devfn += 8)
> > +	for (devfn = 0; devfn < 0x100; devfn++)
> >  		pci_scan_slot(bus, devfn);
> >  
> >  	/* Reserve buses for SR-IOV capability. */
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Gleixner Aug. 24, 2016, 8:39 a.m. UTC | #4
On Thu, 18 Aug 2016, Bjorn Helgaas wrote:
> I looked up the spec: PCI (not PCIe) r3.0, sec 3.2.2.3.4, says:
> 
>   A single-function device may optionally respond to all function
>   numbers as the same function or may ... respond only to function 0
>   and not respond to the other function numbers.
> 
> I'm concerned that a single-function device that responds to all
> function numbers might break with this patch.
> 
>   [multi-function devices] are also required to always implement
>   function 0 in the device.
> 
> Here's the reason we can advance by 8 in the "Go find them" loop.
> 
>   If a single function device is detected (i.e., bit 7 in the Header
>   Type register of function 0 is 0), no more functions for that Device
>   Number will be checked.  If a multi-function device is detected
>   (i.e., bit 7 in the Header Type register of function 0 is 1), then
>   all remaining Function Numbers will be checked.
> 
> This patch does the opposite of what the first sentence recommends.

Fair enough. We'll need to find a way to deal with that in jailhouse then.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka Aug. 24, 2016, 11:13 a.m. UTC | #5
On 2016-08-24 04:39, Thomas Gleixner wrote:
> On Thu, 18 Aug 2016, Bjorn Helgaas wrote:
>> I looked up the spec: PCI (not PCIe) r3.0, sec 3.2.2.3.4, says:
>>
>>   A single-function device may optionally respond to all function
>>   numbers as the same function or may ... respond only to function 0
>>   and not respond to the other function numbers.
>>
>> I'm concerned that a single-function device that responds to all
>> function numbers might break with this patch.
>>
>>   [multi-function devices] are also required to always implement
>>   function 0 in the device.
>>
>> Here's the reason we can advance by 8 in the "Go find them" loop.
>>
>>   If a single function device is detected (i.e., bit 7 in the Header
>>   Type register of function 0 is 0), no more functions for that Device
>>   Number will be checked.  If a multi-function device is detected
>>   (i.e., bit 7 in the Header Type register of function 0 is 1), then
>>   all remaining Function Numbers will be checked.
>>
>> This patch does the opposite of what the first sentence recommends.
> 
> Fair enough. We'll need to find a way to deal with that in jailhouse then.

Wouldn't it also be an option to have this fine-grained scanning only
activated if we detect to run over Jailhouse (which we have to anyway)?
Such code hasn't been proposed for upstream yet, but we will eventually.

Jan
Thomas Gleixner Aug. 24, 2016, 5:23 p.m. UTC | #6
On Wed, 24 Aug 2016, Jan Kiszka wrote:
> On 2016-08-24 04:39, Thomas Gleixner wrote:
> > On Thu, 18 Aug 2016, Bjorn Helgaas wrote:
> >> I looked up the spec: PCI (not PCIe) r3.0, sec 3.2.2.3.4, says:
> >>
> >>   A single-function device may optionally respond to all function
> >>   numbers as the same function or may ... respond only to function 0
> >>   and not respond to the other function numbers.
> >>
> >> I'm concerned that a single-function device that responds to all
> >> function numbers might break with this patch.
> >>
> >>   [multi-function devices] are also required to always implement
> >>   function 0 in the device.
> >>
> >> Here's the reason we can advance by 8 in the "Go find them" loop.
> >>
> >>   If a single function device is detected (i.e., bit 7 in the Header
> >>   Type register of function 0 is 0), no more functions for that Device
> >>   Number will be checked.  If a multi-function device is detected
> >>   (i.e., bit 7 in the Header Type register of function 0 is 1), then
> >>   all remaining Function Numbers will be checked.
> >>
> >> This patch does the opposite of what the first sentence recommends.
> > 
> > Fair enough. We'll need to find a way to deal with that in jailhouse then.
> 
> Wouldn't it also be an option to have this fine-grained scanning only
> activated if we detect to run over Jailhouse (which we have to anyway)?
> Such code hasn't been proposed for upstream yet, but we will eventually.

That might be an option.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -42,7 +42,7 @@  void pcibios_scan_specific_bus(int busn)
 	if (pci_find_bus(0, busn))
 		return;
 
-	for (devfn = 0; devfn < 256; devfn += 8) {
+	for (devfn = 0; devfn < 256; devfn++) {
 		if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
 		    l != 0x0000 && l != 0xffff) {
 			DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2063,7 +2063,7 @@  unsigned int pci_scan_child_bus(struct p
 	dev_dbg(&bus->dev, "scanning bus\n");
 
 	/* Go find them, Rover! */
-	for (devfn = 0; devfn < 0x100; devfn += 8)
+	for (devfn = 0; devfn < 0x100; devfn++)
 		pci_scan_slot(bus, devfn);
 
 	/* Reserve buses for SR-IOV capability. */