Message ID | 20180516013333.135259-1-rajatja@google.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Tue, May 15, 2018 at 06:33:33PM -0700, Rajat Jain wrote: > If the OS and device are reporting different BAR(s), currently > the lspci will happily show the OS version with no indication what > so ever that there is a problem. This is not correct in my opinion > because lspci is often used to debug devices, so should either show > the BAR reported by the device, or atleast an indication of > out-of-sync. > > I spent a lot of time debugging a PCI device that would unexpectedly > clear out the BAR register due to some bug, and it was quite later > I realized that the lspci is showing me the OS version. So fix that. I assume (hope) that "lspci -b" would show you the hardware values. Of course, who would think to use that when you're debugging a problem. So that's not really a solution to the problem you faced. > On a system that is in problem state: > > localhost ~ # setpci -s 1:0.0 0x10.l > 00000004 <=== BAR is zeroed out > localhost ~ # > > Before: > > localhost ~ # lspci -v -s 1:0.0 > 01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59) > Subsystem: Intel Corporation Dual Band Wireless-AC 7265 > Flags: bus master, fast devsel, latency 0, IRQ 275 > Memory at d1400000 (64-bit, non-prefetchable) [size=8K] > Capabilities: [c8] Power Management version 3 > Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Capabilities: [40] Express Endpoint, MSI 00 > Capabilities: [100] Advanced Error Reporting > Capabilities: [140] Device Serial Number b4-d5-bd-ff-ff-c8-a1-6d > Capabilities: [14c] Latency Tolerance Reporting > Capabilities: [154] L1 PM Substates > Kernel driver in use: iwlwifi > Kernel modules: iwlwifi > > After: > > localhost ~ # lspci -v -s 1:0.0 > 01:00.0 Network controller: Device 8086:095a (rev 59) > Subsystem: Device 8086:5010 > Flags: bus master, fast devsel, latency 0, IRQ 275 > [out-of-sync] Memory at d1400000 (64-bit, non-prefetchable) [size=8K] > Capabilities: [c8] Power Management version 3 > Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Capabilities: [40] Express Endpoint, MSI 00 > Capabilities: [100] Advanced Error Reporting > Capabilities: [140] Device Serial Number b4-d5-bd-ff-ff-c8-a1-6d > Capabilities: [14c] Latency Tolerance Reporting > Capabilities: [154] L1 PM Substates > Kernel driver in use: iwlwifi > Kernel modules: iwlwifi I've never been quite clear on the path from hardware BAR value to lspci output. On my system, $ strace -e trace=open lspci -vs00:14.0 2>&1 | grep 00:14.0 suggests that lspci reads these: /sys/bus/pci/.../resource and /sys/bus/pci/.../config "resource" uses resource_show(), where we use the result of pci_resource_to_user() on the resources in the struct pci_dev, so these are basically cached and may be out of date with respect to the hardware. "config" uses pci_read_config(), where we ultimately call the config accessors, which should give us the actual values from the hardware BARs. > Signed-off-by: Rajat Jain <rajatja@google.com> > --- > lspci.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/lspci.c b/lspci.c > index 748452c..81c65f3 100644 > --- a/lspci.c > +++ b/lspci.c > @@ -371,6 +371,20 @@ show_range(char *prefix, u64 base, u64 limit, int is_64bit) > putchar('\n'); > } > > +static pciaddr_t > +bar_value(struct device *d, int i, u32 flg) > +{ > + pciaddr_t val = 0; > + > + /* Read higher order 32 bits if it's a 64 bit bar in memory space */ > + if (i < 5 && !(flg & PCI_BASE_ADDRESS_SPACE_IO) && > + (flg & PCI_BASE_ADDRESS_MEM_TYPE_MASK == PCI_BASE_ADDRESS_MEM_TYPE_64)) > + val = get_conf_long(d, PCI_BASE_ADDRESS_0 + 4 * (i + 1)); > + > + val = (val << 32) | flg; > + return val; > +} > + > static void > show_bases(struct device *d, int cnt) > { > @@ -401,6 +415,10 @@ show_bases(struct device *d, int cnt) > flg = pos; > virtual = 1; > } > + else if (pos != bar_value(d, i, flg)) > + { > + printf("[out-of-sync] "); If the host bridge does any address translation, e.g., if it uses ACPI _TRA, isn't this going to print "out-of-sync" for *every* BAR, even if it isn't really out of sync? > + } > if (flg & PCI_BASE_ADDRESS_SPACE_IO) > { > pciaddr_t a = pos & PCI_BASE_ADDRESS_IO_MASK; > -- > 2.17.0.441.gb46fe60e1d-goog >
On Thu, May 24, 2018 at 3:37 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > On Tue, May 15, 2018 at 06:33:33PM -0700, Rajat Jain wrote: >> If the OS and device are reporting different BAR(s), currently >> the lspci will happily show the OS version with no indication what >> so ever that there is a problem. This is not correct in my opinion >> because lspci is often used to debug devices, so should either show >> the BAR reported by the device, or atleast an indication of >> out-of-sync. >> >> I spent a lot of time debugging a PCI device that would unexpectedly >> clear out the BAR register due to some bug, and it was quite later >> I realized that the lspci is showing me the OS version. So fix that. > > I assume (hope) that "lspci -b" would show you the hardware values. > Of course, who would think to use that when you're debugging a > problem. So that's not really a solution to the problem you faced. > >> On a system that is in problem state: >> >> localhost ~ # setpci -s 1:0.0 0x10.l >> 00000004 <=== BAR is zeroed out >> localhost ~ # >> >> Before: >> >> localhost ~ # lspci -v -s 1:0.0 >> 01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59) >> Subsystem: Intel Corporation Dual Band Wireless-AC 7265 >> Flags: bus master, fast devsel, latency 0, IRQ 275 >> Memory at d1400000 (64-bit, non-prefetchable) [size=8K] >> Capabilities: [c8] Power Management version 3 >> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >> Capabilities: [40] Express Endpoint, MSI 00 >> Capabilities: [100] Advanced Error Reporting >> Capabilities: [140] Device Serial Number b4-d5-bd-ff-ff-c8-a1-6d >> Capabilities: [14c] Latency Tolerance Reporting >> Capabilities: [154] L1 PM Substates >> Kernel driver in use: iwlwifi >> Kernel modules: iwlwifi >> >> After: >> >> localhost ~ # lspci -v -s 1:0.0 >> 01:00.0 Network controller: Device 8086:095a (rev 59) >> Subsystem: Device 8086:5010 >> Flags: bus master, fast devsel, latency 0, IRQ 275 >> [out-of-sync] Memory at d1400000 (64-bit, non-prefetchable) [size=8K] >> Capabilities: [c8] Power Management version 3 >> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >> Capabilities: [40] Express Endpoint, MSI 00 >> Capabilities: [100] Advanced Error Reporting >> Capabilities: [140] Device Serial Number b4-d5-bd-ff-ff-c8-a1-6d >> Capabilities: [14c] Latency Tolerance Reporting >> Capabilities: [154] L1 PM Substates >> Kernel driver in use: iwlwifi >> Kernel modules: iwlwifi > > I've never been quite clear on the path from hardware BAR value to > lspci output. On my system, > > $ strace -e trace=open lspci -vs00:14.0 2>&1 | grep 00:14.0 > > suggests that lspci reads these: > > /sys/bus/pci/.../resource and > /sys/bus/pci/.../config > > "resource" uses resource_show(), where we use the result of > pci_resource_to_user() on the resources in the struct pci_dev, so > these are basically cached and may be out of date with respect to the > hardware. > > "config" uses pci_read_config(), where we ultimately call the config > accessors, which should give us the actual values from the hardware > BARs. > >> Signed-off-by: Rajat Jain <rajatja@google.com> >> --- >> lspci.c | 18 ++++++++++++++++++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/lspci.c b/lspci.c >> index 748452c..81c65f3 100644 >> --- a/lspci.c >> +++ b/lspci.c >> @@ -371,6 +371,20 @@ show_range(char *prefix, u64 base, u64 limit, int is_64bit) >> putchar('\n'); >> } >> >> +static pciaddr_t >> +bar_value(struct device *d, int i, u32 flg) >> +{ >> + pciaddr_t val = 0; >> + >> + /* Read higher order 32 bits if it's a 64 bit bar in memory space */ >> + if (i < 5 && !(flg & PCI_BASE_ADDRESS_SPACE_IO) && >> + (flg & PCI_BASE_ADDRESS_MEM_TYPE_MASK == PCI_BASE_ADDRESS_MEM_TYPE_64)) >> + val = get_conf_long(d, PCI_BASE_ADDRESS_0 + 4 * (i + 1)); >> + >> + val = (val << 32) | flg; >> + return val; >> +} >> + >> static void >> show_bases(struct device *d, int cnt) >> { >> @@ -401,6 +415,10 @@ show_bases(struct device *d, int cnt) >> flg = pos; >> virtual = 1; >> } >> + else if (pos != bar_value(d, i, flg)) >> + { >> + printf("[out-of-sync] "); > > If the host bridge does any address translation, e.g., if it uses ACPI > _TRA, isn't this going to print "out-of-sync" for *every* BAR, even if > it isn't really out of sync? Need a clarification here to ensure I understand you right. When you refer to address translation above, I assume you are referring to a situation where the CPU issues reads to an address X (in CPU address space), which goes through a translation at the pcie root port / controller, and appears as a read from address Y (as seen on the PCIe bus side)? In such a situation, my understanding (may be wrong) was that the lspci is supposed to be showing the PCI address Y for the bar (not address X in physical address space). If this understanding is wrong (and the lspci is supposed to show address CPU X for the bar), then yes, this patch is likely not correct. But I do think lspci should have a way to indicate when the bars in /sys/bus/pci/.../resource and /sys/bus/pci/.../config get out of sync. Thanks, Rajat > >> + } >> if (flg & PCI_BASE_ADDRESS_SPACE_IO) >> { >> pciaddr_t a = pos & PCI_BASE_ADDRESS_IO_MASK; >> -- >> 2.17.0.441.gb46fe60e1d-goog >>
On Fri, May 25, 2018 at 09:57:59AM -0700, Rajat Jain wrote: > On Thu, May 24, 2018 at 3:37 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Tue, May 15, 2018 at 06:33:33PM -0700, Rajat Jain wrote: > >> If the OS and device are reporting different BAR(s), currently > >> the lspci will happily show the OS version with no indication what > >> so ever that there is a problem. This is not correct in my opinion > >> because lspci is often used to debug devices, so should either show > >> the BAR reported by the device, or atleast an indication of > >> out-of-sync. > >> > >> I spent a lot of time debugging a PCI device that would unexpectedly > >> clear out the BAR register due to some bug, and it was quite later > >> I realized that the lspci is showing me the OS version. So fix that. > > > > I assume (hope) that "lspci -b" would show you the hardware values. > > Of course, who would think to use that when you're debugging a > > problem. So that's not really a solution to the problem you faced. > > > >> On a system that is in problem state: > >> > >> localhost ~ # setpci -s 1:0.0 0x10.l > >> 00000004 <=== BAR is zeroed out > >> localhost ~ # > >> > >> Before: > >> > >> localhost ~ # lspci -v -s 1:0.0 > >> 01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59) > >> Subsystem: Intel Corporation Dual Band Wireless-AC 7265 > >> Flags: bus master, fast devsel, latency 0, IRQ 275 > >> Memory at d1400000 (64-bit, non-prefetchable) [size=8K] > >> Capabilities: [c8] Power Management version 3 > >> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > >> Capabilities: [40] Express Endpoint, MSI 00 > >> Capabilities: [100] Advanced Error Reporting > >> Capabilities: [140] Device Serial Number b4-d5-bd-ff-ff-c8-a1-6d > >> Capabilities: [14c] Latency Tolerance Reporting > >> Capabilities: [154] L1 PM Substates > >> Kernel driver in use: iwlwifi > >> Kernel modules: iwlwifi > >> > >> After: > >> > >> localhost ~ # lspci -v -s 1:0.0 > >> 01:00.0 Network controller: Device 8086:095a (rev 59) > >> Subsystem: Device 8086:5010 > >> Flags: bus master, fast devsel, latency 0, IRQ 275 > >> [out-of-sync] Memory at d1400000 (64-bit, non-prefetchable) [size=8K] > >> Capabilities: [c8] Power Management version 3 > >> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > >> Capabilities: [40] Express Endpoint, MSI 00 > >> Capabilities: [100] Advanced Error Reporting > >> Capabilities: [140] Device Serial Number b4-d5-bd-ff-ff-c8-a1-6d > >> Capabilities: [14c] Latency Tolerance Reporting > >> Capabilities: [154] L1 PM Substates > >> Kernel driver in use: iwlwifi > >> Kernel modules: iwlwifi > > > > I've never been quite clear on the path from hardware BAR value to > > lspci output. On my system, > > > > $ strace -e trace=open lspci -vs00:14.0 2>&1 | grep 00:14.0 > > > > suggests that lspci reads these: > > > > /sys/bus/pci/.../resource and > > /sys/bus/pci/.../config > > > > "resource" uses resource_show(), where we use the result of > > pci_resource_to_user() on the resources in the struct pci_dev, so > > these are basically cached and may be out of date with respect to the > > hardware. > > > > "config" uses pci_read_config(), where we ultimately call the config > > accessors, which should give us the actual values from the hardware > > BARs. > > > >> Signed-off-by: Rajat Jain <rajatja@google.com> > >> --- > >> lspci.c | 18 ++++++++++++++++++ > >> 1 file changed, 18 insertions(+) > >> > >> diff --git a/lspci.c b/lspci.c > >> index 748452c..81c65f3 100644 > >> --- a/lspci.c > >> +++ b/lspci.c > >> @@ -371,6 +371,20 @@ show_range(char *prefix, u64 base, u64 limit, int is_64bit) > >> putchar('\n'); > >> } > >> > >> +static pciaddr_t > >> +bar_value(struct device *d, int i, u32 flg) > >> +{ > >> + pciaddr_t val = 0; > >> + > >> + /* Read higher order 32 bits if it's a 64 bit bar in memory space */ > >> + if (i < 5 && !(flg & PCI_BASE_ADDRESS_SPACE_IO) && > >> + (flg & PCI_BASE_ADDRESS_MEM_TYPE_MASK == PCI_BASE_ADDRESS_MEM_TYPE_64)) > >> + val = get_conf_long(d, PCI_BASE_ADDRESS_0 + 4 * (i + 1)); > >> + > >> + val = (val << 32) | flg; > >> + return val; > >> +} > >> + > >> static void > >> show_bases(struct device *d, int cnt) > >> { > >> @@ -401,6 +415,10 @@ show_bases(struct device *d, int cnt) > >> flg = pos; > >> virtual = 1; > >> } > >> + else if (pos != bar_value(d, i, flg)) > >> + { > >> + printf("[out-of-sync] "); > > > > If the host bridge does any address translation, e.g., if it uses ACPI > > _TRA, isn't this going to print "out-of-sync" for *every* BAR, even if > > it isn't really out of sync? > > Need a clarification here to ensure I understand you right. When you > refer to address translation above, I assume you are referring to a > situation where the CPU issues reads to an address X (in CPU address > space), which goes through a translation at the pcie root port / > controller, and appears as a read from address Y (as seen on the PCIe > bus side)? It appears as a read *to* address Y on PCI (I'm sure that's what you had in mind). Linux shows it like this when it enumerates the host bridge: pci_bus 0000:00: root bus resource [mem 0xf0000000000-0xf007edfffff] (bus address [0x80000000-0xfedfffff]) A CPU read to physical address 0xf0000000000 will appear on PCI with the bus address 0x80000000. The PCI BAR would contain 0x80000000. > In such a situation, my understanding (may be wrong) was that the > lspci is supposed to be showing the PCI address Y for the bar (not > address X in physical address space). If this understanding is wrong > (and the lspci is supposed to show address CPU X for the bar), then > yes, this patch is likely not correct. My impression, based on the man page text about the "-b" option was the opposite: -b Bus-centric view. Show all IRQ numbers and addresses as seen by the cards on the PCI bus instead of as seen by the kernel. That suggests to me that by default lspci shows addresses as seen by the kernel, i.e., using CPU physical addresses. These match what /proc/iomem shows (at least if you look at /proc/iomem as root). > But I do think lspci should have a way to indicate when the bars in > /sys/bus/pci/.../resource and > /sys/bus/pci/.../config > get out of sync. I don't disagree; that does sound like it could be useful. I just don't know the best way to accomplish it. Seems like you'd have to do something in resource_show() to re-read the BAR and validate the cache. I don't know what you do if you find that it's invalid. > > > >> + } > >> if (flg & PCI_BASE_ADDRESS_SPACE_IO) > >> { > >> pciaddr_t a = pos & PCI_BASE_ADDRESS_IO_MASK; > >> -- > >> 2.17.0.441.gb46fe60e1d-goog > >>
On Fri, May 25, 2018 at 11:30 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: > On Fri, May 25, 2018 at 09:57:59AM -0700, Rajat Jain wrote: >> On Thu, May 24, 2018 at 3:37 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: >> > On Tue, May 15, 2018 at 06:33:33PM -0700, Rajat Jain wrote: >> >> If the OS and device are reporting different BAR(s), currently >> >> the lspci will happily show the OS version with no indication what >> >> so ever that there is a problem. This is not correct in my opinion >> >> because lspci is often used to debug devices, so should either show >> >> the BAR reported by the device, or atleast an indication of >> >> out-of-sync. >> >> >> >> I spent a lot of time debugging a PCI device that would unexpectedly >> >> clear out the BAR register due to some bug, and it was quite later >> >> I realized that the lspci is showing me the OS version. So fix that. >> > >> > I assume (hope) that "lspci -b" would show you the hardware values. >> > Of course, who would think to use that when you're debugging a >> > problem. So that's not really a solution to the problem you faced. >> > >> >> On a system that is in problem state: >> >> >> >> localhost ~ # setpci -s 1:0.0 0x10.l >> >> 00000004 <=== BAR is zeroed out >> >> localhost ~ # >> >> >> >> Before: >> >> >> >> localhost ~ # lspci -v -s 1:0.0 >> >> 01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59) >> >> Subsystem: Intel Corporation Dual Band Wireless-AC 7265 >> >> Flags: bus master, fast devsel, latency 0, IRQ 275 >> >> Memory at d1400000 (64-bit, non-prefetchable) [size=8K] >> >> Capabilities: [c8] Power Management version 3 >> >> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >> >> Capabilities: [40] Express Endpoint, MSI 00 >> >> Capabilities: [100] Advanced Error Reporting >> >> Capabilities: [140] Device Serial Number b4-d5-bd-ff-ff-c8-a1-6d >> >> Capabilities: [14c] Latency Tolerance Reporting >> >> Capabilities: [154] L1 PM Substates >> >> Kernel driver in use: iwlwifi >> >> Kernel modules: iwlwifi >> >> >> >> After: >> >> >> >> localhost ~ # lspci -v -s 1:0.0 >> >> 01:00.0 Network controller: Device 8086:095a (rev 59) >> >> Subsystem: Device 8086:5010 >> >> Flags: bus master, fast devsel, latency 0, IRQ 275 >> >> [out-of-sync] Memory at d1400000 (64-bit, non-prefetchable) [size=8K] >> >> Capabilities: [c8] Power Management version 3 >> >> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >> >> Capabilities: [40] Express Endpoint, MSI 00 >> >> Capabilities: [100] Advanced Error Reporting >> >> Capabilities: [140] Device Serial Number b4-d5-bd-ff-ff-c8-a1-6d >> >> Capabilities: [14c] Latency Tolerance Reporting >> >> Capabilities: [154] L1 PM Substates >> >> Kernel driver in use: iwlwifi >> >> Kernel modules: iwlwifi >> > >> > I've never been quite clear on the path from hardware BAR value to >> > lspci output. On my system, >> > >> > $ strace -e trace=open lspci -vs00:14.0 2>&1 | grep 00:14.0 >> > >> > suggests that lspci reads these: >> > >> > /sys/bus/pci/.../resource and >> > /sys/bus/pci/.../config >> > >> > "resource" uses resource_show(), where we use the result of >> > pci_resource_to_user() on the resources in the struct pci_dev, so >> > these are basically cached and may be out of date with respect to the >> > hardware. >> > >> > "config" uses pci_read_config(), where we ultimately call the config >> > accessors, which should give us the actual values from the hardware >> > BARs. >> > >> >> Signed-off-by: Rajat Jain <rajatja@google.com> >> >> --- >> >> lspci.c | 18 ++++++++++++++++++ >> >> 1 file changed, 18 insertions(+) >> >> >> >> diff --git a/lspci.c b/lspci.c >> >> index 748452c..81c65f3 100644 >> >> --- a/lspci.c >> >> +++ b/lspci.c >> >> @@ -371,6 +371,20 @@ show_range(char *prefix, u64 base, u64 limit, int is_64bit) >> >> putchar('\n'); >> >> } >> >> >> >> +static pciaddr_t >> >> +bar_value(struct device *d, int i, u32 flg) >> >> +{ >> >> + pciaddr_t val = 0; >> >> + >> >> + /* Read higher order 32 bits if it's a 64 bit bar in memory space */ >> >> + if (i < 5 && !(flg & PCI_BASE_ADDRESS_SPACE_IO) && >> >> + (flg & PCI_BASE_ADDRESS_MEM_TYPE_MASK == PCI_BASE_ADDRESS_MEM_TYPE_64)) >> >> + val = get_conf_long(d, PCI_BASE_ADDRESS_0 + 4 * (i + 1)); >> >> + >> >> + val = (val << 32) | flg; >> >> + return val; >> >> +} >> >> + >> >> static void >> >> show_bases(struct device *d, int cnt) >> >> { >> >> @@ -401,6 +415,10 @@ show_bases(struct device *d, int cnt) >> >> flg = pos; >> >> virtual = 1; >> >> } >> >> + else if (pos != bar_value(d, i, flg)) >> >> + { >> >> + printf("[out-of-sync] "); >> > >> > If the host bridge does any address translation, e.g., if it uses ACPI >> > _TRA, isn't this going to print "out-of-sync" for *every* BAR, even if >> > it isn't really out of sync? >> >> Need a clarification here to ensure I understand you right. When you >> refer to address translation above, I assume you are referring to a >> situation where the CPU issues reads to an address X (in CPU address >> space), which goes through a translation at the pcie root port / >> controller, and appears as a read from address Y (as seen on the PCIe >> bus side)? > > It appears as a read *to* address Y on PCI (I'm sure that's what you had > in mind). Linux shows it like this when it enumerates the host bridge: > > pci_bus 0000:00: root bus resource [mem 0xf0000000000-0xf007edfffff] (bus address [0x80000000-0xfedfffff]) > > A CPU read to physical address 0xf0000000000 will appear on PCI with the > bus address 0x80000000. The PCI BAR would contain 0x80000000. > >> In such a situation, my understanding (may be wrong) was that the >> lspci is supposed to be showing the PCI address Y for the bar (not >> address X in physical address space). If this understanding is wrong >> (and the lspci is supposed to show address CPU X for the bar), then >> yes, this patch is likely not correct. > > My impression, based on the man page text about the "-b" option was the > opposite: > > -b Bus-centric view. Show all IRQ numbers and addresses as > seen by the cards on the PCI bus instead of as seen by the > kernel. Oh, sorry I was wrong. Thanks for correcting me. > > That suggests to me that by default lspci shows addresses as seen by > the kernel, i.e., using CPU physical addresses. These match what > /proc/iomem shows (at least if you look at /proc/iomem as root). > >> But I do think lspci should have a way to indicate when the bars in >> /sys/bus/pci/.../resource and >> /sys/bus/pci/.../config >> get out of sync. > > I don't disagree; that does sound like it could be useful. I just > don't know the best way to accomplish it. Lets leave it for some other time I think. For now, the lspci -b seems to accomplish what I want (so I'll change my scripts to use it as needed). Thanks! > Seems like you'd have to do > something in resource_show() to re-read the BAR and validate the > cache. I don't know what you do if you find that it's invalid. > >> > >> >> + } >> >> if (flg & PCI_BASE_ADDRESS_SPACE_IO) >> >> { >> >> pciaddr_t a = pos & PCI_BASE_ADDRESS_IO_MASK; >> >> -- >> >> 2.17.0.441.gb46fe60e1d-goog >> >>
Sorry, if it was not obvious, I'd like to self NAK this patch. Thanks. On Fri, May 25, 2018 at 11:38 AM, Rajat Jain <rajatja@google.com> wrote: > On Fri, May 25, 2018 at 11:30 AM, Bjorn Helgaas <helgaas@kernel.org> wrote: >> On Fri, May 25, 2018 at 09:57:59AM -0700, Rajat Jain wrote: >>> On Thu, May 24, 2018 at 3:37 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: >>> > On Tue, May 15, 2018 at 06:33:33PM -0700, Rajat Jain wrote: >>> >> If the OS and device are reporting different BAR(s), currently >>> >> the lspci will happily show the OS version with no indication what >>> >> so ever that there is a problem. This is not correct in my opinion >>> >> because lspci is often used to debug devices, so should either show >>> >> the BAR reported by the device, or atleast an indication of >>> >> out-of-sync. >>> >> >>> >> I spent a lot of time debugging a PCI device that would unexpectedly >>> >> clear out the BAR register due to some bug, and it was quite later >>> >> I realized that the lspci is showing me the OS version. So fix that. >>> > >>> > I assume (hope) that "lspci -b" would show you the hardware values. >>> > Of course, who would think to use that when you're debugging a >>> > problem. So that's not really a solution to the problem you faced. >>> > >>> >> On a system that is in problem state: >>> >> >>> >> localhost ~ # setpci -s 1:0.0 0x10.l >>> >> 00000004 <=== BAR is zeroed out >>> >> localhost ~ # >>> >> >>> >> Before: >>> >> >>> >> localhost ~ # lspci -v -s 1:0.0 >>> >> 01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59) >>> >> Subsystem: Intel Corporation Dual Band Wireless-AC 7265 >>> >> Flags: bus master, fast devsel, latency 0, IRQ 275 >>> >> Memory at d1400000 (64-bit, non-prefetchable) [size=8K] >>> >> Capabilities: [c8] Power Management version 3 >>> >> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >>> >> Capabilities: [40] Express Endpoint, MSI 00 >>> >> Capabilities: [100] Advanced Error Reporting >>> >> Capabilities: [140] Device Serial Number b4-d5-bd-ff-ff-c8-a1-6d >>> >> Capabilities: [14c] Latency Tolerance Reporting >>> >> Capabilities: [154] L1 PM Substates >>> >> Kernel driver in use: iwlwifi >>> >> Kernel modules: iwlwifi >>> >> >>> >> After: >>> >> >>> >> localhost ~ # lspci -v -s 1:0.0 >>> >> 01:00.0 Network controller: Device 8086:095a (rev 59) >>> >> Subsystem: Device 8086:5010 >>> >> Flags: bus master, fast devsel, latency 0, IRQ 275 >>> >> [out-of-sync] Memory at d1400000 (64-bit, non-prefetchable) [size=8K] >>> >> Capabilities: [c8] Power Management version 3 >>> >> Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >>> >> Capabilities: [40] Express Endpoint, MSI 00 >>> >> Capabilities: [100] Advanced Error Reporting >>> >> Capabilities: [140] Device Serial Number b4-d5-bd-ff-ff-c8-a1-6d >>> >> Capabilities: [14c] Latency Tolerance Reporting >>> >> Capabilities: [154] L1 PM Substates >>> >> Kernel driver in use: iwlwifi >>> >> Kernel modules: iwlwifi >>> > >>> > I've never been quite clear on the path from hardware BAR value to >>> > lspci output. On my system, >>> > >>> > $ strace -e trace=open lspci -vs00:14.0 2>&1 | grep 00:14.0 >>> > >>> > suggests that lspci reads these: >>> > >>> > /sys/bus/pci/.../resource and >>> > /sys/bus/pci/.../config >>> > >>> > "resource" uses resource_show(), where we use the result of >>> > pci_resource_to_user() on the resources in the struct pci_dev, so >>> > these are basically cached and may be out of date with respect to the >>> > hardware. >>> > >>> > "config" uses pci_read_config(), where we ultimately call the config >>> > accessors, which should give us the actual values from the hardware >>> > BARs. >>> > >>> >> Signed-off-by: Rajat Jain <rajatja@google.com> >>> >> --- >>> >> lspci.c | 18 ++++++++++++++++++ >>> >> 1 file changed, 18 insertions(+) >>> >> >>> >> diff --git a/lspci.c b/lspci.c >>> >> index 748452c..81c65f3 100644 >>> >> --- a/lspci.c >>> >> +++ b/lspci.c >>> >> @@ -371,6 +371,20 @@ show_range(char *prefix, u64 base, u64 limit, int is_64bit) >>> >> putchar('\n'); >>> >> } >>> >> >>> >> +static pciaddr_t >>> >> +bar_value(struct device *d, int i, u32 flg) >>> >> +{ >>> >> + pciaddr_t val = 0; >>> >> + >>> >> + /* Read higher order 32 bits if it's a 64 bit bar in memory space */ >>> >> + if (i < 5 && !(flg & PCI_BASE_ADDRESS_SPACE_IO) && >>> >> + (flg & PCI_BASE_ADDRESS_MEM_TYPE_MASK == PCI_BASE_ADDRESS_MEM_TYPE_64)) >>> >> + val = get_conf_long(d, PCI_BASE_ADDRESS_0 + 4 * (i + 1)); >>> >> + >>> >> + val = (val << 32) | flg; >>> >> + return val; >>> >> +} >>> >> + >>> >> static void >>> >> show_bases(struct device *d, int cnt) >>> >> { >>> >> @@ -401,6 +415,10 @@ show_bases(struct device *d, int cnt) >>> >> flg = pos; >>> >> virtual = 1; >>> >> } >>> >> + else if (pos != bar_value(d, i, flg)) >>> >> + { >>> >> + printf("[out-of-sync] "); >>> > >>> > If the host bridge does any address translation, e.g., if it uses ACPI >>> > _TRA, isn't this going to print "out-of-sync" for *every* BAR, even if >>> > it isn't really out of sync? >>> >>> Need a clarification here to ensure I understand you right. When you >>> refer to address translation above, I assume you are referring to a >>> situation where the CPU issues reads to an address X (in CPU address >>> space), which goes through a translation at the pcie root port / >>> controller, and appears as a read from address Y (as seen on the PCIe >>> bus side)? >> >> It appears as a read *to* address Y on PCI (I'm sure that's what you had >> in mind). Linux shows it like this when it enumerates the host bridge: >> >> pci_bus 0000:00: root bus resource [mem 0xf0000000000-0xf007edfffff] (bus address [0x80000000-0xfedfffff]) >> >> A CPU read to physical address 0xf0000000000 will appear on PCI with the >> bus address 0x80000000. The PCI BAR would contain 0x80000000. >> >>> In such a situation, my understanding (may be wrong) was that the >>> lspci is supposed to be showing the PCI address Y for the bar (not >>> address X in physical address space). If this understanding is wrong >>> (and the lspci is supposed to show address CPU X for the bar), then >>> yes, this patch is likely not correct. >> >> My impression, based on the man page text about the "-b" option was the >> opposite: >> >> -b Bus-centric view. Show all IRQ numbers and addresses as >> seen by the cards on the PCI bus instead of as seen by the >> kernel. > > Oh, sorry I was wrong. Thanks for correcting me. > >> >> That suggests to me that by default lspci shows addresses as seen by >> the kernel, i.e., using CPU physical addresses. These match what >> /proc/iomem shows (at least if you look at /proc/iomem as root). >> >>> But I do think lspci should have a way to indicate when the bars in >>> /sys/bus/pci/.../resource and >>> /sys/bus/pci/.../config >>> get out of sync. >> >> I don't disagree; that does sound like it could be useful. I just >> don't know the best way to accomplish it. > > Lets leave it for some other time I think. For now, the lspci -b seems > to accomplish what I want (so I'll change my scripts to use it as > needed). Thanks! > >> Seems like you'd have to do >> something in resource_show() to re-read the BAR and validate the >> cache. I don't know what you do if you find that it's invalid. >> >>> > >>> >> + } >>> >> if (flg & PCI_BASE_ADDRESS_SPACE_IO) >>> >> { >>> >> pciaddr_t a = pos & PCI_BASE_ADDRESS_IO_MASK; >>> >> -- >>> >> 2.17.0.441.gb46fe60e1d-goog >>> >>
Hello! > I don't disagree; that does sound like it could be useful. I just > don't know the best way to accomplish it. Seems like you'd have to do > something in resource_show() to re-read the BAR and validate the > cache. I don't know what you do if you find that it's invalid. I also do not see how to do it in a reasonably simple way. I checked lspci and it turned out that it already add a "[virtual]" marker in cases when the BAR is zero (which was this case -- am I right?) and the kernel reports a non-zero address. However, it turned out that the check is never triggered on I/O regions, because the BAR contains a non-zero type bit. I have modified the check to take into account just the address part of the BAR. You can check it in the master branch. Martin
diff --git a/lspci.c b/lspci.c index 748452c..81c65f3 100644 --- a/lspci.c +++ b/lspci.c @@ -371,6 +371,20 @@ show_range(char *prefix, u64 base, u64 limit, int is_64bit) putchar('\n'); } +static pciaddr_t +bar_value(struct device *d, int i, u32 flg) +{ + pciaddr_t val = 0; + + /* Read higher order 32 bits if it's a 64 bit bar in memory space */ + if (i < 5 && !(flg & PCI_BASE_ADDRESS_SPACE_IO) && + (flg & PCI_BASE_ADDRESS_MEM_TYPE_MASK == PCI_BASE_ADDRESS_MEM_TYPE_64)) + val = get_conf_long(d, PCI_BASE_ADDRESS_0 + 4 * (i + 1)); + + val = (val << 32) | flg; + return val; +} + static void show_bases(struct device *d, int cnt) { @@ -401,6 +415,10 @@ show_bases(struct device *d, int cnt) flg = pos; virtual = 1; } + else if (pos != bar_value(d, i, flg)) + { + printf("[out-of-sync] "); + } if (flg & PCI_BASE_ADDRESS_SPACE_IO) { pciaddr_t a = pos & PCI_BASE_ADDRESS_IO_MASK;
If the OS and device are reporting different BAR(s), currently the lspci will happily show the OS version with no indication what so ever that there is a problem. This is not correct in my opinion because lspci is often used to debug devices, so should either show the BAR reported by the device, or atleast an indication of out-of-sync. I spent a lot of time debugging a PCI device that would unexpectedly clear out the BAR register due to some bug, and it was quite later I realized that the lspci is showing me the OS version. So fix that. On a system that is in problem state: localhost ~ # setpci -s 1:0.0 0x10.l 00000004 <=== BAR is zeroed out localhost ~ # Before: localhost ~ # lspci -v -s 1:0.0 01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59) Subsystem: Intel Corporation Dual Band Wireless-AC 7265 Flags: bus master, fast devsel, latency 0, IRQ 275 Memory at d1400000 (64-bit, non-prefetchable) [size=8K] Capabilities: [c8] Power Management version 3 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [40] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number b4-d5-bd-ff-ff-c8-a1-6d Capabilities: [14c] Latency Tolerance Reporting Capabilities: [154] L1 PM Substates Kernel driver in use: iwlwifi Kernel modules: iwlwifi After: localhost ~ # lspci -v -s 1:0.0 01:00.0 Network controller: Device 8086:095a (rev 59) Subsystem: Device 8086:5010 Flags: bus master, fast devsel, latency 0, IRQ 275 [out-of-sync] Memory at d1400000 (64-bit, non-prefetchable) [size=8K] Capabilities: [c8] Power Management version 3 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [40] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number b4-d5-bd-ff-ff-c8-a1-6d Capabilities: [14c] Latency Tolerance Reporting Capabilities: [154] L1 PM Substates Kernel driver in use: iwlwifi Kernel modules: iwlwifi Signed-off-by: Rajat Jain <rajatja@google.com> --- lspci.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)