Message ID | 20131206001954.27659.78163.stgit@bhelgaas-glaptop.roam.corp.google.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Thu, Dec 05, 2013 at 05:19:55PM -0700, Bjorn Helgaas wrote: > pci_setup_bridge_io() accessed PCI_IO_BASE and PCI_IO_LIMIT using dword > (32-bit) reads and writes, which also access the Secondary Status register. > Since the Secondary Status register is in the upper 16 bits of the dword, > and we preserved those upper 16 bits, this had the effect of clearing any > of the write-1-to-clear bits that happened to be set in the Secondary > Status register. This is a good catch! > - pci_write_config_dword(bridge, PCI_IO_BASE, l); > + pci_write_config_word(bridge, PCI_IO_BASE, l); But this is a problem :( tegra and mvebu at least do not have HW to do non-32 bit writes, so their implementation of pci_write_config_word does the RMW internally and will still have this same bug. I think you have to keep the 32 bit write here, but zero the write-one-to-clear bits :( Regards, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 9, 2013 at 12:33 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Thu, Dec 05, 2013 at 05:19:55PM -0700, Bjorn Helgaas wrote: >> pci_setup_bridge_io() accessed PCI_IO_BASE and PCI_IO_LIMIT using dword >> (32-bit) reads and writes, which also access the Secondary Status register. >> Since the Secondary Status register is in the upper 16 bits of the dword, >> and we preserved those upper 16 bits, this had the effect of clearing any >> of the write-1-to-clear bits that happened to be set in the Secondary >> Status register. > > This is a good catch! > >> - pci_write_config_dword(bridge, PCI_IO_BASE, l); >> + pci_write_config_word(bridge, PCI_IO_BASE, l); > > But this is a problem :( > > tegra and mvebu at least do not have HW to do non-32 bit writes, so > their implementation of pci_write_config_word does the RMW internally > and will still have this same bug. > > I think you have to keep the 32 bit write here, but zero the > write-one-to-clear bits :( Is there actually some spec requirement about the access sizes that must be supported by the hardware? If there is something, I'll gladly keep the 32-bit access, but if it's only a tegra/mvebu-specific restriction, then I think it needs to be handled in the PCI config accessors for that hardware. This spec statement (PCI (not PCIe) r3.0, sec 3.2.2.3.4) makes it sound like tegra/mvebu is non-conforming and should deal with this specially: "A function must not restrict the size of the access it supports in Configuration Space. The configuration commands, like other commands, allow data to be accessed using any combination of bytes (including a byte, word, DWORD, or non-contiguous bytes) and multiple data phases in a burst. The target is required to handle any combination of byte enables." I think the tegra/mvebu accessors should be able to use the address and access size to figure out what we're trying to do. If we're doing a WORD write to PCI_IO_BASE, it can turn that into: 32-bit read, clear the upper 16 bits (secondary status), insert the PCI_IO_BASE part, 32-bit write. If we're doing a 16-bit write to PCI_SEC_STATUS, it can turn that into: 32-bit read, insert upper 16 bits (secondary status), leave lower 16 bits alone, 32-bit write. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 09, 2013 at 01:15:52PM -0700, Bjorn Helgaas wrote: > Is there actually some spec requirement about the access sizes that > must be supported by the hardware? If there is something, I'll gladly > keep the 32-bit access, but if it's only a tegra/mvebu-specific > restriction, then I think it needs to be handled in the PCI config > accessors for that hardware. The host register interface is not specified by any standard. I looked a for a bit and found other drivers with the same problem: arch/arm/mach-cns3xxx/pcie.c arch/arm/mach-iop13xx/pci.c arch/arm/mach-ks8695/pci.c arch/arm/mach-sa1100/pci-nanoengine.c [..] I stopped looking after that point.. > This spec statement (PCI (not PCIe) r3.0, sec 3.2.2.3.4) makes it > sound like tegra/mvebu is non-conforming and should deal with this > specially: > > "A function must not restrict the size of the access it supports in > Configuration Space. The configuration commands, like other commands, > allow data to be accessed using any combination of bytes (including a > byte, word, DWORD, or non-contiguous bytes) and multiple data phases > in a burst. The target is required to handle any combination of byte > enables." This verbage is talking about the target/responder, not the host. Any target must respond to all possible sizes for all possible config registers, including combinations that even x86 can't issue (like a BE pattern of b0110) > I think the tegra/mvebu accessors should be able to use the address > and access size to figure out what we're trying to do. If we're doing > a WORD write to PCI_IO_BASE, it can turn that into: 32-bit read, clear > the upper 16 bits (secondary status), insert the PCI_IO_BASE part, > 32-bit write. If we're doing a 16-bit write to PCI_SEC_STATUS, it can > turn that into: 32-bit read, insert upper 16 bits (secondary status), > leave lower 16 bits alone, 32-bit write. Seems reasonable, but this obviously shouldn't be open coded in every driver.. Maybe pci_ops should gain a read32/write32 and HW that can only do 32 bit access should not define read/write. The core code could provide read/write functions to do the bit mangling for all possibilities? That would be very clear what is going on. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 9, 2013 at 2:27 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Mon, Dec 09, 2013 at 01:15:52PM -0700, Bjorn Helgaas wrote: > >> Is there actually some spec requirement about the access sizes that >> must be supported by the hardware? If there is something, I'll gladly >> keep the 32-bit access, but if it's only a tegra/mvebu-specific >> restriction, then I think it needs to be handled in the PCI config >> accessors for that hardware. > > The host register interface is not specified by any standard. > > I looked a for a bit and found other drivers with the same problem: > arch/arm/mach-cns3xxx/pcie.c > arch/arm/mach-iop13xx/pci.c > arch/arm/mach-ks8695/pci.c > arch/arm/mach-sa1100/pci-nanoengine.c > [..] > > I stopped looking after that point.. > >> This spec statement (PCI (not PCIe) r3.0, sec 3.2.2.3.4) makes it >> sound like tegra/mvebu is non-conforming and should deal with this >> specially: >> >> "A function must not restrict the size of the access it supports in >> Configuration Space. The configuration commands, like other commands, >> allow data to be accessed using any combination of bytes (including a >> byte, word, DWORD, or non-contiguous bytes) and multiple data phases >> in a burst. The target is required to handle any combination of byte >> enables." > > This verbage is talking about the target/responder, not the host. Any > target must respond to all possible sizes for all possible config > registers, including combinations that even x86 can't issue (like a BE > pattern of b0110) Now I'm confused. From the PCI core's point of view, pci_setup_bridge_io() is talking to a target, namely a PCI-to-PCI bridge. PCI config space accessors are also for talking to targets. I think you're talking about host bridges, and I agree that they are not covered by this spec statement and can have arbitrary register layout and access size restrictions. Some arches set up their host bridges so they appear in PCI config space and have register layouts that look like PCI-to-PCI bridges. In my opinion, those arches then have the responsibility of following all the PCI-to-PCI bridge rules, including access size restrictions, either directly in hardware or in their config accessors. I don't think it makes sense to try to make the core know about this stuff. There are many places that touch bridges, and it would be pretty hard to manage exceptions like this. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 09, 2013 at 02:58:06PM -0700, Bjorn Helgaas wrote: > >> "A function must not restrict the size of the access it supports in > >> Configuration Space. The configuration commands, like other commands, > >> allow data to be accessed using any combination of bytes (including a > >> byte, word, DWORD, or non-contiguous bytes) and multiple data phases > >> in a burst. The target is required to handle any combination of byte > >> enables." > > > > This verbage is talking about the target/responder, not the host. Any > > target must respond to all possible sizes for all possible config > > registers, including combinations that even x86 can't issue (like a BE > > pattern of b0110) > > Now I'm confused. From the PCI core's point of view, > pci_setup_bridge_io() is talking to a target, namely a PCI-to-PCI > bridge. PCI config space accessors are also for talking to targets. The fundamental issue is that the PCI host drivers I pointed out have hardware that cannot construct a PCI-E ConfigWr TLP (or PCI/PCI-X eqiuv) with any content other than a 32 bit DWORD write. So even though pci_setup_bridge_io is talking to a target, and the target is required to handle all possible ConfigWr TLPs, the host driver (struct pci_ops) can only generate one possibility. The problematic host drivers are all emulating non-32 bit ConfigWr support by doing RMW, and you have observed that RMW is not 100% correct when dealing with write 1 to clear bits. Basically, your fix to pci_setup_bridge_io is fine - but your observation led me to realize that the HW drivers implementing RMW for 8 and 16 bit ops under their struct pci_ops.read function have exactly the same flaw you are fixing here - they will silently wipe out write 1 to clear bits. > Some arches set up their host bridges so they appear in PCI config > space and have register layouts that look like PCI-to-PCI bridges. In > my opinion, those arches then have the responsibility of following all > the PCI-to-PCI bridge rules, including access size restrictions, > either directly in hardware or in their config accessors. Sure, but this isn't the problem :) IMHO, the pci core needs to help host drivers implement a correct pci_ops.write function for the case where the host hardware can only produce 32 bit ConfigWR TLPs. That seems to be common enough and subtle enough it shouldn't be inlined into every driver. Just to be clear, I am talking about implementations like this: static int cns3xxx_pci_write_config(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val) { u32 v; void __iomem *base; u32 mask = (0x1ull << (size * 8)) - 1; int shift = (where % 4) * 8; base = cns3xxx_pci_cfg_base(bus, devfn, where); if (!base) return PCIBIOS_SUCCESSFUL; v = __raw_readl(base); v &= ~(mask << shift); v |= (val & mask) << shift; __raw_writel(v, base); return PCIBIOS_SUCCESSFUL; } static struct pci_ops cns3xxx_pcie_ops = { .read = cns3xxx_pci_read_config, .write = cns3xxx_pci_write_config, }; Which we now know is subtly broken as it will clear write 1 to clear bits when doing sub dword operations. I found 6 copies of this pattern in 5 mins of searching :( Regards, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 9, 2013 at 5:10 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Mon, Dec 09, 2013 at 02:58:06PM -0700, Bjorn Helgaas wrote: > >> >> "A function must not restrict the size of the access it supports in >> >> Configuration Space. The configuration commands, like other commands, >> >> allow data to be accessed using any combination of bytes (including a >> >> byte, word, DWORD, or non-contiguous bytes) and multiple data phases >> >> in a burst. The target is required to handle any combination of byte >> >> enables." >> > >> > This verbage is talking about the target/responder, not the host. Any >> > target must respond to all possible sizes for all possible config >> > registers, including combinations that even x86 can't issue (like a BE >> > pattern of b0110) >> >> Now I'm confused. From the PCI core's point of view, >> pci_setup_bridge_io() is talking to a target, namely a PCI-to-PCI >> bridge. PCI config space accessors are also for talking to targets. > > The fundamental issue is that the PCI host drivers I pointed out have > hardware that cannot construct a PCI-E ConfigWr TLP (or PCI/PCI-X > eqiuv) with any content other than a 32 bit DWORD write. Yep. > So even though pci_setup_bridge_io is talking to a target, and the > target is required to handle all possible ConfigWr TLPs, the host > driver (struct pci_ops) can only generate one possibility. Yep. > The problematic host drivers are all emulating non-32 bit ConfigWr > support by doing RMW, and you have observed that RMW is not 100% > correct when dealing with write 1 to clear bits. Yep. > Basically, your fix to pci_setup_bridge_io is fine - but your > observation led me to realize that the HW drivers implementing RMW for > 8 and 16 bit ops under their struct pci_ops.read function have exactly > the same flaw you are fixing here - they will silently wipe out write > 1 to clear bits. Ah, OK. I thought you were saying that we couldn't change pci_setup_bridge_io() to use 16-bit accesses because of this problem. >> Some arches set up their host bridges so they appear in PCI config >> space and have register layouts that look like PCI-to-PCI bridges. In >> my opinion, those arches then have the responsibility of following all >> the PCI-to-PCI bridge rules, including access size restrictions, >> either directly in hardware or in their config accessors. > > Sure, but this isn't the problem :) > > IMHO, the pci core needs to help host drivers implement a correct > pci_ops.write function for the case where the host hardware can only > produce 32 bit ConfigWR TLPs. That seems to be common enough and > subtle enough it shouldn't be inlined into every driver. Sure, if somebody can come up with a reasonable way to share this sort of code, that sounds like a good thing. Maybe extending struct pci_ops is the way to do this, but I hope not, because it seems like it will require quite a bit of specific knowledge about which registers have RW1C bits. They're not all at constant offsets because they can be in capabilities, too. > Just to be clear, I am talking about implementations like this: > > static int cns3xxx_pci_write_config(struct pci_bus *bus, unsigned int devfn, > int where, int size, u32 val) > { > u32 v; > void __iomem *base; > u32 mask = (0x1ull << (size * 8)) - 1; > int shift = (where % 4) * 8; > > base = cns3xxx_pci_cfg_base(bus, devfn, where); > if (!base) > return PCIBIOS_SUCCESSFUL; > > v = __raw_readl(base); > > v &= ~(mask << shift); > v |= (val & mask) << shift; > > __raw_writel(v, base); > > return PCIBIOS_SUCCESSFUL; > } > > static struct pci_ops cns3xxx_pcie_ops = { > .read = cns3xxx_pci_read_config, > .write = cns3xxx_pci_write_config, > }; > > Which we now know is subtly broken as it will clear write 1 to clear > bits when doing sub dword operations. > > I found 6 copies of this pattern in 5 mins of searching :( Yes, I bet there are a bunch of them. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 09, 2013 at 05:18:30PM -0700, Bjorn Helgaas wrote: > > Basically, your fix to pci_setup_bridge_io is fine - but your > > observation led me to realize that the HW drivers implementing RMW for > > 8 and 16 bit ops under their struct pci_ops.read function have exactly > > the same flaw you are fixing here - they will silently wipe out write > > 1 to clear bits. > > Ah, OK. I thought you were saying that we couldn't change > pci_setup_bridge_io() to use 16-bit accesses because of this problem. No, not really - just that this bug you discovered is broader than just that one place :) > Sure, if somebody can come up with a reasonable way to share this sort > of code, that sounds like a good thing. Maybe extending struct > pci_ops is the way to do this, but I hope not, because it seems like Well, sharing the code is no problem, it is exactly the same for every 32 bit only host driver. But supporting RW1C bits in capability lists seems fairly hard once the context at the call site is lost. Which makes me wonder if supporting sub-dword writes to dwords with RW1C bits makes sense at all :( That was why my initial reaction was to not do sub dword writes if you know it will conflict with a RW1C bit. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 9, 2013 at 6:31 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Mon, Dec 09, 2013 at 05:18:30PM -0700, Bjorn Helgaas wrote: > >> > Basically, your fix to pci_setup_bridge_io is fine - but your >> > observation led me to realize that the HW drivers implementing RMW for >> > 8 and 16 bit ops under their struct pci_ops.read function have exactly >> > the same flaw you are fixing here - they will silently wipe out write >> > 1 to clear bits. >> >> Ah, OK. I thought you were saying that we couldn't change >> pci_setup_bridge_io() to use 16-bit accesses because of this problem. > > No, not really - just that this bug you discovered is broader than just > that one place :) > >> Sure, if somebody can come up with a reasonable way to share this sort >> of code, that sounds like a good thing. Maybe extending struct >> pci_ops is the way to do this, but I hope not, because it seems like > > Well, sharing the code is no problem, it is exactly the same for every > 32 bit only host driver. > > But supporting RW1C bits in capability lists seems fairly hard once > the context at the call site is lost. Strictly speaking, fixing this only requires knowledge of the hardware register layout, so at least in principle, it can be done without the call site context. It does sound like it might be messy, though. QEMU deals with this issue somehow; maybe there's something there we can copy. > Which makes me wonder if supporting sub-dword writes to dwords with > RW1C bits makes sense at all :( > > That was why my initial reaction was to not do sub dword writes if you > know it will conflict with a RW1C bit. Keeping the dword read/write and clearing the upper 16 bits before the write is an option. But I'm inclined to do the 16-bit accesses, i.e., use the patch as posted, because that makes the issue more visible. If we keep the dword access, we can fix this particular issue, but it does nothing to help fix other similar cases. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 80350299a6ea..2e344a5581ae 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -538,7 +538,8 @@ static void pci_setup_bridge_io(struct pci_bus *bus) struct pci_bus_region region; unsigned long io_mask; u8 io_base_lo, io_limit_lo; - u32 l, io_upper16; + u16 l; + u32 io_upper16; io_mask = PCI_IO_RANGE_MASK; if (bridge->io_window_1k) @@ -548,11 +549,10 @@ static void pci_setup_bridge_io(struct pci_bus *bus) res = bus->resource[0]; pcibios_resource_to_bus(bridge, ®ion, res); if (res->flags & IORESOURCE_IO) { - pci_read_config_dword(bridge, PCI_IO_BASE, &l); - l &= 0xffff0000; + pci_read_config_word(bridge, PCI_IO_BASE, &l); io_base_lo = (region.start >> 8) & io_mask; io_limit_lo = (region.end >> 8) & io_mask; - l |= ((u32) io_limit_lo << 8) | io_base_lo; + l = ((u16) io_limit_lo << 8) | io_base_lo; /* Set up upper 16 bits of I/O base/limit. */ io_upper16 = (region.end & 0xffff0000) | (region.start >> 16); dev_info(&bridge->dev, " bridge window %pR\n", res); @@ -564,7 +564,7 @@ static void pci_setup_bridge_io(struct pci_bus *bus) /* Temporarily disable the I/O range before updating PCI_IO_BASE. */ pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, 0x0000ffff); /* Update lower 16 bits of I/O base/limit. */ - pci_write_config_dword(bridge, PCI_IO_BASE, l); + pci_write_config_word(bridge, PCI_IO_BASE, l); /* Update upper 16 bits of I/O base/limit. */ pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, io_upper16); }
pci_setup_bridge_io() accessed PCI_IO_BASE and PCI_IO_LIMIT using dword (32-bit) reads and writes, which also access the Secondary Status register. Since the Secondary Status register is in the upper 16 bits of the dword, and we preserved those upper 16 bits, this had the effect of clearing any of the write-1-to-clear bits that happened to be set in the Secondary Status register. That's not what we want, so use word (16-bit) accesses to update only PCI_IO_BASE and PCI_IO_LIMIT. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> --- drivers/pci/setup-bus.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html