Message ID | 20161029111229.26875-1-zajec5@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Rafal, On 16-10-29 04:12 AM, Rafał Miłecki wrote: > From: Rafał Miłecki <rafal@milecki.pl> > > Since early BCM5301X days we got abort handler that was removed by > commit 937b12306ea79 ("ARM: BCM5301X: remove workaround imprecise abort > fault handler"). It assumed we need to deal only with pending aborts > left by the bootloader. Unfortunately this isn't true for BCM5301X. > > When probing PCI config space (device enumeration) it is expected to > have master aborts on the PCI bus. Most bridges don't forward (or they > allow disabling it) these errors onto the AXI/AMBA bus but not the > Northstar (BCM5301X) one. Should we only add this workaround code if CONFIG_PCI is on then? > > iProc PCIe controller on Northstar seems to be some older one, without > a control register for errors forwarding. It means we need to workaround > this at platform level. All newer platforms are not affected by this > issue. > > Signed-off-by: Rafał Miłecki <rafal@milecki.pl> > --- > arch/arm/mach-bcm/bcm_5301x.c | 28 ++++++++++++++++++++++++++++ > 1 file changed, 28 insertions(+) > > diff --git a/arch/arm/mach-bcm/bcm_5301x.c b/arch/arm/mach-bcm/bcm_5301x.c > index c8830a2..fe067f6 100644 > --- a/arch/arm/mach-bcm/bcm_5301x.c > +++ b/arch/arm/mach-bcm/bcm_5301x.c > @@ -9,14 +9,42 @@ > #include <asm/hardware/cache-l2x0.h> > > #include <asm/mach/arch.h> > +#include <asm/siginfo.h> > +#include <asm/signal.h> > + > +#define FSR_EXTERNAL (1 << 12) > +#define FSR_READ (0 << 10) > +#define FSR_IMPRECISE 0x0406 > > static const char *const bcm5301x_dt_compat[] __initconst = { > "brcm,bcm4708", > NULL, > }; > > +static int bcm5301x_abort_handler(unsigned long addr, unsigned int fsr, > + struct pt_regs *regs) > +{ > + /* > + * We want to ignore aborts forwarded from the PCIe bus that are > + * expected and shouldn't really be passed by the PCIe controller. > + * The biggest disadvantage is the same FSR code may be reported when > + * reading non-existing APB register and we shouldn't ignore that. > + */ > + if (fsr == (FSR_EXTERNAL | FSR_READ | FSR_IMPRECISE)) > + return 0; > + > + return 1; > +} > + > +static void __init bcm5301x_init_early(void) > +{ > + hook_fault_code(16 + 6, bcm5301x_abort_handler, SIGBUS, BUS_OBJERR, > + "imprecise external abort"); > +} > + > DT_MACHINE_START(BCM5301X, "BCM5301X") > .l2c_aux_val = 0, > .l2c_aux_mask = ~0, > .dt_compat = bcm5301x_dt_compat, > + .init_early = bcm5301x_init_early, > MACHINE_END > Regards, Scott
On 10/31/2016 07:08 PM, Scott Branden wrote: > Hi Rafal, > > > On 16-10-29 04:12 AM, Rafał Miłecki wrote: >> From: Rafał Miłecki <rafal@milecki.pl> >> >> Since early BCM5301X days we got abort handler that was removed by >> commit 937b12306ea79 ("ARM: BCM5301X: remove workaround imprecise abort >> fault handler"). It assumed we need to deal only with pending aborts >> left by the bootloader. Unfortunately this isn't true for BCM5301X. >> >> When probing PCI config space (device enumeration) it is expected to >> have master aborts on the PCI bus. Most bridges don't forward (or they >> allow disabling it) these errors onto the AXI/AMBA bus but not the >> Northstar (BCM5301X) one. > Should we only add this workaround code if CONFIG_PCI is on then? I think all the supported northstar devices have a PCIe controller. We could add such a CONFIG_PCI check, but I do not see a big advantage. >> iProc PCIe controller on Northstar seems to be some older one, without >> a control register for errors forwarding. It means we need to workaround >> this at platform level. All newer platforms are not affected by this >> issue. >> >> Signed-off-by: Rafał Miłecki <rafal@milecki.pl> >> --- >> arch/arm/mach-bcm/bcm_5301x.c | 28 ++++++++++++++++++++++++++++ >> 1 file changed, 28 insertions(+) >> >> diff --git a/arch/arm/mach-bcm/bcm_5301x.c >> b/arch/arm/mach-bcm/bcm_5301x.c >> index c8830a2..fe067f6 100644 >> --- a/arch/arm/mach-bcm/bcm_5301x.c >> +++ b/arch/arm/mach-bcm/bcm_5301x.c >> @@ -9,14 +9,42 @@ >> #include <asm/hardware/cache-l2x0.h> >> >> #include <asm/mach/arch.h> >> +#include <asm/siginfo.h> >> +#include <asm/signal.h> >> + >> +#define FSR_EXTERNAL (1 << 12) >> +#define FSR_READ (0 << 10) >> +#define FSR_IMPRECISE 0x0406 >> >> static const char *const bcm5301x_dt_compat[] __initconst = { >> "brcm,bcm4708", >> NULL, >> }; >> >> +static int bcm5301x_abort_handler(unsigned long addr, unsigned int fsr, >> + struct pt_regs *regs) >> +{ >> + /* >> + * We want to ignore aborts forwarded from the PCIe bus that are >> + * expected and shouldn't really be passed by the PCIe controller. >> + * The biggest disadvantage is the same FSR code may be reported >> when >> + * reading non-existing APB register and we shouldn't ignore that. >> + */ >> + if (fsr == (FSR_EXTERNAL | FSR_READ | FSR_IMPRECISE)) >> + return 0; How often does this happen? Would it be useful to add a log message here? >> + >> + return 1; >> +} >> + >> +static void __init bcm5301x_init_early(void) >> +{ >> + hook_fault_code(16 + 6, bcm5301x_abort_handler, SIGBUS, BUS_OBJERR, >> + "imprecise external abort"); >> +} >> + >> DT_MACHINE_START(BCM5301X, "BCM5301X") >> .l2c_aux_val = 0, >> .l2c_aux_mask = ~0, >> .dt_compat = bcm5301x_dt_compat, >> + .init_early = bcm5301x_init_early, >> MACHINE_END >> > > Regards, > Scott
On 10/31/2016 01:59 PM, Hauke Mehrtens wrote: > > > On 10/31/2016 07:08 PM, Scott Branden wrote: >> Hi Rafal, >> >> >> On 16-10-29 04:12 AM, Rafał Miłecki wrote: >>> From: Rafał Miłecki <rafal@milecki.pl> >>> >>> Since early BCM5301X days we got abort handler that was removed by >>> commit 937b12306ea79 ("ARM: BCM5301X: remove workaround imprecise abort >>> fault handler"). It assumed we need to deal only with pending aborts >>> left by the bootloader. Unfortunately this isn't true for BCM5301X. >>> >>> When probing PCI config space (device enumeration) it is expected to >>> have master aborts on the PCI bus. Most bridges don't forward (or they >>> allow disabling it) these errors onto the AXI/AMBA bus but not the >>> Northstar (BCM5301X) one. >> Should we only add this workaround code if CONFIG_PCI is on then? > > I think all the supported northstar devices have a PCIe controller. We > could add such a CONFIG_PCI check, but I do not see a big advantage. Actually, I do see a couple disadvantages if we gate this with CONFIG_PCI: if this problem shows up irrespective of your kernel configuration, you want the error handler to clear it, not rely on CONFIG_PCI to be enabled for the error to go away and also, without an additional ifdef, additional compiler coverage.
On 16-10-31 02:01 PM, Florian Fainelli wrote: > On 10/31/2016 01:59 PM, Hauke Mehrtens wrote: >> >> >> On 10/31/2016 07:08 PM, Scott Branden wrote: >>> Hi Rafal, >>> >>> >>> On 16-10-29 04:12 AM, Rafał Miłecki wrote: >>>> From: Rafał Miłecki <rafal@milecki.pl> >>>> >>>> Since early BCM5301X days we got abort handler that was removed by >>>> commit 937b12306ea79 ("ARM: BCM5301X: remove workaround imprecise abort >>>> fault handler"). It assumed we need to deal only with pending aborts >>>> left by the bootloader. Unfortunately this isn't true for BCM5301X. >>>> >>>> When probing PCI config space (device enumeration) it is expected to >>>> have master aborts on the PCI bus. Most bridges don't forward (or they >>>> allow disabling it) these errors onto the AXI/AMBA bus but not the >>>> Northstar (BCM5301X) one. >>> Should we only add this workaround code if CONFIG_PCI is on then? >> >> I think all the supported northstar devices have a PCIe controller. We >> could add such a CONFIG_PCI check, but I do not see a big advantage. > > Actually, I do see a couple disadvantages if we gate this with > CONFIG_PCI: if this problem shows up irrespective of your kernel > configuration, you want the error handler to clear it, not rely on > CONFIG_PCI to be enabled for the error to go away and also, without an > additional ifdef, additional compiler coverage. > A problem with reintroducing this change is that all imprecise data aborts are ignored, not just PCI. So if you don't actually use PCI in your system and want to debug other aborts you are unable to. I don't know if we care about such situation.
On 10/31/2016 02:46 PM, Scott Branden wrote: > > > On 16-10-31 02:01 PM, Florian Fainelli wrote: >> On 10/31/2016 01:59 PM, Hauke Mehrtens wrote: >>> >>> >>> On 10/31/2016 07:08 PM, Scott Branden wrote: >>>> Hi Rafal, >>>> >>>> >>>> On 16-10-29 04:12 AM, Rafał Miłecki wrote: >>>>> From: Rafał Miłecki <rafal@milecki.pl> >>>>> >>>>> Since early BCM5301X days we got abort handler that was removed by >>>>> commit 937b12306ea79 ("ARM: BCM5301X: remove workaround imprecise >>>>> abort >>>>> fault handler"). It assumed we need to deal only with pending aborts >>>>> left by the bootloader. Unfortunately this isn't true for BCM5301X. >>>>> >>>>> When probing PCI config space (device enumeration) it is expected to >>>>> have master aborts on the PCI bus. Most bridges don't forward (or they >>>>> allow disabling it) these errors onto the AXI/AMBA bus but not the >>>>> Northstar (BCM5301X) one. >>>> Should we only add this workaround code if CONFIG_PCI is on then? >>> >>> I think all the supported northstar devices have a PCIe controller. We >>> could add such a CONFIG_PCI check, but I do not see a big advantage. >> >> Actually, I do see a couple disadvantages if we gate this with >> CONFIG_PCI: if this problem shows up irrespective of your kernel >> configuration, you want the error handler to clear it, not rely on >> CONFIG_PCI to be enabled for the error to go away and also, without an >> additional ifdef, additional compiler coverage. >> > A problem with reintroducing this change is that all imprecise data > aborts are ignored, not just PCI. So if you don't actually use PCI in > your system and want to debug other aborts you are unable to. I don't > know if we care about such situation. Considering that any abort is pretty much fatal, the options are: - update the freaking bootloader to a version where there are no such aborts generated, not an option on consumer devices, unclear which version (if any) fixes that - fixups the aborts externally, via a boot wrapper, which is going to take some time to develop, causes additional burden on the distributors to provide instructions/build images on how to do it - fixups the aborts in the kernel, irrespective of where they come from, simple and easy - fixups the aborts in the kernel, look where they come from, by using some bit of magic, looking at PCIe registers and whatnot (provided that is even possible), not too hard, but can take a while, and is error prone I can certainly advocate that option 3 is the one that gives a working device, and this is what matters from a distribution perspective like LEDE/OpenWrt.
On 31 October 2016 at 22:56, Florian Fainelli <f.fainelli@gmail.com> wrote: > On 10/31/2016 02:46 PM, Scott Branden wrote: >> A problem with reintroducing this change is that all imprecise data >> aborts are ignored, not just PCI. So if you don't actually use PCI in >> your system and want to debug other aborts you are unable to. I don't >> know if we care about such situation. > > Considering that any abort is pretty much fatal, the options are: > > - update the freaking bootloader to a version where there are no such > aborts generated, not an option on consumer devices, unclear which > version (if any) fixes that > > - fixups the aborts externally, via a boot wrapper, which is going to > take some time to develop, causes additional burden on the distributors > to provide instructions/build images on how to do it In practice updating bootloader is not possible (as you noticed) but I don't think it's even possible to fix this problem at bootloader / extra loader level. If this was a matter of hardware setup, we could handle it in kernel as well I believe. Ray actually verified it's controller limitation on Northstar platform. It sounds a bit like you're thinking about MMU and Dcache Northstar problem when writing that e-mail. > - fixups the aborts in the kernel, irrespective of where they come from, > simple and easy > > - fixups the aborts in the kernel, look where they come from, by using > some bit of magic, looking at PCIe registers and whatnot (provided that > is even possible), not too hard, but can take a while, and is error prone > > I can certainly advocate that option 3 is the one that gives a working > device, and this is what matters from a distribution perspective like > LEDE/OpenWrt.
On 16-10-31 02:56 PM, Florian Fainelli wrote: > On 10/31/2016 02:46 PM, Scott Branden wrote: >> >> >> On 16-10-31 02:01 PM, Florian Fainelli wrote: >>> On 10/31/2016 01:59 PM, Hauke Mehrtens wrote: >>>> >>>> >>>> On 10/31/2016 07:08 PM, Scott Branden wrote: >>>>> Hi Rafal, >>>>> >>>>> >>>>> On 16-10-29 04:12 AM, Rafał Miłecki wrote: >>>>>> From: Rafał Miłecki <rafal@milecki.pl> >>>>>> >>>>>> Since early BCM5301X days we got abort handler that was removed by >>>>>> commit 937b12306ea79 ("ARM: BCM5301X: remove workaround imprecise >>>>>> abort >>>>>> fault handler"). It assumed we need to deal only with pending aborts >>>>>> left by the bootloader. Unfortunately this isn't true for BCM5301X. >>>>>> >>>>>> When probing PCI config space (device enumeration) it is expected to >>>>>> have master aborts on the PCI bus. Most bridges don't forward (or they >>>>>> allow disabling it) these errors onto the AXI/AMBA bus but not the >>>>>> Northstar (BCM5301X) one. >>>>> Should we only add this workaround code if CONFIG_PCI is on then? >>>> >>>> I think all the supported northstar devices have a PCIe controller. We >>>> could add such a CONFIG_PCI check, but I do not see a big advantage. >>> >>> Actually, I do see a couple disadvantages if we gate this with >>> CONFIG_PCI: if this problem shows up irrespective of your kernel >>> configuration, you want the error handler to clear it, not rely on >>> CONFIG_PCI to be enabled for the error to go away and also, without an >>> additional ifdef, additional compiler coverage. >>> >> A problem with reintroducing this change is that all imprecise data >> aborts are ignored, not just PCI. So if you don't actually use PCI in >> your system and want to debug other aborts you are unable to. I don't >> know if we care about such situation. > > Considering that any abort is pretty much fatal, the options are: > > - update the freaking bootloader to a version where there are no such > aborts generated, not an option on consumer devices, unclear which > version (if any) fixes that > > - fixups the aborts externally, via a boot wrapper, which is going to > take some time to develop, causes additional burden on the distributors > to provide instructions/build images on how to do it I think the abort is already fixed in the kernel now on boot so option 1 and 2 were not needed - and abort handler was removed as detailed by Rafal in the commit. Only outstanding issue is now this new PCI enumeration issue. > > - fixups the aborts in the kernel, irrespective of where they come from, > simple and easy > > - fixups the aborts in the kernel, look where they come from, by using > some bit of magic, looking at PCIe registers and whatnot (provided that > is even possible), not too hard, but can take a while, and is error prone Option 4 sounds like the proper solution - check the range the abort is due to the PCI device enumeration and only ignore those aborts. > > I can certainly advocate that option 3 is the one that gives a working > device, and this is what matters from a distribution perspective like > LEDE/OpenWrt. > Since I don't use BCM5301x option 3 is fine by me.
On 31 October 2016 at 23:05, Scott Branden <scott.branden@broadcom.com> wrote: > On 16-10-31 02:56 PM, Florian Fainelli wrote: >> - fixups the aborts in the kernel, look where they come from, by using >> some bit of magic, looking at PCIe registers and whatnot (provided that >> is even possible), not too hard, but can take a while, and is error prone > > Option 4 sounds like the proper solution - check the range the abort is due > to the PCI device enumeration and only ignore those aborts. This was already suggested by Arnd: http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/422873.html and Ray was checking some internal datasheets, but I'm afraid he never found info on checking if error was caused by PCIe controller. Maybe we could think about some BCM5301X API (extra symbol exported by arch code) for starting ignoring aborts and stopping that. We could ignore aborts during scanning only but honestly it sounds like a bit hacky solution to me. >> I can certainly advocate that option 3 is the one that gives a working >> device, and this is what matters from a distribution perspective like >> LEDE/OpenWrt. >> > Since I don't use BCM5301x option 3 is fine by me. So for it seems like the best solution to me, but I'm open for changes if someone points another that is clean & better one.
Hi Rafal, On 10/31/2016 3:16 PM, Rafał Miłecki wrote: > On 31 October 2016 at 23:05, Scott Branden <scott.branden@broadcom.com> wrote: >> On 16-10-31 02:56 PM, Florian Fainelli wrote: >>> - fixups the aborts in the kernel, look where they come from, by using >>> some bit of magic, looking at PCIe registers and whatnot (provided that >>> is even possible), not too hard, but can take a while, and is error prone >> >> Option 4 sounds like the proper solution - check the range the abort is due >> to the PCI device enumeration and only ignore those aborts. > > This was already suggested by Arnd: > http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/422873.html > and Ray was checking some internal datasheets, but I'm afraid he never > found info on checking if error was caused by PCIe controller. > Correct. Based on response from our ASIC team, on Northstar, 1) we cannot distinguish between various external imprecise aborts; 2) nor can we use the internal PCIe controller register to disable unsupported request from being forwarded as APB bus error (and therefore causes the abort). All subsequent iProc SoCs can disable this error forwarding from the iProc PCIe controller. > Maybe we could think about some BCM5301X API (extra symbol exported by > arch code) for starting ignoring aborts and stopping that. We could > ignore aborts during scanning only but honestly it sounds like a bit > hacky solution to me. > > Correct, one possible solution is to only ignoring the aborts during condiguration space register access of an endpoint device. Below is the logic how I only disable the APB bus error forwarding during these accesses (for all other iProc SoCs): +/** + * APB error forwarding can be disabled during access of configuration + * registers of the endpoint device, to prevent unsupported requests + * (typically seen during enumeration with multi-function devices) from + * triggering a system exception + */ +static inline void iproc_pcie_apb_err_disable(struct pci_bus *bus, + bool disable) +{ + struct iproc_pcie *pcie = iproc_data(bus); + u32 val; + + if (bus->number && pcie->has_apb_err_disable) { + val = iproc_pcie_read_reg(pcie, IPROC_PCIE_APB_ERR_EN); + if (disable) + val &= ~APB_ERR_EN; + else + val |= APB_ERR_EN; + iproc_pcie_write_reg(pcie, IPROC_PCIE_APB_ERR_EN, val); + } +} + [...] +static int iproc_pcie_config_read32(struct pci_bus *bus, unsigned int devfn, + int where, int size, u32 *val) +{ + int ret; + + iproc_pcie_apb_err_disable(bus, true); + ret = pci_generic_config_read32(bus, devfn, where, size, val); + iproc_pcie_apb_err_disable(bus, false); + + return ret; +} + +static int iproc_pcie_config_write32(struct pci_bus *bus, unsigned int devfn, + int where, int size, u32 val) +{ + int ret; + + iproc_pcie_apb_err_disable(bus, true); + ret = pci_generic_config_write32(bus, devfn, where, size, val); + iproc_pcie_apb_err_disable(bus, false); + + return ret; +} + static struct pci_ops iproc_pcie_ops = { .map_bus = iproc_pcie_map_cfg_bus, - .read = pci_generic_config_read32, - .write = pci_generic_config_write32, + .read = iproc_pcie_config_read32, + .write = iproc_pcie_config_write32, }; >>> I can certainly advocate that option 3 is the one that gives a working >>> device, and this is what matters from a distribution perspective like >>> LEDE/OpenWrt. >>> >> Since I don't use BCM5301x option 3 is fine by me. > > So for it seems like the best solution to me, but I'm open for changes > if someone points another that is clean & better one. > I agree with that option 3 is probably the best solution for now, from a distribution's perspective. Thanks, Ray
diff --git a/arch/arm/mach-bcm/bcm_5301x.c b/arch/arm/mach-bcm/bcm_5301x.c index c8830a2..fe067f6 100644 --- a/arch/arm/mach-bcm/bcm_5301x.c +++ b/arch/arm/mach-bcm/bcm_5301x.c @@ -9,14 +9,42 @@ #include <asm/hardware/cache-l2x0.h> #include <asm/mach/arch.h> +#include <asm/siginfo.h> +#include <asm/signal.h> + +#define FSR_EXTERNAL (1 << 12) +#define FSR_READ (0 << 10) +#define FSR_IMPRECISE 0x0406 static const char *const bcm5301x_dt_compat[] __initconst = { "brcm,bcm4708", NULL, }; +static int bcm5301x_abort_handler(unsigned long addr, unsigned int fsr, + struct pt_regs *regs) +{ + /* + * We want to ignore aborts forwarded from the PCIe bus that are + * expected and shouldn't really be passed by the PCIe controller. + * The biggest disadvantage is the same FSR code may be reported when + * reading non-existing APB register and we shouldn't ignore that. + */ + if (fsr == (FSR_EXTERNAL | FSR_READ | FSR_IMPRECISE)) + return 0; + + return 1; +} + +static void __init bcm5301x_init_early(void) +{ + hook_fault_code(16 + 6, bcm5301x_abort_handler, SIGBUS, BUS_OBJERR, + "imprecise external abort"); +} + DT_MACHINE_START(BCM5301X, "BCM5301X") .l2c_aux_val = 0, .l2c_aux_mask = ~0, .dt_compat = bcm5301x_dt_compat, + .init_early = bcm5301x_init_early, MACHINE_END