Message ID | 20220725163905.2024437-1-jeremy.linton@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | PCI SMC conduit, now with DT support | expand |
On Mon, Jul 25, 2022 at 11:39:01AM -0500, Jeremy Linton wrote: > This is a rebase of the later revisions of [1], but refactored > slightly to add a DT method as well. It has all the same advantages of > the ACPI method (putting HW quirks in the firmware rather than the > kernel) but now applied to a 'pci-host-smc-generic' compatible > property which extends the pci-host-generic logic to handle cases > where the PCI Config region isn't ECAM compliant. With this in place, > and firmware managed clock/phy/etc its possible to run the generic > driver on hardware that isn't what one would consider standards > compliant PCI root ports. I still think that hiding the code in firmware because the hardware is broken is absolutely the wrong way to tackle this problem and I thought the general idea from last time was that we were going to teach Linux about the broken hardware instead [1]. I'd rather have the junk where we can see it, reason about it and modify it. What's changed? In my mind, the main thing that's happened since we last discussed this is that Apple shipped arm64 client hardware with working ECAM. *Apple* for goodness sake: a company with basically no incentive to follow standards for their vertically integrated devices! Perhaps others need to raise their game instead of wasting everybody's time on firmware hacks; getting the hardware right obviously isn't as difficult as folks would lead us to believe. Will [1] https://lore.kernel.org/r/20210325131231.GA18590@e121166-lin.cambridge.arm.com
Hi, On 7/26/22 06:40, Will Deacon wrote: > On Mon, Jul 25, 2022 at 11:39:01AM -0500, Jeremy Linton wrote: >> This is a rebase of the later revisions of [1], but refactored >> slightly to add a DT method as well. It has all the same advantages of >> the ACPI method (putting HW quirks in the firmware rather than the >> kernel) but now applied to a 'pci-host-smc-generic' compatible >> property which extends the pci-host-generic logic to handle cases >> where the PCI Config region isn't ECAM compliant. With this in place, >> and firmware managed clock/phy/etc its possible to run the generic >> driver on hardware that isn't what one would consider standards >> compliant PCI root ports. > > I still think that hiding the code in firmware because the hardware is > broken is absolutely the wrong way to tackle this problem and I thought > the general idea from last time was that we were going to teach Linux > about the broken hardware instead [1]. I'd rather have the junk where we > can see it, reason about it and modify it. Well, the CM4/ACPI/PCIe quirk still hasn't landed, but that's not the point. I would like to understand why you think this patch is any different than the dozens of other firmware traps, quite a number merged in the last year, for "broken" hardware or simply as generic platform interfaces? Without rehashing, the entire discussion in the previous thread, I'm going to repeat that this is an official Arm standard the same as the firmware traps to handle speculative execution mitigations or to standardize platform functionality, ex: PSCI or the recent TRNG code. It also has uses beyond fixing broken hardware. But similar to those examples, I think everyone here understands the kernel is both a poor place for this kind of logic, while at the same time may not be technically feasible without supplying EL3, management processor code, or traps to said code. Is it the official position of the Linux kernel maintainers that they will refuse to support future Arm standards in order to gate keep specific hardware platforms? > > What's changed? Well, the code to support this interface is upstream in both TFA, edk2, and various other OS's. So now Linux is trailing. > > In my mind, the main thing that's happened since we last discussed this > is that Apple shipped arm64 client hardware with working ECAM. *Apple* > for goodness sake: a company with basically no incentive to follow > standards for their vertically integrated devices! Perhaps others need > to raise their game instead of wasting everybody's time on firmware > hacks; getting the hardware right obviously isn't as difficult as folks > would lead us to believe. I find it interesting that you hold up the M1 as an example of good hardware. That hardware is one of the worse violators of both platform standards, as well has having a lot of "broken" hardware requiring changes to the kernel that previously were rejected as too far out of line. Never mind, as you point out it has basically zero vendor support and exists only due to a large reverse engineering effort. Thanks for looking at this,
Hi Jeremy, On Thu, Jul 28, 2022 at 12:20:55PM -0500, Jeremy Linton wrote: > On 7/26/22 06:40, Will Deacon wrote: > > On Mon, Jul 25, 2022 at 11:39:01AM -0500, Jeremy Linton wrote: > > > This is a rebase of the later revisions of [1], but refactored > > > slightly to add a DT method as well. It has all the same advantages of > > > the ACPI method (putting HW quirks in the firmware rather than the > > > kernel) but now applied to a 'pci-host-smc-generic' compatible > > > property which extends the pci-host-generic logic to handle cases > > > where the PCI Config region isn't ECAM compliant. With this in place, > > > and firmware managed clock/phy/etc its possible to run the generic > > > driver on hardware that isn't what one would consider standards > > > compliant PCI root ports. > > > > I still think that hiding the code in firmware because the hardware is > > broken is absolutely the wrong way to tackle this problem and I thought > > the general idea from last time was that we were going to teach Linux > > about the broken hardware instead [1]. I'd rather have the junk where we > > can see it, reason about it and modify it. [...] > Is it the official position of the Linux kernel maintainers that they will > refuse to support future Arm standards in order to gate keep specific > hardware platforms? (just back from holiday; well, briefly, going away for a few days soon) We shouldn't generalise what maintainers wwould accept or not. We decide on a case by case basis. With speculative execution mitigations, for example, we try to do as much as we can in the kernel but sometimes that's just not possible, hence an EL3 call and we'd rather have this standardised (e.g. custom branch loops to flush the branch predictor if possible from the normal world, secure call if not). You mention PSCI but that's not working around broken hardware, it was a concious decision from the start to standardise the booting protocol and CPU power management. Now this PCI SMC protocol was simply created because hardware did not comply with another PCI standard that has been around for a long time. As with the speculative execution mitigations, we'd rather work around broken hardware in the kernel first and, if it's not possible, we can look at a firmware interface (and ideally standardised). Do you have an example where we cannot work around the PCI hardware bugs in the kernel and EL3 firmware involvement is necessary? So, in summary, Arm Ltd proposing a new standard because hardware companies can't be bothered with an existing one is not an argument for accepting its support in the Linux kernel. This PCI SMC conduit is not presented as a hardware bug workaround interface but rather as an alternative to ECAM (and, yes, the kernel maintainers can choose not to support specific "standards" in Linux).
On Tuesday 16 August 2022 08:59:05 Catalin Marinas wrote: > Hi Jeremy, > > On Thu, Jul 28, 2022 at 12:20:55PM -0500, Jeremy Linton wrote: > > On 7/26/22 06:40, Will Deacon wrote: > > > On Mon, Jul 25, 2022 at 11:39:01AM -0500, Jeremy Linton wrote: > > > > This is a rebase of the later revisions of [1], but refactored > > > > slightly to add a DT method as well. It has all the same advantages of > > > > the ACPI method (putting HW quirks in the firmware rather than the > > > > kernel) but now applied to a 'pci-host-smc-generic' compatible > > > > property which extends the pci-host-generic logic to handle cases > > > > where the PCI Config region isn't ECAM compliant. With this in place, > > > > and firmware managed clock/phy/etc its possible to run the generic > > > > driver on hardware that isn't what one would consider standards > > > > compliant PCI root ports. > > > > > > I still think that hiding the code in firmware because the hardware is > > > broken is absolutely the wrong way to tackle this problem and I thought > > > the general idea from last time was that we were going to teach Linux > > > about the broken hardware instead [1]. I'd rather have the junk where we > > > can see it, reason about it and modify it. > [...] > > Is it the official position of the Linux kernel maintainers that they will > > refuse to support future Arm standards in order to gate keep specific > > hardware platforms? > > (just back from holiday; well, briefly, going away for a few days soon) > > We shouldn't generalise what maintainers wwould accept or not. We decide > on a case by case basis. With speculative execution mitigations, for > example, we try to do as much as we can in the kernel but sometimes > that's just not possible, hence an EL3 call and we'd rather have this > standardised (e.g. custom branch loops to flush the branch predictor if > possible from the normal world, secure call if not). > > You mention PSCI but that's not working around broken hardware, it was a > concious decision from the start to standardise the booting protocol and > CPU power management. > > Now this PCI SMC protocol was simply created because hardware did not > comply with another PCI standard that has been around for a long time. > As with the speculative execution mitigations, we'd rather work around > broken hardware in the kernel first and, if it's not possible, we can > look at a firmware interface (and ideally standardised). Do you have an > example where we cannot work around the PCI hardware bugs in the kernel > and EL3 firmware involvement is necessary? > > So, in summary, Arm Ltd proposing a new standard because hardware > companies can't be bothered with an existing one is not an argument for > accepting its support in the Linux kernel. This PCI SMC conduit is not > presented as a hardware bug workaround interface but rather as an > alternative to ECAM (and, yes, the kernel maintainers can choose not to > support specific "standards" in Linux). Hello! I think that this PCI SMC could be already marked as deprecated as Linux can use "native" drivers to access PCIe config space, without need to use any kind of RPC mechanism, like ARM SMC. Note that for example kernel driver phy-mvebu-a3700-comphy.c was converted from ARM SMC API to true "native" linux driver which touch hardware directly (and does not use RPC API). And this is the right direction, stop using RPC APIs in kernel and configure hardware directly without need to depends on firmware, SMC or any other SW which is running on CPU. Depending on the firmware or its functionality which access same HW as kernel itself, is always nightmare. x86 developers have enough experience with BIOS and its poor implementations and there was for a long time direction to not use x86 BIOS and rather communicate with hardware directly. And if PCIe hardware is broken? Well, PCIe controller drivers should be extended to handle or workaround it. I have already sent lot of patches for Marvell PCIe controllers to workaround HW design issues, so similarly it should be done for other (known broken) vendor HW. So in my opinion, instead of PCI SMC, kernel PCIe controller drivers should be fixed to correctly access PCIe config space and completely deprecate/remove this PCI SMC from kernel. And if PCI SMC has not landed in kernel yet, even better, because deprecation step can be skipped.