mbox series

[0/4] PCI SMC conduit, now with DT support

Message ID 20220725163905.2024437-1-jeremy.linton@arm.com (mailing list archive)
Headers show
Series PCI SMC conduit, now with DT support | expand

Message

Jeremy Linton July 25, 2022, 4:39 p.m. UTC
This is a rebase of the later revisions of [1], but refactored
slightly to add a DT method as well. It has all the same advantages of
the ACPI method (putting HW quirks in the firmware rather than the
kernel) but now applied to a 'pci-host-smc-generic' compatible
property which extends the pci-host-generic logic to handle cases
where the PCI Config region isn't ECAM compliant. With this in place,
and firmware managed clock/phy/etc its possible to run the generic
driver on hardware that isn't what one would consider standards
compliant PCI root ports.

The DT code was tested on the RPi4, where the ACPI/SMC is upstream in
TF-A and EDK2. On that platform the PCIe works as expected utilizing
the generic host driver rather than the pcie-brcmstb driver.

[1] https://lkml.org/lkml/2021/1/4/1255

Jeremy Linton (4):
  arm64: smccc: Add PCI SMCCCs
  arm64: PCI: Enable SMC conduit
  PCI: host-generic: Add firmware managed config ops
  dt-bindings: PCI: Note the use of pci-host-smc-generic

 .../bindings/pci/host-generic-pci.yaml        |  24 +++-
 arch/arm64/kernel/pci.c                       | 109 ++++++++++++++++++
 drivers/pci/controller/pci-host-common.c      |  34 ++++--
 drivers/pci/controller/pci-host-generic.c     |   6 +
 include/linux/arm-smccc.h                     |  29 +++++
 5 files changed, 186 insertions(+), 16 deletions(-)

Comments

Will Deacon July 26, 2022, 11:40 a.m. UTC | #1
On Mon, Jul 25, 2022 at 11:39:01AM -0500, Jeremy Linton wrote:
> This is a rebase of the later revisions of [1], but refactored
> slightly to add a DT method as well. It has all the same advantages of
> the ACPI method (putting HW quirks in the firmware rather than the
> kernel) but now applied to a 'pci-host-smc-generic' compatible
> property which extends the pci-host-generic logic to handle cases
> where the PCI Config region isn't ECAM compliant. With this in place,
> and firmware managed clock/phy/etc its possible to run the generic
> driver on hardware that isn't what one would consider standards
> compliant PCI root ports.

I still think that hiding the code in firmware because the hardware is
broken is absolutely the wrong way to tackle this problem and I thought
the general idea from last time was that we were going to teach Linux
about the broken hardware instead [1]. I'd rather have the junk where we
can see it, reason about it and modify it.

What's changed?

In my mind, the main thing that's happened since we last discussed this
is that Apple shipped arm64 client hardware with working ECAM. *Apple*
for goodness sake: a company with basically no incentive to follow
standards for their vertically integrated devices! Perhaps others need
to raise their game instead of wasting everybody's time on firmware
hacks; getting the hardware right obviously isn't as difficult as folks
would lead us to believe.

Will

[1] https://lore.kernel.org/r/20210325131231.GA18590@e121166-lin.cambridge.arm.com
Jeremy Linton July 28, 2022, 5:20 p.m. UTC | #2
Hi,

On 7/26/22 06:40, Will Deacon wrote:
> On Mon, Jul 25, 2022 at 11:39:01AM -0500, Jeremy Linton wrote:
>> This is a rebase of the later revisions of [1], but refactored
>> slightly to add a DT method as well. It has all the same advantages of
>> the ACPI method (putting HW quirks in the firmware rather than the
>> kernel) but now applied to a 'pci-host-smc-generic' compatible
>> property which extends the pci-host-generic logic to handle cases
>> where the PCI Config region isn't ECAM compliant. With this in place,
>> and firmware managed clock/phy/etc its possible to run the generic
>> driver on hardware that isn't what one would consider standards
>> compliant PCI root ports.
> 
> I still think that hiding the code in firmware because the hardware is
> broken is absolutely the wrong way to tackle this problem and I thought
> the general idea from last time was that we were going to teach Linux
> about the broken hardware instead [1]. I'd rather have the junk where we
> can see it, reason about it and modify it.

Well, the CM4/ACPI/PCIe quirk still hasn't landed, but that's not the point.

I would like to understand why you think this patch is any different 
than the dozens of other firmware traps, quite a number merged in the 
last year, for "broken" hardware or simply as generic platform interfaces?

Without rehashing, the entire discussion in the previous thread, I'm 
going to repeat that this is an official Arm standard the same as the 
firmware traps to handle speculative execution mitigations or to 
standardize platform functionality, ex: PSCI or the recent TRNG code. It 
also has uses beyond fixing broken hardware.

But similar to those examples, I think everyone here understands the 
kernel is both a poor place for this kind of logic, while at the same 
time may not be technically feasible without supplying EL3, management 
processor code, or traps to said code.

Is it the official position of the Linux kernel maintainers that they 
will refuse to support future Arm standards in order to gate keep 
specific hardware platforms?

> 
> What's changed?

Well, the code to support this interface is upstream in both TFA, edk2, 
and various other OS's. So now Linux is trailing.

> 
> In my mind, the main thing that's happened since we last discussed this
> is that Apple shipped arm64 client hardware with working ECAM. *Apple*
> for goodness sake: a company with basically no incentive to follow
> standards for their vertically integrated devices! Perhaps others need
> to raise their game instead of wasting everybody's time on firmware
> hacks; getting the hardware right obviously isn't as difficult as folks
> would lead us to believe.

I find it interesting that you hold up the M1 as an example of good 
hardware. That hardware is one of the worse violators of both platform 
standards, as well has having a lot of "broken" hardware requiring 
changes to the kernel that previously were rejected as too far out of 
line. Never mind, as you point out it has basically zero vendor support 
and exists only due to a large reverse engineering effort.


Thanks for looking at this,
Catalin Marinas Aug. 16, 2022, 7:59 a.m. UTC | #3
Hi Jeremy,

On Thu, Jul 28, 2022 at 12:20:55PM -0500, Jeremy Linton wrote:
> On 7/26/22 06:40, Will Deacon wrote:
> > On Mon, Jul 25, 2022 at 11:39:01AM -0500, Jeremy Linton wrote:
> > > This is a rebase of the later revisions of [1], but refactored
> > > slightly to add a DT method as well. It has all the same advantages of
> > > the ACPI method (putting HW quirks in the firmware rather than the
> > > kernel) but now applied to a 'pci-host-smc-generic' compatible
> > > property which extends the pci-host-generic logic to handle cases
> > > where the PCI Config region isn't ECAM compliant. With this in place,
> > > and firmware managed clock/phy/etc its possible to run the generic
> > > driver on hardware that isn't what one would consider standards
> > > compliant PCI root ports.
> > 
> > I still think that hiding the code in firmware because the hardware is
> > broken is absolutely the wrong way to tackle this problem and I thought
> > the general idea from last time was that we were going to teach Linux
> > about the broken hardware instead [1]. I'd rather have the junk where we
> > can see it, reason about it and modify it.
[...]
> Is it the official position of the Linux kernel maintainers that they will
> refuse to support future Arm standards in order to gate keep specific
> hardware platforms?

(just back from holiday; well, briefly, going away for a few days soon)

We shouldn't generalise what maintainers wwould accept or not. We decide
on a case by case basis. With speculative execution mitigations, for
example, we try to do as much as we can in the kernel but sometimes
that's just not possible, hence an EL3 call and we'd rather have this
standardised (e.g. custom branch loops to flush the branch predictor if
possible from the normal world, secure call if not).

You mention PSCI but that's not working around broken hardware, it was a
concious decision from the start to standardise the booting protocol and
CPU power management.

Now this PCI SMC protocol was simply created because hardware did not
comply with another PCI standard that has been around for a long time.
As with the speculative execution mitigations, we'd rather work around
broken hardware in the kernel first and, if it's not possible, we can
look at a firmware interface (and ideally standardised). Do you have an
example where we cannot work around the PCI hardware bugs in the kernel
and EL3 firmware involvement is necessary?

So, in summary, Arm Ltd proposing a new standard because hardware
companies can't be bothered with an existing one is not an argument for
accepting its support in the Linux kernel. This PCI SMC conduit is not
presented as a hardware bug workaround interface but rather as an
alternative to ECAM (and, yes, the kernel maintainers can choose not to
support specific "standards" in Linux).
Pali Rohár Aug. 18, 2022, 9:55 p.m. UTC | #4
On Tuesday 16 August 2022 08:59:05 Catalin Marinas wrote:
> Hi Jeremy,
> 
> On Thu, Jul 28, 2022 at 12:20:55PM -0500, Jeremy Linton wrote:
> > On 7/26/22 06:40, Will Deacon wrote:
> > > On Mon, Jul 25, 2022 at 11:39:01AM -0500, Jeremy Linton wrote:
> > > > This is a rebase of the later revisions of [1], but refactored
> > > > slightly to add a DT method as well. It has all the same advantages of
> > > > the ACPI method (putting HW quirks in the firmware rather than the
> > > > kernel) but now applied to a 'pci-host-smc-generic' compatible
> > > > property which extends the pci-host-generic logic to handle cases
> > > > where the PCI Config region isn't ECAM compliant. With this in place,
> > > > and firmware managed clock/phy/etc its possible to run the generic
> > > > driver on hardware that isn't what one would consider standards
> > > > compliant PCI root ports.
> > > 
> > > I still think that hiding the code in firmware because the hardware is
> > > broken is absolutely the wrong way to tackle this problem and I thought
> > > the general idea from last time was that we were going to teach Linux
> > > about the broken hardware instead [1]. I'd rather have the junk where we
> > > can see it, reason about it and modify it.
> [...]
> > Is it the official position of the Linux kernel maintainers that they will
> > refuse to support future Arm standards in order to gate keep specific
> > hardware platforms?
> 
> (just back from holiday; well, briefly, going away for a few days soon)
> 
> We shouldn't generalise what maintainers wwould accept or not. We decide
> on a case by case basis. With speculative execution mitigations, for
> example, we try to do as much as we can in the kernel but sometimes
> that's just not possible, hence an EL3 call and we'd rather have this
> standardised (e.g. custom branch loops to flush the branch predictor if
> possible from the normal world, secure call if not).
> 
> You mention PSCI but that's not working around broken hardware, it was a
> concious decision from the start to standardise the booting protocol and
> CPU power management.
> 
> Now this PCI SMC protocol was simply created because hardware did not
> comply with another PCI standard that has been around for a long time.
> As with the speculative execution mitigations, we'd rather work around
> broken hardware in the kernel first and, if it's not possible, we can
> look at a firmware interface (and ideally standardised). Do you have an
> example where we cannot work around the PCI hardware bugs in the kernel
> and EL3 firmware involvement is necessary?
> 
> So, in summary, Arm Ltd proposing a new standard because hardware
> companies can't be bothered with an existing one is not an argument for
> accepting its support in the Linux kernel. This PCI SMC conduit is not
> presented as a hardware bug workaround interface but rather as an
> alternative to ECAM (and, yes, the kernel maintainers can choose not to
> support specific "standards" in Linux).

Hello! I think that this PCI SMC could be already marked as deprecated
as Linux can use "native" drivers to access PCIe config space, without
need to use any kind of RPC mechanism, like ARM SMC.

Note that for example kernel driver phy-mvebu-a3700-comphy.c was
converted from ARM SMC API to true "native" linux driver which touch
hardware directly (and does not use RPC API). And this is the right
direction, stop using RPC APIs in kernel and configure hardware
directly without need to depends on firmware, SMC or any other SW which
is running on CPU. Depending on the firmware or its functionality which
access same HW as kernel itself, is always nightmare. x86 developers
have enough experience with BIOS and its poor implementations and there
was for a long time direction to not use x86 BIOS and rather communicate
with hardware directly. And if PCIe hardware is broken? Well, PCIe
controller drivers should be extended to handle or workaround it. I have
already sent lot of patches for Marvell PCIe controllers to workaround
HW design issues, so similarly it should be done for other (known
broken) vendor HW.

So in my opinion, instead of PCI SMC, kernel PCIe controller drivers
should be fixed to correctly access PCIe config space and completely
deprecate/remove this PCI SMC from kernel. And if PCI SMC has not landed
in kernel yet, even better, because deprecation step can be skipped.