mbox series

[0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects

Message ID 20230330165128.3237939-1-james.morse@arm.com (mailing list archive)
Headers show
Series arm64: errata: Disable FWB on parts with non-ARM interconnects | expand

Message

James Morse March 30, 2023, 4:51 p.m. UTC
Hello!

Changes since the RFC?:
 * Added DT support, in a way that means we don't end up with per-erratum
   strings, or bloat in the calling code to check for those strings.
 * Added a commandline argument. (boo)
 * Changes to support errata affecting features on big-little systems properly.

~

When stage1 translation is disabled, the SCTRL_E1.I bit controls the
attributes used for instruction fetch, one of the options results in a
non-cacheable access. A whole host of CPUs missed the FWB override
in this case, meaning a KVM guest could fetch stale/junk data instead of
instructions.

The workaround is to disable FWB, and do the required cache maintenance
instead.

The good news is, this isn't a problem for systems using Arm's
interconnect IP. The bad news is: linux can't know this. Arm knows of
at least one platform that is affected by this erratum.


This series adds support for the 'Errata Management Firmware Interface', [0]
and queries that to determine if the CPU is affected or not. DT support is
added so that the firmware interface values can be queried directly from the
DT. This can be used as a fallback for platforms that don't yet support the
interface.

Unfortunately, no-one has firmware that supports this new interface yet,
and the least surprising thing to do is to enable the workaround by default,
meaning FWB is disabled on all these cores, even for unaffected platforms.
ACPI Platforms that are not-affected can either take a firmware-update to
support the interface, or if the kernel they run will only run on hardware
that is unaffected, disable the workaround at build time.

The trusted firmware series to implement the interface is here:
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/19835
the trusted firmware code to advertise a value for this erratum is here:
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/20076

The SDEN documents that describe this are:
Cortex-A78:
https://developer.arm.com/documentation/SDEN1401784/1800/?lang=en
Cortex-A78C:
https://developer.arm.com/documentation/SDEN1707916/1300/?lang=en
https://developer.arm.com/documentation/SDEN2004089/0700/?lang=en
(yes, there are two!)
Cortex-A710:
https://developer.arm.com/documentation/SDEN1775101/1500/?lang=en
Cortex-X1:
https://developer.arm.com/documentation/SDEN1401782/1800/?lang=en
Cortex-X2:
https://developer.arm.com/documentation/SDEN1775100/1500/?lang=en
Cortex-X3:
https://developer.arm.com/documentation/SDEN2055130/1000/?lang=en
Cortex-V1:
https://developer.arm.com/documentation/SDEN1401781/1600/?lang=en
Cortex-V2:
https://developer.arm.com/documentation/SDEN2332927/0500/?lang=en
Cortex-N2:
https://developer.arm.com/documentation/SDEN1982442/1200/?lang=en

Thanks,

James

[0] https://developer.arm.com/documentation/den0100/1-0/?lang=en
[RFC] https://lore.kernel.org/linux-arm-kernel/20230216182201.1705406-1-james.morse@arm.com/

James Morse (6):
  dt-bindings: firmware: Add arm,errata-management
  firmware: smccc: Add support for erratum discovery API
  arm64: cputype: Add new part numbers for Cortex-X3, and Neoverse-V2
  arm64: errata: Disable FWB on parts with non-ARM interconnects
  firmware: smccc: Allow errata management to be overridden by device
    tree
  arm64: errata: Add a commandline option to enable/disable #2701951

 .../admin-guide/kernel-parameters.txt         |   4 +
 Documentation/arm64/silicon-errata.rst        |  18 ++
 .../devicetree/bindings/arm/cpus.yaml         |   5 +
 .../firmware/arm,errata-management.yaml       |  72 +++++
 arch/arm64/Kconfig                            |  27 ++
 arch/arm64/include/asm/cpufeature.h           |   1 +
 arch/arm64/include/asm/cputype.h              |   4 +
 arch/arm64/kernel/cpu_errata.c                | 123 ++++++++
 arch/arm64/kernel/cpufeature.c                |  23 +-
 arch/arm64/tools/cpucaps                      |   1 +
 drivers/firmware/smccc/Kconfig                |   8 +
 drivers/firmware/smccc/Makefile               |   1 +
 drivers/firmware/smccc/em.c                   | 284 ++++++++++++++++++
 include/linux/arm-smccc.h                     |  28 ++
 include/linux/arm_smccc_em.h                  |  11 +
 15 files changed, 609 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
 create mode 100644 drivers/firmware/smccc/em.c
 create mode 100644 include/linux/arm_smccc_em.h

Comments

Rob Herring March 31, 2023, 12:57 p.m. UTC | #1
On Thu, Mar 30, 2023 at 11:51 AM James Morse <james.morse@arm.com> wrote:
>
> Hello!
>
> Changes since the RFC?:
>  * Added DT support, in a way that means we don't end up with per-erratum
>    strings, or bloat in the calling code to check for those strings.
>  * Added a commandline argument. (boo)
>  * Changes to support errata affecting features on big-little systems properly.
>
> ~
>
> When stage1 translation is disabled, the SCTRL_E1.I bit controls the
> attributes used for instruction fetch, one of the options results in a
> non-cacheable access. A whole host of CPUs missed the FWB override
> in this case, meaning a KVM guest could fetch stale/junk data instead of
> instructions.
>
> The workaround is to disable FWB, and do the required cache maintenance
> instead.

What's FWB? I don't see it defined anywhere in the series.

Rob
Rob Herring March 31, 2023, 1:03 p.m. UTC | #2
On Fri, Mar 31, 2023 at 7:57 AM Rob Herring <robh+dt@kernel.org> wrote:
>
> On Thu, Mar 30, 2023 at 11:51 AM James Morse <james.morse@arm.com> wrote:
> >
> > Hello!
> >
> > Changes since the RFC?:
> >  * Added DT support, in a way that means we don't end up with per-erratum
> >    strings, or bloat in the calling code to check for those strings.
> >  * Added a commandline argument. (boo)
> >  * Changes to support errata affecting features on big-little systems properly.
> >
> > ~
> >
> > When stage1 translation is disabled, the SCTRL_E1.I bit controls the
> > attributes used for instruction fetch, one of the options results in a
> > non-cacheable access. A whole host of CPUs missed the FWB override
> > in this case, meaning a KVM guest could fetch stale/junk data instead of
> > instructions.
> >
> > The workaround is to disable FWB, and do the required cache maintenance
> > instead.
>
> What's FWB? I don't see it defined anywhere in the series.

Ah, there it is in patch 1. It wasn't in patch 6, so naturally I went
searching in the cover letter.

Rob
James Morse May 16, 2023, 4:29 p.m. UTC | #3
Hi Catalin, Marc,

On 11/05/2023 22:13, Catalin Marinas wrote:
> On Thu, May 11, 2023 at 07:42:58PM +0100, Marc Zyngier wrote:
>> On Thu, 11 May 2023 18:15:15 +0100,
>> Catalin Marinas <catalin.marinas@arm.com> wrote:
>>> On Thu, Mar 30, 2023 at 05:51:22PM +0100, James Morse wrote:
>>>> When stage1 translation is disabled, the SCTRL_E1.I bit controls the
>>>> attributes used for instruction fetch, one of the options results in a
>>>> non-cacheable access. A whole host of CPUs missed the FWB override
>>>> in this case, meaning a KVM guest could fetch stale/junk data instead of
>>>> instructions.
>>>>
>>>> The workaround is to disable FWB, and do the required cache maintenance
>>>> instead.

>>> I think the workaround can be to only do the required cache maintenance
>>> without disabling FWB. Having FWB on doesn't bring any performance
>>> benefits if we do the cache maintenance anyway but keeping it around may
>>> be useful for other reasons (e.g. KVM device pass-through using
>>> cacheable mappings, though not something KVM supports currently).
>>
>> But you'd also rely on the guest doing its own cache maintenance for
>> instructions it writes, right?
> 
> Ah, you are right. It looks like I only considered the host writing
> instructions. If the guest disabled stage 1 and wrote some instructions
> with FWB on, they'd not necessarily reach the PoC while the instructions
> are fetched from PoC with this bug. Even with SCTLR_EL1.I==0, the guest
> is supposed to do an IC IVAU if it wrote instructions but that's not
> sufficient (hint to the micro-architects, add a chicken bit to upgrade
> IC IVAU to also do a DC CVAC ;))
> 
>> Which probably means exposing a different CLIDR_EL1 so that
>> LoC/LoUU/LoUIS are consistent with *not* having FWB... I also wonder
>> if keeping FWB set has the potential to change the semantics of the
>> CMOs (the spec seems silent on that front).
> 
> Not sure about CMOs, I'd expect them to behave in the same way. However,
> I don't see how faking CLIDR_EL1 can trick the guest into doing DC CVAC
> when its MMU is off.

I think the request is to keep the FWB feature, but to disable it for all host memory
the guest can execute from. I presume this 'device pass-through using cacheable mappings'
would mark that address range as XN at stage2, ( ... it's special right?).

If this is for something like CXL: it can't set XN, and the guest would still be exposed
to the erratum if it executes from theses addresses with the MMU off.

Does this need doing now? It wouldn't need backporting to older kernels...


>>> That said, maybe we can reduce the risk further by doing the
>>> vcpu_has_run_once() trick with !FWB and clean the D side to PoC on a
>>> stage 2 exec fault (together with the I-cache invalidation). We can then
>>> ignore any other cache maintenance on S2 faults until someone shouts (we
>>> can maybe recommend forcing FWB off on the command line through the
>>> cpuid override).
>>
>> You lost me here with your vcpu_has_run_once().
> 
> Most likely I lost myself in the code. So the tricks we used in the past
> tracking the guest MMU off/on was only for the D side. If (we hope that)
> the guest only wrote instructions to a page once before executing them
> (and never writing instructions again), we could trap a subsequent exec
> fault and do the D-cache clean to PoC again.
> 
>> Keeping the CMOs on exec fault is definitely manageable. But is that
>> enough?
> 
> Yeah, not sure it's enough if the guest keeps writing instructions to
> the same page with the MMU off.

The difference between FWB and IDC/DIC still does my head in: My reading is FWB implies
IDC, (but the CTR_EL0.IDC bit might not be set). This doesn't help if the wrong attributes
are being used for instruction fetch.
This is cache-maintenance that wasn't needed before, so there are no tricks with the id
registers we can pull to make the guest do it.


v2 of this will flip the polarity, and also detect based on an 'arm,interconnect'
compatible, or the existing compatible the PMU driver uses.



Thanks,

James
Will Deacon May 23, 2023, 10:48 a.m. UTC | #4
Hi James,

On Thu, Mar 30, 2023 at 05:51:22PM +0100, James Morse wrote:
> Changes since the RFC?:
>  * Added DT support, in a way that means we don't end up with per-erratum
>    strings, or bloat in the calling code to check for those strings.
>  * Added a commandline argument. (boo)
>  * Changes to support errata affecting features on big-little systems properly.
> 
> ~
> 
> When stage1 translation is disabled, the SCTRL_E1.I bit controls the
> attributes used for instruction fetch, one of the options results in a
> non-cacheable access. A whole host of CPUs missed the FWB override
> in this case, meaning a KVM guest could fetch stale/junk data instead of
> instructions.
> 
> The workaround is to disable FWB, and do the required cache maintenance
> instead.
> 
> The good news is, this isn't a problem for systems using Arm's
> interconnect IP. The bad news is: linux can't know this. Arm knows of
> at least one platform that is affected by this erratum.
> 
> 
> This series adds support for the 'Errata Management Firmware Interface', [0]
> and queries that to determine if the CPU is affected or not. DT support is
> added so that the firmware interface values can be queried directly from the
> DT. This can be used as a fallback for platforms that don't yet support the
> interface.

It occurs to me that, when a device is assigned to a VM, there are a
whole bunch of non-probeable reasons why FWB cannot be used and perhaps
we should be looking to advertise that from firmware without pulling in
a reliance on this errata management interface.

Right now, I don't think KVM or VFIO do anything to prevent the assignment
of a device capable of non-coherent DMA traffic and FWB is used if the CPUs
support it. This, in theory, allows the guest to read stale data from host
memory pages using non-coherent DMA and I think we should gate usage of FWB
for a given VM on whether or not that VM has such a device assigned.

The problem is that I don't think you can probe this property reliably. It's
not enough to check for "dma-coherent"; rather we also need to know details
about the IOMMU and device-specific properties such as the ability to
generate NoSnoop transactions. I think firmware is probably the only option
here, so since you're proposing something similar, please can we make it
general enough to be used outside of errata scenarios?

Will
Robin Murphy May 23, 2023, 12:24 p.m. UTC | #5
On 2023-05-16 17:29, James Morse wrote:
> Hi Catalin, Marc,
> 
> On 11/05/2023 22:13, Catalin Marinas wrote:
>> On Thu, May 11, 2023 at 07:42:58PM +0100, Marc Zyngier wrote:
>>> On Thu, 11 May 2023 18:15:15 +0100,
>>> Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>> On Thu, Mar 30, 2023 at 05:51:22PM +0100, James Morse wrote:
>>>>> When stage1 translation is disabled, the SCTRL_E1.I bit controls the
>>>>> attributes used for instruction fetch, one of the options results in a
>>>>> non-cacheable access. A whole host of CPUs missed the FWB override
>>>>> in this case, meaning a KVM guest could fetch stale/junk data instead of
>>>>> instructions.
>>>>>
>>>>> The workaround is to disable FWB, and do the required cache maintenance
>>>>> instead.
> 
>>>> I think the workaround can be to only do the required cache maintenance
>>>> without disabling FWB. Having FWB on doesn't bring any performance
>>>> benefits if we do the cache maintenance anyway but keeping it around may
>>>> be useful for other reasons (e.g. KVM device pass-through using
>>>> cacheable mappings, though not something KVM supports currently).
>>>
>>> But you'd also rely on the guest doing its own cache maintenance for
>>> instructions it writes, right?
>>
>> Ah, you are right. It looks like I only considered the host writing
>> instructions. If the guest disabled stage 1 and wrote some instructions
>> with FWB on, they'd not necessarily reach the PoC while the instructions
>> are fetched from PoC with this bug. Even with SCTLR_EL1.I==0, the guest
>> is supposed to do an IC IVAU if it wrote instructions but that's not
>> sufficient (hint to the micro-architects, add a chicken bit to upgrade
>> IC IVAU to also do a DC CVAC ;))
>>
>>> Which probably means exposing a different CLIDR_EL1 so that
>>> LoC/LoUU/LoUIS are consistent with *not* having FWB... I also wonder
>>> if keeping FWB set has the potential to change the semantics of the
>>> CMOs (the spec seems silent on that front).
>>
>> Not sure about CMOs, I'd expect them to behave in the same way. However,
>> I don't see how faking CLIDR_EL1 can trick the guest into doing DC CVAC
>> when its MMU is off.
> 
> I think the request is to keep the FWB feature, but to disable it for all host memory
> the guest can execute from. I presume this 'device pass-through using cacheable mappings'
> would mark that address range as XN at stage2, ( ... it's special right?).
> 
> If this is for something like CXL: it can't set XN, and the guest would still be exposed
> to the erratum if it executes from theses addresses with the MMU off.
> 
> Does this need doing now? It wouldn't need backporting to older kernels...
> 
> 
>>>> That said, maybe we can reduce the risk further by doing the
>>>> vcpu_has_run_once() trick with !FWB and clean the D side to PoC on a
>>>> stage 2 exec fault (together with the I-cache invalidation). We can then
>>>> ignore any other cache maintenance on S2 faults until someone shouts (we
>>>> can maybe recommend forcing FWB off on the command line through the
>>>> cpuid override).
>>>
>>> You lost me here with your vcpu_has_run_once().
>>
>> Most likely I lost myself in the code. So the tricks we used in the past
>> tracking the guest MMU off/on was only for the D side. If (we hope that)
>> the guest only wrote instructions to a page once before executing them
>> (and never writing instructions again), we could trap a subsequent exec
>> fault and do the D-cache clean to PoC again.
>>
>>> Keeping the CMOs on exec fault is definitely manageable. But is that
>>> enough?
>>
>> Yeah, not sure it's enough if the guest keeps writing instructions to
>> the same page with the MMU off.
> 
> The difference between FWB and IDC/DIC still does my head in: My reading is FWB implies
> IDC, (but the CTR_EL0.IDC bit might not be set). This doesn't help if the wrong attributes
> are being used for instruction fetch.
> This is cache-maintenance that wasn't needed before, so there are no tricks with the id
> registers we can pull to make the guest do it.
> 
> 
> v2 of this will flip the polarity, and also detect based on an 'arm,interconnect'
> compatible, or the existing compatible the PMU driver uses.

Unfortunately I don't think PMUs are going to be a meaningful indicator 
in general since they don't imply anything about topology - you may 
infer that, say, an Arm CMN exists *somewhere* in the system, but it 
could conceivably be on some I/O chiplet connected via CCIX/CXL to a 
different interconnect on the CPU die which *does* still need the 
mitigation.

Thanks,
Robin.