mbox series

[0/3] KVM: arm64: Work around Ampere1 erratum AC03_CPU_38

Message ID 20230609220104.1836988-1-oliver.upton@linux.dev (mailing list archive)
Headers show
Series KVM: arm64: Work around Ampere1 erratum AC03_CPU_38 | expand

Message

Oliver Upton June 9, 2023, 10:01 p.m. UTC
Hi folks,

Small series to work around a CPU erratum on AmpereOne. While the
implementation does not advertise support for FEAT_HAFDBS (due to
another erratum), the associated control bits do not have RES0 behavior
as required by the architecture.

Usage of HAFDBS at stage-1 is unaffected, since HA and HD are only
enabled on implementations that advertise the feature. However, KVM
relies on HA having RES0 semantics if the feature isn't implemented. The
end result is that KVM enables a broken hardware access flag
implementation that could lead to correctness issues.

Applies to 6.4-rc1. Tested with access_tracking_perf_test, verifying
that KVM is indeed taking Access Flag faults.

Oliver Upton (3):
  arm64: errata: Mitigate Ampere1 erratum AC03_CPU_38 at stage-2
  KVM: arm64: Refactor HFGxTR configuration into separate helpers
  KVM: arm64: Prevent guests from enabling HA/HD on Ampere1

 Documentation/arm64/silicon-errata.rst  |  3 +
 arch/arm64/Kconfig                      | 17 +++++
 arch/arm64/kernel/cpu_errata.c          |  7 ++
 arch/arm64/kvm/hyp/include/hyp/switch.h | 99 ++++++++++++++++++++-----
 arch/arm64/kvm/hyp/pgtable.c            | 14 +++-
 arch/arm64/tools/cpucaps                |  1 +
 6 files changed, 120 insertions(+), 21 deletions(-)


base-commit: ac9a78681b921877518763ba0e89202254349d1b

Comments

Catalin Marinas June 14, 2023, 4:57 p.m. UTC | #1
On Fri, Jun 09, 2023 at 10:01:01PM +0000, Oliver Upton wrote:
> Small series to work around a CPU erratum on AmpereOne. While the
> implementation does not advertise support for FEAT_HAFDBS (due to
> another erratum), the associated control bits do not have RES0 behavior
> as required by the architecture.
> 
> Usage of HAFDBS at stage-1 is unaffected, since HA and HD are only
> enabled on implementations that advertise the feature. However, KVM
> relies on HA having RES0 semantics if the feature isn't implemented. The
> end result is that KVM enables a broken hardware access flag
> implementation that could lead to correctness issues.

Just curious, what's the correctness issue here? The access flag is
mostly indicative of which pages are old for swapping out/discarding.
It's not like the dirty state which would be dangerous if we get wrong.
Oliver Upton June 14, 2023, 11:06 p.m. UTC | #2
Hey Catalin,

On Wed, Jun 14, 2023 at 05:57:55PM +0100, Catalin Marinas wrote:
> On Fri, Jun 09, 2023 at 10:01:01PM +0000, Oliver Upton wrote:
> > Small series to work around a CPU erratum on AmpereOne. While the
> > implementation does not advertise support for FEAT_HAFDBS (due to
> > another erratum), the associated control bits do not have RES0 behavior
> > as required by the architecture.
> > 
> > Usage of HAFDBS at stage-1 is unaffected, since HA and HD are only
> > enabled on implementations that advertise the feature. However, KVM
> > relies on HA having RES0 semantics if the feature isn't implemented. The
> > end result is that KVM enables a broken hardware access flag
> > implementation that could lead to correctness issues.
> 
> Just curious, what's the correctness issue here? The access flag is
> mostly indicative of which pages are old for swapping out/discarding.
> It's not like the dirty state which would be dangerous if we get wrong.

I probably could have helped out by giving the full context.

The software-observable behavior on this system is that the A or D
updates could arrive after a PTE has been marked as invalid, which could
corrupt software metadata stuffed into the page tables. We do exactly
that at stage-2 in KVM for parallel fault handling, where a magic value
indicates a PTE is being updated by another thread.
Catalin Marinas June 15, 2023, 8:36 a.m. UTC | #3
On Wed, Jun 14, 2023 at 11:06:40PM +0000, Oliver Upton wrote:
> Hey Catalin,
> 
> On Wed, Jun 14, 2023 at 05:57:55PM +0100, Catalin Marinas wrote:
> > On Fri, Jun 09, 2023 at 10:01:01PM +0000, Oliver Upton wrote:
> > > Small series to work around a CPU erratum on AmpereOne. While the
> > > implementation does not advertise support for FEAT_HAFDBS (due to
> > > another erratum), the associated control bits do not have RES0 behavior
> > > as required by the architecture.
> > > 
> > > Usage of HAFDBS at stage-1 is unaffected, since HA and HD are only
> > > enabled on implementations that advertise the feature. However, KVM
> > > relies on HA having RES0 semantics if the feature isn't implemented. The
> > > end result is that KVM enables a broken hardware access flag
> > > implementation that could lead to correctness issues.
> > 
> > Just curious, what's the correctness issue here? The access flag is
> > mostly indicative of which pages are old for swapping out/discarding.
> > It's not like the dirty state which would be dangerous if we get wrong.
> 
> I probably could have helped out by giving the full context.
> 
> The software-observable behavior on this system is that the A or D
> updates could arrive after a PTE has been marked as invalid, which could
> corrupt software metadata stuffed into the page tables. We do exactly
> that at stage-2 in KVM for parallel fault handling, where a magic value
> indicates a PTE is being updated by another thread.

Ah, ok, that's dangerous indeed. Thanks for the details (you may want to
add them in the patch description or the erratum kconfig entry).
Marc Zyngier June 15, 2023, 9:51 a.m. UTC | #4
On Fri, 09 Jun 2023 23:01:01 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Hi folks,
> 
> Small series to work around a CPU erratum on AmpereOne. While the
> implementation does not advertise support for FEAT_HAFDBS (due to
> another erratum), the associated control bits do not have RES0 behavior
> as required by the architecture.
> 
> Usage of HAFDBS at stage-1 is unaffected, since HA and HD are only
> enabled on implementations that advertise the feature. However, KVM
> relies on HA having RES0 semantics if the feature isn't implemented. The
> end result is that KVM enables a broken hardware access flag
> implementation that could lead to correctness issues.
> 
> Applies to 6.4-rc1. Tested with access_tracking_perf_test, verifying
> that KVM is indeed taking Access Flag faults.

For the series:

Reviewed-by: Marc Zyngier <maz@kernel.org>

	M.
Oliver Upton June 20, 2023, 1:15 p.m. UTC | #5
On Fri, 9 Jun 2023 22:01:01 +0000, Oliver Upton wrote:
> Small series to work around a CPU erratum on AmpereOne. While the
> implementation does not advertise support for FEAT_HAFDBS (due to
> another erratum), the associated control bits do not have RES0 behavior
> as required by the architecture.
> 
> Usage of HAFDBS at stage-1 is unaffected, since HA and HD are only
> enabled on implementations that advertise the feature. However, KVM
> relies on HA having RES0 semantics if the feature isn't implemented. The
> end result is that KVM enables a broken hardware access flag
> implementation that could lead to correctness issues.
> 
> [...]

Applied w/ an expanded description of what's wrong with the unadvertised
HAFDBS implementation, per Catalin's suggestion.

[1/3] arm64: errata: Mitigate Ampere1 erratum AC03_CPU_38 at stage-2
      https://git.kernel.org/kvmarm/kvmarm/c/6df696cd9bc1
[2/3] KVM: arm64: Refactor HFGxTR configuration into separate helpers
      https://git.kernel.org/kvmarm/kvmarm/c/ce4a36225753
[3/3] KVM: arm64: Prevent guests from enabling HA/HD on Ampere1
      https://git.kernel.org/kvmarm/kvmarm/c/082fdfd13841

--
Best,
Oliver