mbox series

[v2,00/10] arm64: Add initial support for FEAT_WFxT

Message ID 20220419182755.601427-1-maz@kernel.org (mailing list archive)
Headers show
Series arm64: Add initial support for FEAT_WFxT | expand

Message

Marc Zyngier April 19, 2022, 6:27 p.m. UTC
The ARMv8.7 WFxT feature is a new take on the good old WFI/WFE
instructions as they behave the same way, only taking an extra timeout
parameter.

This small series aims at adding the minimal support for this feature,
enabling it for both the kernel and KVM.

A potential addition to this series would be to remove the event
generation from the counters, and rely on the timeout where it
matters (spinlocks?). Feedback welcome.

Patches on top of 5.18-rc2, tested of the FVP AEM.

* From v1 [1]:
  - Properly generate traces even if the deadline has already expired
  - Collect RBs, with thanks.

[1] https://lore.kernel.org/r/20220412131303.504690-1-maz@kernel.org

Marc Zyngier (10):
  arm64: Expand ESR_ELx_WFx_ISS_TI to match its ARMv8.7 definition
  arm64: Add RV and RN fields for ESR_ELx_WFx_ISS
  KVM: arm64: Simplify kvm_cpu_has_pending_timer()
  KVM: arm64: Introduce kvm_counter_compute_delta() helper
  KVM: arm64: Handle blocking WFIT instruction
  KVM: arm64: Offer early resume for non-blocking WFxT instructions
  KVM: arm64: Expose the WFXT feature to guests
  arm64: Add HWCAP advertising FEAT_WFXT
  arm64: Add wfet()/wfit() helpers
  arm64: Use WFxT for __delay() when possible

 Documentation/arm64/cpu-feature-registers.rst |  2 +
 Documentation/arm64/elf_hwcaps.rst            |  4 ++
 arch/arm64/include/asm/barrier.h              |  4 ++
 arch/arm64/include/asm/esr.h                  |  8 +++-
 arch/arm64/include/asm/hwcap.h                |  1 +
 arch/arm64/include/asm/kvm_host.h             |  1 +
 arch/arm64/include/uapi/asm/hwcap.h           |  1 +
 arch/arm64/kernel/cpufeature.c                | 13 +++++
 arch/arm64/kernel/cpuinfo.c                   |  1 +
 arch/arm64/kvm/arch_timer.c                   | 47 ++++++++++++-------
 arch/arm64/kvm/arm.c                          |  6 +--
 arch/arm64/kvm/handle_exit.c                  | 35 ++++++++++++--
 arch/arm64/kvm/sys_regs.c                     |  2 +
 arch/arm64/lib/delay.c                        | 12 ++++-
 arch/arm64/tools/cpucaps                      |  1 +
 include/kvm/arm_arch_timer.h                  |  2 -
 16 files changed, 110 insertions(+), 30 deletions(-)

Comments

Catalin Marinas April 20, 2022, 5:24 p.m. UTC | #1
On Tue, Apr 19, 2022 at 07:27:45PM +0100, Marc Zyngier wrote:
> A potential addition to this series would be to remove the event
> generation from the counters, and rely on the timeout where it
> matters (spinlocks?). Feedback welcome.

I think we still need to keep the event generation around, at least for
hardware bugs we don't know about. I don't think user-space rely on it
though, people tend to come up with weird delays like isb ;). But yes,
the WFET should be handy when it turns up in hardware.
Marc Zyngier April 20, 2022, 6:50 p.m. UTC | #2
On Wed, 20 Apr 2022 18:24:31 +0100,
Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> On Tue, Apr 19, 2022 at 07:27:45PM +0100, Marc Zyngier wrote:
> > A potential addition to this series would be to remove the event
> > generation from the counters, and rely on the timeout where it
> > matters (spinlocks?). Feedback welcome.
> 
> I think we still need to keep the event generation around, at least for
> hardware bugs we don't know about. I don't think user-space rely on it
> though, people tend to come up with weird delays like isb ;). But yes,
> the WFET should be handy when it turns up in hardware.

My hope was that the trick of using the event generation to work
around systems failing to broadcast events could become a thing of the
past when WFET is present in the HW. After all, they serve the same
purpose (generate a local event to un-wedge the CPU).

But the more I look at it, the more I hate the potential solution. One
of the issues is that WFxT takes an absolute deadline, rather than a
relative one. So you end up with things like:

	ISB
	MRS	x0, CNTVCT_EL0
	ADD	x0, x0, #some_small_value
	WFET	x0

which is really heavy handed for the slow path of an atomic operation.
Even if you have ECV and CNTVCTSS_EL0 (which allows you to get rid of
the ISB), it is a royal pain.

It would be much better if there was a *relative* version of WFET that
would directly take a timeout relative to the current virtual count,
but I can sense HW designers calling me names already, so I'll shut
up.

	M.