mbox series

[RFC,0/6] Cache coherency management subsystem

Message ID 20250320174118.39173-1-Jonathan.Cameron@huawei.com (mailing list archive)
Headers show
Series Cache coherency management subsystem | expand

Message

Jonathan Cameron March 20, 2025, 5:41 p.m. UTC
Note that I've only a vague idea of who will care about this
so please do +CC others as needed.

On x86 there is the much loved WBINVD instruction that causes a write back
and invalidate of all caches in the system. It is expensive but it is
necessary in a few corner cases. These are cases where the contents of
Physical Memory may change without any writes from the host. Whilst there
are a few reasons this might happen, the one I care about here is when
we are adding or removing mappings on CXL. So typically going from
there being actual memory at a host Physical Address to nothing there
(reads as zero, writes dropped) or visa-versa. That involves the
reprogramming of address decoders (HDM Decoders); in the near future
it may also include the device offering dynamic capacity extents. The
thing that makes it very hard to handle with CPU flushes is that the
instructions are normally VA based and not guaranteed to reach beyond
the Point of Coherence or similar. You might be able to (ab)use
various flush operations intended to ensure persistence memory but
in general they don't work either.

So on other architectures such as ARM64 we have no instruction similar to
WBINVD but we may have device interfaces in the system that provide a way
to ensure a PA range undergoes the write back and invalidate action. This
RFC is to find a way to support those cache maintenance device interfaces.
The ones I know about are much more flexible than WBINVD, allowing
invalidation of particular PA ranges, or a much richer set of flush types
(not supported yet as not needed for upstream use cases).

To illustrate how a solution might work, I've taken both a HiSilicon
design (slight quirk as registers overlap with existing PMU driver)
and more controversially a firmware interface proposal from ARM
(wrapped up in made up ACPI) that was dropped from the released spec
but for which the alpha spec is still available.

Why drivers/cache?
- Mainly because it exists and smells like a reasonable place.
- Conor, you are maintainer for this currently do you mind us putting this
  stuff in there?

Why not just register a singleton function pointer?
- Systems may include multiple cache control devices, responsible
  for different parts of the PA address range (interleaving etc make
  this complex).  They may not all share a common hardware interface.
- A device class is more convenient than managing multiple homogeneous
  device instances within a driver.
- Disadvantage is that we need this small class

Generalizing to more arch?
- I've started with ARM64, but if useful elsewhere the small amount
  of arch code could be moved to a generic location.

QEMU emulation code at
http://gitlab.com/jic23/qemu cxl-2025-03-20 

Why an RFC?
- I'm really just looking for feedback on whether the class approach
  is the way to go at this stage.  I'm not strongly attached to it but
  it feels like the right balance of complexity and flexibility to me.
- I made up the ACPI spec - it's not documented, non official and
  honestly needs work. I would however like to get feedback on whether
  it is something we want to try and get through the ACPI Working group
  as a much improved code first proposal?  The potential justification
  being to avoid the need for lots trivial drivers where maybe a bit
  of DSDT interpreted code does the job better.

Jonathan Cameron (3):
  cache: coherency device class
  acpi: PoC of Cache control via ACPI0019 and _DSM
  Hack: Pretend we have PSCI 1.2

Yicong Yang (3):
  memregion: Support fine grained invalidate by
    cpu_cache_invalidate_memregion()
  arm64: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
  cache: Support cache maintenance for HiSilicon SoC Hydra Home Agent

 arch/arm64/Kconfig                  |   1 +
 arch/arm64/include/asm/cacheflush.h |  14 ++
 arch/arm64/mm/flush.c               |  42 ++++++
 arch/x86/mm/pat/set_memory.c        |   2 +-
 drivers/acpi/Makefile               |   1 +
 drivers/cache/Kconfig               |  26 ++++
 drivers/cache/Makefile              |   4 +
 drivers/cache/acpi_cache_control.c  | 157 ++++++++++++++++++++++
 drivers/cache/coherency_core.c      | 130 +++++++++++++++++++
 drivers/cache/hisi_soc_hha.c        | 193 ++++++++++++++++++++++++++++
 drivers/cxl/core/region.c           |   6 +-
 drivers/firmware/psci/psci.c        |   2 +
 drivers/nvdimm/region.c             |   3 +-
 drivers/nvdimm/region_devs.c        |   3 +-
 include/linux/cache_coherency.h     |  60 +++++++++
 include/linux/memregion.h           |   8 +-
 16 files changed, 646 insertions(+), 6 deletions(-)
 create mode 100644 drivers/cache/acpi_cache_control.c
 create mode 100644 drivers/cache/coherency_core.c
 create mode 100644 drivers/cache/hisi_soc_hha.c
 create mode 100644 include/linux/cache_coherency.h

Comments

Conor Dooley March 21, 2025, 10:32 p.m. UTC | #1
On Thu, Mar 20, 2025 at 05:41:12PM +0000, Jonathan Cameron wrote:
> Note that I've only a vague idea of who will care about this
> so please do +CC others as needed.
> 
> On x86 there is the much loved WBINVD instruction that causes a write back
> and invalidate of all caches in the system. It is expensive but it is
> necessary in a few corner cases. These are cases where the contents of
> Physical Memory may change without any writes from the host. Whilst there
> are a few reasons this might happen, the one I care about here is when
> we are adding or removing mappings on CXL. So typically going from
> there being actual memory at a host Physical Address to nothing there
> (reads as zero, writes dropped) or visa-versa. That involves the
> reprogramming of address decoders (HDM Decoders); in the near future
> it may also include the device offering dynamic capacity extents. The
> thing that makes it very hard to handle with CPU flushes is that the
> instructions are normally VA based and not guaranteed to reach beyond
> the Point of Coherence or similar. You might be able to (ab)use
> various flush operations intended to ensure persistence memory but
> in general they don't work either.
> 
> So on other architectures such as ARM64 we have no instruction similar to
> WBINVD but we may have device interfaces in the system that provide a way
> to ensure a PA range undergoes the write back and invalidate action. This
> RFC is to find a way to support those cache maintenance device interfaces.
> The ones I know about are much more flexible than WBINVD, allowing
> invalidation of particular PA ranges, or a much richer set of flush types
> (not supported yet as not needed for upstream use cases).
> 
> To illustrate how a solution might work, I've taken both a HiSilicon
> design (slight quirk as registers overlap with existing PMU driver)
> and more controversially a firmware interface proposal from ARM
> (wrapped up in made up ACPI) that was dropped from the released spec
> but for which the alpha spec is still available.
> 
> Why drivers/cache?
> - Mainly because it exists and smells like a reasonable place.
> - Conor, you are maintainer for this currently do you mind us putting this
>   stuff in there?

drivers/cache was just something to put the cache controller drivers we
have on RISC-V that implement the various arch_dma*() callbacks in
non-standard ways that made more sense than drivers/soc/<soc vendor>
since the controllers are IP provided by CPU vendors. There's only
two drivers here now, but I am aware of another two non-standard CMO
mechanisms if the silicon with them so there'll likely be more in the
future :) I'm only really maintainer of it to avoid it being another
thing for Palmer to look after :)

I've only skimmed this for now, but I think it is reasonable to put them
here. Maybe my skim is showing, but it would not surprise me to see a
driver providing both non-standard arch_dma*() callbacks as well as
dealing with CXL mappings via this new class on RISC-V in the future..
Either way, I think it'd probably be a good idea to add ?you? as a
co-maintainer if the directory is going to be used for your proposed
interface/drivers, for what I hope is an obvious reason!
Jonathan Cameron March 24, 2025, noon UTC | #2
On Fri, 21 Mar 2025 22:32:15 +0000
Conor Dooley <conor@kernel.org> wrote:

> On Thu, Mar 20, 2025 at 05:41:12PM +0000, Jonathan Cameron wrote:
> > Note that I've only a vague idea of who will care about this
> > so please do +CC others as needed.
> > 
> > On x86 there is the much loved WBINVD instruction that causes a write back
> > and invalidate of all caches in the system. It is expensive but it is
> > necessary in a few corner cases. These are cases where the contents of
> > Physical Memory may change without any writes from the host. Whilst there
> > are a few reasons this might happen, the one I care about here is when
> > we are adding or removing mappings on CXL. So typically going from
> > there being actual memory at a host Physical Address to nothing there
> > (reads as zero, writes dropped) or visa-versa. That involves the
> > reprogramming of address decoders (HDM Decoders); in the near future
> > it may also include the device offering dynamic capacity extents. The
> > thing that makes it very hard to handle with CPU flushes is that the
> > instructions are normally VA based and not guaranteed to reach beyond
> > the Point of Coherence or similar. You might be able to (ab)use
> > various flush operations intended to ensure persistence memory but
> > in general they don't work either.
> > 
> > So on other architectures such as ARM64 we have no instruction similar to
> > WBINVD but we may have device interfaces in the system that provide a way
> > to ensure a PA range undergoes the write back and invalidate action. This
> > RFC is to find a way to support those cache maintenance device interfaces.
> > The ones I know about are much more flexible than WBINVD, allowing
> > invalidation of particular PA ranges, or a much richer set of flush types
> > (not supported yet as not needed for upstream use cases).
> > 
> > To illustrate how a solution might work, I've taken both a HiSilicon
> > design (slight quirk as registers overlap with existing PMU driver)
> > and more controversially a firmware interface proposal from ARM
> > (wrapped up in made up ACPI) that was dropped from the released spec
> > but for which the alpha spec is still available.
> > 
> > Why drivers/cache?
> > - Mainly because it exists and smells like a reasonable place.
> > - Conor, you are maintainer for this currently do you mind us putting this
> >   stuff in there?  
> 
> drivers/cache was just something to put the cache controller drivers we
> have on RISC-V that implement the various arch_dma*() callbacks in
> non-standard ways that made more sense than drivers/soc/<soc vendor>
> since the controllers are IP provided by CPU vendors. There's only
> two drivers here now, but I am aware of another two non-standard CMO
> mechanisms if the silicon with them so there'll likely be more in the
> future :) I'm only really maintainer of it to avoid it being another
> thing for Palmer to look after :)

I suspected as much :)

> 
> I've only skimmed this for now, but I think it is reasonable to put them
> here. Maybe my skim is showing, but it would not surprise me to see a
> driver providing both non-standard arch_dma*() callbacks as well as
> dealing with CXL mappings via this new class on RISC-V in the future..

Absolutely.  The use of an ARM callback was just a place holder for now
(Greg pointed that one out as well as I forgot to mention it in the patch
description!)

I think this will turn out to be at least some subset of implementations
for other architectures unless they decide to go the route of an instruction
(like x86).

> Either way, I think it'd probably be a good idea to add ?you? as a
> co-maintainer if the directory is going to be used for your proposed
> interface/drivers, for what I hope is an obvious reason!

Sure.  That would make sense.

Jonathan
>