mbox series

[v4,00/24] AMD MCA Address Translation Updates

Message ID 20220127204115.384161-1-yazen.ghannam@amd.com (mailing list archive)
Headers show
Series AMD MCA Address Translation Updates | expand

Message

Yazen Ghannam Jan. 27, 2022, 8:40 p.m. UTC
This patchset refactors the AMD MCA Address Translation code and adds
support for newer systems.

The reference code was recently refactored in preparation for updates
for future systems. These patches try to follow the reference code as
closely as possible. I also tried to address comments from previous
patchset reviews.

Patches 1-23 do the refactor without adding new system support. The goal
is to break down the translation algorithm into smaller chunks. Code
that changes between Data Fabric versions or interleaving modes is moved
to a set of function pointers. The intention is that new system support
can be added without any major refactor.

I tried to make a patch for each logical change. The top level function
was split first, then the next level of functions, etc. in a somewhat
breadth-first approach. 

Patch 24 adds support for systems with Data Fabric version 3 (Rome and
later).

Each patch was build tested individually. The entire set was
functionally tested with the following modes.

Naples:
  No interleaving
  Channel interleaving
  Die interleaving
  Socket interleaving

Rome:
  No interleaving
  Nodes-per-Socket 0 (NPS0)
  Nodes-per-Socket 1 (NPS1)
  Nodes-per-Socket 2 (NPS2)
  Nodes-per-Socket 4 (NPS4)
  NPS2 w/o hashing
  NPS4 w/o hashing

Note to the maintainers:
If there are no major issues, I'd like for this set to be applied,
please. Comments are still welcome, of course. There will be a future
set to add support for Family 19h Models 10h, etc., and I'd like to
address comments there.

Thanks,
Yazen

Link:
https://lore.kernel.org/r/20211028175728.121452-1-yazen.ghannam@amd.com

v3->v4:
* Rebased on latest ras/edac-for-next.
* Dropped patches merged in v5.17 (Thanks Boris!)
* Dropped patches for CPU+GPU systems.
* Dropped patch that changed function parameters.
* Folded glossary patch into other patches.
* Left in pr_debug() statements that I found useful during development.

v2->v3:
* Drop "df_regs" use.
* Include patches needed for CPU+GPU systems.
* Set "df_ops" at module init based on family type.

v1->v2:
* Move address translation code to EDAC.
* Use function pointers to handle code differences between DF versions.
* Add glossary of acronyms.

Yazen Ghannam (24):
  EDAC/amd64: Define Data Fabric operations
  EDAC/amd64: Define functions for DramOffset
  EDAC/amd64: Define function to read DRAM address map registers
  EDAC/amd64: Define function to find interleaving mode
  EDAC/amd64: Define function to denormalize address
  EDAC/amd64: Define function to add DRAM base and hole
  EDAC/amd64: Define function to dehash address
  EDAC/amd64: Define function to check DRAM limit address
  EDAC/amd64: Remove goto statements
  EDAC/amd64: Define function to get Interleave Address Bit
  EDAC/amd64: Skip denormalization if no interleaving
  EDAC/amd64: Define function to get number of interleaved channels
  EDAC/amd64: Define function to get number of interleaved dies
  EDAC/amd64: Define function to get number of interleaved sockets
  EDAC/amd64: Remove unnecessary assert
  EDAC/amd64: Define function to make space for CS ID
  EDAC/amd64: Define function to calculate CS ID
  EDAC/amd64: Define function to insert CS ID into address
  EDAC/amd64: Define function to get CS Fabric ID
  EDAC/amd64: Define function to find shift and mask values
  EDAC/amd64: Update CS ID calculation to match reference code
  EDAC/amd64: Match hash function to reference code
  EDAC/amd64: Define function to get interleave address select bit
  EDAC/amd64: Add support for address translation on DF3 systems

 drivers/edac/amd64_edac.c | 710 ++++++++++++++++++++++++++++++--------
 1 file changed, 565 insertions(+), 145 deletions(-)