mbox series

[0/7] Address Translation support for MI200 and MI300 models

Message ID 20231025073339.630093-1-muralimk@amd.com (mailing list archive)
Headers show
Series Address Translation support for MI200 and MI300 models | expand

Message

M K, Muralidhara Oct. 25, 2023, 7:33 a.m. UTC
From: Muralidhara M K <muralidhara.mk@amd.com>

This patchset adds support for MI200 heterogeneous address translation support
and MI300A address translation support, fixups on HBM3 memory address maps.

The patch set depends on the Yazen's patches submitted
"AMD Address Translation Library"
https://lore.kernel.org/linux-edac/20231005173526.42831-1-yazen.ghannam@amd.com/T/#m4a9ddb63b334f367219ab0002a9a133e891f6aac

The patchset does following

Patch 1:
MI200 heterogeneous address translation support.

Patch 2:
MI300 heterogeneous address translation support.

Patch 3:
Convert HBM3 MCA Decoded address to Normalized address.

Patch 4:
lookup table to get the correct cs instance id for HBM3.

Patch 5:
Convert physical cs id to logical cs id by static lookup
table.

Patch 6:
Reading correct bit fields to get cs_fabric_id.

Patch 7:
Identify all physical pages in a row to retire all 8 columns
of pages when the error is injected to avoid future errors.


Muralidhara M K (7):
  RAS: Add Address Translation support for MI200
  RAS: Add Address Translation support for MI300
  RAS: Add MCA Error address conversion for UMC
  RAS: Add static lookup table to get CS physical ID
  RAS: Add fixed Physical to logical CS ID mapping table
  RAS: Get CS fabirc ID register bit fields
  EDAC/amd64: RAS: platform/x86/amd: Identify all physical pages in row

 drivers/edac/amd64_edac.c         |   5 +
 drivers/ras/amd/atl/core.c        |   5 +-
 drivers/ras/amd/atl/dehash.c      | 149 ++++++++++++++
 drivers/ras/amd/atl/denormalize.c | 110 ++++++++++-
 drivers/ras/amd/atl/internal.h    |  27 ++-
 drivers/ras/amd/atl/map.c         | 158 ++++++++++++---
 drivers/ras/amd/atl/reg_fields.h  |  42 +++-
 drivers/ras/amd/atl/system.c      |   4 +
 drivers/ras/amd/atl/umc.c         | 309 +++++++++++++++++++++++++++++-
 include/linux/amd-atl.h           |   2 +
 10 files changed, 778 insertions(+), 33 deletions(-)

Comments

Yazen Ghannam Oct. 26, 2023, 1:21 p.m. UTC | #1
On 10/25/2023 3:33 AM, Muralidhara M K wrote:
> From: Muralidhara M K <muralidhara.mk@amd.com>
> 
> This patchset adds support for MI200 heterogeneous address translation support
> and MI300A address translation support, fixups on HBM3 memory address maps.
> 
> The patch set depends on the Yazen's patches submitted
> "AMD Address Translation Library"
> https://lore.kernel.org/linux-edac/20231005173526.42831-1-yazen.ghannam@amd.com/T/#m4a9ddb63b334f367219ab0002a9a133e891f6aac
> 
> The patchset does following
> 
> Patch 1:
> MI200 heterogeneous address translation support.
> 
> Patch 2:
> MI300 heterogeneous address translation support.
> 
> Patch 3:
> Convert HBM3 MCA Decoded address to Normalized address.
> 
> Patch 4:
> lookup table to get the correct cs instance id for HBM3.
> 
> Patch 5:
> Convert physical cs id to logical cs id by static lookup
> table.
> 
> Patch 6:
> Reading correct bit fields to get cs_fabric_id.

I expect some of these patches to be functionally out of order.

> 
> Patch 7:
> Identify all physical pages in a row to retire all 8 columns
> of pages when the error is injected to avoid future errors.
> 
> 
> Muralidhara M K (7):
>    RAS: Add Address Translation support for MI200
>    RAS: Add Address Translation support for MI300
>    RAS: Add MCA Error address conversion for UMC
>    RAS: Add static lookup table to get CS physical ID
>    RAS: Add fixed Physical to logical CS ID mapping table
>    RAS: Get CS fabirc ID register bit fields
>    EDAC/amd64: RAS: platform/x86/amd: Identify all physical pages in row
> 
>   drivers/edac/amd64_edac.c         |   5 +
>   drivers/ras/amd/atl/core.c        |   5 +-
>   drivers/ras/amd/atl/dehash.c      | 149 ++++++++++++++
>   drivers/ras/amd/atl/denormalize.c | 110 ++++++++++-
>   drivers/ras/amd/atl/internal.h    |  27 ++-
>   drivers/ras/amd/atl/map.c         | 158 ++++++++++++---
>   drivers/ras/amd/atl/reg_fields.h  |  42 +++-
>   drivers/ras/amd/atl/system.c      |   4 +
>   drivers/ras/amd/atl/umc.c         | 309 +++++++++++++++++++++++++++++-
>   include/linux/amd-atl.h           |   2 +
>   10 files changed, 778 insertions(+), 33 deletions(-)
> 

This set doesn't have any changes in arch/x86, so it's not necessary to 
copy the x86 list.

Thanks,
Yazen