diff mbox series

[v2,8/8] x86/MCE/AMD Support new memory interleaving modes during address translation

Message ID 20200903200144.310991-9-Yazen.Ghannam@amd.com (mailing list archive)
State New, archived
Headers show
Series AMD MCA Address Translation Updates | expand

Commit Message

Yazen Ghannam Sept. 3, 2020, 8:01 p.m. UTC
From: Muralidhara M K <muralidhara.mk@amd.com>

Add support for new memory interleaving modes used in current AMD systems.

Check if the system is using a current Data Fabric version or a legacy
version as some bit and register definitions have changed.

Tested on AMD reference platforms with the following memory interleaving
options.

Naples
- None
- Channel
- Die
- Socket

Rome (NPS = Nodes per Socket)
- None
- NPS0
- NPS1
- NPS2
- NPS4

The fixes tag refers to the commit that allows amd64_edac_mod to load on
Rome systems. The module may report an incorrect system addresses on
Rome systems depending on the interleaving option used.

Fixes: 6e846239e548 ("EDAC/amd64: Add Family 17h Model 30h PCI IDs")
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chtradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chtradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20200814191449.183998-3-Yazen.Ghannam@amd.com

v1 -> v2:
* Rebased on cleanup patches.
* Save and use the Data Fabric version.
* Reorder code to execute non-legacy flows first. This change wasn't
  made to the section with the "hashed_bit" calculation, since the
  current flow reads easier IMHO.

 arch/x86/kernel/cpu/mce/amd.c | 222 ++++++++++++++++++++++++++--------
 1 file changed, 172 insertions(+), 50 deletions(-)

Comments

Borislav Petkov Sept. 23, 2020, 8:20 a.m. UTC | #1
On Thu, Sep 03, 2020 at 08:01:44PM +0000, Yazen Ghannam wrote:
> From: Muralidhara M K <muralidhara.mk@amd.com>
> 
> Add support for new memory interleaving modes used in current AMD systems.
>
> Check if the system is using a current Data Fabric version or a legacy
> version as some bit and register definitions have changed.
> 
> Tested on AMD reference platforms with the following memory interleaving
> options.
> 
> Naples
> - None
> - Channel
> - Die
> - Socket
> 
> Rome (NPS = Nodes per Socket)
> - None
> - NPS0
> - NPS1
> - NPS2
> - NPS4
> 
> The fixes tag refers to the commit that allows amd64_edac_mod to load on
> Rome systems.

Err, why? This is adding new stuff to an address translation function.
How does that fix amd64_edac loading on Rome?

> The module may report an incorrect system addresses on
> Rome systems depending on the interleaving option used.

That doesn't stop it from loading, sorry.

Now, before you guys do any new features, I'd like you to split this
humongous function umc_normaddr_to_sysaddr() logically into separate
helpers and each helper does exactly one thing and one thing only.

Then use a verb in its name: umc_translate_normaddr_to_sysaddr() or so.

Also, Yazen, remind me again pls why isn't this function in
drivers/edac/amd64_edac.c, where it is needed?

If the reason is not valid anymore, let's move it there before splitting
so that it doesn't bloat the core code.

Thx.
Yazen Ghannam Sept. 23, 2020, 4:25 p.m. UTC | #2
On Wed, Sep 23, 2020 at 10:20:39AM +0200, Borislav Petkov wrote:
> On Thu, Sep 03, 2020 at 08:01:44PM +0000, Yazen Ghannam wrote:
> > From: Muralidhara M K <muralidhara.mk@amd.com>
> > 
> > Add support for new memory interleaving modes used in current AMD systems.
> >
> > Check if the system is using a current Data Fabric version or a legacy
> > version as some bit and register definitions have changed.
> > 
> > Tested on AMD reference platforms with the following memory interleaving
> > options.
> > 
> > Naples
> > - None
> > - Channel
> > - Die
> > - Socket
> > 
> > Rome (NPS = Nodes per Socket)
> > - None
> > - NPS0
> > - NPS1
> > - NPS2
> > - NPS4
> > 
> > The fixes tag refers to the commit that allows amd64_edac_mod to load on
> > Rome systems.
> 
> Err, why? This is adding new stuff to an address translation function.
> How does that fix amd64_edac loading on Rome?
> 
> > The module may report an incorrect system addresses on
> > Rome systems depending on the interleaving option used.
> 
> That doesn't stop it from loading, sorry.
>

Okay, no problem.

> Now, before you guys do any new features, I'd like you to split this
> humongous function umc_normaddr_to_sysaddr() logically into separate
> helpers and each helper does exactly one thing and one thing only.
> 
> Then use a verb in its name: umc_translate_normaddr_to_sysaddr() or so.
>

Okay, will do.

> Also, Yazen, remind me again pls why isn't this function in
> drivers/edac/amd64_edac.c, where it is needed?
> 
> If the reason is not valid anymore, let's move it there before splitting
> so that it doesn't bloat the core code.
>

I don't remember the original reason, and I was recently asked about
this code living in a module. I did some looking after this ask, and I
found that we should be using this translation to get a proper value for
the memory error notifiers to use. So I think we still need to use this
function some way with the core code even if the EDAC interface isn't
used.

I think this set can be split up.

1) Set with patches 1-3 fixed up to use cpu_die_id.
2) Set with the address translation updates.
   a) Move umc_normaddr_to_sysaddr() into a new module under EDAC.
   b) Hook the new module into amd64_edac.c where it's used today.
   c) Refactor the code as you suggested above.
   d) Add the new features.
3) New set that sets up a proper notifier for the address translation.
   a) Unhook the new module from amd64_edac.c.
   b) Register a notifer that runs before any notifiers that operate on
      memory errors.
   c) Find a way to pass the translated address through the chain
      without losing the original value.

What do you think?

Thanks,
Yazen
Borislav Petkov Sept. 25, 2020, 7:22 a.m. UTC | #3
On Wed, Sep 23, 2020 at 11:25:10AM -0500, Yazen Ghannam wrote:
> I don't remember the original reason, and I was recently asked about
> this code living in a module. I did some looking after this ask, and I
> found that we should be using this translation to get a proper value for
> the memory error notifiers to use. So I think we still need to use this
> function some way with the core code even if the EDAC interface isn't
> used.

You'd need to be more specific here, you want to bypass amd64_edac to
decode errors? Judging by the current RAS activity coming from you guys,
I'm thinking firmware. But then wouldn't the firmware do the decoding
for us and then this function is not even needed?

> What do you think?

I think you should explain what the use case is first. :)

Thx.
Yazen Ghannam Sept. 25, 2020, 7:51 p.m. UTC | #4
On Fri, Sep 25, 2020 at 09:22:31AM +0200, Borislav Petkov wrote:
> On Wed, Sep 23, 2020 at 11:25:10AM -0500, Yazen Ghannam wrote:
> > I don't remember the original reason, and I was recently asked about
> > this code living in a module. I did some looking after this ask, and I
> > found that we should be using this translation to get a proper value for
> > the memory error notifiers to use. So I think we still need to use this
> > function some way with the core code even if the EDAC interface isn't
> > used.
> 
> You'd need to be more specific here, you want to bypass amd64_edac to
> decode errors? Judging by the current RAS activity coming from you guys,
> I'm thinking firmware. But then wouldn't the firmware do the decoding
> for us and then this function is not even needed?
>

The UC, NFIT, and CEC notifiers all operate on system physical
addresses. The address in the MCE record is checked by
mce_usable_address() to see if it can be used by the kernel, i.e. the
address is a system physical address. Right now, this check passes on
AMD systems if MCA_STATUS[AddrV] is set. This works for memory errors on
legacy AMD systems, since the NB MCA bank logs a physical address for
DRAM ECC errors. But this won't work on newer systems, because the UMC
MCA bank does not log a system physical address for DRAM ECC errors. So
the address provided by the hardware will need to be translated to a
physical address before the notifiers in the MCE chain can use it.

We can add support to get the physical address from firmware in some
cases. But it looks to me that we'll still need to keep updating the
translation code in the kernel to cover some platform/user
configurations. So it makes sense to me to move the functionality into a
module to make it easier to update.

The address translation needs to be done before the notfiers that need
it, and EDAC comes after all of them. There's also the case where the
EDAC interface isn't wanted, so amd64_edac will be unloaded. But the
functionality in the other notifiers are still expected to be available.
So it's more than just decoding the error like we do now with amd64_edac.
That's why I think the translation code can be in a separate module with
a notfier that runs before the others. This can do the translation once
then pass the result down to the CEC, UC, NFIT, and EDAC notifiers to
use as needed.

Thanks,
Yazen
Borislav Petkov Sept. 28, 2020, 9:47 a.m. UTC | #5
On Fri, Sep 25, 2020 at 02:51:27PM -0500, Yazen Ghannam wrote:
> We can add support to get the physical address from firmware in some
> cases. But it looks to me that we'll still need to keep updating the
> translation code in the kernel to cover some platform/user
> configurations.

Unfortunately, that is correct. :-\

> The address translation needs to be done before the notfiers that need
> it, and EDAC comes after all of them. There's also the case where the
> EDAC interface isn't wanted, so amd64_edac will be unloaded.

I'd be interested as to why. Because decoding addresses is amd64_edac
*core* functionality. We can stick it in drivers/edac/mce_amd.c but I'd
like to hear what those valid reasons are, not to use the driver which
is supposed to do that anyway.

Thanks for explaining - it is all clear now.
Yazen Ghannam Sept. 28, 2020, 3:53 p.m. UTC | #6
On Mon, Sep 28, 2020 at 11:47:59AM +0200, Borislav Petkov wrote:
> On Fri, Sep 25, 2020 at 02:51:27PM -0500, Yazen Ghannam wrote:
> 
> > The address translation needs to be done before the notfiers that need
> > it, and EDAC comes after all of them. There's also the case where the
> > EDAC interface isn't wanted, so amd64_edac will be unloaded.
> 
> I'd be interested as to why. Because decoding addresses is amd64_edac
> *core* functionality. We can stick it in drivers/edac/mce_amd.c but I'd
> like to hear what those valid reasons are, not to use the driver which
> is supposed to do that anyway.
>

I don't have any clear reasons. I just get vague use cases sometimes
about not using EDAC and relying on other things. But it shouldn't hurt
to have the module load anyway. The EDAC messages can be suppressed, and
the sysfs interface can be ignored. So, after a bit more thought, this
doesn't seem like a good reason.

I agree that the translation code is implementation-specific and applies
only to DRAM ECC errors, so it make sense to have it in amd64_edac. The
only issue is getting the address translation to earlier notifiers. I
think we can add a new one in amd64_edac to run before others. Maybe this
can be a new priority class like MCE_PRIO_PREPROCESS, or something like
that for notifiers that fixup the MCE data.

I can start by moving the address translation to amd64_edac and doing
the code cleanup.

Thanks,
Yazen
Borislav Petkov Sept. 28, 2020, 6:14 p.m. UTC | #7
On Mon, Sep 28, 2020 at 10:53:50AM -0500, Yazen Ghannam wrote:
> I don't have any clear reasons. I just get vague use cases sometimes
> about not using EDAC and relying on other things. But it shouldn't hurt
> to have the module load anyway. The EDAC messages can be suppressed, and
> the sysfs interface can be ignored. So, after a bit more thought, this
> doesn't seem like a good reason.

Ok. We can always carve it out if someone comes up with a valid reason
later.

> I agree that the translation code is implementation-specific and applies
> only to DRAM ECC errors, so it make sense to have it in amd64_edac. The
> only issue is getting the address translation to earlier notifiers. I
> think we can add a new one in amd64_edac to run before others. Maybe this
> can be a new priority class like MCE_PRIO_PREPROCESS, or something like
> that for notifiers that fixup the MCE data.

Well, I'm not sure you need notifiers here - you wanna call
mce_usable_address() and in it, it should do the address conversion
calculation to give you a physical address which you can feed to
memory_failure etc.

Now, mce_usable_address() is core code and we can make core code call
into a module but that is yucky. So *that* is your reason for keeping it
where it is.

Looking at its size:

$ readelf -s vmlinux | grep umc_normaddr_to
  2864: ffffffff817d8ae5   168 FUNC    LOCAL  DEFAULT    1 umc_normaddr_to_[...]
 91866: ffffffff81030e00  1127 FUNC    GLOBAL DEFAULT    1 umc_normaddr_to_[...]

that's something like ~1.3K and if you split it and do some
experimenting, you might get it even slimmer. Not that ~1.3K is that
huge for current standards but we should always aim at not bloating the
fat guy our kernel already is.

Thx.
Yazen Ghannam Sept. 29, 2020, 1:21 p.m. UTC | #8
On Mon, Sep 28, 2020 at 08:14:07PM +0200, Borislav Petkov wrote:
> On Mon, Sep 28, 2020 at 10:53:50AM -0500, Yazen Ghannam wrote:
> 
> > I agree that the translation code is implementation-specific and applies
> > only to DRAM ECC errors, so it make sense to have it in amd64_edac. The
> > only issue is getting the address translation to earlier notifiers. I
> > think we can add a new one in amd64_edac to run before others. Maybe this
> > can be a new priority class like MCE_PRIO_PREPROCESS, or something like
> > that for notifiers that fixup the MCE data.
> 
> Well, I'm not sure you need notifiers here - you wanna call
> mce_usable_address() and in it, it should do the address conversion
> calculation to give you a physical address which you can feed to
> memory_failure etc.
> 
> Now, mce_usable_address() is core code and we can make core code call
> into a module but that is yucky. So *that* is your reason for keeping it
> where it is.
>

Okay, we'll keep the code where it is. I'll work on another set to call
the address translation with mce_usable_address().

> Looking at its size:
> 
> $ readelf -s vmlinux | grep umc_normaddr_to
>   2864: ffffffff817d8ae5   168 FUNC    LOCAL  DEFAULT    1 umc_normaddr_to_[...]
>  91866: ffffffff81030e00  1127 FUNC    GLOBAL DEFAULT    1 umc_normaddr_to_[...]
> 
> that's something like ~1.3K and if you split it and do some
> experimenting, you might get it even slimmer. Not that ~1.3K is that
> huge for current standards but we should always aim at not bloating the
> fat guy our kernel already is.
>

Okay, I'll keep an eye on this and try to slim it down.

Thanks,
Yazen
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index f5440f8000e9..c14076bcabf2 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -683,8 +683,10 @@  void mce_amd_feature_init(struct cpuinfo_x86 *c)
 #define DF_F0_DRAMBASEADDR	0x110
 #define DF_F0_DRAMLIMITADDR	0x114
 #define DF_F0_DRAMOFFSET	0x1B4
+#define DF_F0_DFGLOBALCTRL	0x3F8
 
 #define DF_F1_SYSFABRICID	0x208
+#define DF_F1_SYSFABRICID1	0x20C
 
 int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr)
 {
@@ -695,22 +697,30 @@  int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr)
 	u32 dram_hole_base;
 
 	u32 reg_dram_base_addr, reg_dram_limit_addr;
-	u32 reg_dram_offset;
+	u32 reg_dram_offset, reg_sys_fabric_id;
+
+	bool hash_enabled = false, split_normalized = false;
 
-	u8 die_id_shift, die_id_mask, socket_id_shift, socket_id_mask;
 	u8 intlv_num_dies, intlv_num_chan, intlv_num_sockets;
-	u8 intlv_addr_sel, intlv_addr_bit;
-	u8 num_intlv_bits, hashed_bit;
+	u8 intlv_addr_sel, intlv_addr_bit, num_intlv_bits;
+	u8 cs_mask, cs_id = 0, dst_fabric_id = 0;
 	u8 lgcy_mmio_hole_en, base = 0;
-	u8 cs_mask, cs_id = 0;
-	bool hash_enabled = false;
+	u8 df_version;
+
+	if (amd_df_indirect_read(nid, 1, DF_F1_SYSFABRICID, umc, &reg_sys_fabric_id))
+		goto out_err;
+
+	df_version = (reg_sys_fabric_id & 0xFF) ? 3 : 2;
 
 	if (amd_df_indirect_read(nid, 0, DF_F0_DRAMOFFSET, umc, &reg_dram_offset))
 		goto out_err;
 
 	/* Remove HiAddrOffset from normalized address, if enabled: */
 	if (reg_dram_offset & BIT(0)) {
-		u64 hi_addr_offset = get_bits(reg_dram_offset, 31, 20) << 28;
+		u8 hi_addr_offset_lsb = (df_version >= 3) ? 12 : 20;
+		u64 hi_addr_offset = get_bits(reg_dram_offset, 31, hi_addr_offset_lsb);
+
+		hi_addr_offset <<= 28;
 
 		/* Check if base 1 is used. */
 		if (norm_addr >= hi_addr_offset) {
@@ -733,19 +743,23 @@  int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr)
 		goto out_err;
 
 	lgcy_mmio_hole_en = get_bit(reg_dram_base_addr, 1);
-	intlv_num_chan	  = get_bits(reg_dram_base_addr, 7, 4);
-	intlv_addr_sel	  = get_bits(reg_dram_base_addr, 10, 8);
 	dram_base_addr	  = get_bits(reg_dram_base_addr, 31, 12) << 28;
-
-	intlv_num_sockets = get_bit(reg_dram_limit_addr, 8);
-	intlv_num_dies	  = get_bits(reg_dram_limit_addr, 11, 10);
 	dram_limit_addr	  = (get_bits(reg_dram_limit_addr, 31, 12) << 28) | GENMASK_ULL(27, 0);
 
-	/* {0, 1, 2, 3} map to address bits {8, 9, 10, 11} respectively */
-	if (intlv_addr_sel > 3) {
-		pr_err("%s: Invalid interleave address select %d.\n",
-			__func__, intlv_addr_sel);
-		goto out_err;
+	if (df_version >= 3) {
+		intlv_num_chan    = get_bits(reg_dram_base_addr, 5, 2);
+		intlv_num_dies    = get_bits(reg_dram_base_addr, 7, 6);
+		intlv_num_sockets = get_bit(reg_dram_base_addr, 8);
+		intlv_addr_sel    = get_bits(reg_dram_base_addr, 11, 9);
+
+		dst_fabric_id	  = get_bits(reg_dram_limit_addr, 9, 0);
+	} else {
+		intlv_num_chan	  = get_bits(reg_dram_base_addr, 7, 4);
+		intlv_addr_sel	  = get_bits(reg_dram_base_addr, 10, 8);
+
+		dst_fabric_id	  = get_bits(reg_dram_limit_addr, 7, 0);
+		intlv_num_sockets = get_bit(reg_dram_limit_addr, 8);
+		intlv_num_dies	  = get_bits(reg_dram_limit_addr, 11, 10);
 	}
 
 	intlv_addr_bit = intlv_addr_sel + 8;
@@ -758,27 +772,42 @@  int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr)
 	case 5:	intlv_num_chan = 3; break;
 	case 7:	intlv_num_chan = 4; break;
 
-	case 8: intlv_num_chan = 1;
+	case 8:
+		if (df_version >= 3) {
+			intlv_num_chan = 5;
+		} else {
+			intlv_num_chan = 1;
+			hash_enabled = true;
+		}
+		break;
+	case 12:
+		intlv_num_chan = 1;
 		hash_enabled = true;
 		break;
+	case 13:
+		intlv_num_chan = 2;
+		hash_enabled = true;
+		split_normalized = true;
+		break;
+	case 14:
+		intlv_num_chan = 3;
+		hash_enabled = true;
+		split_normalized = true;
+		break;
 	default:
 		pr_err("%s: Invalid number of interleaved channels %d.\n",
 			__func__, intlv_num_chan);
 		goto out_err;
 	}
 
-	num_intlv_bits = intlv_num_chan;
-
-	if (intlv_num_dies > 2) {
-		pr_err("%s: Invalid number of interleaved nodes/dies %d.\n",
-			__func__, intlv_num_dies);
+	/* Assert interleave address bit is 8 or 9 for hashing cases. */
+	if (hash_enabled && intlv_addr_bit != 8 && intlv_addr_bit != 9) {
+		pr_err("%s: Invalid interleave address bit for hashing %d.\n",
+			__func__, intlv_addr_bit);
 		goto out_err;
 	}
 
-	num_intlv_bits += intlv_num_dies;
-
-	/* Add a bit if sockets are interleaved. */
-	num_intlv_bits += intlv_num_sockets;
+	num_intlv_bits = intlv_num_chan + intlv_num_dies + intlv_num_sockets;
 
 	/* Assert num_intlv_bits <= 4 */
 	if (num_intlv_bits > 4) {
@@ -788,9 +817,11 @@  int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr)
 	}
 
 	if (num_intlv_bits > 0) {
-		u64 temp_addr_x, temp_addr_i, temp_addr_y;
-		u32 reg_sys_fabric_id, cs_fabric_id;
+		u8 cs_fabric_id_msb = (df_version >= 3) ? 13 : 15;
 		u8 die_id_bit, sock_id_bit;
+		u64 addr_x, addr_y, addr_z;
+		u8 node_id_shift = 0;
+		u32 cs_fabric_id;
 
 		/*
 		 * This is the fabric id for this coherent slave. Use
@@ -800,7 +831,7 @@  int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr)
 		if (amd_df_indirect_read(nid, 0, DF_F0_FABRICINSTINFO3, umc, &cs_fabric_id))
 			goto out_err;
 
-		cs_fabric_id = get_bits(cs_fabric_id, 15, 8);
+		cs_fabric_id = get_bits(cs_fabric_id, cs_fabric_id_msb, 8);
 		die_id_bit   = 0;
 
 		/* If interleaved over more than 1 channel: */
@@ -808,43 +839,83 @@  int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr)
 			die_id_bit = intlv_num_chan;
 			cs_mask	   = (1 << die_id_bit) - 1;
 			cs_id	   = cs_fabric_id & cs_mask;
+			cs_id	  -= dst_fabric_id & cs_mask;
 		}
 
 		sock_id_bit = die_id_bit;
 
-		if (intlv_num_dies || intlv_num_sockets)
-			if (amd_df_indirect_read(nid, 1, DF_F1_SYSFABRICID, umc, &reg_sys_fabric_id))
+		if ((intlv_num_dies || intlv_num_sockets) && df_version >= 3) {
+			if (amd_df_indirect_read(nid, 1, DF_F1_SYSFABRICID1, umc, &reg_sys_fabric_id))
 				goto out_err;
 
+			node_id_shift = get_bits(reg_sys_fabric_id, 3, 0);
+		}
+
 		/* If interleaved over more than 1 die. */
 		if (intlv_num_dies) {
+			u8 die_id_shift, die_id_mask;
+
 			sock_id_bit  = die_id_bit + intlv_num_dies;
-			die_id_shift = get_bits(reg_sys_fabric_id, 27, 24);
-			die_id_mask  = get_bits(reg_sys_fabric_id, 15, 8);
+
+			if (df_version >= 3) {
+				die_id_shift = get_bits(reg_sys_fabric_id, 3, 0) + node_id_shift;
+
+				die_id_mask  = get_bits(reg_sys_fabric_id, 18, 16);
+				die_id_mask <<= node_id_shift;
+			} else {
+				die_id_shift = get_bits(reg_sys_fabric_id, 27, 24);
+				die_id_mask  = get_bits(reg_sys_fabric_id, 15, 8);
+			}
 
 			cs_id |= ((cs_fabric_id & die_id_mask) >> die_id_shift) << die_id_bit;
 		}
 
 		/* If interleaved over more than 1 socket. */
 		if (intlv_num_sockets) {
-			socket_id_shift	= get_bits(reg_sys_fabric_id, 31, 28);
-			socket_id_mask	= get_bits(reg_sys_fabric_id, 23, 16);
+			u8 socket_id_shift, socket_id_mask;
+
+			if (df_version >= 3) {
+				socket_id_shift	= get_bits(reg_sys_fabric_id, 10, 8);
+				socket_id_shift += node_id_shift;
+
+				socket_id_mask	= get_bits(reg_sys_fabric_id, 26, 24);
+				socket_id_mask <<= node_id_shift;
+			} else {
+				socket_id_shift	= get_bits(reg_sys_fabric_id, 31, 28);
+				socket_id_mask	= get_bits(reg_sys_fabric_id, 23, 16);
+			}
 
 			cs_id |= ((cs_fabric_id & socket_id_mask) >> socket_id_shift) << sock_id_bit;
 		}
 
 		/*
 		 * The pre-interleaved address consists of XXXXXXIIIYYYYY
-		 * where III is the ID for this CS, and XXXXXXYYYYY are the
-		 * address bits from the post-interleaved address.
-		 * "num_intlv_bits" has been calculated to tell us how many "I"
-		 * bits there are. "intlv_addr_bit" tells us how many "Y" bits
-		 * there are (where "I" starts).
+		 * or XXXXXXIIZZZIYYY where III is the ID for this CS, and
+		 * XXXXXXZZZYYYYY are the address bits from the post-interleaved
+		 * address. "num_intlv_bits" has been calculated to tell us how
+		 * many "I" bits there are. "intlv_addr_bit" tells us how many
+		 * "Y" bits there are (where "I" starts).
+		 *
+		 * The "split" III is only used in the COD modes, where there
+		 * is one bit I at "intlv_addr_bit", and the remaining CS bits
+		 * are higher up starting at bit 12.
 		 */
-		temp_addr_y = get_bits(ret_addr, intlv_addr_bit-1, 0);
-		temp_addr_i = (cs_id << intlv_addr_bit);
-		temp_addr_x = (ret_addr & GENMASK_ULL(63, intlv_addr_bit)) << num_intlv_bits;
-		ret_addr    = temp_addr_x | temp_addr_i | temp_addr_y;
+		addr_y = get_bits(ret_addr, intlv_addr_bit - 1, 0);
+
+		if (split_normalized) {
+			addr_x = ret_addr & GENMASK_ULL(63, 11);
+			addr_x <<= num_intlv_bits;
+
+			addr_z = ret_addr & GENMASK_ULL(10, intlv_addr_bit);
+			addr_z <<= 1;
+		} else {
+			addr_x = ret_addr & GENMASK_ULL(63, intlv_addr_bit);
+			addr_x <<= num_intlv_bits;
+
+			addr_z = 0;
+		}
+
+		ret_addr = addr_x | addr_z | addr_y;
 	}
 
 	/* Add dram base address */
@@ -860,18 +931,69 @@  int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr)
 			ret_addr += (BIT_ULL(32) - dram_hole_base);
 	}
 
-	if (hash_enabled) {
-		hashed_bit =	get_bit(ret_addr, 12) ^
+	/*
+	 * There are three cases for hashing:
+	 * 1) No Hashing
+	 * 2) Legacy Hashing
+	 * 3) Cluster-on-Die (COD) Hashing
+	 */
+	if (!hash_enabled) {
+		/* Fill in the interleave bit. */
+		if (intlv_num_chan)
+			ret_addr |= (cs_id << intlv_addr_bit);
+	} else if (df_version == 2) {
+		/* Legacy 2ch hash. */
+		u8 hashed_bit =	get_bit(ret_addr, 12) ^
 				get_bit(ret_addr, 18) ^
 				get_bit(ret_addr, 21) ^
 				get_bit(ret_addr, 30) ^
 				get_bit(cs_id, 0);
 
-		if (hashed_bit != get_bit(ret_addr, intlv_addr_bit))
-			ret_addr ^= BIT(intlv_addr_bit);
+		ret_addr ^= hashed_bit << intlv_addr_bit;
+	} else {
+		u8 hashed_bit, hash_ctl_64K, hash_ctl_2M, hash_ctl_1G;
+		u32 reg_df_global_ctrl;
+
+		if (amd_df_indirect_read(nid, 0, DF_F0_DFGLOBALCTRL, umc, &reg_df_global_ctrl))
+			goto out_err;
+
+		hash_ctl_64K = get_bit(reg_df_global_ctrl, 20);
+		hash_ctl_2M  = get_bit(reg_df_global_ctrl, 21);
+		hash_ctl_1G  = get_bit(reg_df_global_ctrl, 22);
+
+		/* COD with 2ch, 4ch, or 8ch hash. */
+		hashed_bit =	get_bit(ret_addr, 14) ^
+				(get_bit(ret_addr, 18) & hash_ctl_64K) ^
+				(get_bit(ret_addr, 23) & hash_ctl_2M) ^
+				(get_bit(ret_addr, 32) & hash_ctl_1G) ^
+				get_bit(cs_id, 0);
+
+		ret_addr ^= hashed_bit << intlv_addr_bit;
+
+		/* COD with 4ch or 8ch hash. */
+		if ((intlv_num_chan == 2) || (intlv_num_chan == 3)) {
+			hashed_bit =	get_bit(ret_addr, 12) ^
+					(get_bit(ret_addr, 16) & hash_ctl_64K) ^
+					(get_bit(ret_addr, 21) & hash_ctl_2M) ^
+					(get_bit(ret_addr, 30) & hash_ctl_1G) ^
+					get_bit(cs_id, 1);
+
+			ret_addr ^= hashed_bit << 12;
+		}
+
+		/* COD with 8ch hash. */
+		if (intlv_num_chan == 3) {
+			hashed_bit =	get_bit(ret_addr, 13) ^
+					(get_bit(ret_addr, 17) & hash_ctl_64K) ^
+					(get_bit(ret_addr, 22) & hash_ctl_2M) ^
+					(get_bit(ret_addr, 31) & hash_ctl_1G) ^
+					get_bit(cs_id, 2);
+
+			ret_addr ^= hashed_bit << 13;
+		}
 	}
 
-	/* Is calculated system address is above DRAM limit address? */
+	/* Is calculated system address above DRAM limit address? */
 	if (ret_addr > dram_limit_addr)
 		goto out_err;