diff mbox series

[v7,03/12] EDAC/mce_amd: Extract node id from MCA_IPID

Message ID 20220203174942.31630-4-nchatrad@amd.com (mailing list archive)
State New, archived
Headers show
Series x86/edac/amd64: Add support for GPU nodes | expand

Commit Message

Naveen Krishna Chatradhi Feb. 3, 2022, 5:49 p.m. UTC
On SMCA banks of the GPU nodes, the node id information is
available in register MCA_IPID[47:44](InstanceIdHi).

Convert the hardware node ID to a value used by Linux
where GPU nodes are sequentially after the CPU nodes.

Co-developed-by: Muralidhara M K <muralimk@amd.com>
Signed-off-by: Muralidhara M K <muralimk@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
---
Link:
https://lkml.kernel.org/r/20211028130106.15701-3-nchatrad@amd.com

v6->v7
* None

v5->v6:
* Called amd_get_gpu_node_id function to get node_id

v4->v5:
* None

v3->v4:
* Add reviewed by Yazen

v2->v3:
* Use APIs from amd_nb to identify the gpu_node_start_id and cpu_node_count.
  Which is required to map the hardware node id to node id enumerated by Linux.

v1->v2:
* Modified subject and commit message
* Added Reviewed by Yazen Ghannam

v0->v1:
* Modified the commit message
* Rearranged the conditions before calling decode_dram_ecc()


 drivers/edac/mce_amd.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

Comments

Yazen Ghannam Feb. 9, 2022, 11:31 p.m. UTC | #1
On Thu, Feb 03, 2022 at 11:49:33AM -0600, Naveen Krishna Chatradhi wrote:
> On SMCA banks of the GPU nodes, the node id information is
> available in register MCA_IPID[47:44](InstanceIdHi).
> 
> Convert the hardware node ID to a value used by Linux
> where GPU nodes are sequentially after the CPU nodes.
>

Terminology should be consistent. I see "node id" and "node ID" here.
 
...

> +		} else if (bank_type == SMCA_UMC_V2) {
> +			/*
> +			 * SMCA_UMC_V2 exists on GPU nodes, extract the node id
> +			 * from register MCA_IPID[47:44](InstanceIdHi).
> +			 * The InstanceIdHi field represents the instance ID of the GPU.
> +			 * Which needs to be mapped to a value used by Linux,
> +			 * where GPU nodes are simply numerically after the CPU nodes.
> +			 */
> +			node_id = amd_get_gpu_node_system_id(m->ipid);

As mentioned for the previous patch, why not define this function in EDAC?

Thanks,
Yazen
Naveen Krishna Chatradhi Feb. 14, 2022, 5:54 p.m. UTC | #2
Hi Yazen

On 2/10/2022 5:01 AM, Yazen Ghannam wrote:
> On Thu, Feb 03, 2022 at 11:49:33AM -0600, Naveen Krishna Chatradhi wrote:
>> On SMCA banks of the GPU nodes, the node id information is
>> available in register MCA_IPID[47:44](InstanceIdHi).
>>
>> Convert the hardware node ID to a value used by Linux
>> where GPU nodes are sequentially after the CPU nodes.
>>
> Terminology should be consistent. I see "node id" and "node ID" here.
Will keep it consistent.
>   
> ...
>
>> +		} else if (bank_type == SMCA_UMC_V2) {
>> +			/*
>> +			 * SMCA_UMC_V2 exists on GPU nodes, extract the node id
>> +			 * from register MCA_IPID[47:44](InstanceIdHi).
>> +			 * The InstanceIdHi field represents the instance ID of the GPU.
>> +			 * Which needs to be mapped to a value used by Linux,
>> +			 * where GPU nodes are simply numerically after the CPU nodes.
>> +			 */
>> +			node_id = amd_get_gpu_node_system_id(m->ipid);
> As mentioned for the previous patch, why not define this function in EDAC?

Sure, with recent changes we can move this function to edac. Will wait 
for comments on other patches

in the series and submit next version with feedback addressed.

Regards,

Naveenk

>
> Thanks,
> Yazen
diff mbox series

Patch

diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index cc5c63feb26a..865a925ccef0 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -2,6 +2,7 @@ 
 #include <linux/module.h>
 #include <linux/slab.h>
 
+#include <asm/amd_nb.h>
 #include <asm/cpu.h>
 
 #include "mce_amd.h"
@@ -1186,8 +1187,26 @@  static void decode_smca_error(struct mce *m)
 	if (xec < smca_mce_descs[bank_type].num_descs)
 		pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]);
 
-	if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
-		decode_dram_ecc(topology_die_id(m->extcpu), m);
+	if (xec == 0 && decode_dram_ecc) {
+		int node_id = 0;
+
+		if (bank_type == SMCA_UMC) {
+			node_id = topology_die_id(m->extcpu);
+		} else if (bank_type == SMCA_UMC_V2) {
+			/*
+			 * SMCA_UMC_V2 exists on GPU nodes, extract the node id
+			 * from register MCA_IPID[47:44](InstanceIdHi).
+			 * The InstanceIdHi field represents the instance ID of the GPU.
+			 * Which needs to be mapped to a value used by Linux,
+			 * where GPU nodes are simply numerically after the CPU nodes.
+			 */
+			node_id = amd_get_gpu_node_system_id(m->ipid);
+		} else {
+			return;
+		}
+
+		decode_dram_ecc(node_id, m);
+	}
 }
 
 static inline void amd_decode_err_code(u16 ec)