diff mbox series

[v2,1/2] EDAC/amd64: Check register values from all UMCs

Message ID 20211215155309.2711917-2-yazen.ghannam@amd.com (mailing list archive)
State New, archived
Headers show
Series AMD Family 19h Models 10h-1Fh Updates | expand

Commit Message

Yazen Ghannam Dec. 15, 2021, 3:53 p.m. UTC
The initial support for Unified Memory Controllers (UMCs) was added to
AMD64 EDAC for the first generation of Zen systems. These systems have
two UMCs per Die, and the code originally assumed two UMCs in various
places.

Later systems have more than two UMCs, and this assumption was fixed in
the following commits.

commit bdcee7747f5c ("EDAC/amd64: Support more than two Unified Memory Controllers")
commit d971e28e2ce4 ("EDAC/amd64: Support more than two controllers for chip selects handling")

However, the determine_memory_type() function was missed in these
changes, and two UMCs are still assumed.

Update determine_memory_type() to account for all UMCs when checking the
register values.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20211208174356.1997855-4-yazen.ghannam@amd.com

v1->v2:
* Was patch 3 in v1.
* Update commit message.

 drivers/edac/amd64_edac.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Borislav Petkov Dec. 15, 2021, 6:01 p.m. UTC | #1
On Wed, Dec 15, 2021 at 03:53:08PM +0000, Yazen Ghannam wrote:
> -		if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
> +		u32 umc_cfg = 0, dimm_cfg = 0, i = 0;
> +
> +		for_each_umc(i) {
> +			umc_cfg  |= pvt->umc[i].umc_cfg;
> +			dimm_cfg |= pvt->umc[i].dimm_cfg;
> +		}
> +
> +		if (dimm_cfg & BIT(5))
>  			pvt->dram_type = MEM_LRDDR4;
> -		else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
> +		else if (dimm_cfg & BIT(4))

You're working here under the assumption that bit 4 and 5 will have the
same value on all those UMCs.

You're probably going to say that that is how the BIOS is programming
them so they should be all the same and any other configuration is
invalid but lemme still ask about it explicitly.

And if so, this would probably need a comment above it which I can add
when applying...

Hmm?
Yazen Ghannam Dec. 16, 2021, 4:08 p.m. UTC | #2
On Wed, Dec 15, 2021 at 07:01:22PM +0100, Borislav Petkov wrote:
> On Wed, Dec 15, 2021 at 03:53:08PM +0000, Yazen Ghannam wrote:
> > -		if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
> > +		u32 umc_cfg = 0, dimm_cfg = 0, i = 0;
> > +
> > +		for_each_umc(i) {
> > +			umc_cfg  |= pvt->umc[i].umc_cfg;
> > +			dimm_cfg |= pvt->umc[i].dimm_cfg;
> > +		}
> > +
> > +		if (dimm_cfg & BIT(5))
> >  			pvt->dram_type = MEM_LRDDR4;
> > -		else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
> > +		else if (dimm_cfg & BIT(4))
> 
> You're working here under the assumption that bit 4 and 5 will have the
> same value on all those UMCs.
> 
> You're probably going to say that that is how the BIOS is programming
> them so they should be all the same and any other configuration is
> invalid but lemme still ask about it explicitly.
> 
> And if so, this would probably need a comment above it which I can add
> when applying...
> 
> Hmm?
>

No, that's a good question. And actually the assumption is incorrect. It is
allowed to have different DIMM types in a system though all DIMMs on a single
UMC must match.

Do you recommend a follow up patch or should this one be reworked?

Thanks,
Yazen
Borislav Petkov Dec. 30, 2021, 11:36 a.m. UTC | #3
On Thu, Dec 16, 2021 at 04:08:18PM +0000, Yazen Ghannam wrote:
> No, that's a good question. And actually the assumption is incorrect. It is
> allowed to have different DIMM types in a system though all DIMMs on a single
> UMC must match.

Oh fun, really?!

So a single system can have DDR4 *and* DDR5 on the same board?!

So then that

	pvt->dram_type

is insufficient to store the DIMM type for a pvt. If you have multiple
UMCs on a pvt and all have different type DIMMs, then you need the
relevant DIMM type when you dump it in sysfs...

Which then means, you need ->dram_type to be per UMC...

And also, I'm assuming the hw already enforces that DIMMs on a single
UMC must match - it simply won't boot if they don't so you don't have to
enforce that, at least.

> Do you recommend a follow up patch or should this one be reworked?

This one is insufficient, I'm afraid.

One way to address this is, you could use pvt->umc at the places where
dram_type is used and assign directly to the dimm->mtype thing. But then
you'd need a way to map each struct dimm_info *dimm to the UMC so that
you can determine the correct DIMM type.

Which would make pvt->dram_type redundant and can be removed.

Or you might have a better idea...

HTH.
Yazen Ghannam Jan. 5, 2022, 4:12 p.m. UTC | #4
On Thu, Dec 30, 2021 at 12:36:44PM +0100, Borislav Petkov wrote:
> On Thu, Dec 16, 2021 at 04:08:18PM +0000, Yazen Ghannam wrote:
> > No, that's a good question. And actually the assumption is incorrect. It is
> > allowed to have different DIMM types in a system though all DIMMs on a single
> > UMC must match.
> 
> Oh fun, really?!
> 
> So a single system can have DDR4 *and* DDR5 on the same board?!
>

Well, I don't know about that specifically. There are some restrictions, but
you could have UDIMMs and RDIMMs of the same generation, at least.
 
> So then that
> 
> 	pvt->dram_type
> 
> is insufficient to store the DIMM type for a pvt. If you have multiple
> UMCs on a pvt and all have different type DIMMs, then you need the
> relevant DIMM type when you dump it in sysfs...
> 
> Which then means, you need ->dram_type to be per UMC...
> 
> And also, I'm assuming the hw already enforces that DIMMs on a single
> UMC must match - it simply won't boot if they don't so you don't have to
> enforce that, at least.
> 
> > Do you recommend a follow up patch or should this one be reworked?
> 
> This one is insufficient, I'm afraid.
> 
> One way to address this is, you could use pvt->umc at the places where
> dram_type is used and assign directly to the dimm->mtype thing. But then
> you'd need a way to map each struct dimm_info *dimm to the UMC so that
> you can determine the correct DIMM type.
> 

I did send a patch that did something like this.
https://lkml.kernel.org/r/20211228200615.412999-2-yazen.ghannam@amd.com

Though this got a build warning report, so I need to follow up on that.

> Which would make pvt->dram_type redundant and can be removed.
>

I kept this so as to not break legacy systems. But I'll look at it again. I
think you may be right.

Thanks,
Yazen
diff mbox series

Patch

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index ff29267e46a6..1df763128483 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1621,9 +1621,16 @@  static void determine_memory_type(struct amd64_pvt *pvt)
 	u32 dram_ctrl, dcsm;
 
 	if (pvt->umc) {
-		if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(5))
+		u32 umc_cfg = 0, dimm_cfg = 0, i = 0;
+
+		for_each_umc(i) {
+			umc_cfg  |= pvt->umc[i].umc_cfg;
+			dimm_cfg |= pvt->umc[i].dimm_cfg;
+		}
+
+		if (dimm_cfg & BIT(5))
 			pvt->dram_type = MEM_LRDDR4;
-		else if ((pvt->umc[0].dimm_cfg | pvt->umc[1].dimm_cfg) & BIT(4))
+		else if (dimm_cfg & BIT(4))
 			pvt->dram_type = MEM_RDDR4;
 		else
 			pvt->dram_type = MEM_DDR4;