diff mbox series

ie31200_edac missing PCI ID for i3-4370

Message ID CAHq9+ShGiB_H6-E=L398zYR=ja16r2OuvJZfU4KLof=segyJbw@mail.gmail.com (mailing list archive)
State New, archived
Headers show
Series ie31200_edac missing PCI ID for i3-4370 | expand

Commit Message

Paul Marks Feb. 1, 2021, 12:07 a.m. UTC
I have an ASRock C226M WS with an i3-4370 CPU.

# lspci -vnn
00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
            DRAM Controller [8086:0c00] (rev 06)
        Subsystem: ASRock Incorporation 4th Gen Core Processor
            DRAM Controller [1849:0c00]
        Flags: bus master, fast devsel, latency 0
        Capabilities: [e0] Vendor Specific Information: Len=0c <?>
        Kernel driver in use: hsw_uncore

But edac-util doesn't work:

# edac-util -v
edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs

I tried this ham-fisted patch:

# diff -u ./drivers/edac/ie31200_edac.c{.old,}

And it seems happy now:

# lspci -vnn
00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
            DRAM Controller [8086:0c00] (rev 06)
        Subsystem: ASRock Incorporation 4th Gen Core Processor
            DRAM Controller [1849:0c00]
        Flags: bus master, fast devsel, latency 0
        Capabilities: [e0] Vendor Specific Information: Len=0c <?>
        Kernel driver in use: hsw_uncore
        Kernel modules: ie31200_edac

# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: 0 Uncorrected Errors
mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors
mc0: csrow1: 0 Uncorrected Errors
mc0: csrow1: mc#0csrow#1channel#0: 0 Corrected Errors
edac-util: No errors to report.

I don't know if it's truly working because I can't overclock the RAM
to induce ECC errors, but still I think adding 8086:0c00 to this
driver could be useful.

Comments

Jason Baron Feb. 4, 2021, 10:59 p.m. UTC | #1
On 1/31/21 7:07 PM, Paul Marks wrote:
> I have an ASRock C226M WS with an i3-4370 CPU.
> 
> # lspci -vnn
> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
>             DRAM Controller [8086:0c00] (rev 06)
>         Subsystem: ASRock Incorporation 4th Gen Core Processor
>             DRAM Controller [1849:0c00]
>         Flags: bus master, fast devsel, latency 0
>         Capabilities: [e0] Vendor Specific Information: Len=0c <?>
>         Kernel driver in use: hsw_uncore
> 
> But edac-util doesn't work:
> 
> # edac-util -v
> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs
> 
> I tried this ham-fisted patch:
> 
> # diff -u ./drivers/edac/ie31200_edac.c{.old,}
> --- ./drivers/edac/ie31200_edac.c.old
> +++ ./drivers/edac/ie31200_edac.c
> @@ -58,7 +58,7 @@
>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918

just curious why you removed here and didn't just add?

> 
> And it seems happy now:
> 
> # lspci -vnn
> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
>             DRAM Controller [8086:0c00] (rev 06)
>         Subsystem: ASRock Incorporation 4th Gen Core Processor
>             DRAM Controller [1849:0c00]
>         Flags: bus master, fast devsel, latency 0
>         Capabilities: [e0] Vendor Specific Information: Len=0c <?>
>         Kernel driver in use: hsw_uncore
>         Kernel modules: ie31200_edac
> 
> # edac-util -v
> mc0: 0 Uncorrected Errors with no DIMM info
> mc0: 0 Corrected Errors with no DIMM info
> mc0: csrow0: 0 Uncorrected Errors
> mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors
> mc0: csrow1: 0 Uncorrected Errors
> mc0: csrow1: mc#0csrow#1channel#0: 0 Corrected Errors
> edac-util: No errors to report.
> 
> I don't know if it's truly working because I can't overclock the RAM
> to induce ECC errors, but still I think adding 8086:0c00 to this
> driver could be useful.
> 

Cool yeah - I think it makes sense to add if can confirm
that the Intel datasheet says that this cpu uses the same
registers to read errors from as the others. I can certainly
confirm that the other pci ids do increment ce counts...

Thanks,

-Jason
Paul Marks Feb. 4, 2021, 11:22 p.m. UTC | #2
On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote:
>
> On 1/31/21 7:07 PM, Paul Marks wrote:
> > I have an ASRock C226M WS with an i3-4370 CPU.
> >
> > # lspci -vnn
> > 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
> >             DRAM Controller [8086:0c00] (rev 06)
> >         Subsystem: ASRock Incorporation 4th Gen Core Processor
> >             DRAM Controller [1849:0c00]
> >         Flags: bus master, fast devsel, latency 0
> >         Capabilities: [e0] Vendor Specific Information: Len=0c <?>
> >         Kernel driver in use: hsw_uncore
> >
> > But edac-util doesn't work:
> >
> > # edac-util -v
> > edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs
> >
> > I tried this ham-fisted patch:
> >
> > # diff -u ./drivers/edac/ie31200_edac.c{.old,}
> > --- ./drivers/edac/ie31200_edac.c.old
> > +++ ./drivers/edac/ie31200_edac.c
> > @@ -58,7 +58,7 @@
> >  #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
> >  #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
> >  #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
> > -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
> > +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
> >  #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
> >  #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
> >  #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918
>
> just curious why you removed here and didn't just add?

This is not a serious patch, just a one-liner to demonstrate the problem.

>
> >
> > And it seems happy now:
> >
> > # lspci -vnn
> > 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
> >             DRAM Controller [8086:0c00] (rev 06)
> >         Subsystem: ASRock Incorporation 4th Gen Core Processor
> >             DRAM Controller [1849:0c00]
> >         Flags: bus master, fast devsel, latency 0
> >         Capabilities: [e0] Vendor Specific Information: Len=0c <?>
> >         Kernel driver in use: hsw_uncore
> >         Kernel modules: ie31200_edac
> >
> > # edac-util -v
> > mc0: 0 Uncorrected Errors with no DIMM info
> > mc0: 0 Corrected Errors with no DIMM info
> > mc0: csrow0: 0 Uncorrected Errors
> > mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors
> > mc0: csrow1: 0 Uncorrected Errors
> > mc0: csrow1: mc#0csrow#1channel#0: 0 Corrected Errors
> > edac-util: No errors to report.
> >
> > I don't know if it's truly working because I can't overclock the RAM
> > to induce ECC errors, but still I think adding 8086:0c00 to this
> > driver could be useful.
> >
>
> Cool yeah - I think it makes sense to add if can confirm
> that the Intel datasheet says that this cpu uses the same
> registers to read errors from as the others. I can certainly
> confirm that the other pci ids do increment ce counts...
>
> Thanks,
>
> -Jason
Jason Baron Feb. 9, 2021, 10:25 p.m. UTC | #3
On 2/4/21 6:22 PM, Paul Marks wrote:
> On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote:
>>
>> On 1/31/21 7:07 PM, Paul Marks wrote:
>>> I have an ASRock C226M WS with an i3-4370 CPU.
>>>
>>> # lspci -vnn
>>> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
>>>             DRAM Controller [8086:0c00] (rev 06)
>>>         Subsystem: ASRock Incorporation 4th Gen Core Processor
>>>             DRAM Controller [1849:0c00]
>>>         Flags: bus master, fast devsel, latency 0
>>>         Capabilities: [e0] Vendor Specific Information: Len=0c <?>
>>>         Kernel driver in use: hsw_uncore
>>>
>>> But edac-util doesn't work:
>>>
>>> # edac-util -v
>>> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs
>>>
>>> I tried this ham-fisted patch:
>>>
>>> # diff -u ./drivers/edac/ie31200_edac.c{.old,}
>>> --- ./drivers/edac/ie31200_edac.c.old
>>> +++ ./drivers/edac/ie31200_edac.c
>>> @@ -58,7 +58,7 @@
>>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
>>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
>>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
>>> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
>>> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
>>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
>>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
>>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918
>>
>> just curious why you removed here and didn't just add?
> 
> This is not a serious patch, just a one-liner to demonstrate the problem.

Ok. Any chance you can find the datasheet that shows that this
driver is using the appropriate registers for this hw? I didn't
find it quickly looking...

Thanks,

-Jason
Paul Marks Feb. 9, 2021, 11:58 p.m. UTC | #4
On Tue, Feb 9, 2021 at 2:25 PM Jason Baron <jbaron@akamai.com> wrote:
>
> On 2/4/21 6:22 PM, Paul Marks wrote:
> > On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote:
> >>
> >> On 1/31/21 7:07 PM, Paul Marks wrote:
> >>> I have an ASRock C226M WS with an i3-4370 CPU.
> >>>
> >>> # lspci -vnn
> >>> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
> >>>             DRAM Controller [8086:0c00] (rev 06)
> >>>         Subsystem: ASRock Incorporation 4th Gen Core Processor
> >>>             DRAM Controller [1849:0c00]
> >>>         Flags: bus master, fast devsel, latency 0
> >>>         Capabilities: [e0] Vendor Specific Information: Len=0c <?>
> >>>         Kernel driver in use: hsw_uncore
> >>>
> >>> But edac-util doesn't work:
> >>>
> >>> # edac-util -v
> >>> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs
> >>>
> >>> I tried this ham-fisted patch:
> >>>
> >>> # diff -u ./drivers/edac/ie31200_edac.c{.old,}
> >>> --- ./drivers/edac/ie31200_edac.c.old
> >>> +++ ./drivers/edac/ie31200_edac.c
> >>> @@ -58,7 +58,7 @@
> >>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
> >>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
> >>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
> >>> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
> >>> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
> >>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
> >>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
> >>>  #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918
> >>
> >> just curious why you removed here and didn't just add?
> >
> > This is not a serious patch, just a one-liner to demonstrate the problem.
>
> Ok. Any chance you can find the datasheet that shows that this
> driver is using the appropriate registers for this hw? I didn't
> find it quickly looking...
>

I wouldn't know where to begin.  Do you have an example of a similar
datasheet from one of the known-good devices?

I left "memtester" running on this machine, because it might increase
the odds of generating an ECC error someday.
Jason Baron Feb. 10, 2021, 3:27 a.m. UTC | #5
On 2/9/21 6:58 PM, Paul Marks wrote:
> On Tue, Feb 9, 2021 at 2:25 PM Jason Baron <jbaron@akamai.com> wrote:
>> On 2/4/21 6:22 PM, Paul Marks wrote:
>>> On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote:
>>>> On 1/31/21 7:07 PM, Paul Marks wrote:
>>>>> I have an ASRock C226M WS with an i3-4370 CPU.
>>>>>
>>>>> # lspci -vnn
>>>>> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
>>>>>              DRAM Controller [8086:0c00] (rev 06)
>>>>>          Subsystem: ASRock Incorporation 4th Gen Core Processor
>>>>>              DRAM Controller [1849:0c00]
>>>>>          Flags: bus master, fast devsel, latency 0
>>>>>          Capabilities: [e0] Vendor Specific Information: Len=0c <?>
>>>>>          Kernel driver in use: hsw_uncore
>>>>>
>>>>> But edac-util doesn't work:
>>>>>
>>>>> # edac-util -v
>>>>> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs
>>>>>
>>>>> I tried this ham-fisted patch:
>>>>>
>>>>> # diff -u ./drivers/edac/ie31200_edac.c{.old,}
>>>>> --- ./drivers/edac/ie31200_edac.c.old
>>>>> +++ ./drivers/edac/ie31200_edac.c
>>>>> @@ -58,7 +58,7 @@
>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
>>>>> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
>>>>> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918
>>>> just curious why you removed here and didn't just add?
>>> This is not a serious patch, just a one-liner to demonstrate the problem.
>> Ok. Any chance you can find the datasheet that shows that this
>> driver is using the appropriate registers for this hw? I didn't
>> find it quickly looking...
>>
> I wouldn't know where to begin.  Do you have an example of a similar
> datasheet from one of the known-good devices?
>
> I left "memtester" running on this machine, because it might increase
> the odds of generating an ECC error someday.
Hi Paul,

I have a list of them at the top of:
drivers/edac/ie31200_edac.c

According to the following intel link it looks
like '0xc[0-f]' is valid (page 52):
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e3-1200v3-vol-2-datasheet.pdf

So I'm fine with this patch (assuming it just
becomes an addition).

Thanks,

-Jason
Jason Baron Feb. 10, 2021, 3:31 p.m. UTC | #6
On 2/9/21 10:27 PM, Jason Baron wrote:
> 
> 
> On 2/9/21 6:58 PM, Paul Marks wrote:
>> On Tue, Feb 9, 2021 at 2:25 PM Jason Baron <jbaron@akamai.com> wrote:
>>> On 2/4/21 6:22 PM, Paul Marks wrote:
>>>> On Thu, Feb 4, 2021 at 2:59 PM Jason Baron <jbaron@akamai.com> wrote:
>>>>> On 1/31/21 7:07 PM, Paul Marks wrote:
>>>>>> I have an ASRock C226M WS with an i3-4370 CPU.
>>>>>>
>>>>>> # lspci -vnn
>>>>>> 00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor
>>>>>>              DRAM Controller [8086:0c00] (rev 06)
>>>>>>          Subsystem: ASRock Incorporation 4th Gen Core Processor
>>>>>>              DRAM Controller [1849:0c00]
>>>>>>          Flags: bus master, fast devsel, latency 0
>>>>>>          Capabilities: [e0] Vendor Specific Information: Len=0c <?>
>>>>>>          Kernel driver in use: hsw_uncore
>>>>>>
>>>>>> But edac-util doesn't work:
>>>>>>
>>>>>> # edac-util -v
>>>>>> edac-util: Fatal: Unable to get EDAC data: Unable to find EDAC data in sysfs
>>>>>>
>>>>>> I tried this ham-fisted patch:
>>>>>>
>>>>>> # diff -u ./drivers/edac/ie31200_edac.c{.old,}
>>>>>> --- ./drivers/edac/ie31200_edac.c.old
>>>>>> +++ ./drivers/edac/ie31200_edac.c
>>>>>> @@ -58,7 +58,7 @@
>>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
>>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
>>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
>>>>>> -#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
>>>>>> +#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
>>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
>>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
>>>>>>   #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918
>>>>> just curious why you removed here and didn't just add?
>>>> This is not a serious patch, just a one-liner to demonstrate the problem.
>>> Ok. Any chance you can find the datasheet that shows that this
>>> driver is using the appropriate registers for this hw? I didn't
>>> find it quickly looking...
>>>
>> I wouldn't know where to begin.  Do you have an example of a similar
>> datasheet from one of the known-good devices?
>>
>> I left "memtester" running on this machine, because it might increase
>> the odds of generating an ECC error someday.
> Hi Paul,
> 
> I have a list of them at the top of:
> drivers/edac/ie31200_edac.c
> 
> According to the following intel link it looks
> like '0xc[0-f]' is valid (page 52):

Sorry meant to write that as: '0x0c0[0-f]'.


> https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e3-1200v3-vol-2-datasheet.pdf
> 
> So I'm fine with this patch (assuming it just
> becomes an addition).
> 
> Thanks,
> 
> -Jason
>
diff mbox series

Patch

--- ./drivers/edac/ie31200_edac.c.old
+++ ./drivers/edac/ie31200_edac.c
@@ -58,7 +58,7 @@ 
 #define PCI_DEVICE_ID_INTEL_IE31200_HB_3 0x0150
 #define PCI_DEVICE_ID_INTEL_IE31200_HB_4 0x0158
 #define PCI_DEVICE_ID_INTEL_IE31200_HB_5 0x015c
-#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c04
+#define PCI_DEVICE_ID_INTEL_IE31200_HB_6 0x0c00
 #define PCI_DEVICE_ID_INTEL_IE31200_HB_7 0x0c08
 #define PCI_DEVICE_ID_INTEL_IE31200_HB_8 0x1918
 #define PCI_DEVICE_ID_INTEL_IE31200_HB_9 0x5918