diff mbox series

[v9,4/9] docs: misc-devices: (smpro-errmon) Add documentation

Message ID 20220929094321.770125-5-quan@os.amperecomputing.com (mailing list archive)
State Handled Elsewhere
Headers show
Series Add Ampere's Altra SMPro MFD and its child drivers | expand

Commit Message

Quan Nguyen Sept. 29, 2022, 9:43 a.m. UTC
Adds documentation for Ampere(R)'s Altra(R) SMpro errmon driver.

Signed-off-by: Thu Nguyen <thu@os.amperecomputing.com>
Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
---
Changes in v9:
  + Fix issue when building htmldocs                      [Bagas]
  + Remove unnecessary channel info for VRD and DIMM event [Quan]
  + Update SPDX license info                               [Greg]
  + Update document to align with new changes in sysfs     [Quan]

Changes in v8:
  + Update to reflect single value per sysfs  [Quan]

Changes in v7:
  + None

Changes in v6:
  + First introduced in v6 [Quan]

 Documentation/misc-devices/index.rst        |   1 +
 Documentation/misc-devices/smpro-errmon.rst | 193 ++++++++++++++++++++
 2 files changed, 194 insertions(+)
 create mode 100644 Documentation/misc-devices/smpro-errmon.rst

Comments

Greg Kroah-Hartman Sept. 29, 2022, 9:56 a.m. UTC | #1
On Thu, Sep 29, 2022 at 04:43:16PM +0700, Quan Nguyen wrote:
> Adds documentation for Ampere(R)'s Altra(R) SMpro errmon driver.
> 
> Signed-off-by: Thu Nguyen <thu@os.amperecomputing.com>
> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
> ---
> Changes in v9:
>   + Fix issue when building htmldocs                      [Bagas]
>   + Remove unnecessary channel info for VRD and DIMM event [Quan]
>   + Update SPDX license info                               [Greg]
>   + Update document to align with new changes in sysfs     [Quan]
> 
> Changes in v8:
>   + Update to reflect single value per sysfs  [Quan]
> 
> Changes in v7:
>   + None
> 
> Changes in v6:
>   + First introduced in v6 [Quan]
> 
>  Documentation/misc-devices/index.rst        |   1 +
>  Documentation/misc-devices/smpro-errmon.rst | 193 ++++++++++++++++++++
>  2 files changed, 194 insertions(+)
>  create mode 100644 Documentation/misc-devices/smpro-errmon.rst
> 
> diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst
> index 756be15a49a4..b74b3b34a235 100644
> --- a/Documentation/misc-devices/index.rst
> +++ b/Documentation/misc-devices/index.rst
> @@ -27,6 +27,7 @@ fit into other categories.
>     max6875
>     oxsemi-tornado
>     pci-endpoint-test
> +   smpro-errmon
>     spear-pcie-gadget
>     uacce
>     xilinx_sdfec
> diff --git a/Documentation/misc-devices/smpro-errmon.rst b/Documentation/misc-devices/smpro-errmon.rst
> new file mode 100644
> index 000000000000..b17f30a6cafd
> --- /dev/null
> +++ b/Documentation/misc-devices/smpro-errmon.rst
> @@ -0,0 +1,193 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +Kernel driver Ampere(R)'s Altra(R) SMpro errmon
> +===============================================
> +
> +Supported chips:
> +
> +  * Ampere(R) Altra(R)
> +
> +    Prefix: 'smpro'
> +
> +    Preference: Altra SoC BMC Interface Specification
> +
> +Author: Thu Nguyen <thu@os.amperecomputing.com>
> +
> +Description
> +-----------
> +
> +This driver supports hardware monitoring for Ampere(R) Altra(R) SoC's based on the
> +SMpro co-processor (SMpro).
> +The following SoC alert/event types are supported by the errmon driver:
> +
> +* Core CE/UE error
> +* Memory CE/UE error
> +* PCIe CE/UE error
> +* Other CE/UE error
> +* Internal SMpro/PMpro error
> +* VRD hot
> +* VRD warn/fault
> +* DIMM Hot
> +
> +The SMpro interface provides the registers to query the status of the SoC alerts/events
> +and their data and export to userspace by this driver.
> +
> +The SoC alerts/events will be referenced as error below.
> +
> +Usage Notes
> +-----------
> +
> +SMpro errmon driver creates the sysfs files for each error type.
> +Example: ``error_core_ce`` to get Core CE error type.
> +
> +* If the error is absented, the sysfs file returns empty.
> +* If the errors are presented, one each read to the sysfs, the oldest error will be returned and clear, the next read will be returned with the next error until all the errors are read out.
> +
> +For each host error type, SMpro keeps a latest max number of errors. All the oldest errors that were not read out will be dropped. In that case, the read to the corresponding overflow sysfs will return 1, otherwise, return 0.
> +Example: ``overflow_core_ce`` to report the overflow status of Core CE error type.
> +
> +The format of the error is depended on the error type.
> +
> +1) For Core/Memory/PCIe/Other CE/UE error types::
> +
> +The return 48-byte in hex format in table below:
> +
> +    =======   =============   ===========   ==========================================
> +    OFFSET    FIELD           SIZE (BYTE)   DESCRIPTION
> +    =======   =============   ===========   ==========================================
> +    00        Error Type      1             See Table below for details
> +    01        Subtype         1             See Table below for details
> +    02        Instance        2             See Table below for details
> +    04        Error status    4             See ARM RAS specification for details
> +    08        Error Address   8             See ARM RAS specification for details
> +    16        Error Misc 0    8             See ARM RAS specification for details
> +    24        Error Misc 1    8             See ARM RAS specification for details
> +    32        Error Misc 2    8             See ARM RAS specification for details
> +    40        Error Misc 3    8             See ARM RAS specification for details
> +    =======   =============   ===========   ==========================================
> +
> +Below table defines the value of Error types, Sub Types, Sub component and instance:
> +
> +    ============    ==========    =========   ===============  ====================================
> +    Error Group     Error Type    Sub type    Sub component    Instance
> +    ============    ==========    =========   ===============  ====================================
> +    CPM (core)      0             0           Snoop-Logic      CPM #
> +    CPM (core)      0             2           Armv8 Core 1     CPM #
> +    MCU (mem)       1             1           ERR1             MCU # | SLOT << 11
> +    MCU (mem)       1             2           ERR2             MCU # | SLOT << 11
> +    MCU (mem)       1             3           ERR3             MCU #
> +    MCU (mem)       1             4           ERR4             MCU #
> +    MCU (mem)       1             5           ERR5             MCU #
> +    MCU (mem)       1             6           ERR6             MCU #
> +    MCU (mem)       1             7           Link Error       MCU #
> +    Mesh (other)    2             0           Cross Point      X | (Y << 5) | NS <<11
> +    Mesh (other)    2             1           Home Node(IO)    X | (Y << 5) | NS <<11
> +    Mesh (other)    2             2           Home Node(Mem)   X | (Y << 5) | NS <<11 | device<<12
> +    Mesh (other)    2             4           CCIX Node        X | (Y << 5) | NS <<11
> +    2P Link (other) 3             0           N/A              Altra 2P Link #
> +    GIC (other)     5             0           ERR0             0
> +    GIC (other)     5             1           ERR1             0
> +    GIC (other)     5             2           ERR2             0
> +    GIC (other)     5             3           ERR3             0
> +    GIC (other)     5             4           ERR4             0
> +    GIC (other)     5             5           ERR5             0
> +    GIC (other)     5             6           ERR6             0
> +    GIC (other)     5             7           ERR7             0
> +    GIC (other)     5             8           ERR8             0
> +    GIC (other)     5             9           ERR9             0
> +    GIC (other)     5             10          ERR10            0
> +    GIC (other)     5             11          ERR11            0
> +    GIC (other)     5             12          ERR12            0
> +    GIC (other)     5             13-21       ERR13            RC# + 1
> +    SMMU (other)    6             TCU         100              RC #
> +    SMMU (other)    6             TBU0        0                RC #
> +    SMMU (other)    6             TBU1        1                RC #
> +    SMMU (other)    6             TBU2        2                RC #
> +    SMMU (other)    6             TBU3        3                RC #
> +    SMMU (other)    6             TBU4        4                RC #
> +    SMMU (other)    6             TBU5        5                RC #
> +    SMMU (other)    6             TBU6        6                RC #
> +    SMMU (other)    6             TBU7        7                RC #
> +    SMMU (other)    6             TBU8        8                RC #
> +    SMMU (other)    6             TBU9        9                RC #
> +    PCIe AER (pcie) 7             Root        0                RC #
> +    PCIe AER (pcie) 7             Device      1                RC #
> +    PCIe RC (pcie)  8             RCA HB      0                RC #
> +    PCIe RC (pcie)  8             RCB HB      1                RC #
> +    PCIe RC (pcie)  8             RASDP       8                RC #
> +    OCM (other)     9             ERR0        0                0
> +    OCM (other)     9             ERR1        1                0
> +    OCM (other)     9             ERR2        2                0
> +    SMpro (other)   10            ERR0        0                0
> +    SMpro (other)   10            ERR1        1                0
> +    SMpro (other)   10            MPA_ERR     2                0
> +    PMpro (other)   11            ERR0        0                0
> +    PMpro (other)   11            ERR1        1                0
> +    PMpro (other)   11            MPA_ERR     2                0
> +    ============    ==========    =========   ===============  ====================================
> +
> +    For example:
> +    # cat error_other_ue
> +    880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000
> +
> +2) For the Internal SMpro/PMpro error types::
> +
> +The error_[smpro|pmro] sysfs returns string of 8-byte hex value:
> +    <4-byte hex value of Error info><4-byte hex value of Error extensive data>
> +
> +The warn_[smpro|pmro] sysfs returns string of 4-byte hex value:
> +    <4-byte hex value of Warning info>
> +
> +Reference to Altra SoC BMC Interface Specification for the details.
> +
> +3) For the VRD hot, VRD /warn/fault, DIMM Hot event::
> +
> +The return string is 2-byte hex string value. Reference to section 5.7 GPI status register in Altra SoC BMC Interface Specification for the details.
> +
> +    Example:
> +    #cat event_vrd_hot
> +    0000
> +
> +Sysfs entries
> +-------------
> +
> +The following sysfs files are supported:
> +
> +* Ampere(R) Altra(R):
> +
> +Alert Types:
> +
> +    ========================  =================  ==================================================
> +    Alert Type                Sysfs name         Description
> +    ========================  =================  ==================================================
> +    Core CE Error             error_core_ce      Trigger when Core has CE error
> +    Core CE Error overflow    overflow_core_ce   Trigger when Core CE error overflow
> +    Core UE Error             error_core_ue      Trigger when Core has UE error
> +    Core UE Error overflow    overflow_core_ue   Trigger when Core UE error overflow
> +    Memory CE Error           error_mem_ce       Trigger when Memory has CE error
> +    Memory CE Error overflow  overflow_mem_ce    Trigger when Memory CE error overflow
> +    Memory UE Error           error_mem_ue       Trigger when Memory has UE error
> +    Memory UE Error overflow  overflow_mem_ue    Trigger when Memory UE error overflow
> +    PCIe CE Error             error_pcie_ce      Trigger when any PCIe controller has CE error
> +    PCIe CE Error overflow    overflow_pcie_ce   Trigger when any PCIe controller CE error overflow
> +    PCIe UE Error             error_pcie_ue      Trigger when any PCIe controller has UE error
> +    PCIe UE Error overflow    overflow_pcie_ue   Trigger when any PCIe controller UE error overflow
> +    Other CE Error            error_other_ce     Trigger when any Others CE error
> +    Other CE Error overflow   overflow_other_ce  Trigger when any Others CE error overflow
> +    Other UE Error            error_other_ue     Trigger when any Others UE error
> +    Other UE Error overflow   overflow_other_ue  Trigger when Others UE error overflow
> +    SMpro Error               error_smpro        Trigger when system have SMpro error
> +    SMpro Warning             warn_smpro         Trigger when system have SMpro warning
> +    PMpro Error               error_pmpro        Trigger when system have PMpro error
> +    PMpro Warning             warn_pmpro         Trigger when system have PMpro warning
> +    ========================  =================  ==================================================
> +
> +Event Type:
> +
> +    ============================ ==========================
> +    Event Type                   Sysfs name
> +    ============================ ==========================
> +    VRD HOT                      event_vrd_hot
> +    VR Warn/Fault                event_vrd_warn_fault
> +    DIMM Hot                     event_dimm_hot
> +    ============================ ==========================
> -- 
> 2.35.1
> 

Why not just put this in the driver itself to be generated automatically
instead of living in a file that will never be noticed if anything ever
changes?

thanks,

greg k-h
kernel test robot Sept. 30, 2022, 6:07 a.m. UTC | #2
Hi Quan,

I love your patch! Perhaps something to improve:

[auto build test WARNING on char-misc/char-misc-testing]
[also build test WARNING on groeck-staging/hwmon-next lee-mfd/for-mfd-next linus/master v6.0-rc7 next-20220929]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Quan-Nguyen/Add-Ampere-s-Altra-SMPro-MFD-and-its-child-drivers/20220929-174756
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git 3aa12610b481f99b5e4e3f801ff7f9b7629e4ecf
reproduce:
        # https://github.com/intel-lab-lkp/linux/commit/e0d572373099a5c32ca6240de7b418a4e2dd2471
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Quan-Nguyen/Add-Ampere-s-Altra-SMPro-MFD-and-its-child-drivers/20220929-174756
        git checkout e0d572373099a5c32ca6240de7b418a4e2dd2471
        make menuconfig
        # enable CONFIG_COMPILE_TEST, CONFIG_WARN_MISSING_DOCUMENTS, CONFIG_WARN_ABI_ERRORS
        make htmldocs

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> Documentation/misc-devices/smpro-errmon.rst:53: WARNING: Literal block expected; none found.
>> Documentation/misc-devices/smpro-errmon.rst:87: WARNING: Malformed table.

vim +53 Documentation/misc-devices/smpro-errmon.rst

    52	
  > 53	The return 48-byte in hex format in table below:
    54	
    55	    =======   =============   ===========   ==========================================
    56	    OFFSET    FIELD           SIZE (BYTE)   DESCRIPTION
    57	    =======   =============   ===========   ==========================================
    58	    00        Error Type      1             See Table below for details
    59	    01        Subtype         1             See Table below for details
    60	    02        Instance        2             See Table below for details
    61	    04        Error status    4             See ARM RAS specification for details
    62	    08        Error Address   8             See ARM RAS specification for details
    63	    16        Error Misc 0    8             See ARM RAS specification for details
    64	    24        Error Misc 1    8             See ARM RAS specification for details
    65	    32        Error Misc 2    8             See ARM RAS specification for details
    66	    40        Error Misc 3    8             See ARM RAS specification for details
    67	    =======   =============   ===========   ==========================================
    68	
    69	Below table defines the value of Error types, Sub Types, Sub component and instance:
    70	
    71	    ============    ==========    =========   ===============  ====================================
    72	    Error Group     Error Type    Sub type    Sub component    Instance
    73	    ============    ==========    =========   ===============  ====================================
    74	    CPM (core)      0             0           Snoop-Logic      CPM #
    75	    CPM (core)      0             2           Armv8 Core 1     CPM #
    76	    MCU (mem)       1             1           ERR1             MCU # | SLOT << 11
    77	    MCU (mem)       1             2           ERR2             MCU # | SLOT << 11
    78	    MCU (mem)       1             3           ERR3             MCU #
    79	    MCU (mem)       1             4           ERR4             MCU #
    80	    MCU (mem)       1             5           ERR5             MCU #
    81	    MCU (mem)       1             6           ERR6             MCU #
    82	    MCU (mem)       1             7           Link Error       MCU #
    83	    Mesh (other)    2             0           Cross Point      X | (Y << 5) | NS <<11
    84	    Mesh (other)    2             1           Home Node(IO)    X | (Y << 5) | NS <<11
    85	    Mesh (other)    2             2           Home Node(Mem)   X | (Y << 5) | NS <<11 | device<<12
    86	    Mesh (other)    2             4           CCIX Node        X | (Y << 5) | NS <<11
  > 87	    2P Link (other) 3             0           N/A              Altra 2P Link #
    88	    GIC (other)     5             0           ERR0             0
    89	    GIC (other)     5             1           ERR1             0
    90	    GIC (other)     5             2           ERR2             0
    91	    GIC (other)     5             3           ERR3             0
    92	    GIC (other)     5             4           ERR4             0
    93	    GIC (other)     5             5           ERR5             0
    94	    GIC (other)     5             6           ERR6             0
    95	    GIC (other)     5             7           ERR7             0
    96	    GIC (other)     5             8           ERR8             0
    97	    GIC (other)     5             9           ERR9             0
    98	    GIC (other)     5             10          ERR10            0
    99	    GIC (other)     5             11          ERR11            0
   100	    GIC (other)     5             12          ERR12            0
   101	    GIC (other)     5             13-21       ERR13            RC# + 1
   102	    SMMU (other)    6             TCU         100              RC #
   103	    SMMU (other)    6             TBU0        0                RC #
   104	    SMMU (other)    6             TBU1        1                RC #
   105	    SMMU (other)    6             TBU2        2                RC #
   106	    SMMU (other)    6             TBU3        3                RC #
   107	    SMMU (other)    6             TBU4        4                RC #
   108	    SMMU (other)    6             TBU5        5                RC #
   109	    SMMU (other)    6             TBU6        6                RC #
   110	    SMMU (other)    6             TBU7        7                RC #
   111	    SMMU (other)    6             TBU8        8                RC #
   112	    SMMU (other)    6             TBU9        9                RC #
   113	    PCIe AER (pcie) 7             Root        0                RC #
   114	    PCIe AER (pcie) 7             Device      1                RC #
   115	    PCIe RC (pcie)  8             RCA HB      0                RC #
   116	    PCIe RC (pcie)  8             RCB HB      1                RC #
   117	    PCIe RC (pcie)  8             RASDP       8                RC #
   118	    OCM (other)     9             ERR0        0                0
   119	    OCM (other)     9             ERR1        1                0
   120	    OCM (other)     9             ERR2        2                0
   121	    SMpro (other)   10            ERR0        0                0
   122	    SMpro (other)   10            ERR1        1                0
   123	    SMpro (other)   10            MPA_ERR     2                0
   124	    PMpro (other)   11            ERR0        0                0
   125	    PMpro (other)   11            ERR1        1                0
   126	    PMpro (other)   11            MPA_ERR     2                0
   127	    ============    ==========    =========   ===============  ====================================
   128	
   129	    For example:
   130	    # cat error_other_ue
   131	    880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000
   132
Bagas Sanjaya Sept. 30, 2022, 1:13 p.m. UTC | #3
On Thu, Sep 29, 2022 at 04:43:16PM +0700, Quan Nguyen wrote:
> Adds documentation for Ampere(R)'s Altra(R) SMpro errmon driver.

s/Adds/Add/

> diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst
> index 756be15a49a4..b74b3b34a235 100644
> --- a/Documentation/misc-devices/index.rst
> +++ b/Documentation/misc-devices/index.rst
> @@ -27,6 +27,7 @@ fit into other categories.
>     max6875
>     oxsemi-tornado
>     pci-endpoint-test
> +   smpro-errmon
>     spear-pcie-gadget
>     uacce
>     xilinx_sdfec
> diff --git a/Documentation/misc-devices/smpro-errmon.rst b/Documentation/misc-devices/smpro-errmon.rst
> new file mode 100644
> index 000000000000..b17f30a6cafd
> --- /dev/null
> +++ b/Documentation/misc-devices/smpro-errmon.rst
> @@ -0,0 +1,193 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +Kernel driver Ampere(R)'s Altra(R) SMpro errmon
> +===============================================
> +
> +Supported chips:
> +
> +  * Ampere(R) Altra(R)
> +
> +    Prefix: 'smpro'
> +
> +    Preference: Altra SoC BMC Interface Specification
> +
> +Author: Thu Nguyen <thu@os.amperecomputing.com>
> +
> +Description
> +-----------
> +
> +This driver supports hardware monitoring for Ampere(R) Altra(R) SoC's based on the
> +SMpro co-processor (SMpro).
> +The following SoC alert/event types are supported by the errmon driver:
> +
> +* Core CE/UE error
> +* Memory CE/UE error
> +* PCIe CE/UE error
> +* Other CE/UE error
> +* Internal SMpro/PMpro error
> +* VRD hot
> +* VRD warn/fault
> +* DIMM Hot
> +
> +The SMpro interface provides the registers to query the status of the SoC alerts/events
> +and their data and export to userspace by this driver.
> +
> +The SoC alerts/events will be referenced as error below.
> +
> +Usage Notes
> +-----------
> +
> +SMpro errmon driver creates the sysfs files for each error type.
> +Example: ``error_core_ce`` to get Core CE error type.
> +
> +* If the error is absented, the sysfs file returns empty.
> +* If the errors are presented, one each read to the sysfs, the oldest error will be returned and clear, the next read will be returned with the next error until all the errors are read out.
> +
> +For each host error type, SMpro keeps a latest max number of errors. All the oldest errors that were not read out will be dropped. In that case, the read to the corresponding overflow sysfs will return 1, otherwise, return 0.
> +Example: ``overflow_core_ce`` to report the overflow status of Core CE error type.
> +
> +The format of the error is depended on the error type.
> +
> +1) For Core/Memory/PCIe/Other CE/UE error types::
> +
> +The return 48-byte in hex format in table below:
> +
> +    =======   =============   ===========   ==========================================
> +    OFFSET    FIELD           SIZE (BYTE)   DESCRIPTION
> +    =======   =============   ===========   ==========================================
> +    00        Error Type      1             See Table below for details
> +    01        Subtype         1             See Table below for details
> +    02        Instance        2             See Table below for details
> +    04        Error status    4             See ARM RAS specification for details
> +    08        Error Address   8             See ARM RAS specification for details
> +    16        Error Misc 0    8             See ARM RAS specification for details
> +    24        Error Misc 1    8             See ARM RAS specification for details
> +    32        Error Misc 2    8             See ARM RAS specification for details
> +    40        Error Misc 3    8             See ARM RAS specification for details
> +    =======   =============   ===========   ==========================================
> +
> +Below table defines the value of Error types, Sub Types, Sub component and instance:
> +
> +    ============    ==========    =========   ===============  ====================================
> +    Error Group     Error Type    Sub type    Sub component    Instance
> +    ============    ==========    =========   ===============  ====================================
> +    CPM (core)      0             0           Snoop-Logic      CPM #
> +    CPM (core)      0             2           Armv8 Core 1     CPM #
> +    MCU (mem)       1             1           ERR1             MCU # | SLOT << 11
> +    MCU (mem)       1             2           ERR2             MCU # | SLOT << 11
> +    MCU (mem)       1             3           ERR3             MCU #
> +    MCU (mem)       1             4           ERR4             MCU #
> +    MCU (mem)       1             5           ERR5             MCU #
> +    MCU (mem)       1             6           ERR6             MCU #
> +    MCU (mem)       1             7           Link Error       MCU #
> +    Mesh (other)    2             0           Cross Point      X | (Y << 5) | NS <<11
> +    Mesh (other)    2             1           Home Node(IO)    X | (Y << 5) | NS <<11
> +    Mesh (other)    2             2           Home Node(Mem)   X | (Y << 5) | NS <<11 | device<<12
> +    Mesh (other)    2             4           CCIX Node        X | (Y << 5) | NS <<11
> +    2P Link (other) 3             0           N/A              Altra 2P Link #
> +    GIC (other)     5             0           ERR0             0
> +    GIC (other)     5             1           ERR1             0
> +    GIC (other)     5             2           ERR2             0
> +    GIC (other)     5             3           ERR3             0
> +    GIC (other)     5             4           ERR4             0
> +    GIC (other)     5             5           ERR5             0
> +    GIC (other)     5             6           ERR6             0
> +    GIC (other)     5             7           ERR7             0
> +    GIC (other)     5             8           ERR8             0
> +    GIC (other)     5             9           ERR9             0
> +    GIC (other)     5             10          ERR10            0
> +    GIC (other)     5             11          ERR11            0
> +    GIC (other)     5             12          ERR12            0
> +    GIC (other)     5             13-21       ERR13            RC# + 1
> +    SMMU (other)    6             TCU         100              RC #
> +    SMMU (other)    6             TBU0        0                RC #
> +    SMMU (other)    6             TBU1        1                RC #
> +    SMMU (other)    6             TBU2        2                RC #
> +    SMMU (other)    6             TBU3        3                RC #
> +    SMMU (other)    6             TBU4        4                RC #
> +    SMMU (other)    6             TBU5        5                RC #
> +    SMMU (other)    6             TBU6        6                RC #
> +    SMMU (other)    6             TBU7        7                RC #
> +    SMMU (other)    6             TBU8        8                RC #
> +    SMMU (other)    6             TBU9        9                RC #
> +    PCIe AER (pcie) 7             Root        0                RC #
> +    PCIe AER (pcie) 7             Device      1                RC #
> +    PCIe RC (pcie)  8             RCA HB      0                RC #
> +    PCIe RC (pcie)  8             RCB HB      1                RC #
> +    PCIe RC (pcie)  8             RASDP       8                RC #
> +    OCM (other)     9             ERR0        0                0
> +    OCM (other)     9             ERR1        1                0
> +    OCM (other)     9             ERR2        2                0
> +    SMpro (other)   10            ERR0        0                0
> +    SMpro (other)   10            ERR1        1                0
> +    SMpro (other)   10            MPA_ERR     2                0
> +    PMpro (other)   11            ERR0        0                0
> +    PMpro (other)   11            ERR1        1                0
> +    PMpro (other)   11            MPA_ERR     2                0
> +    ============    ==========    =========   ===============  ====================================
> +
> +    For example:
> +    # cat error_other_ue
> +    880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000
> +
> +2) For the Internal SMpro/PMpro error types::
> +
> +The error_[smpro|pmro] sysfs returns string of 8-byte hex value:
> +    <4-byte hex value of Error info><4-byte hex value of Error extensive data>
> +
> +The warn_[smpro|pmro] sysfs returns string of 4-byte hex value:
> +    <4-byte hex value of Warning info>
> +
> +Reference to Altra SoC BMC Interface Specification for the details.
> +
> +3) For the VRD hot, VRD /warn/fault, DIMM Hot event::
> +
> +The return string is 2-byte hex string value. Reference to section 5.7 GPI status register in Altra SoC BMC Interface Specification for the details.
> +
> +    Example:
> +    #cat event_vrd_hot
> +    0000
> +
> +Sysfs entries
> +-------------
> +
> +The following sysfs files are supported:
> +
> +* Ampere(R) Altra(R):
> +
> +Alert Types:
> +
> +    ========================  =================  ==================================================
> +    Alert Type                Sysfs name         Description
> +    ========================  =================  ==================================================
> +    Core CE Error             error_core_ce      Trigger when Core has CE error
> +    Core CE Error overflow    overflow_core_ce   Trigger when Core CE error overflow
> +    Core UE Error             error_core_ue      Trigger when Core has UE error
> +    Core UE Error overflow    overflow_core_ue   Trigger when Core UE error overflow
> +    Memory CE Error           error_mem_ce       Trigger when Memory has CE error
> +    Memory CE Error overflow  overflow_mem_ce    Trigger when Memory CE error overflow
> +    Memory UE Error           error_mem_ue       Trigger when Memory has UE error
> +    Memory UE Error overflow  overflow_mem_ue    Trigger when Memory UE error overflow
> +    PCIe CE Error             error_pcie_ce      Trigger when any PCIe controller has CE error
> +    PCIe CE Error overflow    overflow_pcie_ce   Trigger when any PCIe controller CE error overflow
> +    PCIe UE Error             error_pcie_ue      Trigger when any PCIe controller has UE error
> +    PCIe UE Error overflow    overflow_pcie_ue   Trigger when any PCIe controller UE error overflow
> +    Other CE Error            error_other_ce     Trigger when any Others CE error
> +    Other CE Error overflow   overflow_other_ce  Trigger when any Others CE error overflow
> +    Other UE Error            error_other_ue     Trigger when any Others UE error
> +    Other UE Error overflow   overflow_other_ue  Trigger when Others UE error overflow
> +    SMpro Error               error_smpro        Trigger when system have SMpro error
> +    SMpro Warning             warn_smpro         Trigger when system have SMpro warning
> +    PMpro Error               error_pmpro        Trigger when system have PMpro error
> +    PMpro Warning             warn_pmpro         Trigger when system have PMpro warning
> +    ========================  =================  ==================================================
> +
> +Event Type:
> +
> +    ============================ ==========================
> +    Event Type                   Sysfs name
> +    ============================ ==========================
> +    VRD HOT                      event_vrd_hot
> +    VR Warn/Fault                event_vrd_warn_fault
> +    DIMM Hot                     event_dimm_hot
> +    ============================ ==========================

The documentation above produces htmldocs warnings:
Documentation/misc-devices/smpro-errmon.rst:53: WARNING: Literal block expected; none found.
Documentation/misc-devices/smpro-errmon.rst:87: WARNING: Malformed table.
Text in column margin in table line 17.
<snipped>...
Documentation/misc-devices/smpro-errmon.rst:135: WARNING: Literal block expected; none found.
Documentation/misc-devices/smpro-errmon.rst:145: WARNING: Literal block expected; none found.

I have applied the fixup (with grammatical and formatting fixes):

---- >8 ----

diff --git a/Documentation/misc-devices/smpro-errmon.rst b/Documentation/misc-devices/smpro-errmon.rst
index b17f30a6cafdab..de8719cc47fd3c 100644
--- a/Documentation/misc-devices/smpro-errmon.rst
+++ b/Documentation/misc-devices/smpro-errmon.rst
@@ -7,18 +7,18 @@ Supported chips:
 
   * Ampere(R) Altra(R)
 
-    Prefix: 'smpro'
+    Prefix: `smpro`
 
-    Preference: Altra SoC BMC Interface Specification
+    Reference: `Altra SoC BMC Interface Specification`
 
 Author: Thu Nguyen <thu@os.amperecomputing.com>
 
 Description
 -----------
 
-This driver supports hardware monitoring for Ampere(R) Altra(R) SoC's based on the
-SMpro co-processor (SMpro).
-The following SoC alert/event types are supported by the errmon driver:
+The smpro-errmon driver supports hardware monitoring for Ampere(R) Altra(R)
+SoCs based on the SMpro co-processor (SMpro). The following SoC alert/event
+types are supported by the driver:
 
 * Core CE/UE error
 * Memory CE/UE error
@@ -29,165 +29,178 @@ The following SoC alert/event types are supported by the errmon driver:
 * VRD warn/fault
 * DIMM Hot
 
-The SMpro interface provides the registers to query the status of the SoC alerts/events
-and their data and export to userspace by this driver.
+The SMpro interface provides the registers to query the status of the SoC
+alerts/events and their data and export to userspace by this driver.
 
-The SoC alerts/events will be referenced as error below.
+The rest of this document will refer SoC alerts/events as errors.
 
 Usage Notes
 -----------
 
 SMpro errmon driver creates the sysfs files for each error type.
-Example: ``error_core_ce`` to get Core CE error type.
+See :ref:`smpro_sysfs` for the list of errors and the corresponding
+sysfs files.
 
-* If the error is absented, the sysfs file returns empty.
-* If the errors are presented, one each read to the sysfs, the oldest error will be returned and clear, the next read will be returned with the next error until all the errors are read out.
+* If there is no errors, the sysfs file is empty.
+* Otherwise, when errors occur, the oldest error
+  will be returned on sysfs file reading and cleared. The next read will
+  return the next error until all the errors are read out.
 
-For each host error type, SMpro keeps a latest max number of errors. All the oldest errors that were not read out will be dropped. In that case, the read to the corresponding overflow sysfs will return 1, otherwise, return 0.
-Example: ``overflow_core_ce`` to report the overflow status of Core CE error type.
+For each host error type, SMpro keeps a latest max number of errors. All the
+oldest errors that were not read will be dropped. In that case, the read
+to the corresponding sysfs will return 1, otherwise return 0.
 
-The format of the error is depended on the error type.
+The error format depends on its type.
 
-1) For Core/Memory/PCIe/Other CE/UE error types::
+1) For Core/Memory/PCIe/Other CE/UE error types
 
-The return 48-byte in hex format in table below:
+   These errors return 48-byte in hex format according to the table below:
 
-    =======   =============   ===========   ==========================================
-    OFFSET    FIELD           SIZE (BYTE)   DESCRIPTION
-    =======   =============   ===========   ==========================================
-    00        Error Type      1             See Table below for details
-    01        Subtype         1             See Table below for details
-    02        Instance        2             See Table below for details
-    04        Error status    4             See ARM RAS specification for details
-    08        Error Address   8             See ARM RAS specification for details
-    16        Error Misc 0    8             See ARM RAS specification for details
-    24        Error Misc 1    8             See ARM RAS specification for details
-    32        Error Misc 2    8             See ARM RAS specification for details
-    40        Error Misc 3    8             See ARM RAS specification for details
-    =======   =============   ===========   ==========================================
+   =======   =============   ===========   ==========================================================
+   OFFSET    FIELD           SIZE (BYTE)   DESCRIPTION
+   =======   =============   ===========   ==========================================================
+   00        Error Type      1             See :ref:`the table below <smpro-error-types>` for details
+   01        Subtype         1             See :ref:`the table below <smpro-error-types>` for details
+   02        Instance        2             See :ref:`the table below <smpro-error-types>` for details
+   04        Error status    4             See ARM RAS specification for details
+   08        Error Address   8             See ARM RAS specification for details
+   16        Error Misc 0    8             See ARM RAS specification for details
+   24        Error Misc 1    8             See ARM RAS specification for details
+   32        Error Misc 2    8             See ARM RAS specification for details
+   40        Error Misc 3    8             See ARM RAS specification for details
+   =======   =============   ===========   ==========================================================
 
-Below table defines the value of Error types, Sub Types, Sub component and instance:
+   The table below defines the value of error types, their subtype, subcomponent and instance:
 
-    ============    ==========    =========   ===============  ====================================
-    Error Group     Error Type    Sub type    Sub component    Instance
-    ============    ==========    =========   ===============  ====================================
-    CPM (core)      0             0           Snoop-Logic      CPM #
-    CPM (core)      0             2           Armv8 Core 1     CPM #
-    MCU (mem)       1             1           ERR1             MCU # | SLOT << 11
-    MCU (mem)       1             2           ERR2             MCU # | SLOT << 11
-    MCU (mem)       1             3           ERR3             MCU #
-    MCU (mem)       1             4           ERR4             MCU #
-    MCU (mem)       1             5           ERR5             MCU #
-    MCU (mem)       1             6           ERR6             MCU #
-    MCU (mem)       1             7           Link Error       MCU #
-    Mesh (other)    2             0           Cross Point      X | (Y << 5) | NS <<11
-    Mesh (other)    2             1           Home Node(IO)    X | (Y << 5) | NS <<11
-    Mesh (other)    2             2           Home Node(Mem)   X | (Y << 5) | NS <<11 | device<<12
-    Mesh (other)    2             4           CCIX Node        X | (Y << 5) | NS <<11
-    2P Link (other) 3             0           N/A              Altra 2P Link #
-    GIC (other)     5             0           ERR0             0
-    GIC (other)     5             1           ERR1             0
-    GIC (other)     5             2           ERR2             0
-    GIC (other)     5             3           ERR3             0
-    GIC (other)     5             4           ERR4             0
-    GIC (other)     5             5           ERR5             0
-    GIC (other)     5             6           ERR6             0
-    GIC (other)     5             7           ERR7             0
-    GIC (other)     5             8           ERR8             0
-    GIC (other)     5             9           ERR9             0
-    GIC (other)     5             10          ERR10            0
-    GIC (other)     5             11          ERR11            0
-    GIC (other)     5             12          ERR12            0
-    GIC (other)     5             13-21       ERR13            RC# + 1
-    SMMU (other)    6             TCU         100              RC #
-    SMMU (other)    6             TBU0        0                RC #
-    SMMU (other)    6             TBU1        1                RC #
-    SMMU (other)    6             TBU2        2                RC #
-    SMMU (other)    6             TBU3        3                RC #
-    SMMU (other)    6             TBU4        4                RC #
-    SMMU (other)    6             TBU5        5                RC #
-    SMMU (other)    6             TBU6        6                RC #
-    SMMU (other)    6             TBU7        7                RC #
-    SMMU (other)    6             TBU8        8                RC #
-    SMMU (other)    6             TBU9        9                RC #
-    PCIe AER (pcie) 7             Root        0                RC #
-    PCIe AER (pcie) 7             Device      1                RC #
-    PCIe RC (pcie)  8             RCA HB      0                RC #
-    PCIe RC (pcie)  8             RCB HB      1                RC #
-    PCIe RC (pcie)  8             RASDP       8                RC #
-    OCM (other)     9             ERR0        0                0
-    OCM (other)     9             ERR1        1                0
-    OCM (other)     9             ERR2        2                0
-    SMpro (other)   10            ERR0        0                0
-    SMpro (other)   10            ERR1        1                0
-    SMpro (other)   10            MPA_ERR     2                0
-    PMpro (other)   11            ERR0        0                0
-    PMpro (other)   11            ERR1        1                0
-    PMpro (other)   11            MPA_ERR     2                0
-    ============    ==========    =========   ===============  ====================================
+   .. _smpro-error-types:
 
-    For example:
-    # cat error_other_ue
-    880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000
+   =============== ==========    =========   ===============  ====================================
+   Error Group     Error Type    Sub type    Sub component    Instance
+   =============== ==========    =========   ===============  ====================================
+   CPM (core)      0             0           Snoop-Logic      CPM #
+   CPM (core)      0             2           Armv8 Core 1     CPM #
+   MCU (mem)       1             1           ERR1             MCU # | SLOT << 11
+   MCU (mem)       1             2           ERR2             MCU # | SLOT << 11
+   MCU (mem)       1             3           ERR3             MCU #
+   MCU (mem)       1             4           ERR4             MCU #
+   MCU (mem)       1             5           ERR5             MCU #
+   MCU (mem)       1             6           ERR6             MCU #
+   MCU (mem)       1             7           Link Error       MCU #
+   Mesh (other)    2             0           Cross Point      X | (Y << 5) | NS <<11
+   Mesh (other)    2             1           Home Node(IO)    X | (Y << 5) | NS <<11
+   Mesh (other)    2             2           Home Node(Mem)   X | (Y << 5) | NS <<11 | device<<12
+   Mesh (other)    2             4           CCIX Node        X | (Y << 5) | NS <<11
+   2P Link (other) 3             0           N/A              Altra 2P Link #
+   GIC (other)     5             0           ERR0             0
+   GIC (other)     5             1           ERR1             0
+   GIC (other)     5             2           ERR2             0
+   GIC (other)     5             3           ERR3             0
+   GIC (other)     5             4           ERR4             0
+   GIC (other)     5             5           ERR5             0
+   GIC (other)     5             6           ERR6             0
+   GIC (other)     5             7           ERR7             0
+   GIC (other)     5             8           ERR8             0
+   GIC (other)     5             9           ERR9             0
+   GIC (other)     5             10          ERR10            0
+   GIC (other)     5             11          ERR11            0
+   GIC (other)     5             12          ERR12            0
+   GIC (other)     5             13-21       ERR13            RC# + 1
+   SMMU (other)    6             TCU         100              RC #
+   SMMU (other)    6             TBU0        0                RC #
+   SMMU (other)    6             TBU1        1                RC #
+   SMMU (other)    6             TBU2        2                RC #
+   SMMU (other)    6             TBU3        3                RC #
+   SMMU (other)    6             TBU4        4                RC #
+   SMMU (other)    6             TBU5        5                RC #
+   SMMU (other)    6             TBU6        6                RC #
+   SMMU (other)    6             TBU7        7                RC #
+   SMMU (other)    6             TBU8        8                RC #
+   SMMU (other)    6             TBU9        9                RC #
+   PCIe AER (pcie) 7             Root        0                RC #
+   PCIe AER (pcie) 7             Device      1                RC #
+   PCIe RC (pcie)  8             RCA HB      0                RC #
+   PCIe RC (pcie)  8             RCB HB      1                RC #
+   PCIe RC (pcie)  8             RASDP       8                RC #
+   OCM (other)     9             ERR0        0                0
+   OCM (other)     9             ERR1        1                0
+   OCM (other)     9             ERR2        2                0
+   SMpro (other)   10            ERR0        0                0
+   SMpro (other)   10            ERR1        1                0
+   SMpro (other)   10            MPA_ERR     2                0
+   PMpro (other)   11            ERR0        0                0
+   PMpro (other)   11            ERR1        1                0
+   PMpro (other)   11            MPA_ERR     2                0
+   =============== ==========    =========   ===============  ====================================
 
-2) For the Internal SMpro/PMpro error types::
+   Example::
 
-The error_[smpro|pmro] sysfs returns string of 8-byte hex value:
-    <4-byte hex value of Error info><4-byte hex value of Error extensive data>
+     # cat error_other_ue
+     880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000
 
-The warn_[smpro|pmro] sysfs returns string of 4-byte hex value:
-    <4-byte hex value of Warning info>
+2) For the internal SMpro/PMpro error types
 
-Reference to Altra SoC BMC Interface Specification for the details.
+   The ``error_[smpro|pmro]`` sysfs returns string of 8-byte hex value::
 
-3) For the VRD hot, VRD /warn/fault, DIMM Hot event::
+     <4-byte hex value of Error info><4-byte hex value of Error extensive data>
 
-The return string is 2-byte hex string value. Reference to section 5.7 GPI status register in Altra SoC BMC Interface Specification for the details.
+   The ``warn_[smpro|pmro]`` sysfs returns string of 4-byte hex value::
 
-    Example:
-    #cat event_vrd_hot
-    0000
+     <4-byte hex value of Warning info>
+
+   Refer to `Altra SoC BMC Interface Specification` for details.
+
+3) For the VRD hot, VRD warn/fault, DIMM Hot event
+
+   The return string is 2-byte hex string value. Refer to section `5.7 GPI
+   status register` in `Altra SoC BMC Interface Specification` for details. 
+
+   Example::
+
+      #cat event_vrd_hot
+      0000
+
+.. _smpro_sysfs:
 
 Sysfs entries
 -------------
 
 The following sysfs files are supported:
 
-* Ampere(R) Altra(R):
+* Ampere(R) Altra(R)
 
-Alert Types:
+  Alert types:
 
-    ========================  =================  ==================================================
-    Alert Type                Sysfs name         Description
-    ========================  =================  ==================================================
-    Core CE Error             error_core_ce      Trigger when Core has CE error
-    Core CE Error overflow    overflow_core_ce   Trigger when Core CE error overflow
-    Core UE Error             error_core_ue      Trigger when Core has UE error
-    Core UE Error overflow    overflow_core_ue   Trigger when Core UE error overflow
-    Memory CE Error           error_mem_ce       Trigger when Memory has CE error
-    Memory CE Error overflow  overflow_mem_ce    Trigger when Memory CE error overflow
-    Memory UE Error           error_mem_ue       Trigger when Memory has UE error
-    Memory UE Error overflow  overflow_mem_ue    Trigger when Memory UE error overflow
-    PCIe CE Error             error_pcie_ce      Trigger when any PCIe controller has CE error
-    PCIe CE Error overflow    overflow_pcie_ce   Trigger when any PCIe controller CE error overflow
-    PCIe UE Error             error_pcie_ue      Trigger when any PCIe controller has UE error
-    PCIe UE Error overflow    overflow_pcie_ue   Trigger when any PCIe controller UE error overflow
-    Other CE Error            error_other_ce     Trigger when any Others CE error
-    Other CE Error overflow   overflow_other_ce  Trigger when any Others CE error overflow
-    Other UE Error            error_other_ue     Trigger when any Others UE error
-    Other UE Error overflow   overflow_other_ue  Trigger when Others UE error overflow
-    SMpro Error               error_smpro        Trigger when system have SMpro error
-    SMpro Warning             warn_smpro         Trigger when system have SMpro warning
-    PMpro Error               error_pmpro        Trigger when system have PMpro error
-    PMpro Warning             warn_pmpro         Trigger when system have PMpro warning
-    ========================  =================  ==================================================
+    ========================  =====================  ==================================================
+    Alert type                Sysfs name             Description (when the error is triggered)
+    ========================  =====================  ==================================================
+    Core CE Error             ``error_core_ce``      Core has CE error
+    Core CE Error overflow    ``overflow_core_ce``   Core CE error overflow
+    Core UE Error             ``error_core_ue``      Core has UE error
+    Core UE Error overflow    ``overflow_core_ue``   Core UE error overflow
+    Memory CE Error           ``error_mem_ce``       Memory has CE error
+    Memory CE Error overflow  ``overflow_mem_ce``    Memory CE error overflow
+    Memory UE Error           ``error_mem_ue``       Memory has UE error
+    Memory UE Error overflow  ``overflow_mem_ue``    Memory UE error overflow
+    PCIe CE Error             ``error_pcie_ce``      any PCIe controller has CE error
+    PCIe CE Error overflow    ``overflow_pcie_ce``   any PCIe controller CE error overflow
+    PCIe UE Error             ``error_pcie_ue``      any PCIe controller has UE error
+    PCIe UE Error overflow    ``overflow_pcie_ue``   any PCIe controller UE error overflow
+    Other CE Error            ``error_other_ce``     any other CE error
+    Other CE Error overflow   ``overflow_other_ce``  any other CE error overflow
+    Other UE Error            ``error_other_ue``     any other UE error
+    Other UE Error overflow   ``overflow_other_ue``  other UE error overflow
+    SMpro Error               ``error_smpro``        system have SMpro error
+    SMpro Warning             ``warn_smpro``         system have SMpro warning
+    PMpro Error               ``error_pmpro``        system have PMpro error
+    PMpro Warning             ``warn_pmpro``         system have PMpro warning
+    ========================  =====================  ==================================================
 
-Event Type:
+  Event types:
 
     ============================ ==========================
-    Event Type                   Sysfs name
+    Event type                   Sysfs name
     ============================ ==========================
-    VRD HOT                      event_vrd_hot
-    VR Warn/Fault                event_vrd_warn_fault
-    DIMM Hot                     event_dimm_hot
+    VRD HOT                      ``event_vrd_hot``
+    VR Warn/Fault                ``event_vrd_warn_fault``
+    DIMM Hot                     ``event_dimm_hot``
     ============================ ==========================

Thanks.
Quan Nguyen Oct. 6, 2022, 7:46 a.m. UTC | #4
On 29/09/2022 16:56, Greg Kroah-Hartman wrote:
> On Thu, Sep 29, 2022 at 04:43:16PM +0700, Quan Nguyen wrote:
>> Adds documentation for Ampere(R)'s Altra(R) SMpro errmon driver.
>>
>> Signed-off-by: Thu Nguyen <thu@os.amperecomputing.com>
>> Signed-off-by: Quan Nguyen <quan@os.amperecomputing.com>
>> ---
>> Changes in v9:
>>    + Fix issue when building htmldocs                      [Bagas]
>>    + Remove unnecessary channel info for VRD and DIMM event [Quan]
>>    + Update SPDX license info                               [Greg]
>>    + Update document to align with new changes in sysfs     [Quan]
>>
>> Changes in v8:
>>    + Update to reflect single value per sysfs  [Quan]
>>
>> Changes in v7:
>>    + None
>>
>> Changes in v6:
>>    + First introduced in v6 [Quan]
>>
>>   Documentation/misc-devices/index.rst        |   1 +
>>   Documentation/misc-devices/smpro-errmon.rst | 193 ++++++++++++++++++++
>>   2 files changed, 194 insertions(+)
>>   create mode 100644 Documentation/misc-devices/smpro-errmon.rst
>>
>> diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst
>> index 756be15a49a4..b74b3b34a235 100644
>> --- a/Documentation/misc-devices/index.rst
>> +++ b/Documentation/misc-devices/index.rst
>> @@ -27,6 +27,7 @@ fit into other categories.
>>      max6875
>>      oxsemi-tornado
>>      pci-endpoint-test
>> +   smpro-errmon
>>      spear-pcie-gadget
>>      uacce
>>      xilinx_sdfec
>> diff --git a/Documentation/misc-devices/smpro-errmon.rst b/Documentation/misc-devices/smpro-errmon.rst
>> new file mode 100644
>> index 000000000000..b17f30a6cafd
>> --- /dev/null
>> +++ b/Documentation/misc-devices/smpro-errmon.rst
>> @@ -0,0 +1,193 @@
>> +.. SPDX-License-Identifier: GPL-2.0-only
>> +
>> +Kernel driver Ampere(R)'s Altra(R) SMpro errmon
>> +===============================================
>> +
>> +Supported chips:
>> +
>> +  * Ampere(R) Altra(R)
>> +
>> +    Prefix: 'smpro'
>> +
>> +    Preference: Altra SoC BMC Interface Specification
>> +
>> +Author: Thu Nguyen <thu@os.amperecomputing.com>
>> +
>> +Description
>> +-----------
>> +
>> +This driver supports hardware monitoring for Ampere(R) Altra(R) SoC's based on the
>> +SMpro co-processor (SMpro).
>> +The following SoC alert/event types are supported by the errmon driver:
>> +
>> +* Core CE/UE error
>> +* Memory CE/UE error
>> +* PCIe CE/UE error
>> +* Other CE/UE error
>> +* Internal SMpro/PMpro error
>> +* VRD hot
>> +* VRD warn/fault
>> +* DIMM Hot
>> +
>> +The SMpro interface provides the registers to query the status of the SoC alerts/events
>> +and their data and export to userspace by this driver.
>> +
>> +The SoC alerts/events will be referenced as error below.
>> +
>> +Usage Notes
>> +-----------
>> +
>> +SMpro errmon driver creates the sysfs files for each error type.
>> +Example: ``error_core_ce`` to get Core CE error type.
>> +
>> +* If the error is absented, the sysfs file returns empty.
>> +* If the errors are presented, one each read to the sysfs, the oldest error will be returned and clear, the next read will be returned with the next error until all the errors are read out.
>> +
>> +For each host error type, SMpro keeps a latest max number of errors. All the oldest errors that were not read out will be dropped. In that case, the read to the corresponding overflow sysfs will return 1, otherwise, return 0.
>> +Example: ``overflow_core_ce`` to report the overflow status of Core CE error type.
>> +
>> +The format of the error is depended on the error type.
>> +
>> +1) For Core/Memory/PCIe/Other CE/UE error types::
>> +
>> +The return 48-byte in hex format in table below:
>> +
>> +    =======   =============   ===========   ==========================================
>> +    OFFSET    FIELD           SIZE (BYTE)   DESCRIPTION
>> +    =======   =============   ===========   ==========================================
>> +    00        Error Type      1             See Table below for details
>> +    01        Subtype         1             See Table below for details
>> +    02        Instance        2             See Table below for details
>> +    04        Error status    4             See ARM RAS specification for details
>> +    08        Error Address   8             See ARM RAS specification for details
>> +    16        Error Misc 0    8             See ARM RAS specification for details
>> +    24        Error Misc 1    8             See ARM RAS specification for details
>> +    32        Error Misc 2    8             See ARM RAS specification for details
>> +    40        Error Misc 3    8             See ARM RAS specification for details
>> +    =======   =============   ===========   ==========================================
>> +
>> +Below table defines the value of Error types, Sub Types, Sub component and instance:
>> +
>> +    ============    ==========    =========   ===============  ====================================
>> +    Error Group     Error Type    Sub type    Sub component    Instance
>> +    ============    ==========    =========   ===============  ====================================
>> +    CPM (core)      0             0           Snoop-Logic      CPM #
>> +    CPM (core)      0             2           Armv8 Core 1     CPM #
>> +    MCU (mem)       1             1           ERR1             MCU # | SLOT << 11
>> +    MCU (mem)       1             2           ERR2             MCU # | SLOT << 11
>> +    MCU (mem)       1             3           ERR3             MCU #
>> +    MCU (mem)       1             4           ERR4             MCU #
>> +    MCU (mem)       1             5           ERR5             MCU #
>> +    MCU (mem)       1             6           ERR6             MCU #
>> +    MCU (mem)       1             7           Link Error       MCU #
>> +    Mesh (other)    2             0           Cross Point      X | (Y << 5) | NS <<11
>> +    Mesh (other)    2             1           Home Node(IO)    X | (Y << 5) | NS <<11
>> +    Mesh (other)    2             2           Home Node(Mem)   X | (Y << 5) | NS <<11 | device<<12
>> +    Mesh (other)    2             4           CCIX Node        X | (Y << 5) | NS <<11
>> +    2P Link (other) 3             0           N/A              Altra 2P Link #
>> +    GIC (other)     5             0           ERR0             0
>> +    GIC (other)     5             1           ERR1             0
>> +    GIC (other)     5             2           ERR2             0
>> +    GIC (other)     5             3           ERR3             0
>> +    GIC (other)     5             4           ERR4             0
>> +    GIC (other)     5             5           ERR5             0
>> +    GIC (other)     5             6           ERR6             0
>> +    GIC (other)     5             7           ERR7             0
>> +    GIC (other)     5             8           ERR8             0
>> +    GIC (other)     5             9           ERR9             0
>> +    GIC (other)     5             10          ERR10            0
>> +    GIC (other)     5             11          ERR11            0
>> +    GIC (other)     5             12          ERR12            0
>> +    GIC (other)     5             13-21       ERR13            RC# + 1
>> +    SMMU (other)    6             TCU         100              RC #
>> +    SMMU (other)    6             TBU0        0                RC #
>> +    SMMU (other)    6             TBU1        1                RC #
>> +    SMMU (other)    6             TBU2        2                RC #
>> +    SMMU (other)    6             TBU3        3                RC #
>> +    SMMU (other)    6             TBU4        4                RC #
>> +    SMMU (other)    6             TBU5        5                RC #
>> +    SMMU (other)    6             TBU6        6                RC #
>> +    SMMU (other)    6             TBU7        7                RC #
>> +    SMMU (other)    6             TBU8        8                RC #
>> +    SMMU (other)    6             TBU9        9                RC #
>> +    PCIe AER (pcie) 7             Root        0                RC #
>> +    PCIe AER (pcie) 7             Device      1                RC #
>> +    PCIe RC (pcie)  8             RCA HB      0                RC #
>> +    PCIe RC (pcie)  8             RCB HB      1                RC #
>> +    PCIe RC (pcie)  8             RASDP       8                RC #
>> +    OCM (other)     9             ERR0        0                0
>> +    OCM (other)     9             ERR1        1                0
>> +    OCM (other)     9             ERR2        2                0
>> +    SMpro (other)   10            ERR0        0                0
>> +    SMpro (other)   10            ERR1        1                0
>> +    SMpro (other)   10            MPA_ERR     2                0
>> +    PMpro (other)   11            ERR0        0                0
>> +    PMpro (other)   11            ERR1        1                0
>> +    PMpro (other)   11            MPA_ERR     2                0
>> +    ============    ==========    =========   ===============  ====================================
>> +
>> +    For example:
>> +    # cat error_other_ue
>> +    880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000
>> +
>> +2) For the Internal SMpro/PMpro error types::
>> +
>> +The error_[smpro|pmro] sysfs returns string of 8-byte hex value:
>> +    <4-byte hex value of Error info><4-byte hex value of Error extensive data>
>> +
>> +The warn_[smpro|pmro] sysfs returns string of 4-byte hex value:
>> +    <4-byte hex value of Warning info>
>> +
>> +Reference to Altra SoC BMC Interface Specification for the details.
>> +
>> +3) For the VRD hot, VRD /warn/fault, DIMM Hot event::
>> +
>> +The return string is 2-byte hex string value. Reference to section 5.7 GPI status register in Altra SoC BMC Interface Specification for the details.
>> +
>> +    Example:
>> +    #cat event_vrd_hot
>> +    0000
>> +
>> +Sysfs entries
>> +-------------
>> +
>> +The following sysfs files are supported:
>> +
>> +* Ampere(R) Altra(R):
>> +
>> +Alert Types:
>> +
>> +    ========================  =================  ==================================================
>> +    Alert Type                Sysfs name         Description
>> +    ========================  =================  ==================================================
>> +    Core CE Error             error_core_ce      Trigger when Core has CE error
>> +    Core CE Error overflow    overflow_core_ce   Trigger when Core CE error overflow
>> +    Core UE Error             error_core_ue      Trigger when Core has UE error
>> +    Core UE Error overflow    overflow_core_ue   Trigger when Core UE error overflow
>> +    Memory CE Error           error_mem_ce       Trigger when Memory has CE error
>> +    Memory CE Error overflow  overflow_mem_ce    Trigger when Memory CE error overflow
>> +    Memory UE Error           error_mem_ue       Trigger when Memory has UE error
>> +    Memory UE Error overflow  overflow_mem_ue    Trigger when Memory UE error overflow
>> +    PCIe CE Error             error_pcie_ce      Trigger when any PCIe controller has CE error
>> +    PCIe CE Error overflow    overflow_pcie_ce   Trigger when any PCIe controller CE error overflow
>> +    PCIe UE Error             error_pcie_ue      Trigger when any PCIe controller has UE error
>> +    PCIe UE Error overflow    overflow_pcie_ue   Trigger when any PCIe controller UE error overflow
>> +    Other CE Error            error_other_ce     Trigger when any Others CE error
>> +    Other CE Error overflow   overflow_other_ce  Trigger when any Others CE error overflow
>> +    Other UE Error            error_other_ue     Trigger when any Others UE error
>> +    Other UE Error overflow   overflow_other_ue  Trigger when Others UE error overflow
>> +    SMpro Error               error_smpro        Trigger when system have SMpro error
>> +    SMpro Warning             warn_smpro         Trigger when system have SMpro warning
>> +    PMpro Error               error_pmpro        Trigger when system have PMpro error
>> +    PMpro Warning             warn_pmpro         Trigger when system have PMpro warning
>> +    ========================  =================  ==================================================
>> +
>> +Event Type:
>> +
>> +    ============================ ==========================
>> +    Event Type                   Sysfs name
>> +    ============================ ==========================
>> +    VRD HOT                      event_vrd_hot
>> +    VR Warn/Fault                event_vrd_warn_fault
>> +    DIMM Hot                     event_dimm_hot
>> +    ============================ ==========================
>> -- 
>> 2.35.1
>>
> 
> Why not just put this in the driver itself to be generated automatically
> instead of living in a file that will never be noticed if anything ever
> changes?
> 

I'm not sure what you mean by "to be generated automatically" but 
information can be documented in the driver code itself and in the 
Documentation/ABI entries.

Will drop this file and move them to both the driver code and to the 
Documentaion/ABI entries in next version.

Thanks,
- Quan
Quan Nguyen Oct. 6, 2022, 7:46 a.m. UTC | #5
On 30/09/2022 20:13, Bagas Sanjaya wrote:
> On Thu, Sep 29, 2022 at 04:43:16PM +0700, Quan Nguyen wrote:
>> Adds documentation for Ampere(R)'s Altra(R) SMpro errmon driver.
> 
> s/Adds/Add/
> 
>> diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst
>> index 756be15a49a4..b74b3b34a235 100644
>> --- a/Documentation/misc-devices/index.rst
>> +++ b/Documentation/misc-devices/index.rst
>> @@ -27,6 +27,7 @@ fit into other categories.
>>      max6875
>>      oxsemi-tornado
>>      pci-endpoint-test
>> +   smpro-errmon
>>      spear-pcie-gadget
>>      uacce
>>      xilinx_sdfec
>> diff --git a/Documentation/misc-devices/smpro-errmon.rst b/Documentation/misc-devices/smpro-errmon.rst
>> new file mode 100644
>> index 000000000000..b17f30a6cafd
>> --- /dev/null
>> +++ b/Documentation/misc-devices/smpro-errmon.rst
>> @@ -0,0 +1,193 @@
>> +.. SPDX-License-Identifier: GPL-2.0-only
>> +
>> +Kernel driver Ampere(R)'s Altra(R) SMpro errmon
>> +===============================================
>> +
>> +Supported chips:
>> +
>> +  * Ampere(R) Altra(R)
>> +
>> +    Prefix: 'smpro'
>> +
>> +    Preference: Altra SoC BMC Interface Specification
>> +
>> +Author: Thu Nguyen <thu@os.amperecomputing.com>
>> +
>> +Description
>> +-----------
>> +
>> +This driver supports hardware monitoring for Ampere(R) Altra(R) SoC's based on the
>> +SMpro co-processor (SMpro).
>> +The following SoC alert/event types are supported by the errmon driver:
>> +
>> +* Core CE/UE error
>> +* Memory CE/UE error
>> +* PCIe CE/UE error
>> +* Other CE/UE error
>> +* Internal SMpro/PMpro error
>> +* VRD hot
>> +* VRD warn/fault
>> +* DIMM Hot
>> +
>> +The SMpro interface provides the registers to query the status of the SoC alerts/events
>> +and their data and export to userspace by this driver.
>> +
>> +The SoC alerts/events will be referenced as error below.
>> +
>> +Usage Notes
>> +-----------
>> +
>> +SMpro errmon driver creates the sysfs files for each error type.
>> +Example: ``error_core_ce`` to get Core CE error type.
>> +
>> +* If the error is absented, the sysfs file returns empty.
>> +* If the errors are presented, one each read to the sysfs, the oldest error will be returned and clear, the next read will be returned with the next error until all the errors are read out.
>> +
>> +For each host error type, SMpro keeps a latest max number of errors. All the oldest errors that were not read out will be dropped. In that case, the read to the corresponding overflow sysfs will return 1, otherwise, return 0.
>> +Example: ``overflow_core_ce`` to report the overflow status of Core CE error type.
>> +
>> +The format of the error is depended on the error type.
>> +
>> +1) For Core/Memory/PCIe/Other CE/UE error types::
>> +
>> +The return 48-byte in hex format in table below:
>> +
>> +    =======   =============   ===========   ==========================================
>> +    OFFSET    FIELD           SIZE (BYTE)   DESCRIPTION
>> +    =======   =============   ===========   ==========================================
>> +    00        Error Type      1             See Table below for details
>> +    01        Subtype         1             See Table below for details
>> +    02        Instance        2             See Table below for details
>> +    04        Error status    4             See ARM RAS specification for details
>> +    08        Error Address   8             See ARM RAS specification for details
>> +    16        Error Misc 0    8             See ARM RAS specification for details
>> +    24        Error Misc 1    8             See ARM RAS specification for details
>> +    32        Error Misc 2    8             See ARM RAS specification for details
>> +    40        Error Misc 3    8             See ARM RAS specification for details
>> +    =======   =============   ===========   ==========================================
>> +
>> +Below table defines the value of Error types, Sub Types, Sub component and instance:
>> +
>> +    ============    ==========    =========   ===============  ====================================
>> +    Error Group     Error Type    Sub type    Sub component    Instance
>> +    ============    ==========    =========   ===============  ====================================
>> +    CPM (core)      0             0           Snoop-Logic      CPM #
>> +    CPM (core)      0             2           Armv8 Core 1     CPM #
>> +    MCU (mem)       1             1           ERR1             MCU # | SLOT << 11
>> +    MCU (mem)       1             2           ERR2             MCU # | SLOT << 11
>> +    MCU (mem)       1             3           ERR3             MCU #
>> +    MCU (mem)       1             4           ERR4             MCU #
>> +    MCU (mem)       1             5           ERR5             MCU #
>> +    MCU (mem)       1             6           ERR6             MCU #
>> +    MCU (mem)       1             7           Link Error       MCU #
>> +    Mesh (other)    2             0           Cross Point      X | (Y << 5) | NS <<11
>> +    Mesh (other)    2             1           Home Node(IO)    X | (Y << 5) | NS <<11
>> +    Mesh (other)    2             2           Home Node(Mem)   X | (Y << 5) | NS <<11 | device<<12
>> +    Mesh (other)    2             4           CCIX Node        X | (Y << 5) | NS <<11
>> +    2P Link (other) 3             0           N/A              Altra 2P Link #
>> +    GIC (other)     5             0           ERR0             0
>> +    GIC (other)     5             1           ERR1             0
>> +    GIC (other)     5             2           ERR2             0
>> +    GIC (other)     5             3           ERR3             0
>> +    GIC (other)     5             4           ERR4             0
>> +    GIC (other)     5             5           ERR5             0
>> +    GIC (other)     5             6           ERR6             0
>> +    GIC (other)     5             7           ERR7             0
>> +    GIC (other)     5             8           ERR8             0
>> +    GIC (other)     5             9           ERR9             0
>> +    GIC (other)     5             10          ERR10            0
>> +    GIC (other)     5             11          ERR11            0
>> +    GIC (other)     5             12          ERR12            0
>> +    GIC (other)     5             13-21       ERR13            RC# + 1
>> +    SMMU (other)    6             TCU         100              RC #
>> +    SMMU (other)    6             TBU0        0                RC #
>> +    SMMU (other)    6             TBU1        1                RC #
>> +    SMMU (other)    6             TBU2        2                RC #
>> +    SMMU (other)    6             TBU3        3                RC #
>> +    SMMU (other)    6             TBU4        4                RC #
>> +    SMMU (other)    6             TBU5        5                RC #
>> +    SMMU (other)    6             TBU6        6                RC #
>> +    SMMU (other)    6             TBU7        7                RC #
>> +    SMMU (other)    6             TBU8        8                RC #
>> +    SMMU (other)    6             TBU9        9                RC #
>> +    PCIe AER (pcie) 7             Root        0                RC #
>> +    PCIe AER (pcie) 7             Device      1                RC #
>> +    PCIe RC (pcie)  8             RCA HB      0                RC #
>> +    PCIe RC (pcie)  8             RCB HB      1                RC #
>> +    PCIe RC (pcie)  8             RASDP       8                RC #
>> +    OCM (other)     9             ERR0        0                0
>> +    OCM (other)     9             ERR1        1                0
>> +    OCM (other)     9             ERR2        2                0
>> +    SMpro (other)   10            ERR0        0                0
>> +    SMpro (other)   10            ERR1        1                0
>> +    SMpro (other)   10            MPA_ERR     2                0
>> +    PMpro (other)   11            ERR0        0                0
>> +    PMpro (other)   11            ERR1        1                0
>> +    PMpro (other)   11            MPA_ERR     2                0
>> +    ============    ==========    =========   ===============  ====================================
>> +
>> +    For example:
>> +    # cat error_other_ue
>> +    880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000
>> +
>> +2) For the Internal SMpro/PMpro error types::
>> +
>> +The error_[smpro|pmro] sysfs returns string of 8-byte hex value:
>> +    <4-byte hex value of Error info><4-byte hex value of Error extensive data>
>> +
>> +The warn_[smpro|pmro] sysfs returns string of 4-byte hex value:
>> +    <4-byte hex value of Warning info>
>> +
>> +Reference to Altra SoC BMC Interface Specification for the details.
>> +
>> +3) For the VRD hot, VRD /warn/fault, DIMM Hot event::
>> +
>> +The return string is 2-byte hex string value. Reference to section 5.7 GPI status register in Altra SoC BMC Interface Specification for the details.
>> +
>> +    Example:
>> +    #cat event_vrd_hot
>> +    0000
>> +
>> +Sysfs entries
>> +-------------
>> +
>> +The following sysfs files are supported:
>> +
>> +* Ampere(R) Altra(R):
>> +
>> +Alert Types:
>> +
>> +    ========================  =================  ==================================================
>> +    Alert Type                Sysfs name         Description
>> +    ========================  =================  ==================================================
>> +    Core CE Error             error_core_ce      Trigger when Core has CE error
>> +    Core CE Error overflow    overflow_core_ce   Trigger when Core CE error overflow
>> +    Core UE Error             error_core_ue      Trigger when Core has UE error
>> +    Core UE Error overflow    overflow_core_ue   Trigger when Core UE error overflow
>> +    Memory CE Error           error_mem_ce       Trigger when Memory has CE error
>> +    Memory CE Error overflow  overflow_mem_ce    Trigger when Memory CE error overflow
>> +    Memory UE Error           error_mem_ue       Trigger when Memory has UE error
>> +    Memory UE Error overflow  overflow_mem_ue    Trigger when Memory UE error overflow
>> +    PCIe CE Error             error_pcie_ce      Trigger when any PCIe controller has CE error
>> +    PCIe CE Error overflow    overflow_pcie_ce   Trigger when any PCIe controller CE error overflow
>> +    PCIe UE Error             error_pcie_ue      Trigger when any PCIe controller has UE error
>> +    PCIe UE Error overflow    overflow_pcie_ue   Trigger when any PCIe controller UE error overflow
>> +    Other CE Error            error_other_ce     Trigger when any Others CE error
>> +    Other CE Error overflow   overflow_other_ce  Trigger when any Others CE error overflow
>> +    Other UE Error            error_other_ue     Trigger when any Others UE error
>> +    Other UE Error overflow   overflow_other_ue  Trigger when Others UE error overflow
>> +    SMpro Error               error_smpro        Trigger when system have SMpro error
>> +    SMpro Warning             warn_smpro         Trigger when system have SMpro warning
>> +    PMpro Error               error_pmpro        Trigger when system have PMpro error
>> +    PMpro Warning             warn_pmpro         Trigger when system have PMpro warning
>> +    ========================  =================  ==================================================
>> +
>> +Event Type:
>> +
>> +    ============================ ==========================
>> +    Event Type                   Sysfs name
>> +    ============================ ==========================
>> +    VRD HOT                      event_vrd_hot
>> +    VR Warn/Fault                event_vrd_warn_fault
>> +    DIMM Hot                     event_dimm_hot
>> +    ============================ ==========================
> 
> The documentation above produces htmldocs warnings:
> Documentation/misc-devices/smpro-errmon.rst:53: WARNING: Literal block expected; none found.
> Documentation/misc-devices/smpro-errmon.rst:87: WARNING: Malformed table.
> Text in column margin in table line 17.
> <snipped>...
> Documentation/misc-devices/smpro-errmon.rst:135: WARNING: Literal block expected; none found.
> Documentation/misc-devices/smpro-errmon.rst:145: WARNING: Literal block expected; none found.
> 
> I have applied the fixup (with grammatical and formatting fixes):
> 
> ---- >8 ----
> 
> diff --git a/Documentation/misc-devices/smpro-errmon.rst b/Documentation/misc-devices/smpro-errmon.rst
> index b17f30a6cafdab..de8719cc47fd3c 100644
> --- a/Documentation/misc-devices/smpro-errmon.rst
> +++ b/Documentation/misc-devices/smpro-errmon.rst
> @@ -7,18 +7,18 @@ Supported chips:
>   
>     * Ampere(R) Altra(R)
>   
> -    Prefix: 'smpro'
> +    Prefix: `smpro`
>   
> -    Preference: Altra SoC BMC Interface Specification
> +    Reference: `Altra SoC BMC Interface Specification`
>   
>   Author: Thu Nguyen <thu@os.amperecomputing.com>
>   
>   Description
>   -----------
>   
> -This driver supports hardware monitoring for Ampere(R) Altra(R) SoC's based on the
> -SMpro co-processor (SMpro).
> -The following SoC alert/event types are supported by the errmon driver:
> +The smpro-errmon driver supports hardware monitoring for Ampere(R) Altra(R)
> +SoCs based on the SMpro co-processor (SMpro). The following SoC alert/event
> +types are supported by the driver:
>   
>   * Core CE/UE error
>   * Memory CE/UE error
> @@ -29,165 +29,178 @@ The following SoC alert/event types are supported by the errmon driver:
>   * VRD warn/fault
>   * DIMM Hot
>   
> -The SMpro interface provides the registers to query the status of the SoC alerts/events
> -and their data and export to userspace by this driver.
> +The SMpro interface provides the registers to query the status of the SoC
> +alerts/events and their data and export to userspace by this driver.
>   
> -The SoC alerts/events will be referenced as error below.
> +The rest of this document will refer SoC alerts/events as errors.
>   
>   Usage Notes
>   -----------
>   
>   SMpro errmon driver creates the sysfs files for each error type.
> -Example: ``error_core_ce`` to get Core CE error type.
> +See :ref:`smpro_sysfs` for the list of errors and the corresponding
> +sysfs files.
>   
> -* If the error is absented, the sysfs file returns empty.
> -* If the errors are presented, one each read to the sysfs, the oldest error will be returned and clear, the next read will be returned with the next error until all the errors are read out.
> +* If there is no errors, the sysfs file is empty.
> +* Otherwise, when errors occur, the oldest error
> +  will be returned on sysfs file reading and cleared. The next read will
> +  return the next error until all the errors are read out.
>   
> -For each host error type, SMpro keeps a latest max number of errors. All the oldest errors that were not read out will be dropped. In that case, the read to the corresponding overflow sysfs will return 1, otherwise, return 0.
> -Example: ``overflow_core_ce`` to report the overflow status of Core CE error type.
> +For each host error type, SMpro keeps a latest max number of errors. All the
> +oldest errors that were not read will be dropped. In that case, the read
> +to the corresponding sysfs will return 1, otherwise return 0.
>   
> -The format of the error is depended on the error type.
> +The error format depends on its type.
>   
> -1) For Core/Memory/PCIe/Other CE/UE error types::
> +1) For Core/Memory/PCIe/Other CE/UE error types
>   
> -The return 48-byte in hex format in table below:
> +   These errors return 48-byte in hex format according to the table below:
>   
> -    =======   =============   ===========   ==========================================
> -    OFFSET    FIELD           SIZE (BYTE)   DESCRIPTION
> -    =======   =============   ===========   ==========================================
> -    00        Error Type      1             See Table below for details
> -    01        Subtype         1             See Table below for details
> -    02        Instance        2             See Table below for details
> -    04        Error status    4             See ARM RAS specification for details
> -    08        Error Address   8             See ARM RAS specification for details
> -    16        Error Misc 0    8             See ARM RAS specification for details
> -    24        Error Misc 1    8             See ARM RAS specification for details
> -    32        Error Misc 2    8             See ARM RAS specification for details
> -    40        Error Misc 3    8             See ARM RAS specification for details
> -    =======   =============   ===========   ==========================================
> +   =======   =============   ===========   ==========================================================
> +   OFFSET    FIELD           SIZE (BYTE)   DESCRIPTION
> +   =======   =============   ===========   ==========================================================
> +   00        Error Type      1             See :ref:`the table below <smpro-error-types>` for details
> +   01        Subtype         1             See :ref:`the table below <smpro-error-types>` for details
> +   02        Instance        2             See :ref:`the table below <smpro-error-types>` for details
> +   04        Error status    4             See ARM RAS specification for details
> +   08        Error Address   8             See ARM RAS specification for details
> +   16        Error Misc 0    8             See ARM RAS specification for details
> +   24        Error Misc 1    8             See ARM RAS specification for details
> +   32        Error Misc 2    8             See ARM RAS specification for details
> +   40        Error Misc 3    8             See ARM RAS specification for details
> +   =======   =============   ===========   ==========================================================
>   
> -Below table defines the value of Error types, Sub Types, Sub component and instance:
> +   The table below defines the value of error types, their subtype, subcomponent and instance:
>   
> -    ============    ==========    =========   ===============  ====================================
> -    Error Group     Error Type    Sub type    Sub component    Instance
> -    ============    ==========    =========   ===============  ====================================
> -    CPM (core)      0             0           Snoop-Logic      CPM #
> -    CPM (core)      0             2           Armv8 Core 1     CPM #
> -    MCU (mem)       1             1           ERR1             MCU # | SLOT << 11
> -    MCU (mem)       1             2           ERR2             MCU # | SLOT << 11
> -    MCU (mem)       1             3           ERR3             MCU #
> -    MCU (mem)       1             4           ERR4             MCU #
> -    MCU (mem)       1             5           ERR5             MCU #
> -    MCU (mem)       1             6           ERR6             MCU #
> -    MCU (mem)       1             7           Link Error       MCU #
> -    Mesh (other)    2             0           Cross Point      X | (Y << 5) | NS <<11
> -    Mesh (other)    2             1           Home Node(IO)    X | (Y << 5) | NS <<11
> -    Mesh (other)    2             2           Home Node(Mem)   X | (Y << 5) | NS <<11 | device<<12
> -    Mesh (other)    2             4           CCIX Node        X | (Y << 5) | NS <<11
> -    2P Link (other) 3             0           N/A              Altra 2P Link #
> -    GIC (other)     5             0           ERR0             0
> -    GIC (other)     5             1           ERR1             0
> -    GIC (other)     5             2           ERR2             0
> -    GIC (other)     5             3           ERR3             0
> -    GIC (other)     5             4           ERR4             0
> -    GIC (other)     5             5           ERR5             0
> -    GIC (other)     5             6           ERR6             0
> -    GIC (other)     5             7           ERR7             0
> -    GIC (other)     5             8           ERR8             0
> -    GIC (other)     5             9           ERR9             0
> -    GIC (other)     5             10          ERR10            0
> -    GIC (other)     5             11          ERR11            0
> -    GIC (other)     5             12          ERR12            0
> -    GIC (other)     5             13-21       ERR13            RC# + 1
> -    SMMU (other)    6             TCU         100              RC #
> -    SMMU (other)    6             TBU0        0                RC #
> -    SMMU (other)    6             TBU1        1                RC #
> -    SMMU (other)    6             TBU2        2                RC #
> -    SMMU (other)    6             TBU3        3                RC #
> -    SMMU (other)    6             TBU4        4                RC #
> -    SMMU (other)    6             TBU5        5                RC #
> -    SMMU (other)    6             TBU6        6                RC #
> -    SMMU (other)    6             TBU7        7                RC #
> -    SMMU (other)    6             TBU8        8                RC #
> -    SMMU (other)    6             TBU9        9                RC #
> -    PCIe AER (pcie) 7             Root        0                RC #
> -    PCIe AER (pcie) 7             Device      1                RC #
> -    PCIe RC (pcie)  8             RCA HB      0                RC #
> -    PCIe RC (pcie)  8             RCB HB      1                RC #
> -    PCIe RC (pcie)  8             RASDP       8                RC #
> -    OCM (other)     9             ERR0        0                0
> -    OCM (other)     9             ERR1        1                0
> -    OCM (other)     9             ERR2        2                0
> -    SMpro (other)   10            ERR0        0                0
> -    SMpro (other)   10            ERR1        1                0
> -    SMpro (other)   10            MPA_ERR     2                0
> -    PMpro (other)   11            ERR0        0                0
> -    PMpro (other)   11            ERR1        1                0
> -    PMpro (other)   11            MPA_ERR     2                0
> -    ============    ==========    =========   ===============  ====================================
> +   .. _smpro-error-types:
>   
> -    For example:
> -    # cat error_other_ue
> -    880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000
> +   =============== ==========    =========   ===============  ====================================
> +   Error Group     Error Type    Sub type    Sub component    Instance
> +   =============== ==========    =========   ===============  ====================================
> +   CPM (core)      0             0           Snoop-Logic      CPM #
> +   CPM (core)      0             2           Armv8 Core 1     CPM #
> +   MCU (mem)       1             1           ERR1             MCU # | SLOT << 11
> +   MCU (mem)       1             2           ERR2             MCU # | SLOT << 11
> +   MCU (mem)       1             3           ERR3             MCU #
> +   MCU (mem)       1             4           ERR4             MCU #
> +   MCU (mem)       1             5           ERR5             MCU #
> +   MCU (mem)       1             6           ERR6             MCU #
> +   MCU (mem)       1             7           Link Error       MCU #
> +   Mesh (other)    2             0           Cross Point      X | (Y << 5) | NS <<11
> +   Mesh (other)    2             1           Home Node(IO)    X | (Y << 5) | NS <<11
> +   Mesh (other)    2             2           Home Node(Mem)   X | (Y << 5) | NS <<11 | device<<12
> +   Mesh (other)    2             4           CCIX Node        X | (Y << 5) | NS <<11
> +   2P Link (other) 3             0           N/A              Altra 2P Link #
> +   GIC (other)     5             0           ERR0             0
> +   GIC (other)     5             1           ERR1             0
> +   GIC (other)     5             2           ERR2             0
> +   GIC (other)     5             3           ERR3             0
> +   GIC (other)     5             4           ERR4             0
> +   GIC (other)     5             5           ERR5             0
> +   GIC (other)     5             6           ERR6             0
> +   GIC (other)     5             7           ERR7             0
> +   GIC (other)     5             8           ERR8             0
> +   GIC (other)     5             9           ERR9             0
> +   GIC (other)     5             10          ERR10            0
> +   GIC (other)     5             11          ERR11            0
> +   GIC (other)     5             12          ERR12            0
> +   GIC (other)     5             13-21       ERR13            RC# + 1
> +   SMMU (other)    6             TCU         100              RC #
> +   SMMU (other)    6             TBU0        0                RC #
> +   SMMU (other)    6             TBU1        1                RC #
> +   SMMU (other)    6             TBU2        2                RC #
> +   SMMU (other)    6             TBU3        3                RC #
> +   SMMU (other)    6             TBU4        4                RC #
> +   SMMU (other)    6             TBU5        5                RC #
> +   SMMU (other)    6             TBU6        6                RC #
> +   SMMU (other)    6             TBU7        7                RC #
> +   SMMU (other)    6             TBU8        8                RC #
> +   SMMU (other)    6             TBU9        9                RC #
> +   PCIe AER (pcie) 7             Root        0                RC #
> +   PCIe AER (pcie) 7             Device      1                RC #
> +   PCIe RC (pcie)  8             RCA HB      0                RC #
> +   PCIe RC (pcie)  8             RCB HB      1                RC #
> +   PCIe RC (pcie)  8             RASDP       8                RC #
> +   OCM (other)     9             ERR0        0                0
> +   OCM (other)     9             ERR1        1                0
> +   OCM (other)     9             ERR2        2                0
> +   SMpro (other)   10            ERR0        0                0
> +   SMpro (other)   10            ERR1        1                0
> +   SMpro (other)   10            MPA_ERR     2                0
> +   PMpro (other)   11            ERR0        0                0
> +   PMpro (other)   11            ERR1        1                0
> +   PMpro (other)   11            MPA_ERR     2                0
> +   =============== ==========    =========   ===============  ====================================
>   
> -2) For the Internal SMpro/PMpro error types::
> +   Example::
>   
> -The error_[smpro|pmro] sysfs returns string of 8-byte hex value:
> -    <4-byte hex value of Error info><4-byte hex value of Error extensive data>
> +     # cat error_other_ue
> +     880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000
>   
> -The warn_[smpro|pmro] sysfs returns string of 4-byte hex value:
> -    <4-byte hex value of Warning info>
> +2) For the internal SMpro/PMpro error types
>   
> -Reference to Altra SoC BMC Interface Specification for the details.
> +   The ``error_[smpro|pmro]`` sysfs returns string of 8-byte hex value::
>   
> -3) For the VRD hot, VRD /warn/fault, DIMM Hot event::
> +     <4-byte hex value of Error info><4-byte hex value of Error extensive data>
>   
> -The return string is 2-byte hex string value. Reference to section 5.7 GPI status register in Altra SoC BMC Interface Specification for the details.
> +   The ``warn_[smpro|pmro]`` sysfs returns string of 4-byte hex value::
>   
> -    Example:
> -    #cat event_vrd_hot
> -    0000
> +     <4-byte hex value of Warning info>
> +
> +   Refer to `Altra SoC BMC Interface Specification` for details.
> +
> +3) For the VRD hot, VRD warn/fault, DIMM Hot event
> +
> +   The return string is 2-byte hex string value. Refer to section `5.7 GPI
> +   status register` in `Altra SoC BMC Interface Specification` for details.
> +
> +   Example::
> +
> +      #cat event_vrd_hot
> +      0000
> +
> +.. _smpro_sysfs:
>   
>   Sysfs entries
>   -------------
>   
>   The following sysfs files are supported:
>   
> -* Ampere(R) Altra(R):
> +* Ampere(R) Altra(R)
>   
> -Alert Types:
> +  Alert types:
>   
> -    ========================  =================  ==================================================
> -    Alert Type                Sysfs name         Description
> -    ========================  =================  ==================================================
> -    Core CE Error             error_core_ce      Trigger when Core has CE error
> -    Core CE Error overflow    overflow_core_ce   Trigger when Core CE error overflow
> -    Core UE Error             error_core_ue      Trigger when Core has UE error
> -    Core UE Error overflow    overflow_core_ue   Trigger when Core UE error overflow
> -    Memory CE Error           error_mem_ce       Trigger when Memory has CE error
> -    Memory CE Error overflow  overflow_mem_ce    Trigger when Memory CE error overflow
> -    Memory UE Error           error_mem_ue       Trigger when Memory has UE error
> -    Memory UE Error overflow  overflow_mem_ue    Trigger when Memory UE error overflow
> -    PCIe CE Error             error_pcie_ce      Trigger when any PCIe controller has CE error
> -    PCIe CE Error overflow    overflow_pcie_ce   Trigger when any PCIe controller CE error overflow
> -    PCIe UE Error             error_pcie_ue      Trigger when any PCIe controller has UE error
> -    PCIe UE Error overflow    overflow_pcie_ue   Trigger when any PCIe controller UE error overflow
> -    Other CE Error            error_other_ce     Trigger when any Others CE error
> -    Other CE Error overflow   overflow_other_ce  Trigger when any Others CE error overflow
> -    Other UE Error            error_other_ue     Trigger when any Others UE error
> -    Other UE Error overflow   overflow_other_ue  Trigger when Others UE error overflow
> -    SMpro Error               error_smpro        Trigger when system have SMpro error
> -    SMpro Warning             warn_smpro         Trigger when system have SMpro warning
> -    PMpro Error               error_pmpro        Trigger when system have PMpro error
> -    PMpro Warning             warn_pmpro         Trigger when system have PMpro warning
> -    ========================  =================  ==================================================
> +    ========================  =====================  ==================================================
> +    Alert type                Sysfs name             Description (when the error is triggered)
> +    ========================  =====================  ==================================================
> +    Core CE Error             ``error_core_ce``      Core has CE error
> +    Core CE Error overflow    ``overflow_core_ce``   Core CE error overflow
> +    Core UE Error             ``error_core_ue``      Core has UE error
> +    Core UE Error overflow    ``overflow_core_ue``   Core UE error overflow
> +    Memory CE Error           ``error_mem_ce``       Memory has CE error
> +    Memory CE Error overflow  ``overflow_mem_ce``    Memory CE error overflow
> +    Memory UE Error           ``error_mem_ue``       Memory has UE error
> +    Memory UE Error overflow  ``overflow_mem_ue``    Memory UE error overflow
> +    PCIe CE Error             ``error_pcie_ce``      any PCIe controller has CE error
> +    PCIe CE Error overflow    ``overflow_pcie_ce``   any PCIe controller CE error overflow
> +    PCIe UE Error             ``error_pcie_ue``      any PCIe controller has UE error
> +    PCIe UE Error overflow    ``overflow_pcie_ue``   any PCIe controller UE error overflow
> +    Other CE Error            ``error_other_ce``     any other CE error
> +    Other CE Error overflow   ``overflow_other_ce``  any other CE error overflow
> +    Other UE Error            ``error_other_ue``     any other UE error
> +    Other UE Error overflow   ``overflow_other_ue``  other UE error overflow
> +    SMpro Error               ``error_smpro``        system have SMpro error
> +    SMpro Warning             ``warn_smpro``         system have SMpro warning
> +    PMpro Error               ``error_pmpro``        system have PMpro error
> +    PMpro Warning             ``warn_pmpro``         system have PMpro warning
> +    ========================  =====================  ==================================================
>   
> -Event Type:
> +  Event types:
>   
>       ============================ ==========================
> -    Event Type                   Sysfs name
> +    Event type                   Sysfs name
>       ============================ ==========================
> -    VRD HOT                      event_vrd_hot
> -    VR Warn/Fault                event_vrd_warn_fault
> -    DIMM Hot                     event_dimm_hot
> +    VRD HOT                      ``event_vrd_hot``
> +    VR Warn/Fault                ``event_vrd_warn_fault``
> +    DIMM Hot                     ``event_dimm_hot``
>       ============================ ==========================
> 

Thanks Bagas and will apply in next version but it may be moved to 
Documentation/ABI as per Greg's suggestion.

- Quan
diff mbox series

Patch

diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst
index 756be15a49a4..b74b3b34a235 100644
--- a/Documentation/misc-devices/index.rst
+++ b/Documentation/misc-devices/index.rst
@@ -27,6 +27,7 @@  fit into other categories.
    max6875
    oxsemi-tornado
    pci-endpoint-test
+   smpro-errmon
    spear-pcie-gadget
    uacce
    xilinx_sdfec
diff --git a/Documentation/misc-devices/smpro-errmon.rst b/Documentation/misc-devices/smpro-errmon.rst
new file mode 100644
index 000000000000..b17f30a6cafd
--- /dev/null
+++ b/Documentation/misc-devices/smpro-errmon.rst
@@ -0,0 +1,193 @@ 
+.. SPDX-License-Identifier: GPL-2.0-only
+
+Kernel driver Ampere(R)'s Altra(R) SMpro errmon
+===============================================
+
+Supported chips:
+
+  * Ampere(R) Altra(R)
+
+    Prefix: 'smpro'
+
+    Preference: Altra SoC BMC Interface Specification
+
+Author: Thu Nguyen <thu@os.amperecomputing.com>
+
+Description
+-----------
+
+This driver supports hardware monitoring for Ampere(R) Altra(R) SoC's based on the
+SMpro co-processor (SMpro).
+The following SoC alert/event types are supported by the errmon driver:
+
+* Core CE/UE error
+* Memory CE/UE error
+* PCIe CE/UE error
+* Other CE/UE error
+* Internal SMpro/PMpro error
+* VRD hot
+* VRD warn/fault
+* DIMM Hot
+
+The SMpro interface provides the registers to query the status of the SoC alerts/events
+and their data and export to userspace by this driver.
+
+The SoC alerts/events will be referenced as error below.
+
+Usage Notes
+-----------
+
+SMpro errmon driver creates the sysfs files for each error type.
+Example: ``error_core_ce`` to get Core CE error type.
+
+* If the error is absented, the sysfs file returns empty.
+* If the errors are presented, one each read to the sysfs, the oldest error will be returned and clear, the next read will be returned with the next error until all the errors are read out.
+
+For each host error type, SMpro keeps a latest max number of errors. All the oldest errors that were not read out will be dropped. In that case, the read to the corresponding overflow sysfs will return 1, otherwise, return 0.
+Example: ``overflow_core_ce`` to report the overflow status of Core CE error type.
+
+The format of the error is depended on the error type.
+
+1) For Core/Memory/PCIe/Other CE/UE error types::
+
+The return 48-byte in hex format in table below:
+
+    =======   =============   ===========   ==========================================
+    OFFSET    FIELD           SIZE (BYTE)   DESCRIPTION
+    =======   =============   ===========   ==========================================
+    00        Error Type      1             See Table below for details
+    01        Subtype         1             See Table below for details
+    02        Instance        2             See Table below for details
+    04        Error status    4             See ARM RAS specification for details
+    08        Error Address   8             See ARM RAS specification for details
+    16        Error Misc 0    8             See ARM RAS specification for details
+    24        Error Misc 1    8             See ARM RAS specification for details
+    32        Error Misc 2    8             See ARM RAS specification for details
+    40        Error Misc 3    8             See ARM RAS specification for details
+    =======   =============   ===========   ==========================================
+
+Below table defines the value of Error types, Sub Types, Sub component and instance:
+
+    ============    ==========    =========   ===============  ====================================
+    Error Group     Error Type    Sub type    Sub component    Instance
+    ============    ==========    =========   ===============  ====================================
+    CPM (core)      0             0           Snoop-Logic      CPM #
+    CPM (core)      0             2           Armv8 Core 1     CPM #
+    MCU (mem)       1             1           ERR1             MCU # | SLOT << 11
+    MCU (mem)       1             2           ERR2             MCU # | SLOT << 11
+    MCU (mem)       1             3           ERR3             MCU #
+    MCU (mem)       1             4           ERR4             MCU #
+    MCU (mem)       1             5           ERR5             MCU #
+    MCU (mem)       1             6           ERR6             MCU #
+    MCU (mem)       1             7           Link Error       MCU #
+    Mesh (other)    2             0           Cross Point      X | (Y << 5) | NS <<11
+    Mesh (other)    2             1           Home Node(IO)    X | (Y << 5) | NS <<11
+    Mesh (other)    2             2           Home Node(Mem)   X | (Y << 5) | NS <<11 | device<<12
+    Mesh (other)    2             4           CCIX Node        X | (Y << 5) | NS <<11
+    2P Link (other) 3             0           N/A              Altra 2P Link #
+    GIC (other)     5             0           ERR0             0
+    GIC (other)     5             1           ERR1             0
+    GIC (other)     5             2           ERR2             0
+    GIC (other)     5             3           ERR3             0
+    GIC (other)     5             4           ERR4             0
+    GIC (other)     5             5           ERR5             0
+    GIC (other)     5             6           ERR6             0
+    GIC (other)     5             7           ERR7             0
+    GIC (other)     5             8           ERR8             0
+    GIC (other)     5             9           ERR9             0
+    GIC (other)     5             10          ERR10            0
+    GIC (other)     5             11          ERR11            0
+    GIC (other)     5             12          ERR12            0
+    GIC (other)     5             13-21       ERR13            RC# + 1
+    SMMU (other)    6             TCU         100              RC #
+    SMMU (other)    6             TBU0        0                RC #
+    SMMU (other)    6             TBU1        1                RC #
+    SMMU (other)    6             TBU2        2                RC #
+    SMMU (other)    6             TBU3        3                RC #
+    SMMU (other)    6             TBU4        4                RC #
+    SMMU (other)    6             TBU5        5                RC #
+    SMMU (other)    6             TBU6        6                RC #
+    SMMU (other)    6             TBU7        7                RC #
+    SMMU (other)    6             TBU8        8                RC #
+    SMMU (other)    6             TBU9        9                RC #
+    PCIe AER (pcie) 7             Root        0                RC #
+    PCIe AER (pcie) 7             Device      1                RC #
+    PCIe RC (pcie)  8             RCA HB      0                RC #
+    PCIe RC (pcie)  8             RCB HB      1                RC #
+    PCIe RC (pcie)  8             RASDP       8                RC #
+    OCM (other)     9             ERR0        0                0
+    OCM (other)     9             ERR1        1                0
+    OCM (other)     9             ERR2        2                0
+    SMpro (other)   10            ERR0        0                0
+    SMpro (other)   10            ERR1        1                0
+    SMpro (other)   10            MPA_ERR     2                0
+    PMpro (other)   11            ERR0        0                0
+    PMpro (other)   11            ERR1        1                0
+    PMpro (other)   11            MPA_ERR     2                0
+    ============    ==========    =========   ===============  ====================================
+
+    For example:
+    # cat error_other_ue
+    880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000
+
+2) For the Internal SMpro/PMpro error types::
+
+The error_[smpro|pmro] sysfs returns string of 8-byte hex value:
+    <4-byte hex value of Error info><4-byte hex value of Error extensive data>
+
+The warn_[smpro|pmro] sysfs returns string of 4-byte hex value:
+    <4-byte hex value of Warning info>
+
+Reference to Altra SoC BMC Interface Specification for the details.
+
+3) For the VRD hot, VRD /warn/fault, DIMM Hot event::
+
+The return string is 2-byte hex string value. Reference to section 5.7 GPI status register in Altra SoC BMC Interface Specification for the details.
+
+    Example:
+    #cat event_vrd_hot
+    0000
+
+Sysfs entries
+-------------
+
+The following sysfs files are supported:
+
+* Ampere(R) Altra(R):
+
+Alert Types:
+
+    ========================  =================  ==================================================
+    Alert Type                Sysfs name         Description
+    ========================  =================  ==================================================
+    Core CE Error             error_core_ce      Trigger when Core has CE error
+    Core CE Error overflow    overflow_core_ce   Trigger when Core CE error overflow
+    Core UE Error             error_core_ue      Trigger when Core has UE error
+    Core UE Error overflow    overflow_core_ue   Trigger when Core UE error overflow
+    Memory CE Error           error_mem_ce       Trigger when Memory has CE error
+    Memory CE Error overflow  overflow_mem_ce    Trigger when Memory CE error overflow
+    Memory UE Error           error_mem_ue       Trigger when Memory has UE error
+    Memory UE Error overflow  overflow_mem_ue    Trigger when Memory UE error overflow
+    PCIe CE Error             error_pcie_ce      Trigger when any PCIe controller has CE error
+    PCIe CE Error overflow    overflow_pcie_ce   Trigger when any PCIe controller CE error overflow
+    PCIe UE Error             error_pcie_ue      Trigger when any PCIe controller has UE error
+    PCIe UE Error overflow    overflow_pcie_ue   Trigger when any PCIe controller UE error overflow
+    Other CE Error            error_other_ce     Trigger when any Others CE error
+    Other CE Error overflow   overflow_other_ce  Trigger when any Others CE error overflow
+    Other UE Error            error_other_ue     Trigger when any Others UE error
+    Other UE Error overflow   overflow_other_ue  Trigger when Others UE error overflow
+    SMpro Error               error_smpro        Trigger when system have SMpro error
+    SMpro Warning             warn_smpro         Trigger when system have SMpro warning
+    PMpro Error               error_pmpro        Trigger when system have PMpro error
+    PMpro Warning             warn_pmpro         Trigger when system have PMpro warning
+    ========================  =================  ==================================================
+
+Event Type:
+
+    ============================ ==========================
+    Event Type                   Sysfs name
+    ============================ ==========================
+    VRD HOT                      event_vrd_hot
+    VR Warn/Fault                event_vrd_warn_fault
+    DIMM Hot                     event_dimm_hot
+    ============================ ==========================