diff mbox series

[v4,2/2] cxl: avoid duplicated report from MCE & device

Message ID 20240808151328.707869-3-ruansy.fnst@fujitsu.com
State New
Headers show
Series cxl: add device reporting poison handler | expand

Commit Message

Shiyang Ruan Aug. 8, 2024, 3:13 p.m. UTC
Since CXL device is a memory device, while CPU is consuming a poison
page of CXL device, it always triggers a MCE (via interrupt #18) and
calls memory_failure() to handle POISON page, no matter which-First path
is configured.  CXL device could also find and report the POISON, kernel
now not only traces but also calls memory_failure() to handle it, which
is marked as "NEW" in the figure blow.
```
1.  MCE (interrupt #18, while CPU consuming POISON)
     -> do_machine_check()
       -> mce_log()
         -> notify chain (x86_mce_decoder_chain)
           -> memory_failure() <---------------------------- EXISTS
2.a FW-First (optional, CXL device proactively find&report)
     -> CXL device -> Firmware
       -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace
                                                  \-> memory_failure()
                                                      ^----- NEW
2.b OS-First (optional, CXL device proactively find&report)
     -> CXL device -> MSI
       -> OS: CXL driver -> trace
                        \-> memory_failure()
                            ^------------------------------- NEW
```

But in this way, the memory_failure() could be called twice or even at
same time, as is shown in the figure above: (1.) and (2.a or 2.b),
before the POISON page is cleared.  memory_failure() has it own mutex
lock so it actually won't be called at same time and the later call
could be avoided because HWPoison bit has been set.  However, assume
such a scenario, "CXL device reports POISON error" triggers 1st call,
user see it from log and want to clear the poison by executing `cxl
clear-poison` command, and at the same time, a process tries to access
this POISON page, which triggers MCE (it's the 2nd call).  Since there
is no lock between the 2nd call with clearing poison operation, race
condition may happen, which may cause HWPoison bit of the page in an
unknown state.

Thus, we have to avoid the 2nd call. This patch[2] introduces a new
notifier_block into `x86_mce_decoder_chain` and a POISON cache list, to
stop the 2nd call of memory_failure(). It checks whether the current
poison page has been reported (if yes, stop the notifier chain, don't
call the following memory_failure() to report again).

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---
 arch/x86/include/asm/mce.h |   1 +
 drivers/cxl/core/mbox.c    | 115 +++++++++++++++++++++++++++++++++++++
 drivers/cxl/core/memdev.c  |   6 +-
 drivers/cxl/cxlmem.h       |   3 +
 4 files changed, 124 insertions(+), 1 deletion(-)

Comments

kernel test robot Aug. 9, 2024, 7:31 a.m. UTC | #1
Hi Shiyang,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on cxl/next linus/master v6.11-rc2 next-20240809]
[cannot apply to cxl/pending]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shiyang-Ruan/cxl-core-introduce-device-reporting-poison-hanlding/20240809-013658
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20240808151328.707869-3-ruansy.fnst%40fujitsu.com
patch subject: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
config: um-allyesconfig (https://download.01.org/0day-ci/archive/20240809/202408091537.p9RKx1R2-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240809/202408091537.p9RKx1R2-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408091537.p9RKx1R2-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from drivers/cxl/core/mbox.c:8:
>> arch/x86/include/asm/mce.h:219:43: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     219 | static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
         |                                           ^~~~~~~~~~~
   arch/x86/include/asm/mce.h:220:44: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     220 | static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
         |                                            ^~~~~~~~~~~
   arch/x86/include/asm/mce.h:240:50: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     240 | static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
         |                                                  ^~~~~~~~~~~
   arch/x86/include/asm/mce.h:241:51: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     241 | static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { }
         |                                                   ^~~~~~~~~~~
   arch/x86/include/asm/mce.h:248:26: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     248 | int mce_available(struct cpuinfo_x86 *c);
         |                          ^~~~~~~~~~~
   arch/x86/include/asm/mce.h:355:48: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)          { }
         |                                                ^~~~~~~~~~~
   arch/x86/include/asm/mce.h:358:50: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)        { return mce_amd_feature_init(c); }
         |                                                  ^~~~~~~~~~~
   arch/x86/include/asm/mce.h: In function 'mce_hygon_feature_init':
>> arch/x86/include/asm/mce.h:358:103: error: passing argument 1 of 'mce_amd_feature_init' from incompatible pointer type [-Werror=incompatible-pointer-types]
     358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)        { return mce_amd_feature_init(c); }
         |                                                                                                       ^
         |                                                                                                       |
         |                                                                                                       struct cpuinfo_x86 *
   arch/x86/include/asm/mce.h:355:61: note: expected 'struct cpuinfo_x86 *' but argument is of type 'struct cpuinfo_x86 *'
     355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)          { }
         |                                         ~~~~~~~~~~~~~~~~~~~~^
   In file included from include/linux/container_of.h:5,
                    from include/linux/list.h:5,
                    from include/linux/key.h:14,
                    from include/linux/security.h:27,
                    from drivers/cxl/core/mbox.c:3:
   drivers/cxl/core/mbox.c: In function 'cxl_handle_mce':
>> arch/x86/include/asm/mce.h:94:58: error: 'struct cpuinfo_um' has no member named 'x86_phys_bits'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                                          ^
   include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
      16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
         |                                                              ^
   include/linux/bits.h:25:17: note: in expansion of macro '__is_constexpr'
      25 |                 __is_constexpr((l) > (h)), (l) > (h), 0)))
         |                 ^~~~~~~~~~~~~~
   include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |          ^~~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                 ^~~~~~~~~~~
   drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
>> arch/x86/include/asm/mce.h:94:58: error: 'struct cpuinfo_um' has no member named 'x86_phys_bits'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                                          ^
   include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
      16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
         |                                                              ^
   include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |          ^~~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                 ^~~~~~~~~~~
   drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   include/linux/bits.h:24:28: error: first argument to '__builtin_choose_expr' not a constant
      24 |         (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
         |                            ^~~~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
      16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
         |                                                              ^
   include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |          ^~~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                 ^~~~~~~~~~~
   drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
      16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
         |                                                   ^
   include/linux/bits.h:24:10: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
      24 |         (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
         |          ^~~~~~~~~~~~~~~~~
   include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |          ^~~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                 ^~~~~~~~~~~
   drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   In file included from include/linux/bits.h:7,
                    from include/linux/ratelimit_types.h:5,
                    from include/linux/printk.h:9,
                    from include/asm-generic/bug.h:22,
                    from ./arch/um/include/generated/asm/bug.h:1,
                    from include/linux/bug.h:5,
                    from include/linux/thread_info.h:13,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/um/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:79,
                    from include/linux/rcupdate.h:27,
                    from include/linux/rbtree.h:24,
                    from include/linux/key.h:15:
>> arch/x86/include/asm/mce.h:94:58: error: 'struct cpuinfo_um' has no member named 'x86_phys_bits'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                                          ^
   include/uapi/linux/bits.h:13:52: note: in definition of macro '__GENMASK_ULL'
      13 |          (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
         |                                                    ^
   arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                 ^~~~~~~~~~~
   drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/mce_amd_feature_init +358 arch/x86/include/asm/mce.h

4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  210  
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  211  #ifdef CONFIG_X86_MCE
a2202aa29289db arch/x86/include/asm/mce.h Yong Wang                 2009-11-10  212  int mcheck_init(void);
5e09954a9acc3b arch/x86/include/asm/mce.h Borislav Petkov           2009-10-16  213  void mcheck_cpu_init(struct cpuinfo_x86 *c);
8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  214  void mcheck_cpu_clear(struct cpuinfo_x86 *c);
4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  215  int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  216  			       u64 lapic_id);
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  217  #else
a2202aa29289db arch/x86/include/asm/mce.h Yong Wang                 2009-11-10  218  static inline int mcheck_init(void) { return 0; }
5e09954a9acc3b arch/x86/include/asm/mce.h Borislav Petkov           2009-10-16 @219  static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  220  static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  221  static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  222  					     u64 lapic_id) { return -EINVAL; }
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  223  #endif
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  224  
b5f2fa4ea00a17 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  225  void mce_setup(struct mce *m);
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  226  void mce_log(struct mce *m);
d6126ef5f31ca5 arch/x86/include/asm/mce.h Greg Kroah-Hartman        2012-01-26  227  DECLARE_PER_CPU(struct device *, mce_device);
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  228  
a0bc32b3cacf19 arch/x86/include/asm/mce.h Akshay Gupta              2020-08-28  229  /* Maximum number of MCA banks per CPU. */
a0bc32b3cacf19 arch/x86/include/asm/mce.h Akshay Gupta              2020-08-28  230  #define MAX_NR_BANKS 64
41fdff322e26c4 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  231  
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  232  #ifdef CONFIG_X86_MCE_INTEL
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  233  void mce_intel_feature_init(struct cpuinfo_x86 *c);
8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  234  void mce_intel_feature_clear(struct cpuinfo_x86 *c);
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  235  void cmci_clear(void);
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  236  void cmci_reenable(void);
7a0c819d28f5c9 arch/x86/include/asm/mce.h Srivatsa S. Bhat          2013-03-20  237  void cmci_rediscover(void);
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  238  void cmci_recheck(void);
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  239  #else
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  240  static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  241  static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { }
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  242  static inline void cmci_clear(void) {}
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  243  static inline void cmci_reenable(void) {}
7a0c819d28f5c9 arch/x86/include/asm/mce.h Srivatsa S. Bhat          2013-03-20  244  static inline void cmci_rediscover(void) {}
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  245  static inline void cmci_recheck(void) {}
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  246  #endif
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  247  
38736072d45488 arch/x86/include/asm/mce.h H. Peter Anvin            2009-05-28  248  int mce_available(struct cpuinfo_x86 *c);
2d1f406139ec20 arch/x86/include/asm/mce.h Borislav Petkov           2017-05-19  249  bool mce_is_memory_error(struct mce *m);
5d96c9342c23ee arch/x86/include/asm/mce.h Vishal Verma              2018-10-25  250  bool mce_is_correctable(struct mce *m);
1bae0cfe4a171c arch/x86/include/asm/mce.h Yazen Ghannam             2023-06-13  251  bool mce_usable_address(struct mce *m);
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  252  
01ca79f1411eae arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  253  DECLARE_PER_CPU(unsigned, mce_exception_count);
ca84f69697da0f arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  254  DECLARE_PER_CPU(unsigned, mce_poll_count);
01ca79f1411eae arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  255  
ee031c31d6381d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  256  typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
ee031c31d6381d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  257  DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
ee031c31d6381d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  258  
b79109c3bbcf52 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  259  enum mcp_flags {
3f2f0680d1161d arch/x86/include/asm/mce.h Borislav Petkov           2015-01-13  260  	MCP_TIMESTAMP	= BIT(0),	/* log time stamp */
3f2f0680d1161d arch/x86/include/asm/mce.h Borislav Petkov           2015-01-13  261  	MCP_UC		= BIT(1),	/* log uncorrected errors */
3f2f0680d1161d arch/x86/include/asm/mce.h Borislav Petkov           2015-01-13  262  	MCP_DONTLOG	= BIT(2),	/* only clear, don't log */
3bff147b187d5d arch/x86/include/asm/mce.h Borislav Petkov           2021-08-23  263  	MCP_QUEUE_LOG	= BIT(3),	/* only queue to genpool */
b79109c3bbcf52 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  264  };
5b9d292ea87c83 arch/x86/include/asm/mce.h Yazen Ghannam             2024-05-23  265  
5b9d292ea87c83 arch/x86/include/asm/mce.h Yazen Ghannam             2024-05-23  266  void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
b79109c3bbcf52 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  267  
9ff36ee9668ff4 arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  268  int mce_notify_irq(void);
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  269  
ea149b36c7f511 arch/x86/include/asm/mce.h Andi Kleen                2009-04-29  270  DECLARE_PER_CPU(struct mce, injectm);
66f5ddf30a59f8 arch/x86/include/asm/mce.h Tony Luck                 2011-11-03  271  
c3d1fb567a634d arch/x86/include/asm/mce.h Naveen N Rao              2013-07-01  272  /* Disable CMCI/polling for MCA bank claimed by firmware */
c3d1fb567a634d arch/x86/include/asm/mce.h Naveen N Rao              2013-07-01  273  extern void mce_disable_bank(int bank);
c3d1fb567a634d arch/x86/include/asm/mce.h Naveen N Rao              2013-07-01  274  
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  275  /*
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  276   * Exception handler
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  277   */
8cd501c1facc15 arch/x86/include/asm/mce.h Thomas Gleixner           2020-02-25  278  void do_machine_check(struct pt_regs *pt_regs);
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  279  
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  280  /*
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  281   * Threshold handler
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  282   */
b276268631af3a arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  283  extern void (*mce_threshold_vector)(void);
b276268631af3a arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  284  
24fd78a81f6d3f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2015-05-06  285  /* Deferred error interrupt handler */
24fd78a81f6d3f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2015-05-06  286  extern void (*deferred_error_int_vector)(void);
24fd78a81f6d3f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2015-05-06  287  
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  288  /*
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  289   * Used by APEI to report memory error via /dev/mcelog
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  290   */
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  291  
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  292  struct cper_sec_mem_err;
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  293  extern void apei_mce_report_mem_error(int corrected,
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  294  				      struct cper_sec_mem_err *mem_err);
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  295  
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  296  /*
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  297   * Enumerate new IP types and HWID values in AMD processors which support
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  298   * Scalable MCA.
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  299   */
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  300  #ifdef CONFIG_X86_MCE_AMD
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  301  
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  302  /* These may be used by multiple smca_hwid_mcatypes */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  303  enum smca_bank_types {
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  304  	SMCA_LS = 0,	/* Load Store */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  305  	SMCA_LS_V2,
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  306  	SMCA_IF,	/* Instruction Fetch */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  307  	SMCA_L2_CACHE,	/* L2 Cache */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  308  	SMCA_DE,	/* Decoder Unit */
68627a697c1959 arch/x86/include/asm/mce.h Yazen Ghannam             2018-02-21  309  	SMCA_RESERVED,	/* Reserved */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  310  	SMCA_EX,	/* Execution Unit */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  311  	SMCA_FP,	/* Floating Point */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  312  	SMCA_L3_CACHE,	/* L3 Cache */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  313  	SMCA_CS,	/* Coherent Slave */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  314  	SMCA_CS_V2,
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  315  	SMCA_PIE,	/* Power, Interrupts, etc. */
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  316  	SMCA_UMC,	/* Unified Memory Controller */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  317  	SMCA_UMC_V2,
47b744ea5e3cf8 arch/x86/include/asm/mce.h Muralidhara M K           2023-11-02  318  	SMCA_MA_LLC,	/* Memory Attached Last Level Cache */
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  319  	SMCA_PB,	/* Parameter Block */
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  320  	SMCA_PSP,	/* Platform Security Processor */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  321  	SMCA_PSP_V2,
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  322  	SMCA_SMU,	/* System Management Unit */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  323  	SMCA_SMU_V2,
cbfa447edd6a38 arch/x86/include/asm/mce.h Yazen Ghannam             2019-02-01  324  	SMCA_MP5,	/* Microprocessor 5 Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  325  	SMCA_MPDMA,	/* MPDMA Unit */
cbfa447edd6a38 arch/x86/include/asm/mce.h Yazen Ghannam             2019-02-01  326  	SMCA_NBIO,	/* Northbridge IO Unit */
cbfa447edd6a38 arch/x86/include/asm/mce.h Yazen Ghannam             2019-02-01  327  	SMCA_PCIE,	/* PCI Express Unit */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  328  	SMCA_PCIE_V2,
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  329  	SMCA_XGMI_PCS,	/* xGMI PCS Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  330  	SMCA_NBIF,	/* NBIF Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  331  	SMCA_SHUB,	/* System HUB Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  332  	SMCA_SATA,	/* SATA Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  333  	SMCA_USB,	/* USB Unit */
47b744ea5e3cf8 arch/x86/include/asm/mce.h Muralidhara M K           2023-11-02  334  	SMCA_USR_DP,	/* Ultra Short Reach Data Plane Controller */
47b744ea5e3cf8 arch/x86/include/asm/mce.h Muralidhara M K           2023-11-02  335  	SMCA_USR_CP,	/* Ultra Short Reach Control Plane Controller */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  336  	SMCA_GMI_PCS,	/* GMI PCS Unit */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  337  	SMCA_XGMI_PHY,	/* xGMI PHY Unit */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  338  	SMCA_WAFL_PHY,	/* WAFL PHY Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  339  	SMCA_GMI_PHY,	/* GMI PHY Unit */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  340  	N_SMCA_BANK_TYPES
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  341  };
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  342  
c6708d50f166be arch/x86/include/asm/mce.h Yazen Ghannam             2017-12-18  343  extern bool amd_mce_is_memory_error(struct mce *m);
e71c3978d6f976 arch/x86/include/asm/mce.h Linus Torvalds            2016-12-12  344  
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  345  extern int mce_threshold_create_device(unsigned int cpu);
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  346  extern int mce_threshold_remove_device(unsigned int cpu);
e71c3978d6f976 arch/x86/include/asm/mce.h Linus Torvalds            2016-12-12  347  
9308fd4074551f arch/x86/include/asm/mce.h Yazen Ghannam             2019-03-22  348  void mce_amd_feature_init(struct cpuinfo_x86 *c);
91f75eb481cfae arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  349  enum smca_bank_types smca_get_bank_type(unsigned int cpu, unsigned int bank);
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  350  #else
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  351  
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  352  static inline int mce_threshold_create_device(unsigned int cpu)		{ return 0; };
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  353  static inline int mce_threshold_remove_device(unsigned int cpu)		{ return 0; };
c6708d50f166be arch/x86/include/asm/mce.h Yazen Ghannam             2017-12-18  354  static inline bool amd_mce_is_memory_error(struct mce *m)		{ return false; };
9308fd4074551f arch/x86/include/asm/mce.h Yazen Ghannam             2019-03-22  355  static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)		{ }
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  356  #endif
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  357  
9308fd4074551f arch/x86/include/asm/mce.h Yazen Ghannam             2019-03-22 @358  static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)	{ return mce_amd_feature_init(c); }
e9c2a283e7d9d4 arch/x86/include/asm/mce.h Arnd Bergmann             2023-05-16  359
kernel test robot Aug. 9, 2024, 7:31 a.m. UTC | #2
Hi Shiyang,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on cxl/next linus/master v6.11-rc2 next-20240809]
[cannot apply to cxl/pending]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shiyang-Ruan/cxl-core-introduce-device-reporting-poison-hanlding/20240809-013658
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20240808151328.707869-3-ruansy.fnst%40fujitsu.com
patch subject: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
config: um-allmodconfig (https://download.01.org/0day-ci/archive/20240809/202408091543.UNFvPFFl-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project f86594788ce93b696675c94f54016d27a6c21d18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240809/202408091543.UNFvPFFl-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408091543.UNFvPFFl-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from drivers/cxl/core/mbox.c:3:
   In file included from include/linux/security.h:33:
   In file included from include/linux/mm.h:2228:
   include/linux/vmstat.h:514:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     514 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   In file included from drivers/cxl/core/mbox.c:3:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:31:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:25:
   In file included from include/linux/kernel_stat.h:8:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:548:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     548 |         val = __raw_readb(PCI_IOBASE + addr);
         |                           ~~~~~~~~~~ ^
   include/asm-generic/io.h:561:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     561 |         val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
      37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
         |                                                   ^
   In file included from drivers/cxl/core/mbox.c:3:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:31:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:25:
   In file included from include/linux/kernel_stat.h:8:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:574:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     574 |         val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
      35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
         |                                                   ^
   In file included from drivers/cxl/core/mbox.c:3:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:31:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:25:
   In file included from include/linux/kernel_stat.h:8:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:585:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     585 |         __raw_writeb(value, PCI_IOBASE + addr);
         |                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:595:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     595 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:605:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     605 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:693:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     693 |         readsb(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:701:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     701 |         readsw(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:709:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     709 |         readsl(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:718:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     718 |         writesb(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:727:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     727 |         writesw(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:736:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     736 |         writesl(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   In file included from drivers/cxl/core/mbox.c:8:
>> arch/x86/include/asm/mce.h:219:43: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     219 | static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
         |                                           ^
   arch/x86/include/asm/mce.h:220:44: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     220 | static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
         |                                            ^
   arch/x86/include/asm/mce.h:240:50: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     240 | static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
         |                                                  ^
   arch/x86/include/asm/mce.h:241:51: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     241 | static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { }
         |                                                   ^
   arch/x86/include/asm/mce.h:248:26: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     248 | int mce_available(struct cpuinfo_x86 *c);
         |                          ^
   arch/x86/include/asm/mce.h:355:48: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)          { }
         |                                                ^
   arch/x86/include/asm/mce.h:358:50: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)        { return mce_amd_feature_init(c); }
         |                                                  ^
>> arch/x86/include/asm/mce.h:358:96: error: incompatible pointer types passing 'struct cpuinfo_x86 *' to parameter of type 'struct cpuinfo_x86 *' [-Werror,-Wincompatible-pointer-types]
     358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)        { return mce_amd_feature_init(c); }
         |                                                                                                       ^
   arch/x86/include/asm/mce.h:355:61: note: passing argument to parameter 'c' here
     355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)          { }
         |                                                             ^
>> drivers/cxl/core/mbox.c:1553:20: error: no member named 'x86_phys_bits' in 'struct cpuinfo_um'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:53: note: expanded from macro 'MCI_ADDR_PHYSADDR'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                             ~~~~~~~~~~~~~ ^
   include/linux/bits.h:37:23: note: expanded from macro 'GENMASK_ULL'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |                              ^
   include/linux/bits.h:25:25: note: expanded from macro 'GENMASK_INPUT_CHECK'
      25 |                 __is_constexpr((l) > (h)), (l) > (h), 0)))
         |                                       ^
   include/linux/compiler.h:290:48: note: expanded from macro '__is_constexpr'
     290 |         (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
         |                                                       ^
   include/linux/build_bug.h:16:62: note: expanded from macro 'BUILD_BUG_ON_ZERO'
      16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
         |                                                              ^
>> drivers/cxl/core/mbox.c:1553:20: error: no member named 'x86_phys_bits' in 'struct cpuinfo_um'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:53: note: expanded from macro 'MCI_ADDR_PHYSADDR'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                             ~~~~~~~~~~~~~ ^
   include/linux/bits.h:37:23: note: expanded from macro 'GENMASK_ULL'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |                              ^
   include/linux/bits.h:25:37: note: expanded from macro 'GENMASK_INPUT_CHECK'
      25 |                 __is_constexpr((l) > (h)), (l) > (h), 0)))
         |                                                   ^
   include/linux/build_bug.h:16:62: note: expanded from macro 'BUILD_BUG_ON_ZERO'
      16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
         |                                                              ^
>> drivers/cxl/core/mbox.c:1553:20: error: no member named 'x86_phys_bits' in 'struct cpuinfo_um'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:53: note: expanded from macro 'MCI_ADDR_PHYSADDR'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                             ~~~~~~~~~~~~~ ^
   include/linux/bits.h:37:45: note: expanded from macro 'GENMASK_ULL'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |                                                    ^
   include/uapi/linux/bits.h:13:52: note: expanded from macro '__GENMASK_ULL'
      13 |          (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
         |                                                    ^
   20 warnings and 4 errors generated.


vim +358 arch/x86/include/asm/mce.h

4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  210  
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  211  #ifdef CONFIG_X86_MCE
a2202aa29289db6 arch/x86/include/asm/mce.h Yong Wang                 2009-11-10  212  int mcheck_init(void);
5e09954a9acc3b4 arch/x86/include/asm/mce.h Borislav Petkov           2009-10-16  213  void mcheck_cpu_init(struct cpuinfo_x86 *c);
8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  214  void mcheck_cpu_clear(struct cpuinfo_x86 *c);
4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  215  int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  216  			       u64 lapic_id);
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  217  #else
a2202aa29289db6 arch/x86/include/asm/mce.h Yong Wang                 2009-11-10  218  static inline int mcheck_init(void) { return 0; }
5e09954a9acc3b4 arch/x86/include/asm/mce.h Borislav Petkov           2009-10-16 @219  static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  220  static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  221  static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  222  					     u64 lapic_id) { return -EINVAL; }
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  223  #endif
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  224  
b5f2fa4ea00a179 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  225  void mce_setup(struct mce *m);
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  226  void mce_log(struct mce *m);
d6126ef5f31ca54 arch/x86/include/asm/mce.h Greg Kroah-Hartman        2012-01-26  227  DECLARE_PER_CPU(struct device *, mce_device);
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  228  
a0bc32b3cacf194 arch/x86/include/asm/mce.h Akshay Gupta              2020-08-28  229  /* Maximum number of MCA banks per CPU. */
a0bc32b3cacf194 arch/x86/include/asm/mce.h Akshay Gupta              2020-08-28  230  #define MAX_NR_BANKS 64
41fdff322e26c4a arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  231  
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  232  #ifdef CONFIG_X86_MCE_INTEL
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  233  void mce_intel_feature_init(struct cpuinfo_x86 *c);
8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  234  void mce_intel_feature_clear(struct cpuinfo_x86 *c);
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  235  void cmci_clear(void);
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  236  void cmci_reenable(void);
7a0c819d28f5c91 arch/x86/include/asm/mce.h Srivatsa S. Bhat          2013-03-20  237  void cmci_rediscover(void);
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  238  void cmci_recheck(void);
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  239  #else
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  240  static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  241  static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { }
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  242  static inline void cmci_clear(void) {}
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  243  static inline void cmci_reenable(void) {}
7a0c819d28f5c91 arch/x86/include/asm/mce.h Srivatsa S. Bhat          2013-03-20  244  static inline void cmci_rediscover(void) {}
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  245  static inline void cmci_recheck(void) {}
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  246  #endif
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  247  
38736072d45488f arch/x86/include/asm/mce.h H. Peter Anvin            2009-05-28  248  int mce_available(struct cpuinfo_x86 *c);
2d1f406139ec203 arch/x86/include/asm/mce.h Borislav Petkov           2017-05-19  249  bool mce_is_memory_error(struct mce *m);
5d96c9342c23ee1 arch/x86/include/asm/mce.h Vishal Verma              2018-10-25  250  bool mce_is_correctable(struct mce *m);
1bae0cfe4a171cc arch/x86/include/asm/mce.h Yazen Ghannam             2023-06-13  251  bool mce_usable_address(struct mce *m);
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  252  
01ca79f1411eae2 arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  253  DECLARE_PER_CPU(unsigned, mce_exception_count);
ca84f69697da0f0 arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  254  DECLARE_PER_CPU(unsigned, mce_poll_count);
01ca79f1411eae2 arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  255  
ee031c31d6381d0 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  256  typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
ee031c31d6381d0 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  257  DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
ee031c31d6381d0 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  258  
b79109c3bbcf52c arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  259  enum mcp_flags {
3f2f0680d1161df arch/x86/include/asm/mce.h Borislav Petkov           2015-01-13  260  	MCP_TIMESTAMP	= BIT(0),	/* log time stamp */
3f2f0680d1161df arch/x86/include/asm/mce.h Borislav Petkov           2015-01-13  261  	MCP_UC		= BIT(1),	/* log uncorrected errors */
3f2f0680d1161df arch/x86/include/asm/mce.h Borislav Petkov           2015-01-13  262  	MCP_DONTLOG	= BIT(2),	/* only clear, don't log */
3bff147b187d5df arch/x86/include/asm/mce.h Borislav Petkov           2021-08-23  263  	MCP_QUEUE_LOG	= BIT(3),	/* only queue to genpool */
b79109c3bbcf52c arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  264  };
5b9d292ea87c836 arch/x86/include/asm/mce.h Yazen Ghannam             2024-05-23  265  
5b9d292ea87c836 arch/x86/include/asm/mce.h Yazen Ghannam             2024-05-23  266  void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
b79109c3bbcf52c arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  267  
9ff36ee9668ff41 arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  268  int mce_notify_irq(void);
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  269  
ea149b36c7f511d arch/x86/include/asm/mce.h Andi Kleen                2009-04-29  270  DECLARE_PER_CPU(struct mce, injectm);
66f5ddf30a59f81 arch/x86/include/asm/mce.h Tony Luck                 2011-11-03  271  
c3d1fb567a634dc arch/x86/include/asm/mce.h Naveen N Rao              2013-07-01  272  /* Disable CMCI/polling for MCA bank claimed by firmware */
c3d1fb567a634dc arch/x86/include/asm/mce.h Naveen N Rao              2013-07-01  273  extern void mce_disable_bank(int bank);
c3d1fb567a634dc arch/x86/include/asm/mce.h Naveen N Rao              2013-07-01  274  
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  275  /*
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  276   * Exception handler
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  277   */
8cd501c1facc159 arch/x86/include/asm/mce.h Thomas Gleixner           2020-02-25  278  void do_machine_check(struct pt_regs *pt_regs);
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  279  
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  280  /*
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  281   * Threshold handler
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  282   */
b276268631af3a1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  283  extern void (*mce_threshold_vector)(void);
b276268631af3a1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  284  
24fd78a81f6d3fe arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2015-05-06  285  /* Deferred error interrupt handler */
24fd78a81f6d3fe arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2015-05-06  286  extern void (*deferred_error_int_vector)(void);
24fd78a81f6d3fe arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2015-05-06  287  
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  288  /*
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  289   * Used by APEI to report memory error via /dev/mcelog
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  290   */
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  291  
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  292  struct cper_sec_mem_err;
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  293  extern void apei_mce_report_mem_error(int corrected,
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  294  				      struct cper_sec_mem_err *mem_err);
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  295  
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  296  /*
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  297   * Enumerate new IP types and HWID values in AMD processors which support
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  298   * Scalable MCA.
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  299   */
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  300  #ifdef CONFIG_X86_MCE_AMD
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  301  
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  302  /* These may be used by multiple smca_hwid_mcatypes */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  303  enum smca_bank_types {
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  304  	SMCA_LS = 0,	/* Load Store */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  305  	SMCA_LS_V2,
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  306  	SMCA_IF,	/* Instruction Fetch */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  307  	SMCA_L2_CACHE,	/* L2 Cache */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  308  	SMCA_DE,	/* Decoder Unit */
68627a697c19593 arch/x86/include/asm/mce.h Yazen Ghannam             2018-02-21  309  	SMCA_RESERVED,	/* Reserved */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  310  	SMCA_EX,	/* Execution Unit */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  311  	SMCA_FP,	/* Floating Point */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  312  	SMCA_L3_CACHE,	/* L3 Cache */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  313  	SMCA_CS,	/* Coherent Slave */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  314  	SMCA_CS_V2,
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  315  	SMCA_PIE,	/* Power, Interrupts, etc. */
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  316  	SMCA_UMC,	/* Unified Memory Controller */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  317  	SMCA_UMC_V2,
47b744ea5e3cf85 arch/x86/include/asm/mce.h Muralidhara M K           2023-11-02  318  	SMCA_MA_LLC,	/* Memory Attached Last Level Cache */
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  319  	SMCA_PB,	/* Parameter Block */
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  320  	SMCA_PSP,	/* Platform Security Processor */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  321  	SMCA_PSP_V2,
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  322  	SMCA_SMU,	/* System Management Unit */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  323  	SMCA_SMU_V2,
cbfa447edd6a382 arch/x86/include/asm/mce.h Yazen Ghannam             2019-02-01  324  	SMCA_MP5,	/* Microprocessor 5 Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  325  	SMCA_MPDMA,	/* MPDMA Unit */
cbfa447edd6a382 arch/x86/include/asm/mce.h Yazen Ghannam             2019-02-01  326  	SMCA_NBIO,	/* Northbridge IO Unit */
cbfa447edd6a382 arch/x86/include/asm/mce.h Yazen Ghannam             2019-02-01  327  	SMCA_PCIE,	/* PCI Express Unit */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  328  	SMCA_PCIE_V2,
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  329  	SMCA_XGMI_PCS,	/* xGMI PCS Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  330  	SMCA_NBIF,	/* NBIF Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  331  	SMCA_SHUB,	/* System HUB Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  332  	SMCA_SATA,	/* SATA Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  333  	SMCA_USB,	/* USB Unit */
47b744ea5e3cf85 arch/x86/include/asm/mce.h Muralidhara M K           2023-11-02  334  	SMCA_USR_DP,	/* Ultra Short Reach Data Plane Controller */
47b744ea5e3cf85 arch/x86/include/asm/mce.h Muralidhara M K           2023-11-02  335  	SMCA_USR_CP,	/* Ultra Short Reach Control Plane Controller */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  336  	SMCA_GMI_PCS,	/* GMI PCS Unit */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  337  	SMCA_XGMI_PHY,	/* xGMI PHY Unit */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  338  	SMCA_WAFL_PHY,	/* WAFL PHY Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  339  	SMCA_GMI_PHY,	/* GMI PHY Unit */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  340  	N_SMCA_BANK_TYPES
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  341  };
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  342  
c6708d50f166bea arch/x86/include/asm/mce.h Yazen Ghannam             2017-12-18  343  extern bool amd_mce_is_memory_error(struct mce *m);
e71c3978d6f9765 arch/x86/include/asm/mce.h Linus Torvalds            2016-12-12  344  
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  345  extern int mce_threshold_create_device(unsigned int cpu);
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  346  extern int mce_threshold_remove_device(unsigned int cpu);
e71c3978d6f9765 arch/x86/include/asm/mce.h Linus Torvalds            2016-12-12  347  
9308fd4074551f2 arch/x86/include/asm/mce.h Yazen Ghannam             2019-03-22  348  void mce_amd_feature_init(struct cpuinfo_x86 *c);
91f75eb481cfaee arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  349  enum smca_bank_types smca_get_bank_type(unsigned int cpu, unsigned int bank);
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  350  #else
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  351  
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  352  static inline int mce_threshold_create_device(unsigned int cpu)		{ return 0; };
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  353  static inline int mce_threshold_remove_device(unsigned int cpu)		{ return 0; };
c6708d50f166bea arch/x86/include/asm/mce.h Yazen Ghannam             2017-12-18  354  static inline bool amd_mce_is_memory_error(struct mce *m)		{ return false; };
9308fd4074551f2 arch/x86/include/asm/mce.h Yazen Ghannam             2019-03-22  355  static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)		{ }
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  356  #endif
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  357  
9308fd4074551f2 arch/x86/include/asm/mce.h Yazen Ghannam             2019-03-22 @358  static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)	{ return mce_amd_feature_init(c); }
e9c2a283e7d9d4e arch/x86/include/asm/mce.h Arnd Bergmann             2023-05-16  359
kernel test robot Aug. 9, 2024, 11:48 a.m. UTC | #3
Hi Shiyang,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on cxl/next linus/master v6.11-rc2 next-20240809]
[cannot apply to cxl/pending]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shiyang-Ruan/cxl-core-introduce-device-reporting-poison-hanlding/20240809-013658
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20240808151328.707869-3-ruansy.fnst%40fujitsu.com
patch subject: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
config: x86_64-randconfig-121-20240809 (https://download.01.org/0day-ci/archive/20240809/202408091914.TFbjPuNQ-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240809/202408091914.TFbjPuNQ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408091914.TFbjPuNQ-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/cxl/core/mbox.c:1465:1: sparse: sparse: symbol 'cxl_mce_records' was not declared. Should it be static?
   drivers/cxl/core/mbox.c: note: in included file (through include/linux/gfp.h, include/linux/xarray.h, include/linux/list_lru.h, ...):
   include/linux/mmzone.h:2018:40: sparse: sparse: self-comparison always evaluates to false

vim +/cxl_mce_records +1465 drivers/cxl/core/mbox.c

  1464	
> 1465	DEFINE_XARRAY(cxl_mce_records);
  1466
Jonathan Cameron Aug. 27, 2024, 3:52 p.m. UTC | #4
On Thu,  8 Aug 2024 23:13:28 +0800
Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote:

> Since CXL device is a memory device, while CPU is consuming a poison
> page of CXL device, it always triggers a MCE (via interrupt #18) and
> calls memory_failure() to handle POISON page, no matter which-First path
> is configured.  CXL device could also find and report the POISON, kernel
> now not only traces but also calls memory_failure() to handle it, which
> is marked as "NEW" in the figure blow.
> ```
> 1.  MCE (interrupt #18, while CPU consuming POISON)
>      -> do_machine_check()
>        -> mce_log()
>          -> notify chain (x86_mce_decoder_chain)
>            -> memory_failure() <---------------------------- EXISTS  
> 2.a FW-First (optional, CXL device proactively find&report)
>      -> CXL device -> Firmware
>        -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace  
>                                                   \-> memory_failure()
>                                                       ^----- NEW
> 2.b OS-First (optional, CXL device proactively find&report)
>      -> CXL device -> MSI
>        -> OS: CXL driver -> trace  
>                         \-> memory_failure()
>                             ^------------------------------- NEW
> ```
> 
> But in this way, the memory_failure() could be called twice or even at
> same time, as is shown in the figure above: (1.) and (2.a or 2.b),
> before the POISON page is cleared.  memory_failure() has it own mutex
> lock so it actually won't be called at same time and the later call
> could be avoided because HWPoison bit has been set.  However, assume
> such a scenario, "CXL device reports POISON error" triggers 1st call,
> user see it from log and want to clear the poison by executing `cxl
> clear-poison` command, and at the same time, a process tries to access
> this POISON page, which triggers MCE (it's the 2nd call).

Attempting to clear poison in a page that is online seems unwise.
Does that ever make sense today?

>  Since there
> is no lock between the 2nd call with clearing poison operation, race
> condition may happen, which may cause HWPoison bit of the page in an
> unknown state.

As long as that state is always wrong in the sense we think it's poisoned
when it isn't we don't care.
> 
> Thus, we have to avoid the 2nd call. This patch[2] introduces a new
> notifier_block into `x86_mce_decoder_chain` and a POISON cache list, to
> stop the 2nd call of memory_failure(). It checks whether the current
> poison page has been reported (if yes, stop the notifier chain, don't
> call the following memory_failure() to report again).
> 

If we do want to do this, it belongs in the generic code, not arch specific
part. Can we do similar in memory failure?

To RAS reviewers, this isn't a new problem unique to CXL. Does a solution
like this make sense in practice, or are we fine to always let two reports
for the same error get handled?


Jonathan
Shiyang Ruan Sept. 2, 2024, 2:19 p.m. UTC | #5
在 2024/8/27 23:52, Jonathan Cameron 写道:
> On Thu,  8 Aug 2024 23:13:28 +0800
> Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote:
> 
>> Since CXL device is a memory device, while CPU is consuming a poison
>> page of CXL device, it always triggers a MCE (via interrupt #18) and
>> calls memory_failure() to handle POISON page, no matter which-First path
>> is configured.  CXL device could also find and report the POISON, kernel
>> now not only traces but also calls memory_failure() to handle it, which
>> is marked as "NEW" in the figure blow.
>> ```
>> 1.  MCE (interrupt #18, while CPU consuming POISON)
>>       -> do_machine_check()
>>         -> mce_log()
>>           -> notify chain (x86_mce_decoder_chain)
>>             -> memory_failure() <---------------------------- EXISTS
>> 2.a FW-First (optional, CXL device proactively find&report)
>>       -> CXL device -> Firmware
>>         -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace
>>                                                    \-> memory_failure()
>>                                                        ^----- NEW
>> 2.b OS-First (optional, CXL device proactively find&report)
>>       -> CXL device -> MSI
>>         -> OS: CXL driver -> trace
>>                          \-> memory_failure()
>>                              ^------------------------------- NEW
>> ```
>>
>> But in this way, the memory_failure() could be called twice or even at
>> same time, as is shown in the figure above: (1.) and (2.a or 2.b),
>> before the POISON page is cleared.  memory_failure() has it own mutex
>> lock so it actually won't be called at same time and the later call
>> could be avoided because HWPoison bit has been set.  However, assume
>> such a scenario, "CXL device reports POISON error" triggers 1st call,
>> user see it from log and want to clear the poison by executing `cxl
>> clear-poison` command, and at the same time, a process tries to access
>> this POISON page, which triggers MCE (it's the 2nd call).
> 
> Attempting to clear poison in a page that is online seems unwise.
> Does that ever make sense today?

To be honest, I am not sure about this.  Even if the error from CXL 
device is recoverable, we don't reuse it again?

> 
>>   Since there
>> is no lock between the 2nd call with clearing poison operation, race
>> condition may happen, which may cause HWPoison bit of the page in an
>> unknown state.
> 
> As long as that state is always wrong in the sense we think it's poisoned
> when it isn't we don't care.

The 2nd memory_failure() need this state to determine whether to 
continue its process or return.

>>
>> Thus, we have to avoid the 2nd call. This patch[2] introduces a new
>> notifier_block into `x86_mce_decoder_chain` and a POISON cache list, to
>> stop the 2nd call of memory_failure(). It checks whether the current
>> poison page has been reported (if yes, stop the notifier chain, don't
>> call the following memory_failure() to report again).
>>
> 
> If we do want to do this, it belongs in the generic code, not arch specific
> part. Can we do similar in memory failure?

Yes, I saw the build error.  Will fix this.

> 
> To RAS reviewers, this isn't a new problem unique to CXL. Does a solution
> like this make sense in practice, or are we fine to always let two reports
> for the same error get handled?
> 
> 
> Jonathan
> 
> 


--
Thanks,
Ruan.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 3ad29b128943..5da45e870858 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -182,6 +182,7 @@  enum mce_notifier_prios {
 	MCE_PRIO_NFIT,
 	MCE_PRIO_EXTLOG,
 	MCE_PRIO_UC,
+	MCE_PRIO_CXL,
 	MCE_PRIO_EARLY,
 	MCE_PRIO_CEC,
 	MCE_PRIO_HIGHEST = MCE_PRIO_CEC
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 0cb6ef2e6600..b21700428c35 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -4,6 +4,8 @@ 
 #include <linux/debugfs.h>
 #include <linux/ktime.h>
 #include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <asm/mce.h>
 #include <asm/unaligned.h>
 #include <cxlpci.h>
 #include <cxlmem.h>
@@ -925,6 +927,9 @@  void cxl_event_handle_record(struct cxl_memdev *cxlmd,
 		if (cxlr)
 			hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
 
+		if (hpa != ULLONG_MAX && cxl_mce_recorded(hpa))
+			return;
+
 		if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
 			trace_cxl_general_media(cxlmd, type, cxlr, hpa,
 						&evt->gen_media);
@@ -1457,6 +1462,112 @@  int cxl_poison_state_init(struct cxl_memdev_state *mds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, CXL);
 
+DEFINE_XARRAY(cxl_mce_records);
+
+bool cxl_mce_recorded(u64 hpa)
+{
+	XA_STATE(xas, &cxl_mce_records, hpa);
+	void *entry;
+
+	xas_lock_irq(&xas);
+	entry = xas_load(&xas);
+	if (entry) {
+		xas_unlock_irq(&xas);
+		return true;
+	}
+	entry = xa_mk_value(hpa);
+	xas_store(&xas, entry);
+	xas_unlock_irq(&xas);
+
+	return false;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mce_recorded, CXL);
+
+void cxl_mce_clear(u64 hpa)
+{
+	XA_STATE(xas, &cxl_mce_records, hpa);
+	void *entry;
+
+	xas_lock_irq(&xas);
+	entry = xas_load(&xas);
+	if (entry) {
+		xas_store(&xas, NULL);
+	}
+	xas_unlock_irq(&xas);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mce_clear, CXL);
+
+struct cxl_contains_hpa_context {
+	bool contains;
+	u64 hpa;
+};
+
+static int __cxl_contains_hpa(struct device *dev, void *arg)
+{
+	struct cxl_contains_hpa_context *ctx = arg;
+	struct cxl_endpoint_decoder *cxled;
+	struct range *range;
+	u64 hpa = ctx->hpa;
+
+	if (!is_endpoint_decoder(dev))
+		return 0;
+
+	cxled = to_cxl_endpoint_decoder(dev);
+	range = &cxled->cxld.hpa_range;
+
+	if (range->start <= hpa && hpa <= range->end) {
+		ctx->contains = true;
+		return 1;
+	}
+
+	return 0;
+}
+
+static bool cxl_contains_hpa(const struct cxl_memdev *cxlmd, u64 hpa)
+{
+	struct cxl_contains_hpa_context ctx = {
+		.contains = false,
+		.hpa = hpa,
+	};
+	struct cxl_port *port;
+
+	port = cxlmd->endpoint;
+	guard(rwsem_write)(&cxl_region_rwsem);
+	if (port && cxl_num_decoders_committed(port))
+		device_for_each_child(&port->dev, &ctx, __cxl_contains_hpa);
+
+	return ctx.contains;
+}
+
+static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
+			  void *data)
+{
+	struct mce *mce = (struct mce *)data;
+	struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state,
+						    mce_notifier);
+	u64 hpa;
+
+	if (!mce || !mce_usable_address(mce))
+		return NOTIFY_DONE;
+
+	hpa = mce->addr & MCI_ADDR_PHYSADDR;
+
+	/* Check if the PFN is located on this CXL device */
+	if (!pfn_valid(hpa >> PAGE_SHIFT) &&
+	    !cxl_contains_hpa(mds->cxlds.cxlmd, hpa))
+		return NOTIFY_DONE;
+
+	/*
+	 * Search PFN in the cxl_mce_records, if already exists, don't continue
+	 * to do memory_failure() to avoid a poison address being reported
+	 * more than once.
+	 */
+	if (cxl_mce_recorded(hpa))
+		return NOTIFY_STOP;
+	else
+		return NOTIFY_OK;
+}
+
 struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
 {
 	struct cxl_memdev_state *mds;
@@ -1476,6 +1587,10 @@  struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
 	mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
 	mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
 
+	mds->mce_notifier.notifier_call = cxl_handle_mce;
+	mds->mce_notifier.priority = MCE_PRIO_CXL;
+	mce_register_decode_chain(&mds->mce_notifier);
+
 	return mds;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_memdev_state_create, CXL);
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 0277726afd04..9d4ed4dc4d51 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -376,10 +376,14 @@  int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
 		goto out;
 
 	cxlr = cxl_dpa_to_region(cxlmd, dpa);
-	if (cxlr)
+	if (cxlr) {
+		u64 hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
+
+		cxl_mce_clear(hpa);
 		dev_warn_once(mds->cxlds.dev,
 			      "poison clear dpa:%#llx region: %s\n", dpa,
 			      dev_name(&cxlr->dev));
+	}
 
 	record = (struct cxl_poison_record) {
 		.address = cpu_to_le64(dpa),
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 5c4810dcbdeb..d2d906c26755 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -502,6 +502,7 @@  struct cxl_memdev_state {
 	struct cxl_fw_state fw;
 
 	struct rcuwait mbox_wait;
+	struct notifier_block mce_notifier;
 	int (*mbox_send)(struct cxl_memdev_state *mds,
 			 struct cxl_mbox_cmd *cmd);
 };
@@ -837,6 +838,8 @@  int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
 int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
 int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
+bool cxl_mce_recorded(u64 pfn);
+void cxl_mce_clear(u64 pfn);
 
 #ifdef CONFIG_CXL_SUSPEND
 void cxl_mem_active_inc(void);