Message ID | 20240808151328.707869-3-ruansy.fnst@fujitsu.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | cxl: add device reporting poison handler | expand |
Hi Shiyang, kernel test robot noticed the following build errors: [auto build test ERROR on tip/x86/core] [also build test ERROR on cxl/next linus/master v6.11-rc2 next-20240809] [cannot apply to cxl/pending] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Shiyang-Ruan/cxl-core-introduce-device-reporting-poison-hanlding/20240809-013658 base: tip/x86/core patch link: https://lore.kernel.org/r/20240808151328.707869-3-ruansy.fnst%40fujitsu.com patch subject: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device config: um-allyesconfig (https://download.01.org/0day-ci/archive/20240809/202408091537.p9RKx1R2-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240809/202408091537.p9RKx1R2-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408091537.p9RKx1R2-lkp@intel.com/ All error/warnings (new ones prefixed by >>): In file included from drivers/cxl/core/mbox.c:8: >> arch/x86/include/asm/mce.h:219:43: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration 219 | static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {} | ^~~~~~~~~~~ arch/x86/include/asm/mce.h:220:44: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration 220 | static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {} | ^~~~~~~~~~~ arch/x86/include/asm/mce.h:240:50: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration 240 | static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { } | ^~~~~~~~~~~ arch/x86/include/asm/mce.h:241:51: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration 241 | static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { } | ^~~~~~~~~~~ arch/x86/include/asm/mce.h:248:26: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration 248 | int mce_available(struct cpuinfo_x86 *c); | ^~~~~~~~~~~ arch/x86/include/asm/mce.h:355:48: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration 355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { } | ^~~~~~~~~~~ arch/x86/include/asm/mce.h:358:50: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration 358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); } | ^~~~~~~~~~~ arch/x86/include/asm/mce.h: In function 'mce_hygon_feature_init': >> arch/x86/include/asm/mce.h:358:103: error: passing argument 1 of 'mce_amd_feature_init' from incompatible pointer type [-Werror=incompatible-pointer-types] 358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); } | ^ | | | struct cpuinfo_x86 * arch/x86/include/asm/mce.h:355:61: note: expected 'struct cpuinfo_x86 *' but argument is of type 'struct cpuinfo_x86 *' 355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { } | ~~~~~~~~~~~~~~~~~~~~^ In file included from include/linux/container_of.h:5, from include/linux/list.h:5, from include/linux/key.h:14, from include/linux/security.h:27, from drivers/cxl/core/mbox.c:3: drivers/cxl/core/mbox.c: In function 'cxl_handle_mce': >> arch/x86/include/asm/mce.h:94:58: error: 'struct cpuinfo_um' has no member named 'x86_phys_bits' 94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0) | ^ include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO' 16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); }))) | ^ include/linux/bits.h:25:17: note: in expansion of macro '__is_constexpr' 25 | __is_constexpr((l) > (h)), (l) > (h), 0))) | ^~~~~~~~~~~~~~ include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK' 37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l)) | ^~~~~~~~~~~~~~~~~~~ arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL' 94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0) | ^~~~~~~~~~~ drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR' 1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR; | ^~~~~~~~~~~~~~~~~ >> arch/x86/include/asm/mce.h:94:58: error: 'struct cpuinfo_um' has no member named 'x86_phys_bits' 94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0) | ^ include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO' 16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); }))) | ^ include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK' 37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l)) | ^~~~~~~~~~~~~~~~~~~ arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL' 94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0) | ^~~~~~~~~~~ drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR' 1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR; | ^~~~~~~~~~~~~~~~~ include/linux/bits.h:24:28: error: first argument to '__builtin_choose_expr' not a constant 24 | (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \ | ^~~~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO' 16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); }))) | ^ include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK' 37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l)) | ^~~~~~~~~~~~~~~~~~~ arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL' 94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0) | ^~~~~~~~~~~ drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR' 1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR; | ^~~~~~~~~~~~~~~~~ include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant 16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); }))) | ^ include/linux/bits.h:24:10: note: in expansion of macro 'BUILD_BUG_ON_ZERO' 24 | (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \ | ^~~~~~~~~~~~~~~~~ include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK' 37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l)) | ^~~~~~~~~~~~~~~~~~~ arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL' 94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0) | ^~~~~~~~~~~ drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR' 1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR; | ^~~~~~~~~~~~~~~~~ In file included from include/linux/bits.h:7, from include/linux/ratelimit_types.h:5, from include/linux/printk.h:9, from include/asm-generic/bug.h:22, from ./arch/um/include/generated/asm/bug.h:1, from include/linux/bug.h:5, from include/linux/thread_info.h:13, from include/asm-generic/preempt.h:5, from ./arch/um/include/generated/asm/preempt.h:1, from include/linux/preempt.h:79, from include/linux/rcupdate.h:27, from include/linux/rbtree.h:24, from include/linux/key.h:15: >> arch/x86/include/asm/mce.h:94:58: error: 'struct cpuinfo_um' has no member named 'x86_phys_bits' 94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0) | ^ include/uapi/linux/bits.h:13:52: note: in definition of macro '__GENMASK_ULL' 13 | (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h)))) | ^ arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL' 94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0) | ^~~~~~~~~~~ drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR' 1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR; | ^~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/mce_amd_feature_init +358 arch/x86/include/asm/mce.h 4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 210 58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 211 #ifdef CONFIG_X86_MCE a2202aa29289db arch/x86/include/asm/mce.h Yong Wang 2009-11-10 212 int mcheck_init(void); 5e09954a9acc3b arch/x86/include/asm/mce.h Borislav Petkov 2009-10-16 213 void mcheck_cpu_init(struct cpuinfo_x86 *c); 8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 214 void mcheck_cpu_clear(struct cpuinfo_x86 *c); 4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 215 int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, 4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 216 u64 lapic_id); 58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 217 #else a2202aa29289db arch/x86/include/asm/mce.h Yong Wang 2009-11-10 218 static inline int mcheck_init(void) { return 0; } 5e09954a9acc3b arch/x86/include/asm/mce.h Borislav Petkov 2009-10-16 @219 static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {} 8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 220 static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {} 4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 221 static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, 4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 222 u64 lapic_id) { return -EINVAL; } 58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 223 #endif 58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 224 b5f2fa4ea00a17 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 225 void mce_setup(struct mce *m); e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 226 void mce_log(struct mce *m); d6126ef5f31ca5 arch/x86/include/asm/mce.h Greg Kroah-Hartman 2012-01-26 227 DECLARE_PER_CPU(struct device *, mce_device); e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 228 a0bc32b3cacf19 arch/x86/include/asm/mce.h Akshay Gupta 2020-08-28 229 /* Maximum number of MCA banks per CPU. */ a0bc32b3cacf19 arch/x86/include/asm/mce.h Akshay Gupta 2020-08-28 230 #define MAX_NR_BANKS 64 41fdff322e26c4 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 231 e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 232 #ifdef CONFIG_X86_MCE_INTEL e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 233 void mce_intel_feature_init(struct cpuinfo_x86 *c); 8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 234 void mce_intel_feature_clear(struct cpuinfo_x86 *c); 88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 235 void cmci_clear(void); 88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 236 void cmci_reenable(void); 7a0c819d28f5c9 arch/x86/include/asm/mce.h Srivatsa S. Bhat 2013-03-20 237 void cmci_rediscover(void); 88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 238 void cmci_recheck(void); e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 239 #else e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 240 static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { } 8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 241 static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { } 88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 242 static inline void cmci_clear(void) {} 88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 243 static inline void cmci_reenable(void) {} 7a0c819d28f5c9 arch/x86/include/asm/mce.h Srivatsa S. Bhat 2013-03-20 244 static inline void cmci_rediscover(void) {} 88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 245 static inline void cmci_recheck(void) {} e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 246 #endif e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 247 38736072d45488 arch/x86/include/asm/mce.h H. Peter Anvin 2009-05-28 248 int mce_available(struct cpuinfo_x86 *c); 2d1f406139ec20 arch/x86/include/asm/mce.h Borislav Petkov 2017-05-19 249 bool mce_is_memory_error(struct mce *m); 5d96c9342c23ee arch/x86/include/asm/mce.h Vishal Verma 2018-10-25 250 bool mce_is_correctable(struct mce *m); 1bae0cfe4a171c arch/x86/include/asm/mce.h Yazen Ghannam 2023-06-13 251 bool mce_usable_address(struct mce *m); 88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 252 01ca79f1411eae arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 253 DECLARE_PER_CPU(unsigned, mce_exception_count); ca84f69697da0f arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 254 DECLARE_PER_CPU(unsigned, mce_poll_count); 01ca79f1411eae arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 255 ee031c31d6381d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 256 typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS); ee031c31d6381d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 257 DECLARE_PER_CPU(mce_banks_t, mce_poll_banks); ee031c31d6381d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 258 b79109c3bbcf52 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 259 enum mcp_flags { 3f2f0680d1161d arch/x86/include/asm/mce.h Borislav Petkov 2015-01-13 260 MCP_TIMESTAMP = BIT(0), /* log time stamp */ 3f2f0680d1161d arch/x86/include/asm/mce.h Borislav Petkov 2015-01-13 261 MCP_UC = BIT(1), /* log uncorrected errors */ 3f2f0680d1161d arch/x86/include/asm/mce.h Borislav Petkov 2015-01-13 262 MCP_DONTLOG = BIT(2), /* only clear, don't log */ 3bff147b187d5d arch/x86/include/asm/mce.h Borislav Petkov 2021-08-23 263 MCP_QUEUE_LOG = BIT(3), /* only queue to genpool */ b79109c3bbcf52 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 264 }; 5b9d292ea87c83 arch/x86/include/asm/mce.h Yazen Ghannam 2024-05-23 265 5b9d292ea87c83 arch/x86/include/asm/mce.h Yazen Ghannam 2024-05-23 266 void machine_check_poll(enum mcp_flags flags, mce_banks_t *b); b79109c3bbcf52 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 267 9ff36ee9668ff4 arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 268 int mce_notify_irq(void); e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 269 ea149b36c7f511 arch/x86/include/asm/mce.h Andi Kleen 2009-04-29 270 DECLARE_PER_CPU(struct mce, injectm); 66f5ddf30a59f8 arch/x86/include/asm/mce.h Tony Luck 2011-11-03 271 c3d1fb567a634d arch/x86/include/asm/mce.h Naveen N Rao 2013-07-01 272 /* Disable CMCI/polling for MCA bank claimed by firmware */ c3d1fb567a634d arch/x86/include/asm/mce.h Naveen N Rao 2013-07-01 273 extern void mce_disable_bank(int bank); c3d1fb567a634d arch/x86/include/asm/mce.h Naveen N Rao 2013-07-01 274 58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 275 /* 58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 276 * Exception handler 58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 277 */ 8cd501c1facc15 arch/x86/include/asm/mce.h Thomas Gleixner 2020-02-25 278 void do_machine_check(struct pt_regs *pt_regs); 58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 279 58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 280 /* 58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 281 * Threshold handler 58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 282 */ b276268631af3a arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 283 extern void (*mce_threshold_vector)(void); b276268631af3a arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 284 24fd78a81f6d3f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2015-05-06 285 /* Deferred error interrupt handler */ 24fd78a81f6d3f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2015-05-06 286 extern void (*deferred_error_int_vector)(void); 24fd78a81f6d3f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2015-05-06 287 d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 288 /* d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 289 * Used by APEI to report memory error via /dev/mcelog d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 290 */ d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 291 d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 292 struct cper_sec_mem_err; d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 293 extern void apei_mce_report_mem_error(int corrected, d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 294 struct cper_sec_mem_err *mem_err); d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 295 be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 296 /* be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 297 * Enumerate new IP types and HWID values in AMD processors which support be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 298 * Scalable MCA. be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 299 */ be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 300 #ifdef CONFIG_X86_MCE_AMD 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 301 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 302 /* These may be used by multiple smca_hwid_mcatypes */ 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 303 enum smca_bank_types { 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 304 SMCA_LS = 0, /* Load Store */ 94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 305 SMCA_LS_V2, 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 306 SMCA_IF, /* Instruction Fetch */ 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 307 SMCA_L2_CACHE, /* L2 Cache */ 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 308 SMCA_DE, /* Decoder Unit */ 68627a697c1959 arch/x86/include/asm/mce.h Yazen Ghannam 2018-02-21 309 SMCA_RESERVED, /* Reserved */ 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 310 SMCA_EX, /* Execution Unit */ 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 311 SMCA_FP, /* Floating Point */ 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 312 SMCA_L3_CACHE, /* L3 Cache */ 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 313 SMCA_CS, /* Coherent Slave */ 94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 314 SMCA_CS_V2, 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 315 SMCA_PIE, /* Power, Interrupts, etc. */ be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 316 SMCA_UMC, /* Unified Memory Controller */ 94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 317 SMCA_UMC_V2, 47b744ea5e3cf8 arch/x86/include/asm/mce.h Muralidhara M K 2023-11-02 318 SMCA_MA_LLC, /* Memory Attached Last Level Cache */ be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 319 SMCA_PB, /* Parameter Block */ be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 320 SMCA_PSP, /* Platform Security Processor */ 94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 321 SMCA_PSP_V2, be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 322 SMCA_SMU, /* System Management Unit */ 94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 323 SMCA_SMU_V2, cbfa447edd6a38 arch/x86/include/asm/mce.h Yazen Ghannam 2019-02-01 324 SMCA_MP5, /* Microprocessor 5 Unit */ 5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 325 SMCA_MPDMA, /* MPDMA Unit */ cbfa447edd6a38 arch/x86/include/asm/mce.h Yazen Ghannam 2019-02-01 326 SMCA_NBIO, /* Northbridge IO Unit */ cbfa447edd6a38 arch/x86/include/asm/mce.h Yazen Ghannam 2019-02-01 327 SMCA_PCIE, /* PCI Express Unit */ 94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 328 SMCA_PCIE_V2, 94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 329 SMCA_XGMI_PCS, /* xGMI PCS Unit */ 5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 330 SMCA_NBIF, /* NBIF Unit */ 5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 331 SMCA_SHUB, /* System HUB Unit */ 5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 332 SMCA_SATA, /* SATA Unit */ 5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 333 SMCA_USB, /* USB Unit */ 47b744ea5e3cf8 arch/x86/include/asm/mce.h Muralidhara M K 2023-11-02 334 SMCA_USR_DP, /* Ultra Short Reach Data Plane Controller */ 47b744ea5e3cf8 arch/x86/include/asm/mce.h Muralidhara M K 2023-11-02 335 SMCA_USR_CP, /* Ultra Short Reach Control Plane Controller */ 5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 336 SMCA_GMI_PCS, /* GMI PCS Unit */ 94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 337 SMCA_XGMI_PHY, /* xGMI PHY Unit */ 94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 338 SMCA_WAFL_PHY, /* WAFL PHY Unit */ 5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 339 SMCA_GMI_PHY, /* GMI PHY Unit */ 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 340 N_SMCA_BANK_TYPES be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 341 }; be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 342 c6708d50f166be arch/x86/include/asm/mce.h Yazen Ghannam 2017-12-18 343 extern bool amd_mce_is_memory_error(struct mce *m); e71c3978d6f976 arch/x86/include/asm/mce.h Linus Torvalds 2016-12-12 344 4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 345 extern int mce_threshold_create_device(unsigned int cpu); 4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 346 extern int mce_threshold_remove_device(unsigned int cpu); e71c3978d6f976 arch/x86/include/asm/mce.h Linus Torvalds 2016-12-12 347 9308fd4074551f arch/x86/include/asm/mce.h Yazen Ghannam 2019-03-22 348 void mce_amd_feature_init(struct cpuinfo_x86 *c); 91f75eb481cfae arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 349 enum smca_bank_types smca_get_bank_type(unsigned int cpu, unsigned int bank); 4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 350 #else 5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 351 4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 352 static inline int mce_threshold_create_device(unsigned int cpu) { return 0; }; 4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 353 static inline int mce_threshold_remove_device(unsigned int cpu) { return 0; }; c6708d50f166be arch/x86/include/asm/mce.h Yazen Ghannam 2017-12-18 354 static inline bool amd_mce_is_memory_error(struct mce *m) { return false; }; 9308fd4074551f arch/x86/include/asm/mce.h Yazen Ghannam 2019-03-22 355 static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { } be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 356 #endif be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 357 9308fd4074551f arch/x86/include/asm/mce.h Yazen Ghannam 2019-03-22 @358 static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); } e9c2a283e7d9d4 arch/x86/include/asm/mce.h Arnd Bergmann 2023-05-16 359
Hi Shiyang, kernel test robot noticed the following build errors: [auto build test ERROR on tip/x86/core] [also build test ERROR on cxl/next linus/master v6.11-rc2 next-20240809] [cannot apply to cxl/pending] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Shiyang-Ruan/cxl-core-introduce-device-reporting-poison-hanlding/20240809-013658 base: tip/x86/core patch link: https://lore.kernel.org/r/20240808151328.707869-3-ruansy.fnst%40fujitsu.com patch subject: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device config: um-allmodconfig (https://download.01.org/0day-ci/archive/20240809/202408091543.UNFvPFFl-lkp@intel.com/config) compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project f86594788ce93b696675c94f54016d27a6c21d18) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240809/202408091543.UNFvPFFl-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408091543.UNFvPFFl-lkp@intel.com/ All error/warnings (new ones prefixed by >>): In file included from drivers/cxl/core/mbox.c:3: In file included from include/linux/security.h:33: In file included from include/linux/mm.h:2228: include/linux/vmstat.h:514:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 514 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ In file included from drivers/cxl/core/mbox.c:3: In file included from include/linux/security.h:35: In file included from include/linux/bpf.h:31: In file included from include/linux/memcontrol.h:13: In file included from include/linux/cgroup.h:25: In file included from include/linux/kernel_stat.h:8: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from arch/um/include/asm/hardirq.h:5: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:14: In file included from arch/um/include/asm/io.h:24: include/asm-generic/io.h:548:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 548 | val = __raw_readb(PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:561:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 561 | val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr)); | ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu' 37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x)) | ^ In file included from drivers/cxl/core/mbox.c:3: In file included from include/linux/security.h:35: In file included from include/linux/bpf.h:31: In file included from include/linux/memcontrol.h:13: In file included from include/linux/cgroup.h:25: In file included from include/linux/kernel_stat.h:8: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from arch/um/include/asm/hardirq.h:5: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:14: In file included from arch/um/include/asm/io.h:24: include/asm-generic/io.h:574:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 574 | val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr)); | ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu' 35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x)) | ^ In file included from drivers/cxl/core/mbox.c:3: In file included from include/linux/security.h:35: In file included from include/linux/bpf.h:31: In file included from include/linux/memcontrol.h:13: In file included from include/linux/cgroup.h:25: In file included from include/linux/kernel_stat.h:8: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:11: In file included from arch/um/include/asm/hardirq.h:5: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:14: In file included from arch/um/include/asm/io.h:24: include/asm-generic/io.h:585:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 585 | __raw_writeb(value, PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:595:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 595 | __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:605:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 605 | __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr); | ~~~~~~~~~~ ^ include/asm-generic/io.h:693:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 693 | readsb(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:701:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 701 | readsw(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:709:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 709 | readsl(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:718:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 718 | writesb(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:727:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 727 | writesw(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ include/asm-generic/io.h:736:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] 736 | writesl(PCI_IOBASE + addr, buffer, count); | ~~~~~~~~~~ ^ In file included from drivers/cxl/core/mbox.c:8: >> arch/x86/include/asm/mce.h:219:43: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility] 219 | static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {} | ^ arch/x86/include/asm/mce.h:220:44: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility] 220 | static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {} | ^ arch/x86/include/asm/mce.h:240:50: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility] 240 | static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { } | ^ arch/x86/include/asm/mce.h:241:51: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility] 241 | static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { } | ^ arch/x86/include/asm/mce.h:248:26: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility] 248 | int mce_available(struct cpuinfo_x86 *c); | ^ arch/x86/include/asm/mce.h:355:48: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility] 355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { } | ^ arch/x86/include/asm/mce.h:358:50: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility] 358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); } | ^ >> arch/x86/include/asm/mce.h:358:96: error: incompatible pointer types passing 'struct cpuinfo_x86 *' to parameter of type 'struct cpuinfo_x86 *' [-Werror,-Wincompatible-pointer-types] 358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); } | ^ arch/x86/include/asm/mce.h:355:61: note: passing argument to parameter 'c' here 355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { } | ^ >> drivers/cxl/core/mbox.c:1553:20: error: no member named 'x86_phys_bits' in 'struct cpuinfo_um' 1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR; | ^~~~~~~~~~~~~~~~~ arch/x86/include/asm/mce.h:94:53: note: expanded from macro 'MCI_ADDR_PHYSADDR' 94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0) | ~~~~~~~~~~~~~ ^ include/linux/bits.h:37:23: note: expanded from macro 'GENMASK_ULL' 37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l)) | ^ include/linux/bits.h:25:25: note: expanded from macro 'GENMASK_INPUT_CHECK' 25 | __is_constexpr((l) > (h)), (l) > (h), 0))) | ^ include/linux/compiler.h:290:48: note: expanded from macro '__is_constexpr' 290 | (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8))) | ^ include/linux/build_bug.h:16:62: note: expanded from macro 'BUILD_BUG_ON_ZERO' 16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); }))) | ^ >> drivers/cxl/core/mbox.c:1553:20: error: no member named 'x86_phys_bits' in 'struct cpuinfo_um' 1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR; | ^~~~~~~~~~~~~~~~~ arch/x86/include/asm/mce.h:94:53: note: expanded from macro 'MCI_ADDR_PHYSADDR' 94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0) | ~~~~~~~~~~~~~ ^ include/linux/bits.h:37:23: note: expanded from macro 'GENMASK_ULL' 37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l)) | ^ include/linux/bits.h:25:37: note: expanded from macro 'GENMASK_INPUT_CHECK' 25 | __is_constexpr((l) > (h)), (l) > (h), 0))) | ^ include/linux/build_bug.h:16:62: note: expanded from macro 'BUILD_BUG_ON_ZERO' 16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); }))) | ^ >> drivers/cxl/core/mbox.c:1553:20: error: no member named 'x86_phys_bits' in 'struct cpuinfo_um' 1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR; | ^~~~~~~~~~~~~~~~~ arch/x86/include/asm/mce.h:94:53: note: expanded from macro 'MCI_ADDR_PHYSADDR' 94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0) | ~~~~~~~~~~~~~ ^ include/linux/bits.h:37:45: note: expanded from macro 'GENMASK_ULL' 37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l)) | ^ include/uapi/linux/bits.h:13:52: note: expanded from macro '__GENMASK_ULL' 13 | (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h)))) | ^ 20 warnings and 4 errors generated. vim +358 arch/x86/include/asm/mce.h 4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 210 58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 211 #ifdef CONFIG_X86_MCE a2202aa29289db6 arch/x86/include/asm/mce.h Yong Wang 2009-11-10 212 int mcheck_init(void); 5e09954a9acc3b4 arch/x86/include/asm/mce.h Borislav Petkov 2009-10-16 213 void mcheck_cpu_init(struct cpuinfo_x86 *c); 8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 214 void mcheck_cpu_clear(struct cpuinfo_x86 *c); 4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 215 int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, 4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 216 u64 lapic_id); 58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 217 #else a2202aa29289db6 arch/x86/include/asm/mce.h Yong Wang 2009-11-10 218 static inline int mcheck_init(void) { return 0; } 5e09954a9acc3b4 arch/x86/include/asm/mce.h Borislav Petkov 2009-10-16 @219 static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {} 8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 220 static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {} 4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 221 static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, 4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 222 u64 lapic_id) { return -EINVAL; } 58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 223 #endif 58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 224 b5f2fa4ea00a179 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 225 void mce_setup(struct mce *m); e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 226 void mce_log(struct mce *m); d6126ef5f31ca54 arch/x86/include/asm/mce.h Greg Kroah-Hartman 2012-01-26 227 DECLARE_PER_CPU(struct device *, mce_device); e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 228 a0bc32b3cacf194 arch/x86/include/asm/mce.h Akshay Gupta 2020-08-28 229 /* Maximum number of MCA banks per CPU. */ a0bc32b3cacf194 arch/x86/include/asm/mce.h Akshay Gupta 2020-08-28 230 #define MAX_NR_BANKS 64 41fdff322e26c4a arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 231 e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 232 #ifdef CONFIG_X86_MCE_INTEL e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 233 void mce_intel_feature_init(struct cpuinfo_x86 *c); 8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 234 void mce_intel_feature_clear(struct cpuinfo_x86 *c); 88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 235 void cmci_clear(void); 88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 236 void cmci_reenable(void); 7a0c819d28f5c91 arch/x86/include/asm/mce.h Srivatsa S. Bhat 2013-03-20 237 void cmci_rediscover(void); 88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 238 void cmci_recheck(void); e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 239 #else e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 240 static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { } 8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 241 static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { } 88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 242 static inline void cmci_clear(void) {} 88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 243 static inline void cmci_reenable(void) {} 7a0c819d28f5c91 arch/x86/include/asm/mce.h Srivatsa S. Bhat 2013-03-20 244 static inline void cmci_rediscover(void) {} 88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 245 static inline void cmci_recheck(void) {} e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 246 #endif e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 247 38736072d45488f arch/x86/include/asm/mce.h H. Peter Anvin 2009-05-28 248 int mce_available(struct cpuinfo_x86 *c); 2d1f406139ec203 arch/x86/include/asm/mce.h Borislav Petkov 2017-05-19 249 bool mce_is_memory_error(struct mce *m); 5d96c9342c23ee1 arch/x86/include/asm/mce.h Vishal Verma 2018-10-25 250 bool mce_is_correctable(struct mce *m); 1bae0cfe4a171cc arch/x86/include/asm/mce.h Yazen Ghannam 2023-06-13 251 bool mce_usable_address(struct mce *m); 88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 252 01ca79f1411eae2 arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 253 DECLARE_PER_CPU(unsigned, mce_exception_count); ca84f69697da0f0 arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 254 DECLARE_PER_CPU(unsigned, mce_poll_count); 01ca79f1411eae2 arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 255 ee031c31d6381d0 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 256 typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS); ee031c31d6381d0 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 257 DECLARE_PER_CPU(mce_banks_t, mce_poll_banks); ee031c31d6381d0 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 258 b79109c3bbcf52c arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 259 enum mcp_flags { 3f2f0680d1161df arch/x86/include/asm/mce.h Borislav Petkov 2015-01-13 260 MCP_TIMESTAMP = BIT(0), /* log time stamp */ 3f2f0680d1161df arch/x86/include/asm/mce.h Borislav Petkov 2015-01-13 261 MCP_UC = BIT(1), /* log uncorrected errors */ 3f2f0680d1161df arch/x86/include/asm/mce.h Borislav Petkov 2015-01-13 262 MCP_DONTLOG = BIT(2), /* only clear, don't log */ 3bff147b187d5df arch/x86/include/asm/mce.h Borislav Petkov 2021-08-23 263 MCP_QUEUE_LOG = BIT(3), /* only queue to genpool */ b79109c3bbcf52c arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 264 }; 5b9d292ea87c836 arch/x86/include/asm/mce.h Yazen Ghannam 2024-05-23 265 5b9d292ea87c836 arch/x86/include/asm/mce.h Yazen Ghannam 2024-05-23 266 void machine_check_poll(enum mcp_flags flags, mce_banks_t *b); b79109c3bbcf52c arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 267 9ff36ee9668ff41 arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 268 int mce_notify_irq(void); e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 269 ea149b36c7f511d arch/x86/include/asm/mce.h Andi Kleen 2009-04-29 270 DECLARE_PER_CPU(struct mce, injectm); 66f5ddf30a59f81 arch/x86/include/asm/mce.h Tony Luck 2011-11-03 271 c3d1fb567a634dc arch/x86/include/asm/mce.h Naveen N Rao 2013-07-01 272 /* Disable CMCI/polling for MCA bank claimed by firmware */ c3d1fb567a634dc arch/x86/include/asm/mce.h Naveen N Rao 2013-07-01 273 extern void mce_disable_bank(int bank); c3d1fb567a634dc arch/x86/include/asm/mce.h Naveen N Rao 2013-07-01 274 58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 275 /* 58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 276 * Exception handler 58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 277 */ 8cd501c1facc159 arch/x86/include/asm/mce.h Thomas Gleixner 2020-02-25 278 void do_machine_check(struct pt_regs *pt_regs); 58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 279 58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 280 /* 58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 281 * Threshold handler 58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 282 */ b276268631af3a1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 283 extern void (*mce_threshold_vector)(void); b276268631af3a1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 284 24fd78a81f6d3fe arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2015-05-06 285 /* Deferred error interrupt handler */ 24fd78a81f6d3fe arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2015-05-06 286 extern void (*deferred_error_int_vector)(void); 24fd78a81f6d3fe arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2015-05-06 287 d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 288 /* d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 289 * Used by APEI to report memory error via /dev/mcelog d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 290 */ d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 291 d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 292 struct cper_sec_mem_err; d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 293 extern void apei_mce_report_mem_error(int corrected, d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 294 struct cper_sec_mem_err *mem_err); d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 295 be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 296 /* be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 297 * Enumerate new IP types and HWID values in AMD processors which support be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 298 * Scalable MCA. be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 299 */ be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 300 #ifdef CONFIG_X86_MCE_AMD 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 301 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 302 /* These may be used by multiple smca_hwid_mcatypes */ 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 303 enum smca_bank_types { 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 304 SMCA_LS = 0, /* Load Store */ 94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 305 SMCA_LS_V2, 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 306 SMCA_IF, /* Instruction Fetch */ 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 307 SMCA_L2_CACHE, /* L2 Cache */ 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 308 SMCA_DE, /* Decoder Unit */ 68627a697c19593 arch/x86/include/asm/mce.h Yazen Ghannam 2018-02-21 309 SMCA_RESERVED, /* Reserved */ 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 310 SMCA_EX, /* Execution Unit */ 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 311 SMCA_FP, /* Floating Point */ 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 312 SMCA_L3_CACHE, /* L3 Cache */ 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 313 SMCA_CS, /* Coherent Slave */ 94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 314 SMCA_CS_V2, 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 315 SMCA_PIE, /* Power, Interrupts, etc. */ be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 316 SMCA_UMC, /* Unified Memory Controller */ 94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 317 SMCA_UMC_V2, 47b744ea5e3cf85 arch/x86/include/asm/mce.h Muralidhara M K 2023-11-02 318 SMCA_MA_LLC, /* Memory Attached Last Level Cache */ be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 319 SMCA_PB, /* Parameter Block */ be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 320 SMCA_PSP, /* Platform Security Processor */ 94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 321 SMCA_PSP_V2, be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 322 SMCA_SMU, /* System Management Unit */ 94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 323 SMCA_SMU_V2, cbfa447edd6a382 arch/x86/include/asm/mce.h Yazen Ghannam 2019-02-01 324 SMCA_MP5, /* Microprocessor 5 Unit */ 5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 325 SMCA_MPDMA, /* MPDMA Unit */ cbfa447edd6a382 arch/x86/include/asm/mce.h Yazen Ghannam 2019-02-01 326 SMCA_NBIO, /* Northbridge IO Unit */ cbfa447edd6a382 arch/x86/include/asm/mce.h Yazen Ghannam 2019-02-01 327 SMCA_PCIE, /* PCI Express Unit */ 94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 328 SMCA_PCIE_V2, 94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 329 SMCA_XGMI_PCS, /* xGMI PCS Unit */ 5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 330 SMCA_NBIF, /* NBIF Unit */ 5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 331 SMCA_SHUB, /* System HUB Unit */ 5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 332 SMCA_SATA, /* SATA Unit */ 5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 333 SMCA_USB, /* USB Unit */ 47b744ea5e3cf85 arch/x86/include/asm/mce.h Muralidhara M K 2023-11-02 334 SMCA_USR_DP, /* Ultra Short Reach Data Plane Controller */ 47b744ea5e3cf85 arch/x86/include/asm/mce.h Muralidhara M K 2023-11-02 335 SMCA_USR_CP, /* Ultra Short Reach Control Plane Controller */ 5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 336 SMCA_GMI_PCS, /* GMI PCS Unit */ 94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 337 SMCA_XGMI_PHY, /* xGMI PHY Unit */ 94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 338 SMCA_WAFL_PHY, /* WAFL PHY Unit */ 5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 339 SMCA_GMI_PHY, /* GMI PHY Unit */ 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 340 N_SMCA_BANK_TYPES be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 341 }; be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 342 c6708d50f166bea arch/x86/include/asm/mce.h Yazen Ghannam 2017-12-18 343 extern bool amd_mce_is_memory_error(struct mce *m); e71c3978d6f9765 arch/x86/include/asm/mce.h Linus Torvalds 2016-12-12 344 4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 345 extern int mce_threshold_create_device(unsigned int cpu); 4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 346 extern int mce_threshold_remove_device(unsigned int cpu); e71c3978d6f9765 arch/x86/include/asm/mce.h Linus Torvalds 2016-12-12 347 9308fd4074551f2 arch/x86/include/asm/mce.h Yazen Ghannam 2019-03-22 348 void mce_amd_feature_init(struct cpuinfo_x86 *c); 91f75eb481cfaee arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 349 enum smca_bank_types smca_get_bank_type(unsigned int cpu, unsigned int bank); 4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 350 #else 5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 351 4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 352 static inline int mce_threshold_create_device(unsigned int cpu) { return 0; }; 4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 353 static inline int mce_threshold_remove_device(unsigned int cpu) { return 0; }; c6708d50f166bea arch/x86/include/asm/mce.h Yazen Ghannam 2017-12-18 354 static inline bool amd_mce_is_memory_error(struct mce *m) { return false; }; 9308fd4074551f2 arch/x86/include/asm/mce.h Yazen Ghannam 2019-03-22 355 static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { } be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 356 #endif be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 357 9308fd4074551f2 arch/x86/include/asm/mce.h Yazen Ghannam 2019-03-22 @358 static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); } e9c2a283e7d9d4e arch/x86/include/asm/mce.h Arnd Bergmann 2023-05-16 359
Hi Shiyang, kernel test robot noticed the following build warnings: [auto build test WARNING on tip/x86/core] [also build test WARNING on cxl/next linus/master v6.11-rc2 next-20240809] [cannot apply to cxl/pending] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Shiyang-Ruan/cxl-core-introduce-device-reporting-poison-hanlding/20240809-013658 base: tip/x86/core patch link: https://lore.kernel.org/r/20240808151328.707869-3-ruansy.fnst%40fujitsu.com patch subject: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device config: x86_64-randconfig-121-20240809 (https://download.01.org/0day-ci/archive/20240809/202408091914.TFbjPuNQ-lkp@intel.com/config) compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240809/202408091914.TFbjPuNQ-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202408091914.TFbjPuNQ-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) >> drivers/cxl/core/mbox.c:1465:1: sparse: sparse: symbol 'cxl_mce_records' was not declared. Should it be static? drivers/cxl/core/mbox.c: note: in included file (through include/linux/gfp.h, include/linux/xarray.h, include/linux/list_lru.h, ...): include/linux/mmzone.h:2018:40: sparse: sparse: self-comparison always evaluates to false vim +/cxl_mce_records +1465 drivers/cxl/core/mbox.c 1464 > 1465 DEFINE_XARRAY(cxl_mce_records); 1466
On Thu, 8 Aug 2024 23:13:28 +0800 Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote: > Since CXL device is a memory device, while CPU is consuming a poison > page of CXL device, it always triggers a MCE (via interrupt #18) and > calls memory_failure() to handle POISON page, no matter which-First path > is configured. CXL device could also find and report the POISON, kernel > now not only traces but also calls memory_failure() to handle it, which > is marked as "NEW" in the figure blow. > ``` > 1. MCE (interrupt #18, while CPU consuming POISON) > -> do_machine_check() > -> mce_log() > -> notify chain (x86_mce_decoder_chain) > -> memory_failure() <---------------------------- EXISTS > 2.a FW-First (optional, CXL device proactively find&report) > -> CXL device -> Firmware > -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace > \-> memory_failure() > ^----- NEW > 2.b OS-First (optional, CXL device proactively find&report) > -> CXL device -> MSI > -> OS: CXL driver -> trace > \-> memory_failure() > ^------------------------------- NEW > ``` > > But in this way, the memory_failure() could be called twice or even at > same time, as is shown in the figure above: (1.) and (2.a or 2.b), > before the POISON page is cleared. memory_failure() has it own mutex > lock so it actually won't be called at same time and the later call > could be avoided because HWPoison bit has been set. However, assume > such a scenario, "CXL device reports POISON error" triggers 1st call, > user see it from log and want to clear the poison by executing `cxl > clear-poison` command, and at the same time, a process tries to access > this POISON page, which triggers MCE (it's the 2nd call). Attempting to clear poison in a page that is online seems unwise. Does that ever make sense today? > Since there > is no lock between the 2nd call with clearing poison operation, race > condition may happen, which may cause HWPoison bit of the page in an > unknown state. As long as that state is always wrong in the sense we think it's poisoned when it isn't we don't care. > > Thus, we have to avoid the 2nd call. This patch[2] introduces a new > notifier_block into `x86_mce_decoder_chain` and a POISON cache list, to > stop the 2nd call of memory_failure(). It checks whether the current > poison page has been reported (if yes, stop the notifier chain, don't > call the following memory_failure() to report again). > If we do want to do this, it belongs in the generic code, not arch specific part. Can we do similar in memory failure? To RAS reviewers, this isn't a new problem unique to CXL. Does a solution like this make sense in practice, or are we fine to always let two reports for the same error get handled? Jonathan
在 2024/8/27 23:52, Jonathan Cameron 写道: > On Thu, 8 Aug 2024 23:13:28 +0800 > Shiyang Ruan <ruansy.fnst@fujitsu.com> wrote: > >> Since CXL device is a memory device, while CPU is consuming a poison >> page of CXL device, it always triggers a MCE (via interrupt #18) and >> calls memory_failure() to handle POISON page, no matter which-First path >> is configured. CXL device could also find and report the POISON, kernel >> now not only traces but also calls memory_failure() to handle it, which >> is marked as "NEW" in the figure blow. >> ``` >> 1. MCE (interrupt #18, while CPU consuming POISON) >> -> do_machine_check() >> -> mce_log() >> -> notify chain (x86_mce_decoder_chain) >> -> memory_failure() <---------------------------- EXISTS >> 2.a FW-First (optional, CXL device proactively find&report) >> -> CXL device -> Firmware >> -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace >> \-> memory_failure() >> ^----- NEW >> 2.b OS-First (optional, CXL device proactively find&report) >> -> CXL device -> MSI >> -> OS: CXL driver -> trace >> \-> memory_failure() >> ^------------------------------- NEW >> ``` >> >> But in this way, the memory_failure() could be called twice or even at >> same time, as is shown in the figure above: (1.) and (2.a or 2.b), >> before the POISON page is cleared. memory_failure() has it own mutex >> lock so it actually won't be called at same time and the later call >> could be avoided because HWPoison bit has been set. However, assume >> such a scenario, "CXL device reports POISON error" triggers 1st call, >> user see it from log and want to clear the poison by executing `cxl >> clear-poison` command, and at the same time, a process tries to access >> this POISON page, which triggers MCE (it's the 2nd call). > > Attempting to clear poison in a page that is online seems unwise. > Does that ever make sense today? To be honest, I am not sure about this. Even if the error from CXL device is recoverable, we don't reuse it again? > >> Since there >> is no lock between the 2nd call with clearing poison operation, race >> condition may happen, which may cause HWPoison bit of the page in an >> unknown state. > > As long as that state is always wrong in the sense we think it's poisoned > when it isn't we don't care. The 2nd memory_failure() need this state to determine whether to continue its process or return. >> >> Thus, we have to avoid the 2nd call. This patch[2] introduces a new >> notifier_block into `x86_mce_decoder_chain` and a POISON cache list, to >> stop the 2nd call of memory_failure(). It checks whether the current >> poison page has been reported (if yes, stop the notifier chain, don't >> call the following memory_failure() to report again). >> > > If we do want to do this, it belongs in the generic code, not arch specific > part. Can we do similar in memory failure? Yes, I saw the build error. Will fix this. > > To RAS reviewers, this isn't a new problem unique to CXL. Does a solution > like this make sense in practice, or are we fine to always let two reports > for the same error get handled? > > > Jonathan > > -- Thanks, Ruan.
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 3ad29b128943..5da45e870858 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -182,6 +182,7 @@ enum mce_notifier_prios { MCE_PRIO_NFIT, MCE_PRIO_EXTLOG, MCE_PRIO_UC, + MCE_PRIO_CXL, MCE_PRIO_EARLY, MCE_PRIO_CEC, MCE_PRIO_HIGHEST = MCE_PRIO_CEC diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 0cb6ef2e6600..b21700428c35 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -4,6 +4,8 @@ #include <linux/debugfs.h> #include <linux/ktime.h> #include <linux/mutex.h> +#include <linux/notifier.h> +#include <asm/mce.h> #include <asm/unaligned.h> #include <cxlpci.h> #include <cxlmem.h> @@ -925,6 +927,9 @@ void cxl_event_handle_record(struct cxl_memdev *cxlmd, if (cxlr) hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa); + if (hpa != ULLONG_MAX && cxl_mce_recorded(hpa)) + return; + if (event_type == CXL_CPER_EVENT_GEN_MEDIA) { trace_cxl_general_media(cxlmd, type, cxlr, hpa, &evt->gen_media); @@ -1457,6 +1462,112 @@ int cxl_poison_state_init(struct cxl_memdev_state *mds) } EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, CXL); +DEFINE_XARRAY(cxl_mce_records); + +bool cxl_mce_recorded(u64 hpa) +{ + XA_STATE(xas, &cxl_mce_records, hpa); + void *entry; + + xas_lock_irq(&xas); + entry = xas_load(&xas); + if (entry) { + xas_unlock_irq(&xas); + return true; + } + entry = xa_mk_value(hpa); + xas_store(&xas, entry); + xas_unlock_irq(&xas); + + return false; +} +EXPORT_SYMBOL_NS_GPL(cxl_mce_recorded, CXL); + +void cxl_mce_clear(u64 hpa) +{ + XA_STATE(xas, &cxl_mce_records, hpa); + void *entry; + + xas_lock_irq(&xas); + entry = xas_load(&xas); + if (entry) { + xas_store(&xas, NULL); + } + xas_unlock_irq(&xas); +} +EXPORT_SYMBOL_NS_GPL(cxl_mce_clear, CXL); + +struct cxl_contains_hpa_context { + bool contains; + u64 hpa; +}; + +static int __cxl_contains_hpa(struct device *dev, void *arg) +{ + struct cxl_contains_hpa_context *ctx = arg; + struct cxl_endpoint_decoder *cxled; + struct range *range; + u64 hpa = ctx->hpa; + + if (!is_endpoint_decoder(dev)) + return 0; + + cxled = to_cxl_endpoint_decoder(dev); + range = &cxled->cxld.hpa_range; + + if (range->start <= hpa && hpa <= range->end) { + ctx->contains = true; + return 1; + } + + return 0; +} + +static bool cxl_contains_hpa(const struct cxl_memdev *cxlmd, u64 hpa) +{ + struct cxl_contains_hpa_context ctx = { + .contains = false, + .hpa = hpa, + }; + struct cxl_port *port; + + port = cxlmd->endpoint; + guard(rwsem_write)(&cxl_region_rwsem); + if (port && cxl_num_decoders_committed(port)) + device_for_each_child(&port->dev, &ctx, __cxl_contains_hpa); + + return ctx.contains; +} + +static int cxl_handle_mce(struct notifier_block *nb, unsigned long val, + void *data) +{ + struct mce *mce = (struct mce *)data; + struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state, + mce_notifier); + u64 hpa; + + if (!mce || !mce_usable_address(mce)) + return NOTIFY_DONE; + + hpa = mce->addr & MCI_ADDR_PHYSADDR; + + /* Check if the PFN is located on this CXL device */ + if (!pfn_valid(hpa >> PAGE_SHIFT) && + !cxl_contains_hpa(mds->cxlds.cxlmd, hpa)) + return NOTIFY_DONE; + + /* + * Search PFN in the cxl_mce_records, if already exists, don't continue + * to do memory_failure() to avoid a poison address being reported + * more than once. + */ + if (cxl_mce_recorded(hpa)) + return NOTIFY_STOP; + else + return NOTIFY_OK; +} + struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) { struct cxl_memdev_state *mds; @@ -1476,6 +1587,10 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID; mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID; + mds->mce_notifier.notifier_call = cxl_handle_mce; + mds->mce_notifier.priority = MCE_PRIO_CXL; + mce_register_decode_chain(&mds->mce_notifier); + return mds; } EXPORT_SYMBOL_NS_GPL(cxl_memdev_state_create, CXL); diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index 0277726afd04..9d4ed4dc4d51 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -376,10 +376,14 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa) goto out; cxlr = cxl_dpa_to_region(cxlmd, dpa); - if (cxlr) + if (cxlr) { + u64 hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa); + + cxl_mce_clear(hpa); dev_warn_once(mds->cxlds.dev, "poison clear dpa:%#llx region: %s\n", dpa, dev_name(&cxlr->dev)); + } record = (struct cxl_poison_record) { .address = cpu_to_le64(dpa), diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 5c4810dcbdeb..d2d906c26755 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -502,6 +502,7 @@ struct cxl_memdev_state { struct cxl_fw_state fw; struct rcuwait mbox_wait; + struct notifier_block mce_notifier; int (*mbox_send)(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd); }; @@ -837,6 +838,8 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len, int cxl_trigger_poison_list(struct cxl_memdev *cxlmd); int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa); int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa); +bool cxl_mce_recorded(u64 pfn); +void cxl_mce_clear(u64 pfn); #ifdef CONFIG_CXL_SUSPEND void cxl_mem_active_inc(void);
Since CXL device is a memory device, while CPU is consuming a poison page of CXL device, it always triggers a MCE (via interrupt #18) and calls memory_failure() to handle POISON page, no matter which-First path is configured. CXL device could also find and report the POISON, kernel now not only traces but also calls memory_failure() to handle it, which is marked as "NEW" in the figure blow. ``` 1. MCE (interrupt #18, while CPU consuming POISON) -> do_machine_check() -> mce_log() -> notify chain (x86_mce_decoder_chain) -> memory_failure() <---------------------------- EXISTS 2.a FW-First (optional, CXL device proactively find&report) -> CXL device -> Firmware -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace \-> memory_failure() ^----- NEW 2.b OS-First (optional, CXL device proactively find&report) -> CXL device -> MSI -> OS: CXL driver -> trace \-> memory_failure() ^------------------------------- NEW ``` But in this way, the memory_failure() could be called twice or even at same time, as is shown in the figure above: (1.) and (2.a or 2.b), before the POISON page is cleared. memory_failure() has it own mutex lock so it actually won't be called at same time and the later call could be avoided because HWPoison bit has been set. However, assume such a scenario, "CXL device reports POISON error" triggers 1st call, user see it from log and want to clear the poison by executing `cxl clear-poison` command, and at the same time, a process tries to access this POISON page, which triggers MCE (it's the 2nd call). Since there is no lock between the 2nd call with clearing poison operation, race condition may happen, which may cause HWPoison bit of the page in an unknown state. Thus, we have to avoid the 2nd call. This patch[2] introduces a new notifier_block into `x86_mce_decoder_chain` and a POISON cache list, to stop the 2nd call of memory_failure(). It checks whether the current poison page has been reported (if yes, stop the notifier chain, don't call the following memory_failure() to report again). Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> --- arch/x86/include/asm/mce.h | 1 + drivers/cxl/core/mbox.c | 115 +++++++++++++++++++++++++++++++++++++ drivers/cxl/core/memdev.c | 6 +- drivers/cxl/cxlmem.h | 3 + 4 files changed, 124 insertions(+), 1 deletion(-)