mbox series

[V4,0/4] rasdaemon: Add support for the CXL error events

Message ID 20230214112143.798-1-shiju.jose@huawei.com (mailing list archive)
Headers show
Series rasdaemon: Add support for the CXL error events | expand

Message

Shiju Jose Feb. 14, 2023, 11:21 a.m. UTC
From: Shiju Jose <shiju.jose@huawei.com>

Log and record the following CXL errors reported through the kernel
trace events. CXL poison errors, CXL AER uncorrectable errors and CXL AER
correctable errors.

Note1: The default poll method in the rasdaemon to receive
the trace events does not work due to a commit in the kernel trace system.
Solution 1:
3e46d910d8acf94e5360126593b68bf4fee4c4a1("tracing: Fix poll() and select()
 do not work on per_cpu trace_pipe an……d trace_pipe_raw") in Linux 6.2-rc7
 or later. and
https://lore.kernel.org/lkml/20230204193345.842-1-shiju.jose@huawei.com/T/

Solution 2: 
Thus instead used the pthread way for testing the CXL error events.
To do so, please make following change in the ras-events.c
<change start ...>
/* rc = read_ras_event_all_cpus(data, cpus); */
rc = -255;
< ...change end >
/* Poll doesn't work on this kernel. Fallback to pthread way */
if (rc == -255) {
...

Shiju Jose (4):
  rasdaemon: Move definition for BIT and BIT_ULL to a common file
  rasdaemon: Add support for the CXL poison events
  rasdaemon: Add support for the CXL AER uncorrectable errors
  rasdaemon: Add support for the CXL AER correctable errors

Changes:
V3 -> V4
1. Modifications for the changes in the kernel patches
   a) https://lore.kernel.org/lkml/cover.1675983077.git.alison.schofield@intel.com/
   b) https://lore.kernel.org/linux-cxl/63e5ed38d77d9_138fbc2947a@iweiny-mobl.notmuch/T/#t

V2 -> V3
1. Fix for the comments from Dave Jiang.

RFC V1 -> V2
1. Rename uuid to region_uuid in the log and SQLite DB.
2. Rebase to the latest rasdaemon code.
3. Modify to match the name changes of interface structures and
   functions in the latest libtraceevent-dev, use in the rasdaemon. 


 Makefile.am                |   7 +-
 configure.ac               |  11 +
 ras-cxl-handler.c          | 398 +++++++++++++++++++++++++++++++++++++
 ras-cxl-handler.h          |  32 +++
 ras-events.c               |  33 +++
 ras-events.h               |   3 +
 ras-non-standard-handler.h |   3 -
 ras-record.c               | 209 +++++++++++++++++++
 ras-record.h               |  52 +++++
 ras-report.c               | 225 +++++++++++++++++++++
 ras-report.h               |   6 +
 11 files changed, 975 insertions(+), 4 deletions(-)
 create mode 100644 ras-cxl-handler.c
 create mode 100644 ras-cxl-handler.h