From patchwork Tue Aug 27 11:30:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cameron X-Patchwork-Id: 11116739 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 48FD114E5 for ; Tue, 27 Aug 2019 11:30:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 26BD32184D for ; Tue, 27 Aug 2019 11:30:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726972AbfH0La4 (ORCPT ); Tue, 27 Aug 2019 07:30:56 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:36842 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726527AbfH0La4 (ORCPT ); Tue, 27 Aug 2019 07:30:56 -0400 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 277A4F007C42F0CB8E46; Tue, 27 Aug 2019 19:30:54 +0800 (CST) Received: from lhrphicprd00229.huawei.com (10.123.41.22) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.439.0; Tue, 27 Aug 2019 19:30:43 +0800 From: Jonathan Cameron To: Mauro Carvalho Chehab , CC: , , , "Jonathan Cameron" Subject: [PATCH V2 1/6] rasdaemon: CCIX: memory error support Date: Tue, 27 Aug 2019 19:30:05 +0800 Message-ID: <20190827113010.50405-2-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190827113010.50405-1-Jonathan.Cameron@huawei.com> References: <20190827113010.50405-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.123.41.22] X-CFilter-Loop: Reflected Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org Adds support for basic decoding and logging of ccix memory errors + storing to sqlite3 DB. Given that the CCIX memory record is very tightly defined by the specification and that databases with large blobs in them are not particularly useful, I have separately exposed all of the standard fields. Note that this means setting them NULL if the validation bits indicate that the field is not valid. Includes making a few ras-record.c functions available from other files to allow us to split off the CCIX error recording functionality. Signed-off-by: Jonathan Cameron --- Makefile.am | 8 +- configure.ac | 10 ++ ras-ccix-handler.c | 244 +++++++++++++++++++++++++++++++++++++++++++++ ras-ccix-handler.h | 61 ++++++++++++ ras-events.c | 16 +++ ras-record-ccix.c | 204 +++++++++++++++++++++++++++++++++++++ ras-record.c | 15 ++- ras-record.h | 28 ++++++ ras-report.h | 6 +- 9 files changed, 585 insertions(+), 7 deletions(-) diff --git a/Makefile.am b/Makefile.am index 3d89672..9d54390 100644 --- a/Makefile.am +++ b/Makefile.am @@ -20,10 +20,16 @@ rasdaemon_SOURCES = rasdaemon.c ras-events.c ras-mc-handler.c \ bitfield.c if WITH_SQLITE3 rasdaemon_SOURCES += ras-record.c +if WITH_CCIX + rasdaemon_SOURCES += ras-record-ccix.c +endif endif if WITH_AER rasdaemon_SOURCES += ras-aer-handler.c endif +if WITH_CCIX + rasdaemon_SOURCES += ras-ccix-handler.c +endif if WITH_NON_STANDARD rasdaemon_SOURCES += ras-non-standard-handler.c endif @@ -56,7 +62,7 @@ rasdaemon_LDADD = -lpthread $(SQLITE3_LIBS) libtrace/libtrace.a include_HEADERS = config.h ras-events.h ras-logger.h ras-mc-handler.h \ ras-aer-handler.h ras-mce-handler.h ras-record.h bitfield.h ras-report.h \ ras-extlog-handler.h ras-arm-handler.h ras-non-standard-handler.h \ - ras-devlink-handler.h + ras-devlink-handler.h ras-ccix-handler.h # This rule can't be called with more than one Makefile job (like make -j8) # I can't figure out a way to fix that diff --git a/configure.ac b/configure.ac index fecff51..ca8977c 100644 --- a/configure.ac +++ b/configure.ac @@ -44,6 +44,15 @@ AS_IF([test "x$enable_aer" = "xyes"], [ ]) AM_CONDITIONAL([WITH_AER], [test x$enable_aer = xyes]) +AC_ARG_ENABLE([ccix], + AS_HELP_STRING([--enable-ccix], [enable CCIX PER events (currently experimental)])) + +AS_IF([test "x$enable_ccix" = "xyes"], [ + AC_DEFINE(HAVE_CCIX,1,"have CCIX PER events collect") + AC_SUBST([WITH_CCIX]) +]) +AM_CONDITIONAL([WITH_CCIX], [test x$enable_ccix = xyes]) + AC_ARG_ENABLE([non_standard], AS_HELP_STRING([--enable-non-standard], [enable NON_STANDARD events (currently experimental)])) @@ -137,4 +146,5 @@ compile time options summary HIP07 SAS HW errors : $enable_hisi_ns_decode ARM events : $enable_arm DEVLINK : $enable_devlink + CCIX : $enable_ccix EOF diff --git a/ras-ccix-handler.c b/ras-ccix-handler.c new file mode 100644 index 0000000..2be413f --- /dev/null +++ b/ras-ccix-handler.c @@ -0,0 +1,244 @@ +/* + * Copyright (c) 2019 Hisilicon Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ +#include +#include +#include +#include +#include "libtrace/kbuffer.h" +#include "ras-record.h" +#include "ras-logger.h" +#include "bitfield.h" +#include "ras-report.h" + +static char *ccix_mem_pool_type(uint8_t pt) +{ + switch (pt) { + case 0: return "other/not-specified"; + case 1: return "ROM"; + case 2: return "volatile"; + case 3: return "non-volatile"; + case 4: return "device/register"; + } + if (pt >= 0x80) + return "vendor"; + return "unknown"; +} + +static char *ccix_mem_spec_type(uint8_t st) +{ + switch (st) { + case 0: return "other/not-specified"; + case 1: return "SRAM"; + case 2: return "DDR"; + case 3: return "NVDIMM-F"; + case 4: return "NVDIMM-N"; + case 5: return "HBM"; + case 6: return "flash"; + } + if (st >= 0x80) + return "vendor"; + return "unknown"; +} + +static char *ccix_mem_op(uint8_t op) +{ + switch (op) { + case 0: return "generic"; + case 1: return "read"; + case 2: return "write"; + case 4: return "scrub"; + } + return "unknown"; +} + +static char *ccix_mem_err_type(int etype) +{ + switch (etype) { + case 0: return "unknown"; + case 1: return "no error"; + case 2: return "single-bit ECC"; + case 3: return "multi-bit ECC"; + case 4: return "single-symbol chipkill ECC"; + case 5: return "multi-symbol chipkill ECC"; + case 6: return "master abort"; + case 7: return "target abort"; + case 8: return "parity error"; + case 9: return "watchdog timeout"; + case 10: return "invalid address"; + case 11: return "mirror Broken"; + case 12: return "memory sparing"; + case 13: return "scrub"; + case 14: return "physical memory map-out event"; + } + return "unknown-type"; +} + +static char *ccix_mem_err_cper_data(const char *c) +{ + const struct cper_ccix_mem_err_compact *cpd = + (struct cper_ccix_mem_err_compact *)c; + static char buf[1024]; + char *p = buf; + + p += sprintf(p, " ("); + p += sprintf(p, "fru: %u ", cpd->fru); + if (cpd->validation_bits & CCIX_MEM_ERR_MEM_ERR_TYPE_VALID) + p += sprintf(p, "error: %s ", + ccix_mem_err_type(cpd->mem_err_type)); + if (cpd->validation_bits & CCIX_MEM_ERR_GENERIC_MEM_VALID) + p += sprintf(p, "type: %s ", + ccix_mem_pool_type(cpd->pool_generic_type)); + if (cpd->validation_bits & CCIX_MEM_ERR_SPEC_TYPE_VALID) + p += sprintf(p, "sub_type: %s ", + ccix_mem_spec_type(cpd->pool_specific_type)); + if (cpd->validation_bits & CCIX_MEM_ERR_OP_VALID) + p += sprintf(p, "op: %s ", ccix_mem_op(cpd->op_type)); + if (cpd->validation_bits & CCIX_MEM_ERR_CARD_VALID) + p += sprintf(p, "card: %u ", cpd->card); + if (cpd->validation_bits & CCIX_MEM_ERR_MOD_VALID) + p += sprintf(p, "mod: %u ", cpd->module); + if (cpd->validation_bits & CCIX_MEM_ERR_BANK_VALID) + p += sprintf(p, "bank: %u ", cpd->bank); + if (cpd->validation_bits & CCIX_MEM_ERR_DEVICE_VALID) + p += sprintf(p, "device: %u ", cpd->device); + if (cpd->validation_bits & CCIX_MEM_ERR_ROW_VALID) + p += sprintf(p, "row: %u ", cpd->row); + if (cpd->validation_bits & CCIX_MEM_ERR_COL_VALID) + p += sprintf(p, "col: %u ", cpd->column); + if (cpd->validation_bits & CCIX_MEM_ERR_RANK_VALID) + p += sprintf(p, "rank: %u ", cpd->rank); + if (cpd->validation_bits & CCIX_MEM_ERR_BIT_POS_VALID) + p += sprintf(p, "bitpos: %u ", cpd->bit_pos); + if (cpd->validation_bits & CCIX_MEM_ERR_CHIP_ID_VALID) + p += sprintf(p, "chipid: %u ", cpd->chip_id); + p += sprintf(p - 1, ")"); + + return buf; +} + +static char *ccix_component_type(int type) +{ + switch (type) { + case 0: return "RA"; + case 1: return "HA"; + case 2: return "SA"; + case 3: return "Port"; + case 4: return "CCIX-Link"; + } + return "unknown-component"; +} + +static char *err_severity(int severity) +{ + switch (severity) { + case 0: return "recoverable"; + case 1: return "fatal"; + case 2: return "corrected"; + case 3: return "informational"; + } + return "unknown-severity"; +} + +static unsigned long long err_mask(int lsb) +{ + if (lsb == 0xff) + return ~0ull; + return ~((1ull << lsb) - 1); +} + +static int ras_ccix_common_parse(struct trace_seq *s, + struct pevent_record *record, + struct event_format *event, void *context, + struct ras_ccix_event *ev) +{ + unsigned long long val; + int len; + + if (pevent_get_field_val(s, event, "err_seq", record, &val, 1) < 0) + return -1; + ev->error_seq = val; + if (pevent_get_field_val(s, event, "sev", record, &val, 1) < 0) + return -1; + ev->severity = val; + if (pevent_get_field_val(s, event, "sevdetail", record, &val, 1) < 0) + return -1; + ev->severity_detail = val; + if (pevent_get_field_val(s, event, "pa", record, &val, 1) < 0) + return -1; + ev->address = val; + if (pevent_get_field_val(s, event, "pa_mask_lsb", record, &val, 1) < 0) + return -1; + ev->pa_mask_lsb = val; + if (pevent_get_field_val(s, event, "source", record, &val, 1) < 0) + return -1; + ev->source = val; + if (pevent_get_field_val(s, event, "component", record, &val, 1) < 0) + return -1; + ev->component = val; + + ev->cper_data = pevent_get_field_raw(s, event, "data", record, &len, 1); + ev->cper_data_length = len; + + if (pevent_get_field_val(s, event, "vendor_data_length", record, &val, + 1)) + return -1; + ev->vendor_data_length = val; + + ev->vendor_data = pevent_get_field_raw(s, event, "vendor_data", record, + &len, 1); + + return 0; +} + +int ras_ccix_memory_event_handler(struct trace_seq *s, + struct pevent_record *record, + struct event_format *event, void *context) +{ + struct ras_events *ras = context; + struct tm *tm; + struct ras_ccix_event ev; + time_t now; + int ret; + + if (ras->use_uptime) + now = record->ts/user_hz + ras->uptime_diff; + else + now = time(NULL); + + tm = localtime(&now); + + if (tm) + strftime(ev.timestamp, sizeof(ev.timestamp), + "%Y-%m-%d %H:%M:%S %z", tm); + trace_seq_printf(s, "%s ", ev.timestamp); + + ret = ras_ccix_common_parse(s, record, event, context, &ev); + if (ret) + return ret; + + trace_seq_printf(s, "%d %s id:%d CCIX memory error %s ue:%d nocomm:%d degraded:%d deferred:%d physical addr: 0x%llx mask: 0x%llx %s", + ev.error_seq, err_severity(ev.severity), + ev.source, ccix_component_type(ev.component), + (ev.severity_detail & 0x1) ? 1 : 0, + (ev.severity_detail & 0x2) ? 1 : 0, + (ev.severity_detail & 0x4) ? 1 : 0, + (ev.severity_detail & 0x8) ? 1 : 0, + ev.address, + err_mask(ev.pa_mask_lsb), + ccix_mem_err_cper_data(ev.cper_data)); + + ras_store_ccix_memory_event(ras, &ev); + + return 0; +} diff --git a/ras-ccix-handler.h b/ras-ccix-handler.h new file mode 100644 index 0000000..f6d25b1 --- /dev/null +++ b/ras-ccix-handler.h @@ -0,0 +1,61 @@ +/* + * Copyright (c) 2019 Hisilicon Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#ifndef __RAS_CCIX_HANDLER_H +#define __RAS_CCIX_HANDLER_H + +#include "ras-events.h" +#include "libtrace/event-parse.h" + +int ras_ccix_memory_event_handler(struct trace_seq *s, + struct pevent_record *record, + struct event_format *event, void *context); + +/* Perhaps unnecessary paranoia, but the tracepoint structure is packed */ +#pragma pack(1) +struct cper_ccix_mem_err_compact { + uint32_t validation_bits; + uint8_t mem_err_type; + uint8_t pool_generic_type; + uint8_t pool_specific_type; + uint8_t op_type; + uint8_t card; + uint16_t module; + uint16_t bank; + uint32_t device; + uint32_t row; + uint32_t column; + uint32_t rank; + uint8_t bit_pos; + uint8_t chip_id; + uint8_t fru; +}; +#pragma pack() + +#define CCIX_MEM_ERR_GENERIC_MEM_VALID 0x0001 +#define CCIX_MEM_ERR_OP_VALID 0x0002 +#define CCIX_MEM_ERR_MEM_ERR_TYPE_VALID 0x0004 +#define CCIX_MEM_ERR_CARD_VALID 0x0008 +#define CCIX_MEM_ERR_BANK_VALID 0x0010 +#define CCIX_MEM_ERR_DEVICE_VALID 0x0020 +#define CCIX_MEM_ERR_ROW_VALID 0x0040 +#define CCIX_MEM_ERR_COL_VALID 0x0080 +#define CCIX_MEM_ERR_RANK_VALID 0x0100 +#define CCIX_MEM_ERR_BIT_POS_VALID 0x0200 +#define CCIX_MEM_ERR_CHIP_ID_VALID 0x0400 +#define CCIX_MEM_ERR_VENDOR_DATA_VALID 0x0800 +#define CCIX_MEM_ERR_MOD_VALID 0x1000 +#define CCIX_MEM_ERR_SPEC_TYPE_VALID 0x2000 + +#endif diff --git a/ras-events.c b/ras-events.c index 6ba7a6a..e365d97 100644 --- a/ras-events.c +++ b/ras-events.c @@ -29,6 +29,7 @@ #include "libtrace/event-parse.h" #include "ras-mc-handler.h" #include "ras-aer-handler.h" +#include "ras-ccix-handler.h" #include "ras-non-standard-handler.h" #include "ras-arm-handler.h" #include "ras-mce-handler.h" @@ -203,6 +204,10 @@ int toggle_ras_mc_event(int enable) rc |= __toggle_ras_mc_event(ras, "ras", "aer_event", enable); #endif +#ifdef HAVE_CCIX + rc |= __toggle_ras_mc_event(ras, "ras", "ccix_memory_event", enable); +#endif + #ifdef HAVE_MCE rc |= __toggle_ras_mc_event(ras, "mce", "mce_record", enable); #endif @@ -717,6 +722,17 @@ int handle_ras_events(int record_events) "ras", "aer_event"); #endif +#ifdef HAVE_CCIX + rc = add_event_handler(ras, pevent, page_size, "ras", + "ccix_memory_error_event", + ras_ccix_memory_event_handler, NULL); + if (!rc) + num_events++; + else + log(ALL, LOG_ERR, "Can't get traces from %s:%s\n", + "ras", "ccix_memory_event"); +#endif + #ifdef HAVE_NON_STANDARD rc = add_event_handler(ras, pevent, page_size, "ras", "non_standard_event", ras_non_standard_event_handler, NULL); diff --git a/ras-record-ccix.c b/ras-record-ccix.c new file mode 100644 index 0000000..6e46b40 --- /dev/null +++ b/ras-record-ccix.c @@ -0,0 +1,204 @@ +/* + * Copyright (C) 2019 Jonathan Cameron + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +*/ + +#include +#include +#include +#include +#include "bitfield.h" +#include "ras-ccix-handler.h" +#include "ras-logger.h" +#include "ras-record.h" +#include "ras-report.h" + +enum { + ccix_field_id, + ccix_field_timestamp, + ccix_field_error_count, + ccix_field_severity, + ccix_field_severity_detail, + ccix_field_address, + ccix_field_address_mask, + ccix_field_source, + ccix_field_component, + ccix_field_common_end +}; + +#define CCIX_COMMON_FIELDS \ + [ccix_field_id] = { .name = "id", .type = "INTEGER PRIMARY KEY" }, \ + [ccix_field_timestamp] = { .name = "timestamp", .type = "TEXT" }, \ + [ccix_field_error_count] = { .name = "error_count", .type = "INTEGER" }, \ + [ccix_field_severity] = { .name = "severity", .type = "INTEGER" }, \ + [ccix_field_severity_detail] = { .name = "severity_detail", .type = "INTEGER" }, \ + [ccix_field_address] = { .name = "address", .type = "INTEGER" }, \ + [ccix_field_address_mask] = { .name = "address_mask", .type = "INTEGER" }, \ + [ccix_field_source] = { .name = "source", .type = "INTEGER" }, \ + [ccix_field_component] = { .name = "component", .type = "INTEGER" } + +enum { + ccix_mem_field_error_type = ccix_field_common_end, + ccix_mem_field_fru, + ccix_mem_field_type, + ccix_mem_field_sub_type, + ccix_mem_field_operation, + ccix_mem_field_card, + ccix_mem_field_mod, + ccix_mem_field_bank, + ccix_mem_field_device, + ccix_mem_field_row, + ccix_mem_field_col, + ccix_mem_field_rank, + ccix_mem_field_bit_pos, + ccix_mem_field_chip_id, + ccix_mem_field_vendor +}; + +static const struct db_fields ccix_memory_event_fields[] = { + CCIX_COMMON_FIELDS, + [ccix_mem_field_error_type] = { .name = "mem_err_type", .type = "INTEGER" }, + [ccix_mem_field_fru] = { .name = "fru", .type = "INTEGER" }, + [ccix_mem_field_type] = { .name = "type", .type = "INTEGER" }, + [ccix_mem_field_sub_type] = { .name = "sub_type", .type = "INTEGER" }, + [ccix_mem_field_operation] = { .name = "operation", .type = "INTEGER" }, + [ccix_mem_field_card] = { .name = "card", .type = "INTEGER" }, + [ccix_mem_field_mod] = { .name = "mod", .type = "INTEGER" }, + [ccix_mem_field_bank] = { .name = "bank", .type = "INTEGER" }, + [ccix_mem_field_device] = { .name = "device", .type = "INTEGER" }, + [ccix_mem_field_row] = { .name = "row", .type = "INTEGER" }, + [ccix_mem_field_col] = { .name = "col", .type = "INTEGER" }, + [ccix_mem_field_rank] = { .name = "rank", .type = "INTEGER" }, + [ccix_mem_field_bit_pos] = { .name = "bit_position", .type = "INTEGER" }, + [ccix_mem_field_chip_id] = { .name = "chip_id", .type = "INTEGER" }, + [ccix_mem_field_vendor] = { .name = "vendor_data", .type = "BLOB" }, +}; + +static const struct db_table_descriptor ccix_memory_event_tab = { + .name = "ccix_memory_event", + .fields = ccix_memory_event_fields, + .num_fields = ARRAY_SIZE(ccix_memory_event_fields), +}; + +static void ras_store_ccix_common(sqlite3_stmt *record, + struct ras_ccix_event *ev) +{ + sqlite3_bind_text(record, ccix_field_timestamp, ev->timestamp, -1, + NULL); + sqlite3_bind_int(record, ccix_field_error_count, ev->error_seq); + sqlite3_bind_int(record, ccix_field_severity, ev->severity); + sqlite3_bind_int(record, ccix_field_severity_detail, + ev->severity_detail); + sqlite3_bind_int64(record, ccix_field_address, ev->address); + sqlite3_bind_int64(record, ccix_field_address_mask, ev->pa_mask_lsb); + sqlite3_bind_int(record, ccix_field_source, ev->source); + sqlite3_bind_int(record, ccix_field_component, ev->component); +} + +int ras_store_ccix_memory_event(struct ras_events *ras, + struct ras_ccix_event *ev) +{ + int rc; + struct sqlite3_priv *priv = ras->db_priv; + struct cper_ccix_mem_err_compact *mem = + (struct cper_ccix_mem_err_compact *)ev->cper_data; + sqlite3_stmt *rec = priv->stmt_ccix_mem_record; + + if (!priv || !rec) + return 0; + log(TERM, LOG_INFO, "ccix_memory_eventstore: %p\n", rec); + + ras_store_ccix_common(rec, ev); + + sqlite3_bind_int(rec, ccix_mem_field_fru, mem->fru); + + if (mem->validation_bits & CCIX_MEM_ERR_MEM_ERR_TYPE_VALID) + sqlite3_bind_int(rec, ccix_mem_field_error_type, + mem->mem_err_type); + + if (mem->validation_bits & CCIX_MEM_ERR_GENERIC_MEM_VALID) + sqlite3_bind_int(rec, ccix_mem_field_type, + mem->pool_generic_type); + + if (mem->validation_bits & CCIX_MEM_ERR_SPEC_TYPE_VALID) + sqlite3_bind_int(rec, ccix_mem_field_sub_type, + mem->pool_specific_type); + + if (mem->validation_bits & CCIX_MEM_ERR_OP_VALID) + sqlite3_bind_int(rec, ccix_mem_field_operation, mem->op_type); + + if (mem->validation_bits & CCIX_MEM_ERR_CARD_VALID) + sqlite3_bind_int(rec, ccix_mem_field_card, mem->card); + + if (mem->validation_bits & CCIX_MEM_ERR_MOD_VALID) + sqlite3_bind_int(rec, ccix_mem_field_mod, mem->module); + + if (mem->validation_bits & CCIX_MEM_ERR_BANK_VALID) + sqlite3_bind_int(rec, ccix_mem_field_bank, mem->bank); + + if (mem->validation_bits & CCIX_MEM_ERR_DEVICE_VALID) + sqlite3_bind_int(rec, ccix_mem_field_device, mem->device); + + if (mem->validation_bits & CCIX_MEM_ERR_ROW_VALID) + sqlite3_bind_int(rec, ccix_mem_field_row, mem->row); + + if (mem->validation_bits & CCIX_MEM_ERR_COL_VALID) + sqlite3_bind_int(rec, ccix_mem_field_col, mem->column); + + if (mem->validation_bits & CCIX_MEM_ERR_RANK_VALID) + sqlite3_bind_int(rec, ccix_mem_field_rank, mem->rank); + + if (mem->validation_bits & CCIX_MEM_ERR_BIT_POS_VALID) + sqlite3_bind_int(rec, ccix_mem_field_bit_pos, mem->bit_pos); + + if (mem->validation_bits & CCIX_MEM_ERR_CHIP_ID_VALID) + sqlite3_bind_int(rec, ccix_mem_field_chip_id, mem->chip_id); + + if (mem->validation_bits & CCIX_MEM_ERR_VENDOR_DATA_VALID) + sqlite3_bind_blob(rec, ccix_mem_field_vendor, + ev->vendor_data, ev->vendor_data_length, + NULL); + + rc = sqlite3_step(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed to do ccix_mem_record step on sqlite: error = %d\n", + rc); + + rc = sqlite3_reset(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed reset ccix_mem_record on sqlite: error = %d\n", + rc); + + rc = sqlite3_clear_bindings(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed to clear ccix_mem_record: error %d\n", + rc); + log(TERM, LOG_INFO, "register inserted at db\n"); + return rc; +} + +void ras_ccix_create_table(struct sqlite3_priv *priv) +{ + int rc; + + rc = ras_mc_create_table(priv, &ccix_memory_event_tab); + if (rc == SQLITE_OK) + rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_mem_record, + &ccix_memory_event_tab); +} diff --git a/ras-record.c b/ras-record.c index b212607..874902c 100644 --- a/ras-record.c +++ b/ras-record.c @@ -28,6 +28,7 @@ #include "ras-events.h" #include "ras-mc-handler.h" #include "ras-aer-handler.h" +#include "ras-ccix-handler.h" #include "ras-mce-handler.h" #include "ras-logger.h" @@ -449,9 +450,9 @@ int ras_store_devlink_event(struct ras_events *ras, struct devlink_event *ev) * Generic code */ -static int ras_mc_prepare_stmt(struct sqlite3_priv *priv, - sqlite3_stmt **stmt, - const struct db_table_descriptor *db_tab) +int ras_mc_prepare_stmt(struct sqlite3_priv *priv, + sqlite3_stmt **stmt, + const struct db_table_descriptor *db_tab) { int i, rc; @@ -495,8 +496,8 @@ static int ras_mc_prepare_stmt(struct sqlite3_priv *priv, return rc; } -static int ras_mc_create_table(struct sqlite3_priv *priv, - const struct db_table_descriptor *db_tab) +int ras_mc_create_table(struct sqlite3_priv *priv, + const struct db_table_descriptor *db_tab) { const struct db_fields *field; char sql[1024], *p = sql, *end = sql + sizeof(sql); @@ -604,6 +605,10 @@ int ras_mc_event_opendb(unsigned cpu, struct ras_events *ras) &extlog_event_tab); #endif +#ifdef HAVE_CCIX + ras_ccix_create_table(priv); +#endif + #ifdef HAVE_MCE rc = ras_mc_create_table(priv, &mce_record_tab); if (rc == SQLITE_OK) diff --git a/ras-record.h b/ras-record.h index 432a571..c094c91 100644 --- a/ras-record.h +++ b/ras-record.h @@ -44,6 +44,21 @@ struct ras_aer_event { const char *msg; }; +struct ras_ccix_event { + char timestamp[64]; + int32_t error_seq; + int8_t severity; + int8_t severity_detail; + unsigned long long address; + int8_t pa_mask_lsb; + uint8_t source; + uint8_t component; + const char *cper_data; + unsigned short cper_data_length; + uint16_t vendor_data_length; + const char *vendor_data; +}; + struct ras_extlog_event { char timestamp[64]; int32_t error_seq; @@ -108,6 +123,9 @@ struct sqlite3_priv { #ifdef HAVE_EXTLOG sqlite3_stmt *stmt_extlog_record; #endif +#ifdef HAVE_CCIX + sqlite3_stmt *stmt_ccix_mem_record; +#endif #ifdef HAVE_NON_STANDARD sqlite3_stmt *stmt_non_standard_record; #endif @@ -131,12 +149,20 @@ struct db_table_descriptor { }; int ras_mc_event_opendb(unsigned cpu, struct ras_events *ras); +int ras_mc_prepare_stmt(struct sqlite3_priv *priv, + sqlite3_stmt **stmt, + const struct db_table_descriptor *db_tab); +int ras_mc_create_table(struct sqlite3_priv *priv, + const struct db_table_descriptor *db_tab); + int ras_mc_add_vendor_table(struct ras_events *ras, sqlite3_stmt **stmt, const struct db_table_descriptor *db_tab); int ras_store_mc_event(struct ras_events *ras, struct ras_mc_event *ev); int ras_store_aer_event(struct ras_events *ras, struct ras_aer_event *ev); int ras_store_mce_record(struct ras_events *ras, struct mce_event *ev); int ras_store_extlog_mem_record(struct ras_events *ras, struct ras_extlog_event *ev); +void ras_ccix_create_table(struct sqlite3_priv *priv); +int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev); int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev); int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev); int ras_store_devlink_event(struct ras_events *ras, struct devlink_event *ev); @@ -147,6 +173,8 @@ static inline int ras_store_mc_event(struct ras_events *ras, struct ras_mc_event static inline int ras_store_aer_event(struct ras_events *ras, struct ras_aer_event *ev) { return 0; }; static inline int ras_store_mce_record(struct ras_events *ras, struct mce_event *ev) { return 0; }; static inline int ras_store_extlog_mem_record(struct ras_events *ras, struct ras_extlog_event *ev) { return 0; }; +static inline void ras_ccix_create_table(void *priv) {}; +static inline int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev) { return 0; }; static inline int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; }; static inline int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev) { return 0; }; static inline int ras_store_devlink_event(struct ras_events *ras, struct devlink_event *ev) { return 0; }; diff --git a/ras-report.h b/ras-report.h index cb133a1..4684fdc 100644 --- a/ras-report.h +++ b/ras-report.h @@ -19,6 +19,7 @@ #include "ras-mc-handler.h" #include "ras-mce-handler.h" #include "ras-aer-handler.h" +#include "ras-ccix-handler.h" /* Maximal length of backtrace. */ #define MAX_BACKTRACE_SIZE (1024*1024) @@ -35,7 +36,8 @@ enum { AER_EVENT, NON_STANDARD_EVENT, ARM_EVENT, - DEVLINK_EVENT + DEVLINK_EVENT, + CCIX_EVENT, }; #ifdef HAVE_ABRT_REPORT @@ -46,6 +48,7 @@ int ras_report_mce_event(struct ras_events *ras, struct mce_event *ev); int ras_report_non_standard_event(struct ras_events *ras, struct ras_non_standard_event *ev); int ras_report_arm_event(struct ras_events *ras, struct ras_arm_event *ev); int ras_report_devlink_event(struct ras_events *ras, struct devlink_event *ev); +int ras_report_ccix_event(struct ras_events *ras, struct ras_ccix_event *ev); #else @@ -55,6 +58,7 @@ static inline int ras_report_mce_event(struct ras_events *ras, struct mce_event static inline int ras_report_non_standard_event(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; }; static inline int ras_report_arm_event(struct ras_events *ras, struct ras_arm_event *ev) { return 0; }; static inline int ras_report_devlink_event(struct ras_events *ras, struct devlink_event *ev) { return 0; }; +static inline int ras_report_ccix_event(struct ras_events *ras, struct ras_ccix_event *ev) { return 0; }; #endif From patchwork Tue Aug 27 11:30:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cameron X-Patchwork-Id: 11116737 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D1EE16B1 for ; Tue, 27 Aug 2019 11:30:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EFDED2184D for ; Tue, 27 Aug 2019 11:30:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726772AbfH0La4 (ORCPT ); Tue, 27 Aug 2019 07:30:56 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:36814 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726621AbfH0La4 (ORCPT ); Tue, 27 Aug 2019 07:30:56 -0400 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 170882AD328075993044; Tue, 27 Aug 2019 19:30:54 +0800 (CST) Received: from lhrphicprd00229.huawei.com (10.123.41.22) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.439.0; Tue, 27 Aug 2019 19:30:45 +0800 From: Jonathan Cameron To: Mauro Carvalho Chehab , CC: , , , "Jonathan Cameron" Subject: [PATCH V2 2/6] rasdaemon: CCIX: Cache error support Date: Tue, 27 Aug 2019 19:30:06 +0800 Message-ID: <20190827113010.50405-3-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190827113010.50405-1-Jonathan.Cameron@huawei.com> References: <20190827113010.50405-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.123.41.22] X-CFilter-Loop: Reflected Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org Adds the support of CCIX cache error reporting and logging to sqlite3. Signed-off-by: Jonathan Cameron --- ras-ccix-handler.c | 114 +++++++++++++++++++++++++++++++++++++++++++++ ras-ccix-handler.h | 24 ++++++++++ ras-events.c | 9 ++++ ras-record-ccix.c | 100 +++++++++++++++++++++++++++++++++++++++ ras-record.h | 3 ++ 5 files changed, 250 insertions(+) diff --git a/ras-ccix-handler.c b/ras-ccix-handler.c index 2be413f..f68c297 100644 --- a/ras-ccix-handler.c +++ b/ras-ccix-handler.c @@ -127,6 +127,79 @@ static char *ccix_mem_err_cper_data(const char *c) return buf; } +static char *ccix_cache_type(uint8_t type) +{ + switch (type) { + case 0: return "instruction"; + case 1: return "data"; + case 2: return "generic/unified"; + case 3: return "snoop filter directory"; + } + return "unknown"; +} + +static char *ccix_cache_err_type(int etype) +{ + switch (etype) { + case 0: return "data"; + case 1: return "tag"; + case 2: return "timeout"; + case 3: return "hang"; + case 4: return "data loss"; + case 5: return "invalid address"; + } + return "unknown-type"; +} + +static char *ccix_cache_op(uint8_t op) +{ + switch (op) { + case 0: return "generic"; + case 1: return "generic read"; + case 2: return "generic write"; + case 3: return "data read"; + case 4: return "data write"; + case 5: return "instruction fetch"; + case 6: return "prefetch"; + case 7: return "eviction"; + case 8: return "snooping"; + case 9: return "snooped"; + case 10: return "management/command"; + } + return "unknown"; +} + +static char *ccix_cache_err_cper_data(const char *c) +{ + const struct cper_ccix_cache_err_compact *cpd = + (struct cper_ccix_cache_err_compact *)c; + static char buf[1024]; + char *p = buf; + + if (!(cpd->validation_bits)) + return ""; + + p += sprintf(p, " ("); + if (cpd->validation_bits & CCIX_CACHE_ERR_CACHE_ERR_TYPE_VALID) + p += sprintf(p, "error: %s ", + ccix_cache_err_type(cpd->cache_error_type)); + if (cpd->validation_bits & CCIX_CACHE_ERR_TYPE_VALID) + p += sprintf(p, "type: %s ", ccix_cache_type(cpd->cache_type)); + if (cpd->validation_bits & CCIX_CACHE_ERR_OP_VALID) + p += sprintf(p, "op: %s ", ccix_cache_op(cpd->op_type)); + if (cpd->validation_bits & CCIX_CACHE_ERR_LEVEL_VALID) + p += sprintf(p, "level: %u ", cpd->cache_level); + if (cpd->validation_bits & CCIX_CACHE_ERR_SET_VALID) + p += sprintf(p, "set: %u ", cpd->set); + if (cpd->validation_bits & CCIX_CACHE_ERR_WAY_VALID) + p += sprintf(p, "way: %u ", cpd->way); + if (cpd->validation_bits & CCIX_CACHE_ERR_INSTANCE_ID_VALID) + p += sprintf(p, "instance: %u ", cpd->instance); + p += sprintf(p - 1, ")"); + + return buf; +} + static char *ccix_component_type(int type) { switch (type) { @@ -242,3 +315,44 @@ int ras_ccix_memory_event_handler(struct trace_seq *s, return 0; } + +int ras_ccix_cache_event_handler(struct trace_seq *s, + struct pevent_record *record, + struct event_format *event, void *context) +{ + struct ras_events *ras = context; + struct tm *tm; + struct ras_ccix_event ev; + time_t now; + int ret; + + if (ras->use_uptime) + now = record->ts/user_hz + ras->uptime_diff; + else + now = time(NULL); + + tm = localtime(&now); + + if (tm) + strftime(ev.timestamp, sizeof(ev.timestamp), + "%Y-%m-%d %H:%M:%S %z", tm); + trace_seq_printf(s, "%s ", ev.timestamp); + ret = ras_ccix_common_parse(s, record, event, context, &ev); + if (ret) + return ret; + + trace_seq_printf(s, "%d %s id:%d CCIX cache error %s ue:%d nocomm:%d degraded:%d deferred:%d physical addr: 0x%llx mask: 0x%llx %s", + ev.error_seq, err_severity(ev.severity), + ev.source, ccix_component_type(ev.component), + (ev.severity_detail & 0x1) ? 1 : 0, + (ev.severity_detail & 0x2) ? 1 : 0, + (ev.severity_detail & 0x4) ? 1 : 0, + (ev.severity_detail & 0x8) ? 1 : 0, + ev.address, + err_mask(ev.pa_mask_lsb), + ccix_cache_err_cper_data(ev.cper_data)); + + ras_store_ccix_cache_event(ras, &ev); + + return 0; +} diff --git a/ras-ccix-handler.h b/ras-ccix-handler.h index f6d25b1..629ccbe 100644 --- a/ras-ccix-handler.h +++ b/ras-ccix-handler.h @@ -21,6 +21,9 @@ int ras_ccix_memory_event_handler(struct trace_seq *s, struct pevent_record *record, struct event_format *event, void *context); +int ras_ccix_cache_event_handler(struct trace_seq *s, + struct pevent_record *record, + struct event_format *event, void *context); /* Perhaps unnecessary paranoia, but the tracepoint structure is packed */ #pragma pack(1) @@ -41,6 +44,18 @@ struct cper_ccix_mem_err_compact { uint8_t chip_id; uint8_t fru; }; + +struct cper_ccix_cache_err_compact { + uint32_t validation_bits; + uint32_t set; + uint32_t way; + uint8_t cache_type; + uint8_t op_type; + uint8_t cache_error_type; + uint8_t cache_level; + uint8_t instance; +}; + #pragma pack() #define CCIX_MEM_ERR_GENERIC_MEM_VALID 0x0001 @@ -58,4 +73,13 @@ struct cper_ccix_mem_err_compact { #define CCIX_MEM_ERR_MOD_VALID 0x1000 #define CCIX_MEM_ERR_SPEC_TYPE_VALID 0x2000 +#define CCIX_CACHE_ERR_TYPE_VALID 0x0001 +#define CCIX_CACHE_ERR_OP_VALID 0x0002 +#define CCIX_CACHE_ERR_CACHE_ERR_TYPE_VALID 0x0004 +#define CCIX_CACHE_ERR_LEVEL_VALID 0x0008 +#define CCIX_CACHE_ERR_SET_VALID 0x0010 +#define CCIX_CACHE_ERR_WAY_VALID 0x0020 +#define CCIX_CACHE_ERR_INSTANCE_ID_VALID 0x0040 +#define CCIX_CACHE_ERR_VENDOR_DATA_VALID 0x0080 + #endif diff --git a/ras-events.c b/ras-events.c index e365d97..f1b67cd 100644 --- a/ras-events.c +++ b/ras-events.c @@ -206,6 +206,7 @@ int toggle_ras_mc_event(int enable) #ifdef HAVE_CCIX rc |= __toggle_ras_mc_event(ras, "ras", "ccix_memory_event", enable); + rc |= __toggle_ras_mc_event(ras, "ras", "ccix_cache_event", enable); #endif #ifdef HAVE_MCE @@ -731,6 +732,14 @@ int handle_ras_events(int record_events) else log(ALL, LOG_ERR, "Can't get traces from %s:%s\n", "ras", "ccix_memory_event"); + rc = add_event_handler(ras, pevent, page_size, "ras", + "ccix_cache_error_event", + ras_ccix_cache_event_handler, NULL); + if (!rc) + num_events++; + else + log(ALL, LOG_ERR, "Can't get traces from %s:%s\n", + "ras", "ccix_cache_event"); #endif #ifdef HAVE_NON_STANDARD diff --git a/ras-record-ccix.c b/ras-record-ccix.c index 6e46b40..5b6e044 100644 --- a/ras-record-ccix.c +++ b/ras-record-ccix.c @@ -193,6 +193,101 @@ int ras_store_ccix_memory_event(struct ras_events *ras, return rc; } +enum { + ccix_cache_field_type = ccix_field_common_end, + ccix_cache_field_operation, + ccix_cache_field_error_type, + ccix_cache_field_level, + ccix_cache_field_set, + ccix_cache_field_way, + ccix_cache_field_instance, + ccix_cache_field_vendor, +}; + +static const struct db_fields ccix_cache_event_fields[] = { + CCIX_COMMON_FIELDS, + [ccix_cache_field_type] = { .name = "type", .type = "INTEGER" }, + [ccix_cache_field_operation] = { .name = "operation", .type = "INTEGER" }, + [ccix_cache_field_error_type] = { .name = "cache_err_type", .type = "INTEGER" }, + [ccix_cache_field_level] = { .name = "\"level\"", .type = "INTEGER" }, + [ccix_cache_field_set] = { .name = "\"set\"", .type = "INTEGER" }, + [ccix_cache_field_way] = { .name = "way", .type = "INTEGER" }, + [ccix_cache_field_instance] = { .name = "instance", .type = "INTEGER" }, + [ccix_cache_field_vendor] = { .name = "vendor_data", .type = "BLOB" }, +}; + +static const struct db_table_descriptor ccix_cache_event_tab = { + .name = "ccix_cache_event", + .fields = ccix_cache_event_fields, + .num_fields = ARRAY_SIZE(ccix_cache_event_fields), +}; + +int ras_store_ccix_cache_event(struct ras_events *ras, + struct ras_ccix_event *ev) +{ + int rc; + struct sqlite3_priv *priv = ras->db_priv; + struct cper_ccix_cache_err_compact *cache = + (struct cper_ccix_cache_err_compact *)ev->cper_data; + sqlite3_stmt *rec = priv->stmt_ccix_cache_record; + + if (!priv || !rec) + return 0; + log(TERM, LOG_INFO, "ccix_cache_eventstore: %p\n", rec); + + ras_store_ccix_common(rec, ev); + + if (cache->validation_bits & CCIX_CACHE_ERR_CACHE_ERR_TYPE_VALID) + sqlite3_bind_int(rec, ccix_cache_field_error_type, + cache->cache_error_type); + + if (cache->validation_bits & CCIX_CACHE_ERR_TYPE_VALID) + sqlite3_bind_int(rec, ccix_cache_field_type, cache->cache_type); + + if (cache->validation_bits & CCIX_CACHE_ERR_OP_VALID) + sqlite3_bind_int(rec, ccix_cache_field_operation, + cache->op_type); + + if (cache->validation_bits & CCIX_CACHE_ERR_LEVEL_VALID) + sqlite3_bind_int(rec, ccix_cache_field_level, + cache->cache_level); + + if (cache->validation_bits & CCIX_CACHE_ERR_SET_VALID) + sqlite3_bind_int(rec, ccix_cache_field_set, cache->set); + + if (cache->validation_bits & CCIX_CACHE_ERR_WAY_VALID) + sqlite3_bind_int(rec, ccix_cache_field_way, cache->way); + + if (cache->validation_bits & CCIX_CACHE_ERR_INSTANCE_ID_VALID) + sqlite3_bind_int(rec, ccix_cache_field_instance, + cache->instance); + + if (cache->validation_bits & CCIX_CACHE_ERR_VENDOR_DATA_VALID) + sqlite3_bind_blob(rec, ccix_cache_field_vendor, + ev->vendor_data, ev->vendor_data_length, + NULL); + + rc = sqlite3_step(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed to do ccix_cache_record step on sqlite: error = %d\n", + rc); + + rc = sqlite3_reset(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed reset ccix_cache_record on sqlite: error = %d\n", + rc); + + rc = sqlite3_clear_bindings(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed to clear ccix_cache_record: error %d\n", + rc); + log(TERM, LOG_INFO, "register inserted at db\n"); + return rc; +} + void ras_ccix_create_table(struct sqlite3_priv *priv) { int rc; @@ -201,4 +296,9 @@ void ras_ccix_create_table(struct sqlite3_priv *priv) if (rc == SQLITE_OK) rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_mem_record, &ccix_memory_event_tab); + + rc = ras_mc_create_table(priv, &ccix_cache_event_tab); + if (rc == SQLITE_OK) + rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_cache_record, + &ccix_cache_event_tab); } diff --git a/ras-record.h b/ras-record.h index c094c91..ac25ffc 100644 --- a/ras-record.h +++ b/ras-record.h @@ -125,6 +125,7 @@ struct sqlite3_priv { #endif #ifdef HAVE_CCIX sqlite3_stmt *stmt_ccix_mem_record; + sqlite3_stmt *stmt_ccix_cache_record; #endif #ifdef HAVE_NON_STANDARD sqlite3_stmt *stmt_non_standard_record; @@ -163,6 +164,7 @@ int ras_store_mce_record(struct ras_events *ras, struct mce_event *ev); int ras_store_extlog_mem_record(struct ras_events *ras, struct ras_extlog_event *ev); void ras_ccix_create_table(struct sqlite3_priv *priv); int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev); +int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev); int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev); int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev); int ras_store_devlink_event(struct ras_events *ras, struct devlink_event *ev); @@ -175,6 +177,7 @@ static inline int ras_store_mce_record(struct ras_events *ras, struct mce_event static inline int ras_store_extlog_mem_record(struct ras_events *ras, struct ras_extlog_event *ev) { return 0; }; static inline void ras_ccix_create_table(void *priv) {}; static inline int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev) { return 0; }; +static inline int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; static inline int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; }; static inline int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev) { return 0; }; static inline int ras_store_devlink_event(struct ras_events *ras, struct devlink_event *ev) { return 0; }; From patchwork Tue Aug 27 11:30:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cameron X-Patchwork-Id: 11116757 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A6D0416B1 for ; Tue, 27 Aug 2019 11:47:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 83F2D2173E for ; Tue, 27 Aug 2019 11:47:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726537AbfH0LrM (ORCPT ); Tue, 27 Aug 2019 07:47:12 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:52704 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728763AbfH0LrM (ORCPT ); Tue, 27 Aug 2019 07:47:12 -0400 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 1D7EC2763C2188B19219; Tue, 27 Aug 2019 19:30:54 +0800 (CST) Received: from lhrphicprd00229.huawei.com (10.123.41.22) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.439.0; Tue, 27 Aug 2019 19:30:47 +0800 From: Jonathan Cameron To: Mauro Carvalho Chehab , CC: , , , "Jonathan Cameron" Subject: [PATCH V2 3/6] rasdaemon: CCIX: ATC error support Date: Tue, 27 Aug 2019 19:30:07 +0800 Message-ID: <20190827113010.50405-4-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190827113010.50405-1-Jonathan.Cameron@huawei.com> References: <20190827113010.50405-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.123.41.22] X-CFilter-Loop: Reflected Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org Adds support for CCIX address translation cache (ATC) errors. Signed-off-by: Jonathan Cameron --- ras-ccix-handler.c | 61 ++++++++++++++++++++++++++++++++++++++++ ras-ccix-handler.h | 13 +++++++++ ras-events.c | 9 ++++++ ras-record-ccix.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++ ras-record.h | 3 ++ 5 files changed, 155 insertions(+) diff --git a/ras-ccix-handler.c b/ras-ccix-handler.c index f68c297..f7b9e8e 100644 --- a/ras-ccix-handler.c +++ b/ras-ccix-handler.c @@ -200,6 +200,26 @@ static char *ccix_cache_err_cper_data(const char *c) return buf; } +static char *ccix_atc_err_cper_data(const char *c) +{ + const struct cper_ccix_atc_err_compact *cpd = + (struct cper_ccix_atc_err_compact *)c; + static char buf[1024]; + char *p = buf; + + if (!cpd->validation_bits) + return ""; + + p += sprintf(p, " ("); + if (cpd->validation_bits & CCIX_ATC_ERR_OP_VALID) + p += sprintf(p, "op: %s ", ccix_cache_op(cpd->op_type)); + if (cpd->validation_bits & CCIX_ATC_ERR_INSTANCE_ID_VALID) + p += sprintf(p, "instance: %u ", cpd->instance); + p += sprintf(p - 1, ")"); + + return buf; +} + static char *ccix_component_type(int type) { switch (type) { @@ -356,3 +376,44 @@ int ras_ccix_cache_event_handler(struct trace_seq *s, return 0; } + +int ras_ccix_atc_event_handler(struct trace_seq *s, + struct pevent_record *record, + struct event_format *event, void *context) +{ + struct ras_events *ras = context; + struct tm *tm; + struct ras_ccix_event ev; + time_t now; + int ret; + + if (ras->use_uptime) + now = record->ts/user_hz + ras->uptime_diff; + else + now = time(NULL); + + tm = localtime(&now); + + if (tm) + strftime(ev.timestamp, sizeof(ev.timestamp), + "%Y-%m-%d %H:%M:%S %z", tm); + trace_seq_printf(s, "%s ", ev.timestamp); + ret = ras_ccix_common_parse(s, record, event, context, &ev); + if (ret) + return ret; + + trace_seq_printf(s, "%d %s id:%d CCIX ATC error: %s ue:%d nocomm:%d degraded:%d deferred:%d physical addr: 0x%llx mask: 0x%llx %s", + ev.error_seq, err_severity(ev.severity), + ev.source, ccix_component_type(ev.component), + (ev.severity_detail & 0x1) ? 1 : 0, + (ev.severity_detail & 0x2) ? 1 : 0, + (ev.severity_detail & 0x4) ? 1 : 0, + (ev.severity_detail & 0x8) ? 1 : 0, + ev.address, + err_mask(ev.pa_mask_lsb), + ccix_atc_err_cper_data(ev.cper_data)); + + ras_store_ccix_atc_event(ras, &ev); + + return 0; +} diff --git a/ras-ccix-handler.h b/ras-ccix-handler.h index 629ccbe..4528af7 100644 --- a/ras-ccix-handler.h +++ b/ras-ccix-handler.h @@ -24,6 +24,9 @@ int ras_ccix_memory_event_handler(struct trace_seq *s, int ras_ccix_cache_event_handler(struct trace_seq *s, struct pevent_record *record, struct event_format *event, void *context); +int ras_ccix_atc_event_handler(struct trace_seq *s, + struct pevent_record *record, + struct event_format *event, void *context); /* Perhaps unnecessary paranoia, but the tracepoint structure is packed */ #pragma pack(1) @@ -56,6 +59,12 @@ struct cper_ccix_cache_err_compact { uint8_t instance; }; +struct cper_ccix_atc_err_compact { + uint32_t validation_bits; + uint8_t op_type; + uint8_t instance; +}; + #pragma pack() #define CCIX_MEM_ERR_GENERIC_MEM_VALID 0x0001 @@ -82,4 +91,8 @@ struct cper_ccix_cache_err_compact { #define CCIX_CACHE_ERR_INSTANCE_ID_VALID 0x0040 #define CCIX_CACHE_ERR_VENDOR_DATA_VALID 0x0080 +#define CCIX_ATC_ERR_OP_VALID 0x0001 +#define CCIX_ATC_ERR_INSTANCE_ID_VALID 0x0002 +#define CCIX_ATC_ERR_VENDOR_DATA_VALID 0x0004 + #endif diff --git a/ras-events.c b/ras-events.c index f1b67cd..68ed246 100644 --- a/ras-events.c +++ b/ras-events.c @@ -207,6 +207,7 @@ int toggle_ras_mc_event(int enable) #ifdef HAVE_CCIX rc |= __toggle_ras_mc_event(ras, "ras", "ccix_memory_event", enable); rc |= __toggle_ras_mc_event(ras, "ras", "ccix_cache_event", enable); + rc |= __toggle_ras_mc_event(ras, "ras", "ccix_atc_event", enable); #endif #ifdef HAVE_MCE @@ -740,6 +741,14 @@ int handle_ras_events(int record_events) else log(ALL, LOG_ERR, "Can't get traces from %s:%s\n", "ras", "ccix_cache_event"); + rc = add_event_handler(ras, pevent, page_size, "ras", + "ccix_atc_error_event", + ras_ccix_atc_event_handler, NULL); + if (!rc) + num_events++; + else + log(ALL, LOG_ERR, "Can't get traces from %s:%s\n", + "ras", "ccix_atc_event"); #endif #ifdef HAVE_NON_STANDARD diff --git a/ras-record-ccix.c b/ras-record-ccix.c index 5b6e044..df68eef 100644 --- a/ras-record-ccix.c +++ b/ras-record-ccix.c @@ -288,6 +288,70 @@ int ras_store_ccix_cache_event(struct ras_events *ras, return rc; } +enum { + ccix_atc_field_operation = ccix_field_common_end, + ccix_atc_field_instance, + ccix_atc_field_vendor, +}; + +static const struct db_fields ccix_atc_event_fields[] = { + CCIX_COMMON_FIELDS, + [ccix_atc_field_operation] = { .name = "operation", .type = "INTEGER" }, + [ccix_atc_field_instance] = { .name = "instance", .type = "INTEGER" }, + [ccix_atc_field_vendor] = { .name = "vendor_data", .type = "BLOB" }, +}; + +static const struct db_table_descriptor ccix_atc_event_tab = { + .name = "ccix_atc_event", + .fields = ccix_atc_event_fields, + .num_fields = ARRAY_SIZE(ccix_atc_event_fields), +}; + +int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev) +{ + int rc; + struct sqlite3_priv *priv = ras->db_priv; + struct cper_ccix_atc_err_compact *atc = + (struct cper_ccix_atc_err_compact *)ev->cper_data; + sqlite3_stmt *rec = priv->stmt_ccix_atc_record; + + if (!priv || !rec) + return 0; + log(TERM, LOG_INFO, "ccix_atc_eventstore: %p\n", rec); + + ras_store_ccix_common(priv->stmt_ccix_atc_record, ev); + if (atc->validation_bits & CCIX_ATC_ERR_OP_VALID) + sqlite3_bind_int(rec, ccix_atc_field_operation, atc->op_type); + + if (atc->validation_bits & CCIX_ATC_ERR_INSTANCE_ID_VALID) + sqlite3_bind_int(rec, ccix_atc_field_instance, atc->instance); + + if (atc->validation_bits & CCIX_ATC_ERR_VENDOR_DATA_VALID) + sqlite3_bind_blob(rec, ccix_atc_field_vendor, + ev->vendor_data, ev->vendor_data_length, + NULL); + + rc = sqlite3_step(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed to do ccix_atc_record step on sqlite: error = %d\n", + rc); + + rc = sqlite3_reset(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed reset ccix_atc_record on sqlite: error = %d\n", + rc); + + rc = sqlite3_clear_bindings(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed to clear ccix_atc_record: error %d\n", + rc); + log(TERM, LOG_INFO, "register inserted at db\n"); + return rc; +} + void ras_ccix_create_table(struct sqlite3_priv *priv) { int rc; @@ -301,4 +365,9 @@ void ras_ccix_create_table(struct sqlite3_priv *priv) if (rc == SQLITE_OK) rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_cache_record, &ccix_cache_event_tab); + + rc = ras_mc_create_table(priv, &ccix_atc_event_tab); + if (rc == SQLITE_OK) + rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_atc_record, + &ccix_atc_event_tab); } diff --git a/ras-record.h b/ras-record.h index ac25ffc..c3b3586 100644 --- a/ras-record.h +++ b/ras-record.h @@ -126,6 +126,7 @@ struct sqlite3_priv { #ifdef HAVE_CCIX sqlite3_stmt *stmt_ccix_mem_record; sqlite3_stmt *stmt_ccix_cache_record; + sqlite3_stmt *stmt_ccix_atc_record; #endif #ifdef HAVE_NON_STANDARD sqlite3_stmt *stmt_non_standard_record; @@ -165,6 +166,7 @@ int ras_store_extlog_mem_record(struct ras_events *ras, struct ras_extlog_event void ras_ccix_create_table(struct sqlite3_priv *priv); int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev); int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev); +int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev); int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev); int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev); int ras_store_devlink_event(struct ras_events *ras, struct devlink_event *ev); @@ -178,6 +180,7 @@ static inline int ras_store_extlog_mem_record(struct ras_events *ras, struct ras static inline void ras_ccix_create_table(void *priv) {}; static inline int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev) { return 0; }; static inline int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; +static inline int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; static inline int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; }; static inline int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev) { return 0; }; static inline int ras_store_devlink_event(struct ras_events *ras, struct devlink_event *ev) { return 0; }; From patchwork Tue Aug 27 11:30:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cameron X-Patchwork-Id: 11116741 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 40D5A112C for ; Tue, 27 Aug 2019 11:31:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1659B21872 for ; Tue, 27 Aug 2019 11:31:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726527AbfH0LbA (ORCPT ); Tue, 27 Aug 2019 07:31:00 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:36926 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726140AbfH0LbA (ORCPT ); Tue, 27 Aug 2019 07:31:00 -0400 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 2B6448C40989623C2C54; Tue, 27 Aug 2019 19:30:59 +0800 (CST) Received: from lhrphicprd00229.huawei.com (10.123.41.22) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.439.0; Tue, 27 Aug 2019 19:30:49 +0800 From: Jonathan Cameron To: Mauro Carvalho Chehab , CC: , , , "Jonathan Cameron" Subject: [PATCH V2 4/6] rasdaemon: CCIX: Port error suppport Date: Tue, 27 Aug 2019 19:30:08 +0800 Message-ID: <20190827113010.50405-5-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190827113010.50405-1-Jonathan.Cameron@huawei.com> References: <20190827113010.50405-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.123.41.22] X-CFilter-Loop: Reflected Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org Add support for reporting and storing to sqlite3 for CCIX Port errors. Signed-off-by: Jonathan Cameron --- ras-ccix-handler.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++ ras-ccix-handler.h | 14 +++++++ ras-events.c | 9 +++++ ras-record-ccix.c | 75 +++++++++++++++++++++++++++++++++++++ ras-record.h | 3 ++ 5 files changed, 194 insertions(+) diff --git a/ras-ccix-handler.c b/ras-ccix-handler.c index f7b9e8e..0a79627 100644 --- a/ras-ccix-handler.c +++ b/ras-ccix-handler.c @@ -220,6 +220,58 @@ static char *ccix_atc_err_cper_data(const char *c) return buf; } +static char *ccix_port_op(uint8_t op) +{ + switch (op) { + case 0: return "command"; + case 1: return "read"; + case 2: return "write"; + } + return "unknown"; +} + +static char *ccix_port_err_type(uint8_t type) +{ + switch (type) { + case 0: return "generic bus / slave error"; + case 1: return "bus parity / ECC error"; + case 2: return "BDF not present"; + case 3: return "invalid address"; + case 4: return "invalid agent ID"; + case 5: return "bus timeout"; + case 6: return "hang"; + case 7: return "egress blocked"; + } + return "unknown-type"; +}; + +static char *ccix_port_err_cper_data(const char *c) +{ + const struct cper_ccix_port_err_compact *cpd = + (struct cper_ccix_port_err_compact *)c; + static char buf[1024]; + char *p = buf; + int i; + + if (!cpd->validation_bits) + return ""; + + p += sprintf(p, " ("); + if (cpd->validation_bits & CCIX_PORT_ERR_TYPE_VALID) + p += sprintf(p, "error: %s ", + ccix_port_err_type(cpd->err_type)); + if (cpd->validation_bits & CCIX_PORT_ERR_OP_VALID) + p += sprintf(p, "op: %s ", ccix_port_op(cpd->op_type)); + if (cpd->validation_bits & CCIX_PORT_ERR_MESSAGE_VALID) { + p += sprintf(p, "message: "); + for (i = 0; i < 8; i++) + p += sprintf(p, "0x%08x ", cpd->message[i]); + } + p += sprintf(p - 1, ")"); + + return buf; +} + static char *ccix_component_type(int type) { switch (type) { @@ -417,3 +469,44 @@ int ras_ccix_atc_event_handler(struct trace_seq *s, return 0; } + +int ras_ccix_port_event_handler(struct trace_seq *s, + struct pevent_record *record, + struct event_format *event, void *context) +{ + struct ras_events *ras = context; + struct tm *tm; + struct ras_ccix_event ev; + time_t now; + int ret; + + if (ras->use_uptime) + now = record->ts/user_hz + ras->uptime_diff; + else + now = time(NULL); + + tm = localtime(&now); + + if (tm) + strftime(ev.timestamp, sizeof(ev.timestamp), + "%Y-%m-%d %H:%M:%S %z", tm); + trace_seq_printf(s, "%s ", ev.timestamp); + ret = ras_ccix_common_parse(s, record, event, context, &ev); + if (ret) + return ret; + + trace_seq_printf(s, "%d %s id:%d CCIX Port error: %s ue:%d nocomm:%d degraded:%d deferred:%d physical addr: 0x%llx mask: 0x%llx %s", + ev.error_seq, err_severity(ev.severity), + ev.source, ccix_component_type(ev.component), + (ev.severity_detail & 0x1) ? 1 : 0, + (ev.severity_detail & 0x2) ? 1 : 0, + (ev.severity_detail & 0x4) ? 1 : 0, + (ev.severity_detail & 0x8) ? 1 : 0, + ev.address, + err_mask(ev.pa_mask_lsb), + ccix_port_err_cper_data(ev.cper_data)); + + ras_store_ccix_port_event(ras, &ev); + + return 0; +} diff --git a/ras-ccix-handler.h b/ras-ccix-handler.h index 4528af7..e824aed 100644 --- a/ras-ccix-handler.h +++ b/ras-ccix-handler.h @@ -27,6 +27,9 @@ int ras_ccix_cache_event_handler(struct trace_seq *s, int ras_ccix_atc_event_handler(struct trace_seq *s, struct pevent_record *record, struct event_format *event, void *context); +int ras_ccix_port_event_handler(struct trace_seq *s, + struct pevent_record *record, + struct event_format *event, void *context); /* Perhaps unnecessary paranoia, but the tracepoint structure is packed */ #pragma pack(1) @@ -65,6 +68,12 @@ struct cper_ccix_atc_err_compact { uint8_t instance; }; +struct cper_ccix_port_err_compact { + uint32_t validation_bits; + uint32_t message[8]; + uint8_t err_type; + uint8_t op_type; +}; #pragma pack() #define CCIX_MEM_ERR_GENERIC_MEM_VALID 0x0001 @@ -95,4 +104,9 @@ struct cper_ccix_atc_err_compact { #define CCIX_ATC_ERR_INSTANCE_ID_VALID 0x0002 #define CCIX_ATC_ERR_VENDOR_DATA_VALID 0x0004 +#define CCIX_PORT_ERR_OP_VALID 0x0001 +#define CCIX_PORT_ERR_TYPE_VALID 0x0002 +#define CCIX_PORT_ERR_MESSAGE_VALID 0x0004 +#define CCIX_PORT_ERR_VENDOR_DATA_VALID 0x0008 + #endif diff --git a/ras-events.c b/ras-events.c index 68ed246..83e28a7 100644 --- a/ras-events.c +++ b/ras-events.c @@ -208,6 +208,7 @@ int toggle_ras_mc_event(int enable) rc |= __toggle_ras_mc_event(ras, "ras", "ccix_memory_event", enable); rc |= __toggle_ras_mc_event(ras, "ras", "ccix_cache_event", enable); rc |= __toggle_ras_mc_event(ras, "ras", "ccix_atc_event", enable); + rc |= __toggle_ras_mc_event(ras, "ras", "ccix_port_event", enable); #endif #ifdef HAVE_MCE @@ -749,6 +750,14 @@ int handle_ras_events(int record_events) else log(ALL, LOG_ERR, "Can't get traces from %s:%s\n", "ras", "ccix_atc_event"); + rc = add_event_handler(ras, pevent, page_size, "ras", + "ccix_port_error_event", + ras_ccix_port_event_handler, NULL); + if (!rc) + num_events++; + else + log(ALL, LOG_ERR, "Can't get traces from %s:%s\n", + "ras", "ccix_port_event"); #endif #ifdef HAVE_NON_STANDARD diff --git a/ras-record-ccix.c b/ras-record-ccix.c index df68eef..e1c5df4 100644 --- a/ras-record-ccix.c +++ b/ras-record-ccix.c @@ -352,6 +352,76 @@ int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev) return rc; } +enum { + ccix_port_field_operation = ccix_field_common_end, + ccix_port_field_etype, + ccix_port_field_message, + ccix_port_field_vendor, +}; + +static const struct db_fields ccix_port_event_fields[] = { + CCIX_COMMON_FIELDS, + [ccix_port_field_operation] = { .name = "operation", .type = "INTEGER" }, + [ccix_port_field_etype] = { .name = "etype", .type = "INTEGER" }, + [ccix_port_field_message] = { .name = "message", .type = "BLOB" }, + [ccix_port_field_vendor] = { .name = "vendor_data", .type = "BLOB" }, +}; + +static const struct db_table_descriptor ccix_port_event_tab = { + .name = "ccix_port_event", + .fields = ccix_port_event_fields, + .num_fields = ARRAY_SIZE(ccix_port_event_fields), +}; + +int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev) +{ + int rc; + struct sqlite3_priv *priv = ras->db_priv; + struct cper_ccix_port_err_compact *port = + (struct cper_ccix_port_err_compact *)ev->cper_data; + sqlite3_stmt *rec = priv->stmt_ccix_port_record; + + if (!priv || !rec) + return 0; + log(TERM, LOG_INFO, "ccix_port_eventstore: %p\n", rec); + + ras_store_ccix_common(rec, ev); + if (port->validation_bits & CCIX_PORT_ERR_OP_VALID) + sqlite3_bind_int(rec, ccix_port_field_operation, port->op_type); + + if (port->validation_bits & CCIX_PORT_ERR_TYPE_VALID) + sqlite3_bind_int(rec, ccix_port_field_etype, port->err_type); + + if (port->validation_bits & CCIX_PORT_ERR_MESSAGE_VALID) + sqlite3_bind_blob(rec, ccix_port_field_message, + port->message, sizeof(port->message), NULL); + + if (port->validation_bits & CCIX_PORT_ERR_VENDOR_DATA_VALID) + sqlite3_bind_blob(rec, ccix_port_field_vendor, + ev->vendor_data, ev->vendor_data_length, + NULL); + + rc = sqlite3_step(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed to do ccix_port_record step on sqlite: error = %d\n", + rc); + + rc = sqlite3_reset(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed reset ccix_port_record on sqlite: error = %d\n", + rc); + + rc = sqlite3_clear_bindings(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed to clear ccix_port_record: error %d\n", + rc); + log(TERM, LOG_INFO, "register inserted at db\n"); + return rc; +} + void ras_ccix_create_table(struct sqlite3_priv *priv) { int rc; @@ -370,4 +440,9 @@ void ras_ccix_create_table(struct sqlite3_priv *priv) if (rc == SQLITE_OK) rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_atc_record, &ccix_atc_event_tab); + + rc = ras_mc_create_table(priv, &ccix_port_event_tab); + if (rc == SQLITE_OK) + rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_port_record, + &ccix_port_event_tab); } diff --git a/ras-record.h b/ras-record.h index c3b3586..778de25 100644 --- a/ras-record.h +++ b/ras-record.h @@ -127,6 +127,7 @@ struct sqlite3_priv { sqlite3_stmt *stmt_ccix_mem_record; sqlite3_stmt *stmt_ccix_cache_record; sqlite3_stmt *stmt_ccix_atc_record; + sqlite3_stmt *stmt_ccix_port_record; #endif #ifdef HAVE_NON_STANDARD sqlite3_stmt *stmt_non_standard_record; @@ -167,6 +168,7 @@ void ras_ccix_create_table(struct sqlite3_priv *priv); int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev); int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev); int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev); +int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev); int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev); int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev); int ras_store_devlink_event(struct ras_events *ras, struct devlink_event *ev); @@ -181,6 +183,7 @@ static inline void ras_ccix_create_table(void *priv) {}; static inline int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *ev) { return 0; }; static inline int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; static inline int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; +static inline int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; static inline int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; }; static inline int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev) { return 0; }; static inline int ras_store_devlink_event(struct ras_events *ras, struct devlink_event *ev) { return 0; }; From patchwork Tue Aug 27 11:30:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cameron X-Patchwork-Id: 11116743 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D2FED14E5 for ; Tue, 27 Aug 2019 11:31:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AF92321872 for ; Tue, 27 Aug 2019 11:31:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726140AbfH0LbB (ORCPT ); Tue, 27 Aug 2019 07:31:01 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:36932 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726420AbfH0LbB (ORCPT ); Tue, 27 Aug 2019 07:31:01 -0400 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 2563956575B8FB7F4310; Tue, 27 Aug 2019 19:30:59 +0800 (CST) Received: from lhrphicprd00229.huawei.com (10.123.41.22) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.439.0; Tue, 27 Aug 2019 19:30:51 +0800 From: Jonathan Cameron To: Mauro Carvalho Chehab , CC: , , , "Jonathan Cameron" Subject: [PATCH V2 5/6] rasdaemon: CCIX: Link error support Date: Tue, 27 Aug 2019 19:30:09 +0800 Message-ID: <20190827113010.50405-6-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190827113010.50405-1-Jonathan.Cameron@huawei.com> References: <20190827113010.50405-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.123.41.22] X-CFilter-Loop: Reflected Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org Add support for reporting and storing to sqlite3 of CCIX Link errors. Signed-off-by: Jonathan Cameron --- ras-ccix-handler.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++ ras-ccix-handler.h | 19 +++++++++ ras-events.c | 9 +++++ ras-record-ccix.c | 87 +++++++++++++++++++++++++++++++++++++++++ ras-record.h | 3 ++ 5 files changed, 214 insertions(+) diff --git a/ras-ccix-handler.c b/ras-ccix-handler.c index 0a79627..69baa48 100644 --- a/ras-ccix-handler.c +++ b/ras-ccix-handler.c @@ -272,6 +272,61 @@ static char *ccix_port_err_cper_data(const char *c) return buf; } +static char *ccix_link_err_type(uint8_t err) +{ + switch (err) { + case 0: return "generic"; + case 1: return "credit underflow"; + case 2: return "credit overflow"; + case 3: return "unusable credit"; + case 4: return "credit timeout"; + } + return "unknown"; +}; + +static char *ccix_link_credit(uint8_t credit) +{ + switch (credit) { + case 0: return "memory"; + case 1: return "snoop"; + case 2: return "data"; + case 3: return "misc"; + } + return "unknown"; +}; + +static char *ccix_link_err_cper_data(const char *c) +{ + const struct cper_ccix_link_err_compact *cpd = + (struct cper_ccix_link_err_compact *)c; + static char buf[1024]; + char *p = buf; + int i; + + if (!cpd->validation_bits) + return ""; + + p += sprintf(p, " ("); + if (cpd->validation_bits & CCIX_LINK_ERR_TYPE_VALID) + p += sprintf(p, "error: %s ", + ccix_link_err_type(cpd->err_type)); + if (cpd->validation_bits & CCIX_LINK_ERR_OP_VALID) + p += sprintf(p, "op: %s ", ccix_port_op(cpd->op_type)); + if (cpd->validation_bits & CCIX_LINK_ERR_LINK_ID_VALID) + p += sprintf(p, "id: %u ", cpd->link_id); + if (cpd->validation_bits & CCIX_LINK_ERR_CREDIT_TYPE_VALID) + p += sprintf(p, "credit-type: %s ", + ccix_link_credit(cpd->credit_type)); + if (cpd->validation_bits & CCIX_LINK_ERR_MESSAGE_VALID) { + p += sprintf(p, "message: "); + for (i = 0; i < 8; i++) + p += sprintf(p, "0x%08x ", cpd->message[i]); + } + p += sprintf(p - 1, ")"); + + return buf; +} + static char *ccix_component_type(int type) { switch (type) { @@ -510,3 +565,44 @@ int ras_ccix_port_event_handler(struct trace_seq *s, return 0; } + +int ras_ccix_link_event_handler(struct trace_seq *s, + struct pevent_record *record, + struct event_format *event, void *context) +{ + struct ras_events *ras = context; + struct tm *tm; + struct ras_ccix_event ev; + time_t now; + int ret; + + if (ras->use_uptime) + now = record->ts/user_hz + ras->uptime_diff; + else + now = time(NULL); + + tm = localtime(&now); + + if (tm) + strftime(ev.timestamp, sizeof(ev.timestamp), + "%Y-%m-%d %H:%M:%S %z", tm); + trace_seq_printf(s, "%s ", ev.timestamp); + ret = ras_ccix_common_parse(s, record, event, context, &ev); + if (ret) + return ret; + + trace_seq_printf(s, "%d %s id:%d CCIX Link error: %s ue:%d nocomm:%d degraded:%d deferred:%d physical addr: 0x%llx mask: 0x%llx %s", + ev.error_seq, err_severity(ev.severity), + ev.source, ccix_component_type(ev.component), + (ev.severity_detail & 0x1) ? 1 : 0, + (ev.severity_detail & 0x2) ? 1 : 0, + (ev.severity_detail & 0x4) ? 1 : 0, + (ev.severity_detail & 0x8) ? 1 : 0, + ev.address, + err_mask(ev.pa_mask_lsb), + ccix_link_err_cper_data(ev.cper_data)); + + ras_store_ccix_link_event(ras, &ev); + + return 0; +} diff --git a/ras-ccix-handler.h b/ras-ccix-handler.h index e824aed..3def534 100644 --- a/ras-ccix-handler.h +++ b/ras-ccix-handler.h @@ -30,6 +30,9 @@ int ras_ccix_atc_event_handler(struct trace_seq *s, int ras_ccix_port_event_handler(struct trace_seq *s, struct pevent_record *record, struct event_format *event, void *context); +int ras_ccix_link_event_handler(struct trace_seq *s, + struct pevent_record *record, + struct event_format *event, void *context); /* Perhaps unnecessary paranoia, but the tracepoint structure is packed */ #pragma pack(1) @@ -74,6 +77,15 @@ struct cper_ccix_port_err_compact { uint8_t err_type; uint8_t op_type; }; + +struct cper_ccix_link_err_compact { + uint32_t validation_bits; + uint32_t message[8]; + uint8_t err_type; + uint8_t op_type; + uint8_t link_id; + uint8_t credit_type; +}; #pragma pack() #define CCIX_MEM_ERR_GENERIC_MEM_VALID 0x0001 @@ -109,4 +121,11 @@ struct cper_ccix_port_err_compact { #define CCIX_PORT_ERR_MESSAGE_VALID 0x0004 #define CCIX_PORT_ERR_VENDOR_DATA_VALID 0x0008 +#define CCIX_LINK_ERR_OP_VALID 0x0001 +#define CCIX_LINK_ERR_TYPE_VALID 0x0002 +#define CCIX_LINK_ERR_LINK_ID_VALID 0x0004 +#define CCIX_LINK_ERR_CREDIT_TYPE_VALID 0x0008 +#define CCIX_LINK_ERR_MESSAGE_VALID 0x0010 +#define CCIX_LINK_ERR_VENDOR_DATA_VALID 0x0020 + #endif diff --git a/ras-events.c b/ras-events.c index 83e28a7..c73a36d 100644 --- a/ras-events.c +++ b/ras-events.c @@ -209,6 +209,7 @@ int toggle_ras_mc_event(int enable) rc |= __toggle_ras_mc_event(ras, "ras", "ccix_cache_event", enable); rc |= __toggle_ras_mc_event(ras, "ras", "ccix_atc_event", enable); rc |= __toggle_ras_mc_event(ras, "ras", "ccix_port_event", enable); + rc |= __toggle_ras_mc_event(ras, "ras", "ccix_link_event", enable); #endif #ifdef HAVE_MCE @@ -758,6 +759,14 @@ int handle_ras_events(int record_events) else log(ALL, LOG_ERR, "Can't get traces from %s:%s\n", "ras", "ccix_port_event"); + rc = add_event_handler(ras, pevent, page_size, "ras", + "ccix_link_error_event", + ras_ccix_link_event_handler, NULL); + if (!rc) + num_events++; + else + log(ALL, LOG_ERR, "Can't get traces from %s:%s\n", + "ras", "ccix_link_event"); #endif #ifdef HAVE_NON_STANDARD diff --git a/ras-record-ccix.c b/ras-record-ccix.c index e1c5df4..1e03e84 100644 --- a/ras-record-ccix.c +++ b/ras-record-ccix.c @@ -422,6 +422,88 @@ int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev) return rc; } +enum { + ccix_link_field_operation = ccix_field_common_end, + ccix_link_field_etype, + ccix_link_field_link_id, + ccix_link_field_credit_type, + ccix_link_field_message, + ccix_link_field_vendor, +}; + +static const struct db_fields ccix_link_event_fields[] = { + CCIX_COMMON_FIELDS, + [ccix_link_field_operation] = { .name = "operation", .type = "INTEGER" }, + [ccix_link_field_etype] = { .name = "etype", .type = "INTEGER" }, + [ccix_link_field_link_id] = { .name = "credit_id", .type = "INTEGER" }, + [ccix_link_field_credit_type] = { .name = "credit_type", .type = "INTEGER" }, + [ccix_link_field_message] = { .name = "message", .type = "BLOB" }, + [ccix_link_field_vendor] = { .name = "vendor_data", .type = "BLOB" }, +}; + +static const struct db_table_descriptor ccix_link_event_tab = { + .name = "ccix_link_event", + .fields = ccix_link_event_fields, + .num_fields = ARRAY_SIZE(ccix_link_event_fields), +}; + +int ras_store_ccix_link_event(struct ras_events *ras, struct ras_ccix_event *ev) +{ + int rc; + struct sqlite3_priv *priv = ras->db_priv; + struct cper_ccix_link_err_compact *link = + (struct cper_ccix_link_err_compact *)ev->cper_data; + sqlite3_stmt *rec = priv->stmt_ccix_link_record; + + if (!priv || !rec) + return 0; + log(TERM, LOG_INFO, "ccix_link_eventstore: %p\n", rec); + + ras_store_ccix_common(rec, ev); + if (link->validation_bits & CCIX_LINK_ERR_OP_VALID) + sqlite3_bind_int(rec, ccix_link_field_operation, link->op_type); + + if (link->validation_bits & CCIX_LINK_ERR_TYPE_VALID) + sqlite3_bind_int(rec, ccix_link_field_operation, + link->err_type); + + if (link->validation_bits & CCIX_LINK_ERR_LINK_ID_VALID) + sqlite3_bind_int(rec, ccix_link_field_link_id, link->link_id); + + if (link->validation_bits & CCIX_LINK_ERR_CREDIT_TYPE_VALID) + sqlite3_bind_int(rec, ccix_link_field_credit_type, + link->credit_type); + + if (link->validation_bits & CCIX_LINK_ERR_MESSAGE_VALID) + sqlite3_bind_blob(rec, ccix_link_field_message, + link->message, sizeof(link->message), NULL); + + if (link->validation_bits & CCIX_LINK_ERR_VENDOR_DATA_VALID) + sqlite3_bind_blob(rec, ccix_link_field_vendor, + ev->vendor_data, ev->vendor_data_length, + NULL); + + rc = sqlite3_step(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed to do ccix_link_record step on sqlite: error = %d\n", + rc); + + rc = sqlite3_reset(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed reset ccix_link_record on sqlite: error = %d\n", + rc); + + rc = sqlite3_clear_bindings(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed to clear ccix_link_record: error %d\n", + rc); + log(TERM, LOG_INFO, "register inserted at db\n"); + return rc; +} + void ras_ccix_create_table(struct sqlite3_priv *priv) { int rc; @@ -445,4 +527,9 @@ void ras_ccix_create_table(struct sqlite3_priv *priv) if (rc == SQLITE_OK) rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_port_record, &ccix_port_event_tab); + + rc = ras_mc_create_table(priv, &ccix_link_event_tab); + if (rc == SQLITE_OK) + rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_link_record, + &ccix_link_event_tab); } diff --git a/ras-record.h b/ras-record.h index 778de25..f13e286 100644 --- a/ras-record.h +++ b/ras-record.h @@ -128,6 +128,7 @@ struct sqlite3_priv { sqlite3_stmt *stmt_ccix_cache_record; sqlite3_stmt *stmt_ccix_atc_record; sqlite3_stmt *stmt_ccix_port_record; + sqlite3_stmt *stmt_ccix_link_record; #endif #ifdef HAVE_NON_STANDARD sqlite3_stmt *stmt_non_standard_record; @@ -169,6 +170,7 @@ int ras_store_ccix_memory_event(struct ras_events *ras, struct ras_ccix_event *e int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev); int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev); int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev); +int ras_store_ccix_link_event(struct ras_events *ras, struct ras_ccix_event *ev); int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev); int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev); int ras_store_devlink_event(struct ras_events *ras, struct devlink_event *ev); @@ -184,6 +186,7 @@ static inline int ras_store_ccix_memory_event(struct ras_events *ras, struct ras static inline int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; static inline int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; static inline int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; +static inline int ras_store_ccix_link_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; static inline int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; }; static inline int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev) { return 0; }; static inline int ras_store_devlink_event(struct ras_events *ras, struct devlink_event *ev) { return 0; }; From patchwork Tue Aug 27 11:30:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Cameron X-Patchwork-Id: 11116745 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB2A8112C for ; Tue, 27 Aug 2019 11:31:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BA3E5217F5 for ; Tue, 27 Aug 2019 11:31:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726420AbfH0LbI (ORCPT ); Tue, 27 Aug 2019 07:31:08 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:5667 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725793AbfH0LbI (ORCPT ); Tue, 27 Aug 2019 07:31:08 -0400 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 35077F285919A08D096B; Tue, 27 Aug 2019 19:31:04 +0800 (CST) Received: from lhrphicprd00229.huawei.com (10.123.41.22) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.439.0; Tue, 27 Aug 2019 19:30:53 +0800 From: Jonathan Cameron To: Mauro Carvalho Chehab , CC: , , , "Jonathan Cameron" Subject: [PATCH V2 6/6] rasdaemon: CCIX: Agent Internal error support Date: Tue, 27 Aug 2019 19:30:10 +0800 Message-ID: <20190827113010.50405-7-Jonathan.Cameron@huawei.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190827113010.50405-1-Jonathan.Cameron@huawei.com> References: <20190827113010.50405-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.123.41.22] X-CFilter-Loop: Reflected Sender: linux-edac-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org Add support for reporting and stroing to sqlite3 of CCIX Agent Interal errors. In the current 1.0 CCIX specification these only have vendor_data defined. However, they are structured to allow additional fields in future so we handle them the same way as all the other CCIX error types. Signed-off-by: Jonathan Cameron --- ras-ccix-handler.c | 40 ++++++++++++++++++++++++++++++ ras-ccix-handler.h | 8 ++++++ ras-events.c | 9 +++++++ ras-record-ccix.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++ ras-record.h | 3 +++ 5 files changed, 121 insertions(+) diff --git a/ras-ccix-handler.c b/ras-ccix-handler.c index 69baa48..2088790 100644 --- a/ras-ccix-handler.c +++ b/ras-ccix-handler.c @@ -606,3 +606,43 @@ int ras_ccix_link_event_handler(struct trace_seq *s, return 0; } + +int ras_ccix_agent_event_handler(struct trace_seq *s, + struct pevent_record *record, + struct event_format *event, void *context) +{ + struct ras_events *ras = context; + struct tm *tm; + struct ras_ccix_event ev; + time_t now; + int ret; + + if (ras->use_uptime) + now = record->ts/user_hz + ras->uptime_diff; + else + now = time(NULL); + + tm = localtime(&now); + + if (tm) + strftime(ev.timestamp, sizeof(ev.timestamp), + "%Y-%m-%d %H:%M:%S %z", tm); + trace_seq_printf(s, "%s ", ev.timestamp); + ret = ras_ccix_common_parse(s, record, event, context, &ev); + if (ret) + return ret; + + trace_seq_printf(s, "%d %s id:%d CCIX Agent Internal error: %s ue:%d nocomm:%d degraded:%d deferred:%d physical addr: 0x%llx mask: 0x%llx", + ev.error_seq, err_severity(ev.severity), + ev.source, ccix_component_type(ev.component), + (ev.severity_detail & 0x1) ? 1 : 0, + (ev.severity_detail & 0x2) ? 1 : 0, + (ev.severity_detail & 0x4) ? 1 : 0, + (ev.severity_detail & 0x8) ? 1 : 0, + ev.address, + err_mask(ev.pa_mask_lsb)); + + ras_store_ccix_agent_event(ras, &ev); + + return 0; +} diff --git a/ras-ccix-handler.h b/ras-ccix-handler.h index 3def534..c53e3ee 100644 --- a/ras-ccix-handler.h +++ b/ras-ccix-handler.h @@ -33,6 +33,9 @@ int ras_ccix_port_event_handler(struct trace_seq *s, int ras_ccix_link_event_handler(struct trace_seq *s, struct pevent_record *record, struct event_format *event, void *context); +int ras_ccix_agent_event_handler(struct trace_seq *s, + struct pevent_record *record, + struct event_format *event, void *context); /* Perhaps unnecessary paranoia, but the tracepoint structure is packed */ #pragma pack(1) @@ -86,6 +89,10 @@ struct cper_ccix_link_err_compact { uint8_t link_id; uint8_t credit_type; }; + +struct cper_ccix_agent_internal_err_compact { + uint32_t validation_bits; +}; #pragma pack() #define CCIX_MEM_ERR_GENERIC_MEM_VALID 0x0001 @@ -128,4 +135,5 @@ struct cper_ccix_link_err_compact { #define CCIX_LINK_ERR_MESSAGE_VALID 0x0010 #define CCIX_LINK_ERR_VENDOR_DATA_VALID 0x0020 +#define CCIX_AGENT_ERR_VENDOR_DATA_VALID 0x0001 #endif diff --git a/ras-events.c b/ras-events.c index c73a36d..4de28b7 100644 --- a/ras-events.c +++ b/ras-events.c @@ -210,6 +210,7 @@ int toggle_ras_mc_event(int enable) rc |= __toggle_ras_mc_event(ras, "ras", "ccix_atc_event", enable); rc |= __toggle_ras_mc_event(ras, "ras", "ccix_port_event", enable); rc |= __toggle_ras_mc_event(ras, "ras", "ccix_link_event", enable); + rc |= __toggle_ras_mc_event(ras, "ras", "ccix_agent_event", enable); #endif #ifdef HAVE_MCE @@ -767,6 +768,14 @@ int handle_ras_events(int record_events) else log(ALL, LOG_ERR, "Can't get traces from %s:%s\n", "ras", "ccix_link_event"); + rc = add_event_handler(ras, pevent, page_size, "ras", + "ccix_agent_error_event", + ras_ccix_agent_event_handler, NULL); + if (!rc) + num_events++; + else + log(ALL, LOG_ERR, "Cant' get traces from %s:%s\n", + "ras", "ccix_agent_error_event"); #endif #ifdef HAVE_NON_STANDARD diff --git a/ras-record-ccix.c b/ras-record-ccix.c index 1e03e84..79c6e52 100644 --- a/ras-record-ccix.c +++ b/ras-record-ccix.c @@ -504,6 +504,62 @@ int ras_store_ccix_link_event(struct ras_events *ras, struct ras_ccix_event *ev) return rc; } +enum { + ccix_agent_field_vendor = ccix_field_common_end, +}; + +static const struct db_fields ccix_agent_event_fields[] = { + CCIX_COMMON_FIELDS, + [ccix_agent_field_vendor] = { .name = "vendor_data", .type = "BLOB" }, +}; + +static const struct db_table_descriptor ccix_agent_event_tab = { + .name = "ccix_agent_event", + .fields = ccix_agent_event_fields, + .num_fields = ARRAY_SIZE(ccix_agent_event_fields), +}; + +int ras_store_ccix_agent_event(struct ras_events *ras, + struct ras_ccix_event *ev) +{ + int rc; + struct sqlite3_priv *priv = ras->db_priv; + struct cper_ccix_agent_internal_err_compact *agent = + (struct cper_ccix_agent_internal_err_compact *)ev->cper_data; + sqlite3_stmt *rec = priv->stmt_ccix_agent_record; + + if (!priv || !rec) + return 0; + log(TERM, LOG_INFO, "ccix_agent_eventstore: %p\n", rec); + + ras_store_ccix_common(rec, ev); + + if (agent->validation_bits & CCIX_AGENT_ERR_VENDOR_DATA_VALID) + sqlite3_bind_blob(rec, ccix_agent_field_vendor, + ev->vendor_data, ev->vendor_data_length, + NULL); + + rc = sqlite3_step(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed to do ccix_agent_record step on sqlite: error = %d\n", + rc); + + rc = sqlite3_reset(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed reset ccix_agent_record on sqlite: error = %d\n", + rc); + + rc = sqlite3_clear_bindings(rec); + if (rc != SQLITE_OK && rc != SQLITE_DONE) + log(TERM, LOG_ERR, + "Failed to clear ccix_agent_record: error %d\n", + rc); + log(TERM, LOG_INFO, "register inserted at db\n"); + return rc; +} + void ras_ccix_create_table(struct sqlite3_priv *priv) { int rc; @@ -532,4 +588,9 @@ void ras_ccix_create_table(struct sqlite3_priv *priv) if (rc == SQLITE_OK) rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_link_record, &ccix_link_event_tab); + + rc = ras_mc_create_table(priv, &ccix_agent_event_tab); + if (rc == SQLITE_OK) + rc = ras_mc_prepare_stmt(priv, &priv->stmt_ccix_agent_record, + &ccix_agent_event_tab); } diff --git a/ras-record.h b/ras-record.h index f13e286..4f78e1d 100644 --- a/ras-record.h +++ b/ras-record.h @@ -129,6 +129,7 @@ struct sqlite3_priv { sqlite3_stmt *stmt_ccix_atc_record; sqlite3_stmt *stmt_ccix_port_record; sqlite3_stmt *stmt_ccix_link_record; + sqlite3_stmt *stmt_ccix_agent_record; #endif #ifdef HAVE_NON_STANDARD sqlite3_stmt *stmt_non_standard_record; @@ -171,6 +172,7 @@ int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ccix_event *ev int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev); int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev); int ras_store_ccix_link_event(struct ras_events *ras, struct ras_ccix_event *ev); +int ras_store_ccix_agent_event(struct ras_events *ras, struct ras_ccix_event *ev); int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev); int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev); int ras_store_devlink_event(struct ras_events *ras, struct devlink_event *ev); @@ -187,6 +189,7 @@ static inline int ras_store_ccix_cache_event(struct ras_events *ras, struct ras_ static inline int ras_store_ccix_atc_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; static inline int ras_store_ccix_port_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; static inline int ras_store_ccix_link_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; +static inline int ras_store_ccix_agent_event(struct ras_events *ras, struct ras_ccix_event *ev) {return 0; }; static inline int ras_store_non_standard_record(struct ras_events *ras, struct ras_non_standard_event *ev) { return 0; }; static inline int ras_store_arm_record(struct ras_events *ras, struct ras_arm_event *ev) { return 0; }; static inline int ras_store_devlink_event(struct ras_events *ras, struct devlink_event *ev) { return 0; };