From patchwork Wed Oct 12 21:28:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alison Schofield X-Patchwork-Id: 13005457 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD010C433FE for ; Wed, 12 Oct 2022 21:28:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229792AbiJLV21 (ORCPT ); Wed, 12 Oct 2022 17:28:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229495AbiJLV20 (ORCPT ); Wed, 12 Oct 2022 17:28:26 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B4251187B8; Wed, 12 Oct 2022 14:28:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665610106; x=1697146106; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GDE0F2gRE+3RNCoyAc/vJbM1NA8sddfEo7OjEj0BcYs=; b=arHSSmtW342f62+QrpArmWllg6/Y4RJvSJxnlL+QqJBYym8Pw22S2z9d +bmMlFV3REYJ6AJkk3U8gwhlqv649kU6EP0QPSSfHE0Mz56mUjZ3FM3Ch 57/OX9p803cNPYVHNBVxrYyFnFvuVpvNsA5xJ6P5yZW7CL7lBbNFux4Pz uEiTllMff+Yj78gbR6JAkuu56cOChvwqzYWHzw2re/8BfW15Vkp7tL0CG 76dpDe257LptcG7i/qE2Ha282eaaJwLrWrP5wGUjnbYqZXY3Y/7mdFoMa lSR6oJCpBItvBXZistnFDxZKf34FYua05o3iKKEd26A+Axh4d777rvjsl g==; X-IronPort-AV: E=McAfee;i="6500,9779,10498"; a="306543870" X-IronPort-AV: E=Sophos;i="5.95,180,1661842800"; d="scan'208";a="306543870" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2022 14:28:25 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10498"; a="689834231" X-IronPort-AV: E=Sophos;i="5.95,180,1661842800"; d="scan'208";a="689834231" Received: from aschofie-mobl2.amr.corp.intel.com (HELO localhost) ([10.251.3.191]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2022 14:28:24 -0700 From: alison.schofield@intel.com To: Dan Williams , Ira Weiny , Vishal Verma , Dave Jiang , Ben Widawsky , Steven Rostedt , Ingo Molnar Cc: Alison Schofield , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 1/4] trace, cxl: Introduce a TRACE_EVENT for CXL poison records Date: Wed, 12 Oct 2022 14:28:17 -0700 Message-Id: <17ee0f309e4287510e4e68f2cbcfc9d111a6e69d.1665606782.git.alison.schofield@intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org From: Alison Schofield CXL devices may support the retrieval of a device poison list. Introduce a trace event that the CXL subsystem can use to log the error records. Signed-off-by: Alison Schofield --- include/trace/events/cxl.h | 88 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 88 insertions(+) create mode 100644 include/trace/events/cxl.h diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h new file mode 100644 index 000000000000..9613b0f18011 --- /dev/null +++ b/include/trace/events/cxl.h @@ -0,0 +1,88 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM cxl + +#if !defined(_CXL_TRACE_H) || defined(TRACE_HEADER_MULTI_READ) +#define _CXL_TRACE_H + +#include + +/* CXL 8.2.9.5.4.1 Get Poison List: Poison Source */ +#define CXL_POISON_SOURCE_UNKNOWN 0 +#define CXL_POISON_SOURCE_EXTERNAL 1 +#define CXL_POISON_SOURCE_INTERNAL 2 +#define CXL_POISON_SOURCE_INJECTED 3 +#define CXL_POISON_SOURCE_VENDOR 7 + +#define show_poison_source(source) \ + __print_symbolic(source, \ + { CXL_POISON_SOURCE_UNKNOWN, "Unknown" }, \ + { CXL_POISON_SOURCE_EXTERNAL, "External" }, \ + { CXL_POISON_SOURCE_INTERNAL, "Internal" }, \ + { CXL_POISON_SOURCE_INJECTED, "Injected" }, \ + { CXL_POISON_SOURCE_VENDOR, "Vendor" }) + +/* CXL 8.2.9.5.4.1 Get Poison List: Payload out flags */ +#define CXL_POISON_FLAG_MORE BIT(0) +#define CXL_POISON_FLAG_OVERFLOW BIT(1) +#define CXL_POISON_FLAG_SCANNING BIT(2) + +#define show_poison_flags(flags) \ + __print_flags(flags, "|", \ + { CXL_POISON_FLAG_MORE, "More" }, \ + { CXL_POISON_FLAG_OVERFLOW, "Overflow" }, \ + { CXL_POISON_FLAG_SCANNING, "Scanning" }) + +TRACE_EVENT(cxl_poison, + + TP_PROTO(pid_t pid, const char *region, const char *memdev, + const char *pcidev, u64 hpa, u64 dpa, u32 length, + u8 source, u8 flags, u64 overflow_t), + + TP_ARGS(pid, region, memdev, pcidev, hpa, dpa, + length, source, flags, overflow_t), + + TP_STRUCT__entry( + __field(pid_t, pid) + __string(region, region ? region : "") + __string(memdev, memdev) + __string(pcidev, pcidev) + __field(u64, hpa) + __field(u64, dpa) + __field(u32, length) + __field(u8, source) + __field(u8, flags) + __field(u64, overflow_t) + ), + + TP_fast_assign( + __entry->pid = pid; + __assign_str(region, region ? region : ""); + __assign_str(memdev, memdev); + __assign_str(pcidev, pcidev); + __entry->hpa = hpa; + __entry->dpa = dpa; + __entry->length = length; + __entry->source = source; + __entry->flags = flags; + __entry->overflow_t = overflow_t; + ), + + TP_printk("pid:%d region:%s memdev:%s pcidev:%s hpa:0x%llx dpa:0x%llx length:0x%x source:%s flags:%s overflow_time:%llu", + __entry->pid, + __get_str(region), + __get_str(memdev), + __get_str(pcidev), + __entry->hpa, + __entry->dpa, + __entry->length, + show_poison_source(__entry->source), + show_poison_flags(__entry->flags), + __entry->overflow_t) +); +#endif /* _CXL_TRACE_H */ + +/* This part must be outside protection */ +#undef TRACE_INCLUDE_FILE +#define TRACE_INCLUDE_FILE cxl +#include From patchwork Wed Oct 12 21:28:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alison Schofield X-Patchwork-Id: 13005458 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C930EC4332F for ; Wed, 12 Oct 2022 21:28:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229867AbiJLV2a (ORCPT ); Wed, 12 Oct 2022 17:28:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229826AbiJLV23 (ORCPT ); Wed, 12 Oct 2022 17:28:29 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7ED21187B8; Wed, 12 Oct 2022 14:28:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665610108; x=1697146108; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VrxHuMpgkOPkAI6Z8ozNVsBV1nAycHBihR/5S4R8jxc=; b=GGs0IY3C/Oj95ffxFE0ObSlE9iGthDJgOiQZt4Cbpv/Isqhv053K4n9Y sQXiAInLPizV0X1LTzk9v/klLCMpCYUyEROGT85fX/vYWuq7P0A9MONKT 6K0Y/Sawj6IJ0B9AjxpOkjhIsY6DUNyfcIHIo1UhhdyBgmRXkyYJG7GIO RMorSmw1/wmspPTWkHJ1GXDn3N2QpnNaPdElFPVlIXXMflywfpsWWJd2W rl3vhLO8FmJkXL4FHnpQLtiGD8zk+iR+/GYXsHSTLhjtVUHViOZ4AOaBB wEIOF2GlXuA1ISKElXhgWjkzBUsgEMPptYdPbbrepamFsXVuFjNWxriMZ w==; X-IronPort-AV: E=McAfee;i="6500,9779,10498"; a="306543877" X-IronPort-AV: E=Sophos;i="5.95,180,1661842800"; d="scan'208";a="306543877" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2022 14:28:27 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10498"; a="689834237" X-IronPort-AV: E=Sophos;i="5.95,180,1661842800"; d="scan'208";a="689834237" Received: from aschofie-mobl2.amr.corp.intel.com (HELO localhost) ([10.251.3.191]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2022 14:28:26 -0700 From: alison.schofield@intel.com To: Dan Williams , Ira Weiny , Vishal Verma , Dave Jiang , Ben Widawsky , Steven Rostedt , Ingo Molnar Cc: Alison Schofield , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 2/4] cxl/mbox: Add GET_POISON_LIST mailbox command Date: Wed, 12 Oct 2022 14:28:18 -0700 Message-Id: <54b9c0b570636c04f1caaff5ac66e56128568732.1665606782.git.alison.schofield@intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org From: Alison Schofield CXL devices maintain a list of locations that are poisoned or result in poison if the addresses are accessed by the host. Per the spec (CXL 3.0 8.2.9.8.4.1), the device returns this Poison list as a set of Media Error Records that include the source of the error, the starting device physical address and length. The length is the number of adjacent DPAs in the record and is in units of 64 bytes. Retrieve the list and log each Media Error Record as a trace event of type 'cxl_poison'. Use an optional 'region_name' parameter and include it in the trace event, to identify per region poison collection. Signed-off-by: Alison Schofield --- Trace field 'hpa' is always zero here. Another patch doing the address translation trails this one, and will fill it in. Happy to remove it here if the foreshadowing is unwanted. drivers/cxl/core/mbox.c | 69 +++++++++++++++++++++++++++++++++++++++++ drivers/cxl/cxlmem.h | 42 +++++++++++++++++++++++++ 2 files changed, 111 insertions(+) diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 40e3ccb2bf3e..f982645e35e4 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -9,6 +9,9 @@ #include "core.h" +#define CREATE_TRACE_POINTS +#include + static bool cxl_raw_allow_all; /** @@ -750,6 +753,7 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds) { /* See CXL 2.0 Table 175 Identify Memory Device Output Payload */ struct cxl_mbox_identify id; + __le32 val = 0; int rc; rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_IDENTIFY, NULL, 0, &id, @@ -769,6 +773,9 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds) cxlds->lsa_size = le32_to_cpu(id.lsa_size); memcpy(cxlds->firmware_version, id.fw_revision, sizeof(id.fw_revision)); + memcpy(&val, id.poison_list_max_mer, 3); + cxlds->poison_max = min_t(u32, le32_to_cpu(val), CXL_POISON_LIST_MAX); + return 0; } EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL); @@ -833,6 +840,67 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds) } EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, CXL); +int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len, + const char *region_name) +{ + struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct cxl_mbox_poison_payload_out *po; + struct cxl_mbox_poison_payload_in pi; + int nr_records = 0; + int rc, i; + + po = kvmalloc(cxlds->payload_size, GFP_KERNEL); + if (!po) + return -ENOMEM; + + pi.offset = cpu_to_le64(offset); + pi.length = cpu_to_le64(len); + + rc = mutex_lock_interruptible(&cxlds->poison_list_mutex); + if (rc) + goto out; + + do { + u64 overflow_t = 0; + + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_POISON, &pi, + sizeof(pi), po, cxlds->payload_size); + if (rc) + break; + + if (po->flags & CXL_POISON_FLAG_OVERFLOW) + overflow_t = le64_to_cpu(po->overflow_timestamp); + + for (i = 0; i < le16_to_cpu(po->count); i++) { + u32 len = le32_to_cpu(po->record[i].length) * + CXL_POISON_LEN_MULT; + u64 addr = le64_to_cpu(po->record[i].address); + u8 source = addr & CXL_POISON_SOURCE_MASK; + u64 dpa = addr & CXL_POISON_START_MASK; + u64 hpa = 0; + + trace_cxl_poison(current->pid, region_name, + dev_name(&cxlmd->dev), + dev_name(cxlds->dev), hpa, dpa, len, + source, po->flags, overflow_t); + } + + /* Protect against an uncleared _FLAG_MORE */ + nr_records = nr_records + le16_to_cpu(po->count); + if (nr_records >= cxlds->poison_max) { + dev_dbg(&cxlmd->dev, "Max Error Records reached: %d\n", + nr_records); + break; + } + } while (po->flags & CXL_POISON_FLAG_MORE); + + mutex_unlock(&cxlds->poison_list_mutex); +out: + kvfree(po); + return rc; +} +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison, CXL); + struct cxl_dev_state *cxl_dev_state_create(struct device *dev) { struct cxl_dev_state *cxlds; @@ -844,6 +912,7 @@ struct cxl_dev_state *cxl_dev_state_create(struct device *dev) } mutex_init(&cxlds->mbox_mutex); + mutex_init(&cxlds->poison_list_mutex); cxlds->dev = dev; return cxlds; diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index a83bb6782d23..f5c6992de236 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -192,6 +192,8 @@ struct cxl_endpoint_dvsec_info { * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register) * @lsa_size: Size of Label Storage Area * (CXL 2.0 8.2.9.5.1.1 Identify Memory Device) + * @poison_max: maximum media error records held in device cache + * @poison_list_mutex: Mutex to synchronize poison list retrieval * @mbox_mutex: Mutex to synchronize mailbox access. * @firmware_version: Firmware version for the memory device. * @enabled_cmds: Hardware commands found enabled in CEL. @@ -223,6 +225,8 @@ struct cxl_dev_state { size_t payload_size; size_t lsa_size; + u32 poison_max; + struct mutex poison_list_mutex; /* Protect reads of poison list */ struct mutex mbox_mutex; /* Protects device mailbox and firmware */ char firmware_version[0x10]; DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX); @@ -344,6 +348,42 @@ struct cxl_mbox_set_partition_info { #define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0) +struct cxl_mbox_poison_payload_in { + __le64 offset; + __le64 length; +} __packed; + +struct cxl_mbox_poison_payload_out { + u8 flags; + u8 rsvd1; + __le64 overflow_timestamp; + __le16 count; + u8 rsvd2[0x14]; + struct cxl_poison_record { + __le64 address; + __le32 length; + __le32 rsvd; + } __packed record[]; +} __packed; + +/* CXL 8.2.9.5.4.1 Get Poison List payload out flags */ +#define CXL_POISON_FLAG_MORE BIT(0) +#define CXL_POISON_FLAG_OVERFLOW BIT(1) +#define CXL_POISON_FLAG_SCANNING BIT(2) + +/* + * CXL 8.2.9.5.4.1 Get Poison List address field encodes both the + * starting address of poison, and the source of the poison. + */ +#define CXL_POISON_START_MASK GENMASK_ULL(63, 6) +#define CXL_POISON_SOURCE_MASK GENMASK(2, 0) + +/* CXL 8.2.9.5.4.1 Table 188: Length is in units of 64 bytes */ +#define CXL_POISON_LEN_MULT 64 + +/* Kernel maximum for a cache of media poison errors */ +#define CXL_POISON_LIST_MAX 1024 + /** * struct cxl_mem_command - Driver representation of a memory device command * @info: Command information as it exists for the UAPI @@ -378,6 +418,8 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds); struct cxl_dev_state *cxl_dev_state_create(struct device *dev); void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds); void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds); +int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len, + const char *region_name); #ifdef CONFIG_CXL_SUSPEND void cxl_mem_active_inc(void); void cxl_mem_active_dec(void); From patchwork Wed Oct 12 21:28:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alison Schofield X-Patchwork-Id: 13005460 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22B07C433FE for ; Wed, 12 Oct 2022 21:28:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229889AbiJLV2q (ORCPT ); Wed, 12 Oct 2022 17:28:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229896AbiJLV2b (ORCPT ); Wed, 12 Oct 2022 17:28:31 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A27A11C250; Wed, 12 Oct 2022 14:28:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665610109; x=1697146109; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bOroxabanSnVuEdHASl1PrlI0kMGlstbF9b5gHS+rIQ=; b=LqCE3UatRoWAVteTvycASj323EdfU20Bk8HuJw/zP9fLh6PF9AriHIiZ Rqiv4GORvKKDg3YkJ9JFxEf/yPa3TK0OEYbP00yKMgSes1LCQrvOBFinJ F5WtE/cnmuHiEtTXropXje4I1IY8MvnsRbfXmnM5WlszkLWfeaEo8bajE 3TUghmlobDj0DuNx9EUiRE1EjDe/t5DyOUOu+ZXNMrsQUQT6P/kYfAnpl JUINLiWiXUL53BN4Suu+YW5q5KTL2mCpO0X5NZiHR6YLvZMvmIBbpfhN8 raRcSfdxfxk7voa9HpiPyyK1e4B9YTLi9B6XegWBVTRVLOXYCbhS58sQE A==; X-IronPort-AV: E=McAfee;i="6500,9779,10498"; a="306543883" X-IronPort-AV: E=Sophos;i="5.95,180,1661842800"; d="scan'208";a="306543883" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2022 14:28:28 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10498"; a="689834245" X-IronPort-AV: E=Sophos;i="5.95,180,1661842800"; d="scan'208";a="689834245" Received: from aschofie-mobl2.amr.corp.intel.com (HELO localhost) ([10.251.3.191]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2022 14:28:27 -0700 From: alison.schofield@intel.com To: Dan Williams , Ira Weiny , Vishal Verma , Dave Jiang , Ben Widawsky , Steven Rostedt , Ingo Molnar Cc: Alison Schofield , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 3/4] cxl/memdev: Add trigger_poison_list sysfs attribute Date: Wed, 12 Oct 2022 14:28:19 -0700 Message-Id: <6dbadd279a2cd870638b2dbd0e463b1578009dfa.1665606782.git.alison.schofield@intel.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org From: Alison Schofield When a boolean 'true' is written to this attribute the memdev driver retrieves the poison list from the device. The list includes addresses that are poisoned, or would result in poison if accessed, and the source of the poison. This attribute is only visible for devices supporting the capability. The retrieved errors are logged as kernel trace events with the label 'cxl_poison'. Signed-off-by: Alison Schofield --- Documentation/ABI/testing/sysfs-bus-cxl | 14 +++++++++ drivers/cxl/core/memdev.c | 41 +++++++++++++++++++++++++ 2 files changed, 55 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index 0debe2955f34..ab3665f8738e 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -354,3 +354,17 @@ Description: 1), and checks that the hardware accepts the commit request. Reading this value indicates whether the region is committed or not. + + +What: /sys/bus/cxl/devices/memX/trigger_poison_list +Date: October, 2022 +KernelVersion: v6.2 +Contact: linux-cxl@vger.kernel.org +Description: + (WO) When a boolean 'true' is written to this attribute the + memdev driver retrieves the poison list from the device. The + list includes addresses that are poisoned or would result in + poison if accessed, and the source of the poison. This + attribute is only visible for devices supporting the + capability. The retrieved errors are logged as kernel + trace events with the label 'cxl_poison'. diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c index 20ce488a7754..06d265db5127 100644 --- a/drivers/cxl/core/memdev.c +++ b/drivers/cxl/core/memdev.c @@ -106,12 +106,45 @@ static ssize_t numa_node_show(struct device *dev, struct device_attribute *attr, } static DEVICE_ATTR_RO(numa_node); +static ssize_t trigger_poison_list_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_memdev *cxlmd = to_cxl_memdev(dev); + struct cxl_dev_state *cxlds = cxlmd->cxlds; + u64 offset, length; + bool tmp; + int rc; + + if (kstrtobool(buf, &tmp)) + return -EINVAL; + + /* Per CXL Spec, separate the pmem and ram poison list reads */ + if (resource_size(&cxlds->pmem_res)) { + offset = cxlds->pmem_res.start; + length = resource_size(&cxlds->pmem_res); + rc = cxl_mem_get_poison(cxlmd, offset, length, NULL); + if (rc) + return rc; + } + if (resource_size(&cxlds->ram_res)) { + offset = cxlds->ram_res.start; + length = resource_size(&cxlds->ram_res); + rc = cxl_mem_get_poison(cxlmd, offset, length, NULL); + if (rc) + return rc; + } + return len; +} +static DEVICE_ATTR_WO(trigger_poison_list); + static struct attribute *cxl_memdev_attributes[] = { &dev_attr_serial.attr, &dev_attr_firmware_version.attr, &dev_attr_payload_max.attr, &dev_attr_label_storage_size.attr, &dev_attr_numa_node.attr, + &dev_attr_trigger_poison_list.attr, NULL, }; @@ -130,6 +163,14 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a, { if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr) return 0; + + if (a == &dev_attr_trigger_poison_list.attr) { + struct device *dev = kobj_to_dev(kobj); + + if (!test_bit(CXL_MEM_COMMAND_ID_GET_POISON, + to_cxl_memdev(dev)->cxlds->enabled_cmds)) + return 0; + } return a->mode; } From patchwork Wed Oct 12 21:28:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alison Schofield X-Patchwork-Id: 13005459 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E4FDC4332F for ; Wed, 12 Oct 2022 21:28:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230024AbiJLV2q (ORCPT ); Wed, 12 Oct 2022 17:28:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230134AbiJLV2e (ORCPT ); Wed, 12 Oct 2022 17:28:34 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FDC21204E7; Wed, 12 Oct 2022 14:28:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1665610112; x=1697146112; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xLRDy1lhIF8KSdDHKDhLRtFvARkcXs+aDxSBTabc5hM=; b=JUJ1I/e2ZQDm37ZuhuNyG8jwK6XBMXWlo7oBd74tmyT7+y9kSRAFLB2A E96J0FfjvqafmWHTxUPAyMVbMMSJtEQNS0W72biHAYEtAkU8r5DZRumLw oc7GK5TXzB0OUzGy6VnVR5f86/bdQh04c1U/8sHIj/5yiBifItwSXSc/0 27XX3JAbaQugiy9cB1iYvV8yX3c7XrgajHRCeop8xnDLUFTdyG9jNaukb 4TFMi/PztWAcvzUE9mLxpLYbqxcSt0lynY+X2qvNURWL+WkACdTkye4ML vaoA7GsimW12hoTm6JP7yx/aXRfImpFlOiGAY/CK7GSTaWgtrDryLYnmh g==; X-IronPort-AV: E=McAfee;i="6500,9779,10498"; a="306543892" X-IronPort-AV: E=Sophos;i="5.95,180,1661842800"; d="scan'208";a="306543892" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2022 14:28:29 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10498"; a="689834258" X-IronPort-AV: E=Sophos;i="5.95,180,1661842800"; d="scan'208";a="689834258" Received: from aschofie-mobl2.amr.corp.intel.com (HELO localhost) ([10.251.3.191]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2022 14:28:28 -0700 From: alison.schofield@intel.com To: Dan Williams , Ira Weiny , Vishal Verma , Dave Jiang , Ben Widawsky , Steven Rostedt , Ingo Molnar Cc: Alison Schofield , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 4/4] cxl/region: Add trigger_poison_list sysfs attribute Date: Wed, 12 Oct 2022 14:28:20 -0700 Message-Id: X-Mailer: git-send-email 2.37.3 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-cxl@vger.kernel.org From: Alison Schofield When a boolean 'true' is written to this attribute the region driver retrieves the poison list for the capacity each device contributes to this region. The list includes addresses that are poisoned, or would result in poison if accessed, and the source of the poison. The retrieved errors are logged as kernel trace events with the label 'cxl_poison'. Devices not supporting the poison list capability are ignored. Signed-off-by: Alison Schofield --- Documentation/ABI/testing/sysfs-bus-cxl | 14 ++++++++++ drivers/cxl/core/region.c | 34 +++++++++++++++++++++++++ 2 files changed, 48 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl index ab3665f8738e..7e33f6ee4992 100644 --- a/Documentation/ABI/testing/sysfs-bus-cxl +++ b/Documentation/ABI/testing/sysfs-bus-cxl @@ -368,3 +368,17 @@ Description: attribute is only visible for devices supporting the capability. The retrieved errors are logged as kernel trace events with the label 'cxl_poison'. + + +What: /sys/bus/cxl/devices/regionZ/trigger_poison_list +Date: October, 2022 +KernelVersion: v6.2 +Contact: linux-cxl@vger.kernel.org +Description: + (WO) When a boolean 'true' is written to this attribute the + region driver retrieves the poison list for the capacity + each device contributes to this region. The list includes + addresses that are poisoned, or would result in poison if + accessed, and the source of the poison. The retrieved + errors are logged as kernel trace events with the label + 'cxl_poison'. diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index ad21b2aa3b0a..e20207934336 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -72,6 +72,38 @@ static int is_dup(struct device *match, void *data) return 0; } +static ssize_t trigger_poison_list_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + struct cxl_endpoint_decoder *cxled; + struct cxl_memdev *cxlmd; + u64 offset, length; + int rc, i; + bool tmp; + + if (kstrtobool(buf, &tmp)) + return -EINVAL; + + for (i = 0; i < p->nr_targets; i++) { + cxled = p->targets[i]; + cxlmd = cxled_to_memdev(cxled); + if (!test_bit(CXL_MEM_COMMAND_ID_GET_POISON, + cxlmd->cxlds->enabled_cmds)) + continue; + offset = cxl_dpa_resource(cxled); + length = cxl_dpa_size(cxled); + rc = cxl_mem_get_poison(cxlmd, offset, length, + dev_name(&cxlr->dev)); + if (rc) + return rc; + } + return len; +} +static DEVICE_ATTR_WO(trigger_poison_list); + static ssize_t uuid_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { @@ -282,6 +314,7 @@ static umode_t cxl_region_visible(struct kobject *kobj, struct attribute *a, if (a == &dev_attr_uuid.attr && cxlr->mode != CXL_DECODER_PMEM) return 0; + return a->mode; } @@ -555,6 +588,7 @@ static struct attribute *cxl_region_attrs[] = { &dev_attr_interleave_granularity.attr, &dev_attr_resource.attr, &dev_attr_size.attr, + &dev_attr_trigger_poison_list.attr, NULL, };