From patchwork Mon Jan 16 19:39:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13103610 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD313C54EBE for ; Mon, 16 Jan 2023 19:39:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D5346B0073; Mon, 16 Jan 2023 14:39:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 685DA6B0075; Mon, 16 Jan 2023 14:39:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 54D6A6B0078; Mon, 16 Jan 2023 14:39:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 478226B0073 for ; Mon, 16 Jan 2023 14:39:14 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id ECA641606B2 for ; Mon, 16 Jan 2023 19:39:13 +0000 (UTC) X-FDA: 80361675786.23.CB1013A Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf04.hostedemail.com (Postfix) with ESMTP id 4601840016 for ; Mon, 16 Jan 2023 19:39:12 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eBl7jimI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 33qfFYwgKCGkQPHXPfHUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jiaqiyan.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=33qfFYwgKCGkQPHXPfHUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jiaqiyan.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1673897952; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZXeY8E/2c5nWSDT/ivMx8LjfRNuYigTN+ho7oHQ71Zc=; b=GLBFCDZ6eaMY26mNEyCYQX1V+k28Vz4AJk/D2wIuXYkyaVYGqPR7wHAsYNxYHwy5SyApcW 064OafNt+Xe5i0l8L5nRqcttmp3GJoulymxIxznAlMrRAxt5NNwJW4JBW5ILc3IGvEStvY zK9xbK9p21iXzZc0kg2yr5COi9mMuwY= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eBl7jimI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of 33qfFYwgKCGkQPHXPfHUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jiaqiyan.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=33qfFYwgKCGkQPHXPfHUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jiaqiyan.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673897952; a=rsa-sha256; cv=none; b=MevATxeH89kjUPkeoF8dIoYJoPlGUfzi/JBneYDtxQNMRO/wlEXMe6w6pR4rLrIRWbooTt hKwlA68uja2N4gqS26UWLbIb96l52Y18mGfjPn3IzFRaLndTpYVIRZez8MvL2mKZl2EHyO zbly8iEBn7uYtj9nQ66456P+6fzKab0= Received: by mail-pj1-f74.google.com with SMTP id h6-20020a17090aa88600b00223fccff2efso20419008pjq.6 for ; Mon, 16 Jan 2023 11:39:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ZXeY8E/2c5nWSDT/ivMx8LjfRNuYigTN+ho7oHQ71Zc=; b=eBl7jimIv/AtfRQcfYm1wcj8VbI/BEWMzO7FwjlVd1vF+d2ku7J/Do3OHGH6tyWVlI 7BwB7JpaMiSnJCWZDOybw5aFiAwQgT5KiG3HA/fxC6d1WppL4mwrZhUO9QGyyYPRIQyn GvarQtt72TrJG058kQrdEN3X41O7HCvPtSEBh4769gOHPoQNPxR4+LeSR9uYFYJu0LDj Z0cbZ+0NHLkLP7UQrVlzgflmavKcb4UMvQuiWn8rgxlM+jAqEhojQE+QNwQOIWzC1cSO OyHDmA1XgsXw6GhIe9j85klvVqLCa6opzpWx2PEOApFOAkCqnZNVaVbpkI108TCF8U5e U9qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZXeY8E/2c5nWSDT/ivMx8LjfRNuYigTN+ho7oHQ71Zc=; b=c/l1JtWOoGg/PgMwU/ajLGxuXkkV883UXov/FiZPubyMZc+6JGtT6Xk5m6kMoNoUZH /RBYRW4ETXyhdOWcZR0AaI9GtT5b5Ah0TXfWep2KzhIB31+NZO7rul2f6CeeLHM72aFH KfzJqpO3TVvLQB/n3Wp/0QuxP3g5b6MgbxWRh2u4/ozVAuJmses877MP0MW5o6lqpQVe DOCDMm6nbCP3trz2Ctnf6yWHkjpYvQrwKz2S6i2bcFwAmoGWtbHVZthDEqa8Bzh2YNVX sjlLw1Z22BkAjrdIsqT7+VbVl73V1XrrcVl1YMVKNwf0x6K1k4IOU0i0AqYuIhtkcAWb JM6Q== X-Gm-Message-State: AFqh2ko064+51Ojrrpv+E6MwdDJgPSf5ETwUqAviLlOPgUbB6UItqobH +lWipVFCkCh09p2uHPnQTPRoqLDASao3ig== X-Google-Smtp-Source: AMrXdXuno/ymFCV9qdpwPbmJI0fUw50GtvCcyQlZEiSLqR25INbML+EBJBoCq3fdPTn04SAkmP42HpI7Zgktog== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a05:6a00:2c1:b0:58b:957a:5ec7 with SMTP id b1-20020a056a0002c100b0058b957a5ec7mr53390pft.39.1673897950995; Mon, 16 Jan 2023 11:39:10 -0800 (PST) Date: Mon, 16 Jan 2023 19:39:00 +0000 In-Reply-To: <20230116193902.1315236-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230116193902.1315236-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230116193902.1315236-2-jiaqiyan@google.com> Subject: [PATCH v1 1/3] mm: memory-failure: Add memory failure stats to sysfs From: Jiaqi Yan To: tony.luck@intel.com, naoya.horiguchi@nec.com Cc: jiaqiyan@google.com, duenwen@google.com, rientjes@google.com, linux-mm@kvack.org, shy828301@gmail.com, akpm@linux-foundation.org, wangkefeng.wang@huawei.com X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 4601840016 X-Stat-Signature: tizn8ubc6qkf17akzdkzo4o8fw9zer3m X-HE-Tag: 1673897952-407615 X-HE-Meta: U2FsdGVkX19mJBrZYwqWMUcYCZkR1an+Dj4+MixL+Ko78l8zRnmphS4P+lGOoF4M0NJsIMd4FNFpiRq5XB/r79uRKaRzG4AS0M6GBADG0yXc5iDh19unf/LMbaDZsnB7kdXz3rDShQOx3jPbdvmkULctz0nzRbgb9oXv/IUEDtVswm3547CcD8StUZHZuHw08dCV7tLQFRc2YCRB1K8oiTKJkFyHeso04WnjQ8tklJEfPVMK7RpOxVZrWTGI0qdTiiBwuCkfV16zUL8Yb6L0UYcMzhNZNL9mFRtx3lQ0tuASmNIiGD6Z+fryUoG84yroly72djJPwpvzFryj5pICNo98kP+GF6Lx1O5ZTj7ztrGpPeCV79G0rv1eVEoqCzg7+1zKd8MARcq5hLHXUXt/EvUhgP66XXuf3hz+wI6rVGgJifQXwZ5tTtjXrrikPL6Y8o2MNOi+9YXdsyr1gCROiSnXniZ+3/1Nv5hjoNbhCYvMafgB2FR/EzY8/8lVSPBxbkWzWqRpoMNz7Bsn4kzjystuHNO98pHZFFgitkMW5IRcujETzkX1Qc6ywTwmhsdapH35/a7gRVC2Pi0Xq1ZNzuoKWoxko4piivGWiK5arp8RNxBK6HiLsV2jS6ORmLduDdNC7k/jYwPi/DAg7B+YVqvyjYu8OK5S3kDjtx+gfP/VnLEg/52e9lz7JVm6aVMCmiyGgIjeBFs+y6sopx6nwRSk5/sXitukJbt1JTQ7wMCUmn3GIMwNiYEQkDSCmlCT5t1rK6VI1XwhhsEN5hbBK4E4lfvX2f9CCM/h5TBj2ZLT9wbqMOeHSVMsO/hL5EhXxOjF78zF/kWA5ZmaRlWxkkAlad7nL1bmoYZSX+pyvK8dvivpvhP/yt3QDAJYHFbNLJKMQTAMs1L2WD2FpJPYzOcauxzBFE8wRoeqKa0YToDzSIlMuEgOWGRW+g3PU0A5aY5QEi/c8Gghi2w/o44 dzhFrMDo XehdhtIpv18y1diHRVSFhLiU+UWHlBZy3xaQ1LySs5zLFZUhgbxXUI7C0EFME/fMbchcu7IYoN5sqcuae8SvMadP1imfahJSHko64vTQYBw4fQ46EzM9IbF7Got0HJYonAH9y8mTYZrIzxWyeTuFcdK3gkdHB358vpQlkMMxod81RlSQQ5HFN3yrIjjEMTFlMTUOTZ28TIc+cGqFumr9G3RsAN8tmwy4G5EjTZYoIZXZkyjfOVbxXvS86bFD96CunN5UWK8lsOuznJFCbwgC7VtKTJv2VUvokiEVYTd95YyiXMqAXayiBn9Ae+3gRjPDwVjC+Zssc5peyVT620nkZseYKpONqtlBE2eesSLj7c4pJZUqTIu1QAFEo8fBUBS108vO1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Today kernel provides following memory error info to userspace, but each has its own disadvantage * HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total, not per NUMA node stats though * ras:memory_failure_event: only available after explicitly enabled * /dev/mcelog provides many useful info about the MCEs, but doesn't capture how memory_failure recovered memory MCEs * kernel logs: userspace needs to process log text Exposes per NUMA node memory error stats as sysfs entries: /sys/devices/system/node/node${X}/memory_failure/pages_poisoned /sys/devices/system/node/node${X}/memory_failure/pages_recovered /sys/devices/system/node/node${X}/memory_failure/pages_ignored /sys/devices/system/node/node${X}/memory_failure/pages_failed /sys/devices/system/node/node${X}/memory_failure/pages_delayed These counters describe how many raw pages are poisoned and after the attempted recoveries by the kernel, their resolutions: how many are recovered, ignored, failed, or delayed respectively. The following math holds for the statistics: * pages_poisoned = pages_recovered + pages_ignored + pages_failed + pages_delayed * pages_poisoned * PAGE_SIZE = /proc/meminfo/HardwareCorrupted Acked-by: David Rientjes Signed-off-by: Jiaqi Yan --- drivers/base/node.c | 3 +++ include/linux/mm.h | 5 +++++ include/linux/mmzone.h | 28 ++++++++++++++++++++++++++++ mm/memory-failure.c | 35 +++++++++++++++++++++++++++++++++++ 4 files changed, 71 insertions(+) diff --git a/drivers/base/node.c b/drivers/base/node.c index faf3597a96da..b46db17124f3 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -586,6 +586,9 @@ static const struct attribute_group *node_dev_groups[] = { &node_dev_group, #ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP &arch_node_dev_group, +#endif +#ifdef CONFIG_MEMORY_FAILURE + &memory_failure_attr_group, #endif NULL }; diff --git a/include/linux/mm.h b/include/linux/mm.h index f3f196e4d66d..888576884eb9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3521,6 +3521,11 @@ enum mf_action_page_type { MF_MSG_UNKNOWN, }; +/* + * Sysfs entries for memory failure handling statistics. + */ +extern const struct attribute_group memory_failure_attr_group; + #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) extern void clear_huge_page(struct page *page, unsigned long addr_hint, diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index cd28a100d9e4..0a14b35a96da 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1110,6 +1110,31 @@ struct deferred_split { }; #endif +#ifdef CONFIG_MEMORY_FAILURE +/* + * Per NUMA node memory failure handling statistics. + */ +struct memory_failure_stats { + /* + * Number of pages poisoned. + * Cases not accounted: memory outside kernel control, offline page, + * arch-specific memory_failure (SGX), and hwpoison_filter() + * filtered error events. + */ + unsigned long pages_poisoned; + /* + * Recovery results of poisoned pages handled by memory_failure, + * in sync with mf_result. + * pages_poisoned = pages_ignored + pages_failed + + * pages_delayed + pages_recovered + */ + unsigned long pages_ignored; + unsigned long pages_failed; + unsigned long pages_delayed; + unsigned long pages_recovered; +}; +#endif + /* * On NUMA machines, each NUMA node would have a pg_data_t to describe * it's memory layout. On UMA machines there is a single pglist_data which @@ -1253,6 +1278,9 @@ typedef struct pglist_data { #ifdef CONFIG_NUMA struct memory_tier __rcu *memtier; #endif +#ifdef CONFIG_MEMORY_FAILURE + struct memory_failure_stats mf_stats; +#endif } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index c77a9e37e27e..cb782fa552d5 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -87,6 +87,41 @@ inline void num_poisoned_pages_sub(unsigned long pfn, long i) memblk_nr_poison_sub(pfn, i); } +/** + * MF_ATTR_RO - Create sysfs entry for each memory failure statistics. + * @_name: name of the file in the per NUMA sysfs directory. + */ +#define MF_ATTR_RO(_name) \ +static ssize_t _name##_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + struct memory_failure_stats *mf_stats = \ + &NODE_DATA(dev->id)->mf_stats; \ + return sprintf(buf, "%lu\n", mf_stats->_name); \ +} \ +static DEVICE_ATTR_RO(_name) + +MF_ATTR_RO(pages_poisoned); +MF_ATTR_RO(pages_ignored); +MF_ATTR_RO(pages_failed); +MF_ATTR_RO(pages_delayed); +MF_ATTR_RO(pages_recovered); + +static struct attribute *memory_failure_attr[] = { + &dev_attr_pages_poisoned.attr, + &dev_attr_pages_ignored.attr, + &dev_attr_pages_failed.attr, + &dev_attr_pages_delayed.attr, + &dev_attr_pages_recovered.attr, + NULL, +}; + +const struct attribute_group memory_failure_attr_group = { + .name = "memory_failure", + .attrs = memory_failure_attr, +}; + /* * Return values: * 1: the page is dissolved (if needed) and taken off from buddy,