Message ID | 20230116193902.1315236-2-jiaqiyan@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Introduce per NUMA node memory error statistics | expand |
On Mon, 16 Jan 2023 19:39:00 +0000 Jiaqi Yan <jiaqiyan@google.com> wrote: > +/* > + * Per NUMA node memory failure handling statistics. > + */ > +struct memory_failure_stats { > + /* > + * Number of pages poisoned. > + * Cases not accounted: memory outside kernel control, offline page, > + * arch-specific memory_failure (SGX), and hwpoison_filter() > + * filtered error events. > + */ > + unsigned long pages_poisoned; > + /* > + * Recovery results of poisoned pages handled by memory_failure, > + * in sync with mf_result. > + * pages_poisoned = pages_ignored + pages_failed + > + * pages_delayed + pages_recovered > + */ > + unsigned long pages_ignored; > + unsigned long pages_failed; > + unsigned long pages_delayed; > + unsigned long pages_recovered; > +}; I don't think the "pages_" here add much value - a simple code comment saying "everything counts in pages" should suffice. If you're feeling deprived of columnar space, maybe just remove? Or not ;)
On Mon, Jan 16, 2023 at 07:39:00PM +0000, Jiaqi Yan wrote: > Today kernel provides following memory error info to userspace, but each > has its own disadvantage > * HardwareCorrupted in /proc/meminfo: number of bytes poisoned in total, > not per NUMA node stats though > * ras:memory_failure_event: only available after explicitly enabled > * /dev/mcelog provides many useful info about the MCEs, but > doesn't capture how memory_failure recovered memory MCEs > * kernel logs: userspace needs to process log text > > Exposes per NUMA node memory error stats as sysfs entries: > > /sys/devices/system/node/node${X}/memory_failure/pages_poisoned > /sys/devices/system/node/node${X}/memory_failure/pages_recovered > /sys/devices/system/node/node${X}/memory_failure/pages_ignored > /sys/devices/system/node/node${X}/memory_failure/pages_failed > /sys/devices/system/node/node${X}/memory_failure/pages_delayed > > These counters describe how many raw pages are poisoned and after the > attempted recoveries by the kernel, their resolutions: how many are > recovered, ignored, failed, or delayed respectively. > > The following math holds for the statistics: > * pages_poisoned = pages_recovered + pages_ignored + pages_failed + > pages_delayed > * pages_poisoned * PAGE_SIZE = /proc/meminfo/HardwareCorrupted > > Acked-by: David Rientjes <rientjes@google.com> > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> ... > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index cd28a100d9e4..0a14b35a96da 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -1110,6 +1110,31 @@ struct deferred_split { > }; > #endif > > +#ifdef CONFIG_MEMORY_FAILURE > +/* > + * Per NUMA node memory failure handling statistics. > + */ > +struct memory_failure_stats { > + /* > + * Number of pages poisoned. > + * Cases not accounted: memory outside kernel control, offline page, > + * arch-specific memory_failure (SGX), and hwpoison_filter() > + * filtered error events. > + */ Yes, this comment is important. So the sum of the pages_poisoned counters over NUMA nodes can be mismatched to the global counter shown in /proc/meminfo. But this makes code simple, and maybe the new stats info is useful enough even without supporting the special cases. So I'm OK with this. BTW, maybe "unpoison" can be also mentioned here? Thanks, Naoya Horiguchi
On Mon, Jan 16, 2023 at 12:15:33PM -0800, Andrew Morton wrote: > On Mon, 16 Jan 2023 19:39:00 +0000 Jiaqi Yan <jiaqiyan@google.com> wrote: > > > +/* > > + * Per NUMA node memory failure handling statistics. > > + */ > > +struct memory_failure_stats { > > + /* > > + * Number of pages poisoned. > > + * Cases not accounted: memory outside kernel control, offline page, > > + * arch-specific memory_failure (SGX), and hwpoison_filter() > > + * filtered error events. > > + */ > > + unsigned long pages_poisoned; > > + /* > > + * Recovery results of poisoned pages handled by memory_failure, > > + * in sync with mf_result. > > + * pages_poisoned = pages_ignored + pages_failed + > > + * pages_delayed + pages_recovered > > + */ > > + unsigned long pages_ignored; > > + unsigned long pages_failed; > > + unsigned long pages_delayed; > > + unsigned long pages_recovered; > > +}; > > I don't think the "pages_" here add much value - a simple code comment > saying "everything counts in pages" should suffice. If you're feeling > deprived of columnar space, maybe just remove? Or not ;) I personally feel more like removing the "pages_". And I feel that the main counter can be renamed with pages_total or total, because it show the relationship among counters, and the structure name "memory_failure_stats" already implies that the counter is about hwpoisoned pages. Thanks, Naoya Horiguchi
On Tue, Jan 17, 2023 at 1:14 AM HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote: > > On Mon, Jan 16, 2023 at 12:15:33PM -0800, Andrew Morton wrote: > > On Mon, 16 Jan 2023 19:39:00 +0000 Jiaqi Yan <jiaqiyan@google.com> wrote: > > > > > +/* > > > + * Per NUMA node memory failure handling statistics. > > > + */ > > > +struct memory_failure_stats { > > > + /* > > > + * Number of pages poisoned. > > > + * Cases not accounted: memory outside kernel control, offline page, > > > + * arch-specific memory_failure (SGX), and hwpoison_filter() > > > + * filtered error events. > > > + */ > > > + unsigned long pages_poisoned; > > > + /* > > > + * Recovery results of poisoned pages handled by memory_failure, > > > + * in sync with mf_result. > > > + * pages_poisoned = pages_ignored + pages_failed + > > > + * pages_delayed + pages_recovered > > > + */ > > > + unsigned long pages_ignored; > > > + unsigned long pages_failed; > > > + unsigned long pages_delayed; > > > + unsigned long pages_recovered; > > > +}; > > > > I don't think the "pages_" here add much value - a simple code comment > > saying "everything counts in pages" should suffice. If you're feeling > > deprived of columnar space, maybe just remove? Or not ;) > > I personally feel more like removing the "pages_". And I feel that the > main counter can be renamed with pages_total or total, because it show the > relationship among counters, and the structure name "memory_failure_stats" > already implies that the counter is about hwpoisoned pages. > > Thanks, > Naoya Horiguchi Thanks, v2 will 1) rename pages_poisoned to total and 2) remove the "pages_" prefix from the rest counters.
diff --git a/drivers/base/node.c b/drivers/base/node.c index faf3597a96da..b46db17124f3 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -586,6 +586,9 @@ static const struct attribute_group *node_dev_groups[] = { &node_dev_group, #ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP &arch_node_dev_group, +#endif +#ifdef CONFIG_MEMORY_FAILURE + &memory_failure_attr_group, #endif NULL }; diff --git a/include/linux/mm.h b/include/linux/mm.h index f3f196e4d66d..888576884eb9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3521,6 +3521,11 @@ enum mf_action_page_type { MF_MSG_UNKNOWN, }; +/* + * Sysfs entries for memory failure handling statistics. + */ +extern const struct attribute_group memory_failure_attr_group; + #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) extern void clear_huge_page(struct page *page, unsigned long addr_hint, diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index cd28a100d9e4..0a14b35a96da 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1110,6 +1110,31 @@ struct deferred_split { }; #endif +#ifdef CONFIG_MEMORY_FAILURE +/* + * Per NUMA node memory failure handling statistics. + */ +struct memory_failure_stats { + /* + * Number of pages poisoned. + * Cases not accounted: memory outside kernel control, offline page, + * arch-specific memory_failure (SGX), and hwpoison_filter() + * filtered error events. + */ + unsigned long pages_poisoned; + /* + * Recovery results of poisoned pages handled by memory_failure, + * in sync with mf_result. + * pages_poisoned = pages_ignored + pages_failed + + * pages_delayed + pages_recovered + */ + unsigned long pages_ignored; + unsigned long pages_failed; + unsigned long pages_delayed; + unsigned long pages_recovered; +}; +#endif + /* * On NUMA machines, each NUMA node would have a pg_data_t to describe * it's memory layout. On UMA machines there is a single pglist_data which @@ -1253,6 +1278,9 @@ typedef struct pglist_data { #ifdef CONFIG_NUMA struct memory_tier __rcu *memtier; #endif +#ifdef CONFIG_MEMORY_FAILURE + struct memory_failure_stats mf_stats; +#endif } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index c77a9e37e27e..cb782fa552d5 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -87,6 +87,41 @@ inline void num_poisoned_pages_sub(unsigned long pfn, long i) memblk_nr_poison_sub(pfn, i); } +/** + * MF_ATTR_RO - Create sysfs entry for each memory failure statistics. + * @_name: name of the file in the per NUMA sysfs directory. + */ +#define MF_ATTR_RO(_name) \ +static ssize_t _name##_show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + struct memory_failure_stats *mf_stats = \ + &NODE_DATA(dev->id)->mf_stats; \ + return sprintf(buf, "%lu\n", mf_stats->_name); \ +} \ +static DEVICE_ATTR_RO(_name) + +MF_ATTR_RO(pages_poisoned); +MF_ATTR_RO(pages_ignored); +MF_ATTR_RO(pages_failed); +MF_ATTR_RO(pages_delayed); +MF_ATTR_RO(pages_recovered); + +static struct attribute *memory_failure_attr[] = { + &dev_attr_pages_poisoned.attr, + &dev_attr_pages_ignored.attr, + &dev_attr_pages_failed.attr, + &dev_attr_pages_delayed.attr, + &dev_attr_pages_recovered.attr, + NULL, +}; + +const struct attribute_group memory_failure_attr_group = { + .name = "memory_failure", + .attrs = memory_failure_attr, +}; + /* * Return values: * 1: the page is dissolved (if needed) and taken off from buddy,