From patchwork Thu Nov 21 04:55:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tomohiro Misono X-Patchwork-Id: 13881652 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB769D75BB9 for ; Thu, 21 Nov 2024 04:56:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 442776B0085; Wed, 20 Nov 2024 23:56:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F2116B0088; Wed, 20 Nov 2024 23:56:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B9976B0089; Wed, 20 Nov 2024 23:56:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0D2AD6B0085 for ; Wed, 20 Nov 2024 23:56:12 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B8776AEBB6 for ; Thu, 21 Nov 2024 04:56:11 +0000 (UTC) X-FDA: 82808888946.23.BC33289 Received: from esa2.hc1455-7.c3s2.iphmx.com (esa2.hc1455-7.c3s2.iphmx.com [207.54.90.48]) by imf29.hostedemail.com (Postfix) with ESMTP id C01DC120012 for ; Thu, 21 Nov 2024 04:54:58 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=fujitsu.com header.s=fj2 header.b=bTBuK5Pt; spf=pass (imf29.hostedemail.com: domain of misono.tomohiro@fujitsu.com designates 207.54.90.48 as permitted sender) smtp.mailfrom=misono.tomohiro@fujitsu.com; dmarc=pass (policy=reject) header.from=fujitsu.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732164723; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=UvbErPzxJz9ZcHAHtx5qvIPEcgXaPK7npqnGJXj4/rg=; b=X4TPsWVLV8yj++jtB/kL4hp6TbLa318V29xc/HElqEF9KEJ2G8CuyOFWCVtaFV3+c0xGXx 62peOMo213M51IfWeR/ePdnFNcL69rRuH8QXmK6IlRMTx/EMWNUW7giadY4dH/73YyXhXx 6xgwlasZI1M0ZXuHtDgMOtIkh8CEu4o= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=fujitsu.com header.s=fj2 header.b=bTBuK5Pt; spf=pass (imf29.hostedemail.com: domain of misono.tomohiro@fujitsu.com designates 207.54.90.48 as permitted sender) smtp.mailfrom=misono.tomohiro@fujitsu.com; dmarc=pass (policy=reject) header.from=fujitsu.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732164723; a=rsa-sha256; cv=none; b=bDL27y49ibesJDGIRgNhBe8E9HB/7TGjHcJi+67kkQzuijng6C0Xzi1LDmaPDCSamW5/NJ SXSnBRNFstlF8tQN8qAjSEQis/cvA40+vjesnFj/yR5O4FIyiXAzkN/f26xQsN38/dGtzC doHeJhdr3s2CnzifPptQMmIn1Td0bwg= DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=fujitsu.com; i=@fujitsu.com; q=dns/txt; s=fj2; t=1732164970; x=1763700970; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=ztU4UKuOiinqRIzKTE6P8obzPmKKFoc3AYyl5EH9UjA=; b=bTBuK5PtjqwAZM5Qod5YpOdyoalkKg/RkMddUP2L//EDTiPHQwffOztN zPOYwogwjCFvtaPWkiR/t8DZ2J5R2zeEv5ubDkJwX74Phxs6NwAEXegOM Bk6rj9n5yDbF/MIZeDH+EbiLZyZIZlQeJWM7gGVdr31UEQpq0FSHuFnL/ gKSU4TYR8NPcC7IlheBudgoD7vEHC7w1shEPWzuLGzgbu6Sq8LdwoA364 OQBwHA3h6WyHWULHC3jt41N5kXRfOCAwnqt0ZKyPmKBaequDb8cypmKEu HnERUh5Xksm1+PpGDbVRqweTOboq1ln0YlQMfReN1oQZFpJVv1KjY/NuF A==; X-CSE-ConnectionGUID: y+0wG7ENQ2eIgzIcI/LYcQ== X-CSE-MsgGUID: 5nR0UTBYSQmh5iSjXCDQxA== X-IronPort-AV: E=McAfee;i="6700,10204,11262"; a="181185935" X-IronPort-AV: E=Sophos;i="6.12,171,1728918000"; d="scan'208";a="181185935" Received: from unknown (HELO oym-r1.gw.nic.fujitsu.com) ([210.162.30.89]) by esa2.hc1455-7.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Nov 2024 13:56:08 +0900 Received: from oym-m4.gw.nic.fujitsu.com (oym-nat-oym-m4.gw.nic.fujitsu.com [192.168.87.61]) by oym-r1.gw.nic.fujitsu.com (Postfix) with ESMTP id 69730D480B for ; Thu, 21 Nov 2024 13:56:05 +0900 (JST) Received: from yto-om2.fujitsu.com (yto-om2.o.css.fujitsu.com [10.128.89.163]) by oym-m4.gw.nic.fujitsu.com (Postfix) with ESMTP id F08FFD5E3A for ; Thu, 21 Nov 2024 13:56:04 +0900 (JST) Received: from sm-x86-mem01.ssoft.mng.com (sm-x86-stp01.soft.fujitsu.com [10.124.178.20]) by yto-om2.fujitsu.com (Postfix) with ESMTP id B5A864005E5D5; Thu, 21 Nov 2024 13:56:04 +0900 (JST) From: Tomohiro Misono To: Andrew Morton , Miaohe Lin , Naoya Horiguchi Cc: jiaqiyan@google.com, misono.tomohiro@fujitsu.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH] mm: memory-failure: add soft-offline stat in mf_stats Date: Thu, 21 Nov 2024 04:55:04 +0000 Message-Id: <20241121045504.2233544-1-misono.tomohiro@fujitsu.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C01DC120012 X-Stat-Signature: 7ri7c4hqjw4nx638tfchrfgk6yn94ha3 X-Rspam-User: X-HE-Tag: 1732164898-817509 X-HE-Meta: U2FsdGVkX184nsybStBMBGX6ii2vz/KtPX0Gz89buK7x+BzCqpLsJOEx33lULjRu4ThI41W3aSp/eBTvs7W2Ho/w6edphncCS+lr9jBZJZjVyPbmbs73G5Lhnh2IYOiB/M7965RTTZafDVAXFhAnzf7pnKqH4ZT0SjFOxVCccGAc1IAorVY7pcmWeLwxBkzPzFRKIM6vQic7m1/kDcd9Wawlzg8dyhuawiqDVFlNXBeo+KpbSG4fQP5IUar7ATGr+IFIDa5PHR1yVIhg8u+GhtIPGv1dWOt4oY84/jZp+/ULfuXAjrRPiSYkvkOGjkyizAKUZCmf8gV5F80M60D9cpCPThrXP2PZcfABWAh8FdbA/duN4YUdFt00JtzI2gAjB4nng+TYEx88y+tU109x7Ze92IsIUMayAw1DRnvyz+RnL2jKpBvpDQnpkV2T+go3ky7xjaMtRxcQbMRUTGT7U/j9GqlUiAFJAkQH1y4MYFgubGfLmgR1F03SYFQvx7YLrvmOC9e/0FuImqWmqfSRDXk500mmPmYZxORDeiUcQz0EEY/Gm5u2gKrNYGQUrpoLF/JF3TeFFufMZLyBwpqadIAItnfoAWc1NRLwgAQdka+Ggv03tHwdS4CvuzElyk/7Leu3JkaHUsj8WfWMpps94H1L91Qf0H9li2YXaajBLz7AHxlNXm18UuD2VzsRrE5obWKa+Os2WDPZ1Ibd0wxrNxFN7bbl5zZ3xb+EckSN1tPt7vrjfw2RV1vcZyxVLhgnrYL2vZYDmKKGdiDUFKZ+5+c88hpwWzss2SOHfuiAYm3/lqdBdB/ZQ/ejaRpSNPlEtEewaHWsqGuZG+KSu9VqPkRIMPamG3v7D8rqtk/CvqwxBtMCIlJA0iWqFk5SC2LkRTp6LnoDWbrGhV6Z1HIetfuXej6HbOXHxZTTpa923Rm1LU7VrFQszk0gJcKMGUDyv8Bxg0wmE7OIwosEVdF G7uZt9bi UAHC8hs4JGhwqvzuki7BKUrm+BGtTcCrm6KTNrRqNp5jpgFRQGbEJFvSXjA4Rj4SWtdXXgek9dzgliKxoZAzRTgoor5nUH332ekEj4kEyVEooynm80oXSKpL9KFi7BbbYgmMOCmgiZDiHaMxtpWEH9WO6th5QqeRY7wjEqmvg5aJzYAWNWY3qL5ADq79wpe9HEVk/mYTmoNKhywhR5sVsFnOM4huV+QOF1ApfOXeeqf8SMqAYkv2Hbn4XkMjNsDvt87nbxW4ZsZxGDT911eF+XA8Ta66YjqJ7VrFMMrMSr/JBpqP9pzo1hdRyVORbnW2RjF1QCvprcnT8k/C6wpr6Y4jXtUU04NklZDQANVgwMDEyV7cN+9uI8cJtFbUoHHfoyfXM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: commit 44b8f8bf2438 ("mm: memory-failure: add memory failure stats to sysfs") introduces per NUMA memory error stats which show breakdown of HardwareCorrupted of /proc/meminfo in /sys/devices/system/node/nodeX/memory_failure. However, HardwareCorrupted also counts soft-offline pages. So, add soft-offline stats in mf_stats too to represent more accurate status. This updates total count as: total = recovered + ignored + failed + delayed + soft_offline Test example: 1) # grep HardwareCorrupted /proc/meminfo HardwareCorrupted: 0 kB 2) soft-offline 1 page by madvise(MADV_SOFT_OFFLINE) 3) # grep HardwareCorrupted /proc/meminfo HardwareCorrupted: 4 kB # grep -r "" /sys/devices/system/node/node0/memory_failure /sys/devices/system/node/node0/memory_failure/total:1 /sys/devices/system/node/node0/memory_failure/soft_offline:1 /sys/devices/system/node/node0/memory_failure/recovered:0 /sys/devices/system/node/node0/memory_failure/ignored:0 /sys/devices/system/node/node0/memory_failure/failed:0 /sys/devices/system/node/node0/memory_failure/delayed:0 Signed-off-by: Tomohiro Misono --- Hello This is RFC because I'm not sure adding SOFT_OFFLINE in enum mf_result is a right approach. Also, maybe is it better to move update_per_node_mf_stats() into num_poisoned_pages_inc()? I omitted some cleanups and sysfs doc update in this version to highlight changes. I'd appreciate any suggestions. Regards, Tomohiro Misono include/linux/mm.h | 2 ++ include/linux/mmzone.h | 4 +++- mm/memory-failure.c | 9 +++++++++ 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5d6cd523c7c0..7f93f6883760 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3991,6 +3991,8 @@ enum mf_result { MF_FAILED, /* Error: handling failed */ MF_DELAYED, /* Will be handled later */ MF_RECOVERED, /* Successfully recovered */ + + MF_RES_SOFT_OFFLINE, /* Soft-offline */ }; enum mf_action_page_type { diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b36124145a16..6a030610cba3 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1282,13 +1282,15 @@ struct memory_failure_stats { /* * Recovery results of poisoned raw pages handled by memory_failure, * in sync with mf_result. - * total = ignored + failed + delayed + recovered. + * total = ignored + failed + delayed + recovered + soft_offline. * total * PAGE_SIZE * #nodes = /proc/meminfo/HardwareCorrupted. */ unsigned long ignored; unsigned long failed; unsigned long delayed; unsigned long recovered; + + unsigned long soft_offline; }; #endif diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a7b8ccd29b6f..02f845a222cc 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -109,6 +109,7 @@ MF_ATTR_RO(ignored); MF_ATTR_RO(failed); MF_ATTR_RO(delayed); MF_ATTR_RO(recovered); +MF_ATTR_RO(soft_offline); static struct attribute *memory_failure_attr[] = { &dev_attr_total.attr, @@ -116,6 +117,7 @@ static struct attribute *memory_failure_attr[] = { &dev_attr_failed.attr, &dev_attr_delayed.attr, &dev_attr_recovered.attr, + &dev_attr_soft_offline.attr, NULL, }; @@ -185,6 +187,9 @@ static int __page_handle_poison(struct page *page) return ret; } +static void update_per_node_mf_stats(unsigned long pfn, + enum mf_result result); + static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release) { if (hugepage_or_freepage) { @@ -208,6 +213,7 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo put_page(page); page_ref_inc(page); num_poisoned_pages_inc(page_to_pfn(page)); + update_per_node_mf_stats(page_to_pfn(page), MF_RES_SOFT_OFFLINE); return true; } @@ -1314,6 +1320,9 @@ static void update_per_node_mf_stats(unsigned long pfn, case MF_RECOVERED: ++mf_stats->recovered; break; + case MF_RES_SOFT_OFFLINE: + ++mf_stats->soft_offline; + break; default: WARN_ONCE(1, "Memory failure: mf_result=%d is not properly handled", result); break;