From patchwork Wed Oct 20 21:07:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 12573367 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87030C433F5 for ; Wed, 20 Oct 2021 21:08:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4570760F9E for ; Wed, 20 Oct 2021 21:08:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4570760F9E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id DA44094000B; Wed, 20 Oct 2021 17:08:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D0D4E940009; Wed, 20 Oct 2021 17:08:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B30EE94000B; Wed, 20 Oct 2021 17:08:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0113.hostedemail.com [216.40.44.113]) by kanga.kvack.org (Postfix) with ESMTP id A00A7940009 for ; Wed, 20 Oct 2021 17:08:08 -0400 (EDT) Received: from smtpin36.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6482082499A8 for ; Wed, 20 Oct 2021 21:08:08 +0000 (UTC) X-FDA: 78718053456.36.C427FB5 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf21.hostedemail.com (Postfix) with ESMTP id 1A569D042B5A for ; Wed, 20 Oct 2021 21:08:06 +0000 (UTC) Received: by mail-pg1-f177.google.com with SMTP id s136so20399622pgs.4 for ; Wed, 20 Oct 2021 14:08:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7O2jemVYe0cdkuCpXRsCTx09Gh0RPAfbpmg1m/i8Imk=; b=qgzRo9cTGlk9YDJCuVhg/rOQ5tvo0oepgrPcqRlLrzwGEWDG6hhjkZLkWaUTYecWUc zGovGv89xq6siwFDIUH15dWBH+vR+yF4x1Dc5jaz1XiB9sd4yQWCHzeRZnGufQoLQuDL gDemw7tufwdXcaSChFSOLqVciBnLUdVHRiGWTN8LPdYreag6ewu8x2MJPK/23VRRinqc 5OFeLyB6NedfZxrmsfUJdJUBAbQZOUPfqhtccwifaoI0nxR11YYCZeraWxNTT8cPj3SS /szKISBseE64pjUWFPgcPAS5LkLtbM8kpD1O4lC4iOCKO4iQFHuGzZBWr00o7oFSRqfZ quZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7O2jemVYe0cdkuCpXRsCTx09Gh0RPAfbpmg1m/i8Imk=; b=7EJcetybRViqGpZq5BwlT2tbNsgjXhyDA/Y3ETbyexHUCZDGV1YPY7bENmZqCEi+XO 3NbVvF7r+rZ0a2b2ISmAVeW4F7KeUapTUQ0OhLjHfE+9DvyVP0Xxs5ojQEG2AqOHcPgJ KUUyDr6qlCejoX1uS8wWYzfMjBrSkCoiS+Y/8bggyMxk6JCOpmJ0LMSkqBmmP3+dOl4k cR9c2YnDqmYUK1vxEd+HYx3Y22KIrRCeQKjsP56pajw59BwJ2hVHrhH8NXgyCH4FlA32 c9nASDuKZTU4i7pQOepu8T25gbCRUzAnMtuEpGPeeYoALlKXr+aI4RqmP5BGYal5ju3l KuCA== X-Gm-Message-State: AOAM532FWMwVbmg7QnC92ovum1Krr0F0lQROyVwvtDSLp3j44D+yn0m8 ZxK1qLTjI8aFMn4j1r+q9qU= X-Google-Smtp-Source: ABdhPJxUCPR5Kgu+vcT+c3DRMEm0+BInCFJ2AU660nrM2JGkEjwBBrdIw+EBYmJIvla9nLs9G1EjNw== X-Received: by 2002:a63:e216:: with SMTP id q22mr1218827pgh.3.1634764086956; Wed, 20 Oct 2021 14:08:06 -0700 (PDT) Received: from localhost.localdomain (c-73-93-239-127.hsd1.ca.comcast.net. [73.93.239.127]) by smtp.gmail.com with ESMTPSA id i8sm3403143pfo.117.2021.10.20.14.08.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Oct 2021 14:08:05 -0700 (PDT) From: Yang Shi To: naoya.horiguchi@nec.com, hughd@google.com, kirill.shutemov@linux.intel.com, willy@infradead.org, peterx@redhat.com, osalvador@suse.de, akpm@linux-foundation.org Cc: shy828301@gmail.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [v5 PATCH 4/6] mm: hwpoison: refactor refcount check handling Date: Wed, 20 Oct 2021 14:07:53 -0700 Message-Id: <20211020210755.23964-5-shy828301@gmail.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20211020210755.23964-1-shy828301@gmail.com> References: <20211020210755.23964-1-shy828301@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 1A569D042B5A X-Stat-Signature: 6hmpjqo3gp44exwwebpeswtyc87jf9m5 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=qgzRo9cT; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf21.hostedemail.com: domain of shy828301@gmail.com designates 209.85.215.177 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-HE-Tag: 1634764086-746174 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Memory failure will report failure if the page still has extra pinned refcount other than from hwpoison after the handler is done. Actually the check is not necessary for all handlers, so move the check into specific handlers. This would make the following keeping shmem page in page cache patch easier. There may be expected extra pin for some cases, for example, when the page is dirty and in swapcache. Suggested-by: Naoya Horiguchi Signed-off-by: Naoya Horiguchi Signed-off-by: Yang Shi --- mm/memory-failure.c | 93 +++++++++++++++++++++++++++++++-------------- 1 file changed, 64 insertions(+), 29 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index bdbbb32211a5..aaeda93d26fb 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -806,12 +806,44 @@ static int truncate_error_page(struct page *p, unsigned long pfn, return ret; } +struct page_state { + unsigned long mask; + unsigned long res; + enum mf_action_page_type type; + + /* Callback ->action() has to unlock the relevant page inside it. */ + int (*action)(struct page_state *ps, struct page *p); +}; + +/* + * Return true if page is still referenced by others, otherwise return + * false. + * + * The extra_pins is true when one extra refcount is expected. + */ +static bool has_extra_refcount(struct page_state *ps, struct page *p, + bool extra_pins) +{ + int count = page_count(p) - 1; + + if (extra_pins) + count -= 1; + + if (count > 0) { + pr_err("Memory failure: %#lx: %s still referenced by %d users\n", + page_to_pfn(p), action_page_types[ps->type], count); + return true; + } + + return false; +} + /* * Error hit kernel page. * Do nothing, try to be lucky and not touch this instead. For a few cases we * could be more sophisticated. */ -static int me_kernel(struct page *p, unsigned long pfn) +static int me_kernel(struct page_state *ps, struct page *p) { unlock_page(p); return MF_IGNORED; @@ -820,9 +852,9 @@ static int me_kernel(struct page *p, unsigned long pfn) /* * Page in unknown state. Do nothing. */ -static int me_unknown(struct page *p, unsigned long pfn) +static int me_unknown(struct page_state *ps, struct page *p) { - pr_err("Memory failure: %#lx: Unknown page state\n", pfn); + pr_err("Memory failure: %#lx: Unknown page state\n", page_to_pfn(p)); unlock_page(p); return MF_FAILED; } @@ -830,7 +862,7 @@ static int me_unknown(struct page *p, unsigned long pfn) /* * Clean (or cleaned) page cache page. */ -static int me_pagecache_clean(struct page *p, unsigned long pfn) +static int me_pagecache_clean(struct page_state *ps, struct page *p) { int ret; struct address_space *mapping; @@ -867,9 +899,13 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn) * * Open: to take i_rwsem or not for this? Right now we don't. */ - ret = truncate_error_page(p, pfn, mapping); + ret = truncate_error_page(p, page_to_pfn(p), mapping); out: unlock_page(p); + + if (has_extra_refcount(ps, p, false)) + ret = MF_FAILED; + return ret; } @@ -878,7 +914,7 @@ static int me_pagecache_clean(struct page *p, unsigned long pfn) * Issues: when the error hit a hole page the error is not properly * propagated. */ -static int me_pagecache_dirty(struct page *p, unsigned long pfn) +static int me_pagecache_dirty(struct page_state *ps, struct page *p) { struct address_space *mapping = page_mapping(p); @@ -922,7 +958,7 @@ static int me_pagecache_dirty(struct page *p, unsigned long pfn) mapping_set_error(mapping, -EIO); } - return me_pagecache_clean(p, pfn); + return me_pagecache_clean(ps, p); } /* @@ -944,9 +980,10 @@ static int me_pagecache_dirty(struct page *p, unsigned long pfn) * Clean swap cache pages can be directly isolated. A later page fault will * bring in the known good data from disk. */ -static int me_swapcache_dirty(struct page *p, unsigned long pfn) +static int me_swapcache_dirty(struct page_state *ps, struct page *p) { int ret; + bool extra_pins = false; ClearPageDirty(p); /* Trigger EIO in shmem: */ @@ -954,10 +991,17 @@ static int me_swapcache_dirty(struct page *p, unsigned long pfn) ret = delete_from_lru_cache(p) ? MF_FAILED : MF_DELAYED; unlock_page(p); + + if (ret == MF_DELAYED) + extra_pins = true; + + if (has_extra_refcount(ps, p, extra_pins)) + ret = MF_FAILED; + return ret; } -static int me_swapcache_clean(struct page *p, unsigned long pfn) +static int me_swapcache_clean(struct page_state *ps, struct page *p) { int ret; @@ -965,6 +1009,10 @@ static int me_swapcache_clean(struct page *p, unsigned long pfn) ret = delete_from_lru_cache(p) ? MF_FAILED : MF_RECOVERED; unlock_page(p); + + if (has_extra_refcount(ps, p, false)) + ret = MF_FAILED; + return ret; } @@ -974,7 +1022,7 @@ static int me_swapcache_clean(struct page *p, unsigned long pfn) * - Error on hugepage is contained in hugepage unit (not in raw page unit.) * To narrow down kill region to one page, we need to break up pmd. */ -static int me_huge_page(struct page *p, unsigned long pfn) +static int me_huge_page(struct page_state *ps, struct page *p) { int res; struct page *hpage = compound_head(p); @@ -985,7 +1033,7 @@ static int me_huge_page(struct page *p, unsigned long pfn) mapping = page_mapping(hpage); if (mapping) { - res = truncate_error_page(hpage, pfn, mapping); + res = truncate_error_page(hpage, page_to_pfn(p), mapping); unlock_page(hpage); } else { res = MF_FAILED; @@ -1003,6 +1051,9 @@ static int me_huge_page(struct page *p, unsigned long pfn) } } + if (has_extra_refcount(ps, p, false)) + res = MF_FAILED; + return res; } @@ -1028,14 +1079,7 @@ static int me_huge_page(struct page *p, unsigned long pfn) #define slab (1UL << PG_slab) #define reserved (1UL << PG_reserved) -static struct page_state { - unsigned long mask; - unsigned long res; - enum mf_action_page_type type; - - /* Callback ->action() has to unlock the relevant page inside it. */ - int (*action)(struct page *p, unsigned long pfn); -} error_states[] = { +static struct page_state error_states[] = { { reserved, reserved, MF_MSG_KERNEL, me_kernel }, /* * free pages are specially detected outside this table: @@ -1095,19 +1139,10 @@ static int page_action(struct page_state *ps, struct page *p, unsigned long pfn) { int result; - int count; /* page p should be unlocked after returning from ps->action(). */ - result = ps->action(p, pfn); + result = ps->action(ps, p); - count = page_count(p) - 1; - if (ps->action == me_swapcache_dirty && result == MF_DELAYED) - count--; - if (count > 0) { - pr_err("Memory failure: %#lx: %s still referenced by %d users\n", - pfn, action_page_types[ps->type], count); - result = MF_FAILED; - } action_result(pfn, ps->type, result); /* Could do more checks here if page looks ok */