[v3,3/5] mm: hwpoison: refactor refcount check handling

Message ID	20210930215311.240774-4-shy828301@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> From: Yang Shi <shy828301@gmail.com> To: naoya.horiguchi@nec.com, hughd@google.com, kirill.shutemov@linux.intel.com, willy@infradead.org, peterx@redhat.com, osalvador@suse.de, akpm@linux-foundation.org Cc: shy828301@gmail.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [v3 PATCH 3/5] mm: hwpoison: refactor refcount check handling Date: Thu, 30 Sep 2021 14:53:09 -0700 Message-Id: <20210930215311.240774-4-shy828301@gmail.com> In-Reply-To: <20210930215311.240774-1-shy828301@gmail.com> References: <20210930215311.240774-1-shy828301@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Solve silent data loss caused by poisoned page cache (shmem/tmpfs) \| expand [RFC,v3,0/5] Solve silent data loss caused by poisoned page cache (shmem/tmpfs) [v3,1/5] mm: hwpoison: remove the unnecessary THP check [v3,2/5] mm: filemap: check if THP has hwpoisoned subpage for PMD page fault [v3,3/5] mm: hwpoison: refactor refcount check handling [v3,4/5] mm: shmem: don't truncate page if memory failure happens [v3,5/5] mm: hwpoison: handle non-anonymous THP correctly

Message ID

20210930215311.240774-4-shy828301@gmail.com (mailing list archive)

State

New, archived

Headers

From: Yang Shi <shy828301@gmail.com>
To: naoya.horiguchi@nec.com, hughd@google.com,
        kirill.shutemov@linux.intel.com, willy@infradead.org,
        peterx@redhat.com, osalvador@suse.de, akpm@linux-foundation.org
Cc: shy828301@gmail.com, linux-mm@kvack.org,
        linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [v3 PATCH 3/5] mm: hwpoison: refactor refcount check handling
Date: Thu, 30 Sep 2021 14:53:09 -0700
Message-Id: <20210930215311.240774-4-shy828301@gmail.com>
In-Reply-To: <20210930215311.240774-1-shy828301@gmail.com>
References: <20210930215311.240774-1-shy828301@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Solve silent data loss caused by poisoned page cache (shmem/tmpfs) | expand

Commit Message

Yang Shi Sept. 30, 2021, 9:53 p.m. UTC

Memory failure will report failure if the page still has extra pinned
refcount other than from hwpoison after the handler is done.  Actually
the check is not necessary for all handlers, so move the check into
specific handlers.  This would make the following keeping shmem page in
page cache patch easier.

Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/memory-failure.c | 93 +++++++++++++++++++++++++++++++--------------
 1 file changed, 64 insertions(+), 29 deletions(-)

Comments

Peter Xu Oct. 6, 2021, 10:01 p.m. UTC | #1

On Thu, Sep 30, 2021 at 02:53:09PM -0700, Yang Shi wrote:
> +/*
> + * Return true if page is still referenced by others, otherwise return
> + * false.
> + *
> + * The dec is true when one extra refcount is expected.
> + */
> +static bool has_extra_refcount(struct page_state *ps, struct page *p,
> +			       bool dec)

Nit: would it be nicer to keep using things like "extra_pins", so we pass in 1
for swapcache dirty case and 0 for the rest?  Then it'll also match with most
of the similar cases in e.g. huge_memory.c (please try grep "extra_pins" there).

> +{
> +	int count = page_count(p) - 1;
> +
> +	if (dec)
> +		count -= 1;
> +
> +	if (count > 0) {
> +		pr_err("Memory failure: %#lx: %s still referenced by %d users\n",
> +		       page_to_pfn(p), action_page_types[ps->type], count);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  /*
>   * Error hit kernel page.
>   * Do nothing, try to be lucky and not touch this instead. For a few cases we
>   * could be more sophisticated.
>   */
> -static int me_kernel(struct page *p, unsigned long pfn)
> +static int me_kernel(struct page_state *ps, struct page *p)

Not sure whether it's intended, but some of the action() hooks do not call the
refcount check now while in the past they'll all do.  Just to double check
they're expected, like this one and me_unknown().

>  {
>  	unlock_page(p);
>  	return MF_IGNORED;
> @@ -820,9 +852,9 @@ static int me_kernel(struct page *p, unsigned long pfn)
>  /*
>   * Page in unknown state. Do nothing.
>   */
> -static int me_unknown(struct page *p, unsigned long pfn)
> +static int me_unknown(struct page_state *ps, struct page *p)
>  {
> -	pr_err("Memory failure: %#lx: Unknown page state\n", pfn);
> +	pr_err("Memory failure: %#lx: Unknown page state\n", page_to_pfn(p));
>  	unlock_page(p);
>  	return MF_FAILED;
>  }

Thanks,

Yang Shi Oct. 7, 2021, 2:47 a.m. UTC | #2

On Wed, Oct 6, 2021 at 3:02 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Sep 30, 2021 at 02:53:09PM -0700, Yang Shi wrote:
> > +/*
> > + * Return true if page is still referenced by others, otherwise return
> > + * false.
> > + *
> > + * The dec is true when one extra refcount is expected.
> > + */
> > +static bool has_extra_refcount(struct page_state *ps, struct page *p,
> > +                            bool dec)
>
> Nit: would it be nicer to keep using things like "extra_pins", so we pass in 1
> for swapcache dirty case and 0 for the rest?  Then it'll also match with most
> of the similar cases in e.g. huge_memory.c (please try grep "extra_pins" there).

Thanks for the suggestion. Yes, it makes some sense to me. And the
code comments in patch 4/5 does says (the suggested version by Naoya):

/*
 * The shmem page is kept in page cache instead of truncating
 * so is expected to have an extra refcount after error-handling.
 */

Will rename it in the new version.

>
> > +{
> > +     int count = page_count(p) - 1;
> > +
> > +     if (dec)
> > +             count -= 1;
> > +
> > +     if (count > 0) {
> > +             pr_err("Memory failure: %#lx: %s still referenced by %d users\n",
> > +                    page_to_pfn(p), action_page_types[ps->type], count);
> > +             return true;
> > +     }
> > +
> > +     return false;
> > +}
> > +
> >  /*
> >   * Error hit kernel page.
> >   * Do nothing, try to be lucky and not touch this instead. For a few cases we
> >   * could be more sophisticated.
> >   */
> > -static int me_kernel(struct page *p, unsigned long pfn)
> > +static int me_kernel(struct page_state *ps, struct page *p)
>
> Not sure whether it's intended, but some of the action() hooks do not call the
> refcount check now while in the past they'll all do.  Just to double check
> they're expected, like this one and me_unknown().

Yeah, it is intentional. Before this change all me_* handlers did
check refcount even though it was not necessary, for example,
me_kernel() and me_unknown().

>
> >  {
> >       unlock_page(p);
> >       return MF_IGNORED;
> > @@ -820,9 +852,9 @@ static int me_kernel(struct page *p, unsigned long pfn)
> >  /*
> >   * Page in unknown state. Do nothing.
> >   */
> > -static int me_unknown(struct page *p, unsigned long pfn)
> > +static int me_unknown(struct page_state *ps, struct page *p)
> >  {
> > -     pr_err("Memory failure: %#lx: Unknown page state\n", pfn);
> > +     pr_err("Memory failure: %#lx: Unknown page state\n", page_to_pfn(p));
> >       unlock_page(p);
> >       return MF_FAILED;
> >  }
>
> Thanks,
>
> --
> Peter Xu
>

Peter Xu Oct. 7, 2021, 4:18 p.m. UTC | #3

On Wed, Oct 06, 2021 at 07:47:20PM -0700, Yang Shi wrote:
> Yeah, it is intentional. Before this change all me_* handlers did
> check refcount even though it was not necessary, for example,
> me_kernel() and me_unknown().

Would you mind add some explanation into the commit message on what kind of
pages dropped the refcount check, and why they can be dropped, when you respin?
Thanks a lot.

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a79a38374a14..562bcf335bd2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -806,12 +806,44 @@  static int truncate_error_page(struct page *p, unsigned long pfn,
 	return ret;
 }
 
+struct page_state {
+	unsigned long mask;
+	unsigned long res;
+	enum mf_action_page_type type;
+
+	/* Callback ->action() has to unlock the relevant page inside it. */
+	int (*action)(struct page_state *ps, struct page *p);
+};
+
+/*
+ * Return true if page is still referenced by others, otherwise return
+ * false.
+ *
+ * The dec is true when one extra refcount is expected.
+ */
+static bool has_extra_refcount(struct page_state *ps, struct page *p,
+			       bool dec)
+{
+	int count = page_count(p) - 1;
+
+	if (dec)
+		count -= 1;
+
+	if (count > 0) {
+		pr_err("Memory failure: %#lx: %s still referenced by %d users\n",
+		       page_to_pfn(p), action_page_types[ps->type], count);
+		return true;
+	}
+
+	return false;
+}
+
 /*
  * Error hit kernel page.
  * Do nothing, try to be lucky and not touch this instead. For a few cases we
  * could be more sophisticated.
  */
-static int me_kernel(struct page *p, unsigned long pfn)
+static int me_kernel(struct page_state *ps, struct page *p)
 {
 	unlock_page(p);
 	return MF_IGNORED;
@@ -820,9 +852,9 @@  static int me_kernel(struct page *p, unsigned long pfn)
 /*
  * Page in unknown state. Do nothing.
  */
-static int me_unknown(struct page *p, unsigned long pfn)
+static int me_unknown(struct page_state *ps, struct page *p)
 {
-	pr_err("Memory failure: %#lx: Unknown page state\n", pfn);
+	pr_err("Memory failure: %#lx: Unknown page state\n", page_to_pfn(p));
 	unlock_page(p);
 	return MF_FAILED;
 }
@@ -830,7 +862,7 @@  static int me_unknown(struct page *p, unsigned long pfn)
 /*
  * Clean (or cleaned) page cache page.
  */
-static int me_pagecache_clean(struct page *p, unsigned long pfn)
+static int me_pagecache_clean(struct page_state *ps, struct page *p)
 {
 	int ret;
 	struct address_space *mapping;
@@ -867,9 +899,13 @@  static int me_pagecache_clean(struct page *p, unsigned long pfn)
 	 *
 	 * Open: to take i_rwsem or not for this? Right now we don't.
 	 */
-	ret = truncate_error_page(p, pfn, mapping);
+	ret = truncate_error_page(p, page_to_pfn(p), mapping);
 out:
 	unlock_page(p);
+
+	if (has_extra_refcount(ps, p, false))
+		ret = MF_FAILED;
+
 	return ret;
 }
 
@@ -878,7 +914,7 @@  static int me_pagecache_clean(struct page *p, unsigned long pfn)
  * Issues: when the error hit a hole page the error is not properly
  * propagated.
  */
-static int me_pagecache_dirty(struct page *p, unsigned long pfn)
+static int me_pagecache_dirty(struct page_state *ps, struct page *p)
 {
 	struct address_space *mapping = page_mapping(p);
 
@@ -922,7 +958,7 @@  static int me_pagecache_dirty(struct page *p, unsigned long pfn)
 		mapping_set_error(mapping, -EIO);
 	}
 
-	return me_pagecache_clean(p, pfn);
+	return me_pagecache_clean(ps, p);
 }
 
 /*
@@ -944,9 +980,10 @@  static int me_pagecache_dirty(struct page *p, unsigned long pfn)
  * Clean swap cache pages can be directly isolated. A later page fault will
  * bring in the known good data from disk.
  */
-static int me_swapcache_dirty(struct page *p, unsigned long pfn)
+static int me_swapcache_dirty(struct page_state *ps, struct page *p)
 {
 	int ret;
+	bool dec = false;
 
 	ClearPageDirty(p);
 	/* Trigger EIO in shmem: */
@@ -954,10 +991,17 @@  static int me_swapcache_dirty(struct page *p, unsigned long pfn)
 
 	ret = delete_from_lru_cache(p) ? MF_FAILED : MF_DELAYED;
 	unlock_page(p);
+
+	if (ret == MF_DELAYED)
+		dec = true;
+
+	if (has_extra_refcount(ps, p, dec))
+		ret = MF_FAILED;
+
 	return ret;
 }
 
-static int me_swapcache_clean(struct page *p, unsigned long pfn)
+static int me_swapcache_clean(struct page_state *ps, struct page *p)
 {
 	int ret;
 
@@ -965,6 +1009,10 @@  static int me_swapcache_clean(struct page *p, unsigned long pfn)
 
 	ret = delete_from_lru_cache(p) ? MF_FAILED : MF_RECOVERED;
 	unlock_page(p);
+
+	if (has_extra_refcount(ps, p, false))
+		ret = MF_FAILED;
+
 	return ret;
 }
 
@@ -974,7 +1022,7 @@  static int me_swapcache_clean(struct page *p, unsigned long pfn)
  * - Error on hugepage is contained in hugepage unit (not in raw page unit.)
  *   To narrow down kill region to one page, we need to break up pmd.
  */
-static int me_huge_page(struct page *p, unsigned long pfn)
+static int me_huge_page(struct page_state *ps, struct page *p)
 {
 	int res;
 	struct page *hpage = compound_head(p);
@@ -985,7 +1033,7 @@  static int me_huge_page(struct page *p, unsigned long pfn)
 
 	mapping = page_mapping(hpage);
 	if (mapping) {
-		res = truncate_error_page(hpage, pfn, mapping);
+		res = truncate_error_page(hpage, page_to_pfn(p), mapping);
 		unlock_page(hpage);
 	} else {
 		res = MF_FAILED;
@@ -1003,6 +1051,9 @@  static int me_huge_page(struct page *p, unsigned long pfn)
 		}
 	}
 
+	if (has_extra_refcount(ps, p, false))
+		res = MF_FAILED;
+
 	return res;
 }
 
@@ -1028,14 +1079,7 @@  static int me_huge_page(struct page *p, unsigned long pfn)
 #define slab		(1UL << PG_slab)
 #define reserved	(1UL << PG_reserved)
 
-static struct page_state {
-	unsigned long mask;
-	unsigned long res;
-	enum mf_action_page_type type;
-
-	/* Callback ->action() has to unlock the relevant page inside it. */
-	int (*action)(struct page *p, unsigned long pfn);
-} error_states[] = {
+static struct page_state error_states[] = {
 	{ reserved,	reserved,	MF_MSG_KERNEL,	me_kernel },
 	/*
 	 * free pages are specially detected outside this table:
@@ -1095,19 +1139,10 @@  static int page_action(struct page_state *ps, struct page *p,
 			unsigned long pfn)
 {
 	int result;
-	int count;
 
 	/* page p should be unlocked after returning from ps->action().  */
-	result = ps->action(p, pfn);
+	result = ps->action(ps, p);
 
-	count = page_count(p) - 1;
-	if (ps->action == me_swapcache_dirty && result == MF_DELAYED)
-		count--;
-	if (count > 0) {
-		pr_err("Memory failure: %#lx: %s still referenced by %d users\n",
-		       pfn, action_page_types[ps->type], count);
-		result = MF_FAILED;
-	}
 	action_result(pfn, ps->type, result);
 
 	/* Could do more checks here if page looks ok */

[v3,3/5] mm: hwpoison: refactor refcount check handling

Commit Message

Comments

Patch