diff mbox series

[v3] mm/hwpoison: do not lock page again when me_huge_page() successfully recovers

Message ID 20210609072029.74645-1-nao.horiguchi@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v3] mm/hwpoison: do not lock page again when me_huge_page() successfully recovers | expand

Commit Message

Naoya Horiguchi June 9, 2021, 7:20 a.m. UTC
From: Naoya Horiguchi <naoya.horiguchi@nec.com>

Currently me_huge_page() temporary unlocks page to perform some actions
then locks it again later.  My testcase (which calls hard-offline on
some tail page in a hugetlb, then accesses the address of the hugetlb
range) showed that page allocation code detects this page lock on buddy
page and printed out "BUG: Bad page state" message.

check_new_page_bad() does not consider a page with __PG_HWPOISON as bad
page, so this flag works as kind of filter, but this filtering doesn't
work in this case because the "bad page" is not the actual hwpoisoned
page. So stop locking page again.  Actions to be taken depend on the
page type of the error, so page unlocking should be done in ->action()
callbacks.  So let's make it assumed and change all existing callbacks
that way.

Fixes: commit 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: stable@vger.kernel.org
---
ChangeLog v3:
- switch to "unlock page in ->action()" approach based on suggestion from Linus
  https://lore.kernel.org/linux-mm/CAHk-=wht_gk_d9k+NZs7eJvBeLOQT4xGcykgaCRHuiQ+LbReRw@mail.gmail.com/
---
 mm/memory-failure.c | 44 ++++++++++++++++++++++++++++++--------------
 1 file changed, 30 insertions(+), 14 deletions(-)

Comments

Andrew Morton June 15, 2021, 2:35 a.m. UTC | #1
On Fri, 11 Jun 2021 00:23:29 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote:

> > 
> > --- mm/memory-failure.c
> > +++ mm/memory-failure.c
> > @@ -1782,6 +1796,8 @@ int memory_failure(unsigned long pfn, int flags)
> >  
> >  identify_page_state:
> >  	res = identify_page_state(pfn, p, page_flags);
> > +	mutex_unlock(&mf_mutex);
> > +	return res;
> >  unlock_page:
> >  	unlock_page(p);
> >  unlock_mutex:
> > 
> > and...  That mutex_unlock() looks odd.  The patch adds no matching
> > mutex_lock?
> 
> Yes, memory_failure() already has one mutex_lock (introduced by
> mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.patch,
> sorry for not clarifying that), and the change introduces a separate
> return path.  But I now think that I should have used "goto unlock_mutex"
> to use existing return path.

But mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.patch
is part of Tony's three patch series which is not marked for -stable. 
So it isn't appropriate that this patch be based on top of that three
patch series.

Should Tony's patchset also be targeted to -stable?  If so then OK.

If not then please let's prepare your -stable patch against current
mainline, as it is higher priority than the 5.14-rc1 material in
linux-next.
HORIGUCHI NAOYA(堀口 直也) June 15, 2021, 3:58 a.m. UTC | #2
On Mon, Jun 14, 2021 at 07:35:28PM -0700, Andrew Morton wrote:
> On Fri, 11 Jun 2021 00:23:29 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote:
> 
> > > 
> > > --- mm/memory-failure.c
> > > +++ mm/memory-failure.c
> > > @@ -1782,6 +1796,8 @@ int memory_failure(unsigned long pfn, int flags)
> > >  
> > >  identify_page_state:
> > >  	res = identify_page_state(pfn, p, page_flags);
> > > +	mutex_unlock(&mf_mutex);
> > > +	return res;
> > >  unlock_page:
> > >  	unlock_page(p);
> > >  unlock_mutex:
> > > 
> > > and...  That mutex_unlock() looks odd.  The patch adds no matching
> > > mutex_lock?
> > 
> > Yes, memory_failure() already has one mutex_lock (introduced by
> > mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.patch,
> > sorry for not clarifying that), and the change introduces a separate
> > return path.  But I now think that I should have used "goto unlock_mutex"
> > to use existing return path.
> 
> But mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.patch
> is part of Tony's three patch series which is not marked for -stable. 
> So it isn't appropriate that this patch be based on top of that three
> patch series.
> 
> Should Tony's patchset also be targeted to -stable?  If so then OK.

Yes, that's fine. And I think that the first two patches
(mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.pathc and
mmhwpoison-return-ehwpoison-to-denote-that-the-page-has-already-been-poisoned.patch)
can be marked to stable, but 3/3 (mmhwpoison-send-sigbus-with-error-virutal-address.patch)
may not because it's a little too large and not the main part of the fixes.

Thanks,
Naoya Horiguchi

> 
> If not then please let's prepare your -stable patch against current
> mainline, as it is higher priority than the 5.14-rc1 material in
> linux-next.
>
Andrew Morton June 15, 2021, 11:48 p.m. UTC | #3
On Tue, 15 Jun 2021 03:58:47 +0000 HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@nec.com> wrote:

> > Should Tony's patchset also be targeted to -stable?  If so then OK.
> 
> Yes, that's fine. And I think that the first two patches
> (mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.pathc and
> mmhwpoison-return-ehwpoison-to-denote-that-the-page-has-already-been-poisoned.patch)
> can be marked to stable, but 3/3 (mmhwpoison-send-sigbus-with-error-virutal-address.patch)
> may not because it's a little too large and not the main part of the fixes.

OK, thanks, I queued

mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races.patch
mm-memory-failure-use-a-mutex-to-avoid-memory_failure-races-fix.patch
mmhwpoison-return-ehwpoison-to-denote-that-the-page-has-already-been-poisoned.patch
#
mm-hwpoison-do-not-lock-page-again-when-me_huge_page-successfully-recovers.patch

for 5.13 with cc:stable and

mmhwpoison-send-sigbus-with-error-virutal-address.patch
mmhwpoison-send-sigbus-with-error-virutal-address-fix.patch

for 5.14.
diff mbox series

Patch

diff --git v5.13-rc2/mm/memory-failure.c v5.13-rc2_patched/mm/memory-failure.c
index 763955dd6431..e5a1531f7f4e 100644
--- v5.13-rc2/mm/memory-failure.c
+++ v5.13-rc2_patched/mm/memory-failure.c
@@ -801,6 +801,7 @@  static int truncate_error_page(struct page *p, unsigned long pfn,
  */
 static int me_kernel(struct page *p, unsigned long pfn)
 {
+	unlock_page(p);
 	return MF_IGNORED;
 }
 
@@ -810,6 +811,7 @@  static int me_kernel(struct page *p, unsigned long pfn)
 static int me_unknown(struct page *p, unsigned long pfn)
 {
 	pr_err("Memory failure: %#lx: Unknown page state\n", pfn);
+	unlock_page(p);
 	return MF_FAILED;
 }
 
@@ -818,6 +820,7 @@  static int me_unknown(struct page *p, unsigned long pfn)
  */
 static int me_pagecache_clean(struct page *p, unsigned long pfn)
 {
+	int ret;
 	struct address_space *mapping;
 
 	delete_from_lru_cache(p);
@@ -826,8 +829,10 @@  static int me_pagecache_clean(struct page *p, unsigned long pfn)
 	 * For anonymous pages we're done the only reference left
 	 * should be the one m_f() holds.
 	 */
-	if (PageAnon(p))
-		return MF_RECOVERED;
+	if (PageAnon(p)) {
+		ret = MF_RECOVERED;
+		goto out;
+	}
 
 	/*
 	 * Now truncate the page in the page cache. This is really
@@ -841,7 +846,8 @@  static int me_pagecache_clean(struct page *p, unsigned long pfn)
 		/*
 		 * Page has been teared down in the meanwhile
 		 */
-		return MF_FAILED;
+		ret = MF_FAILED;
+		goto out;
 	}
 
 	/*
@@ -849,7 +855,10 @@  static int me_pagecache_clean(struct page *p, unsigned long pfn)
 	 *
 	 * Open: to take i_mutex or not for this? Right now we don't.
 	 */
-	return truncate_error_page(p, pfn, mapping);
+	ret = truncate_error_page(p, pfn, mapping);
+out:
+	unlock_page(p);
+	return ret;
 }
 
 /*
@@ -925,24 +934,26 @@  static int me_pagecache_dirty(struct page *p, unsigned long pfn)
  */
 static int me_swapcache_dirty(struct page *p, unsigned long pfn)
 {
+	int ret;
+
 	ClearPageDirty(p);
 	/* Trigger EIO in shmem: */
 	ClearPageUptodate(p);
 
-	if (!delete_from_lru_cache(p))
-		return MF_DELAYED;
-	else
-		return MF_FAILED;
+	ret = delete_from_lru_cache(p) ? MF_FAILED : MF_DELAYED;
+	unlock_page(p);
+	return ret;
 }
 
 static int me_swapcache_clean(struct page *p, unsigned long pfn)
 {
+	int ret;
+
 	delete_from_swap_cache(p);
 
-	if (!delete_from_lru_cache(p))
-		return MF_RECOVERED;
-	else
-		return MF_FAILED;
+	ret = delete_from_lru_cache(p) ? MF_FAILED : MF_RECOVERED;
+	unlock_page(p);
+	return ret;
 }
 
 /*
@@ -963,6 +974,7 @@  static int me_huge_page(struct page *p, unsigned long pfn)
 	mapping = page_mapping(hpage);
 	if (mapping) {
 		res = truncate_error_page(hpage, pfn, mapping);
+		unlock_page(hpage);
 	} else {
 		res = MF_FAILED;
 		unlock_page(hpage);
@@ -977,7 +989,6 @@  static int me_huge_page(struct page *p, unsigned long pfn)
 			page_ref_inc(p);
 			res = MF_RECOVERED;
 		}
-		lock_page(hpage);
 	}
 
 	return res;
@@ -1009,6 +1020,8 @@  static struct page_state {
 	unsigned long mask;
 	unsigned long res;
 	enum mf_action_page_type type;
+
+	/* Callback ->action() has to unlock the relevant page inside it. */
 	int (*action)(struct page *p, unsigned long pfn);
 } error_states[] = {
 	{ reserved,	reserved,	MF_MSG_KERNEL,	me_kernel },
@@ -1072,6 +1085,7 @@  static int page_action(struct page_state *ps, struct page *p,
 	int result;
 	int count;
 
+	/* page p should be unlocked after returning from ps->action().  */
 	result = ps->action(p, pfn);
 
 	count = page_count(p) - 1;
@@ -1476,7 +1490,7 @@  static int memory_failure_hugetlb(unsigned long pfn, int flags)
 		goto out;
 	}
 
-	res = identify_page_state(pfn, p, page_flags);
+	return identify_page_state(pfn, p, page_flags);
 out:
 	unlock_page(head);
 	return res;
@@ -1768,6 +1782,8 @@  int memory_failure(unsigned long pfn, int flags)
 
 identify_page_state:
 	res = identify_page_state(pfn, p, page_flags);
+	mutex_unlock(&mf_mutex);
+	return res;
 unlock_page:
 	unlock_page(p);
 unlock_mutex: