diff mbox series

[v3,3/3] mm/hwpoison: fix unpoison_memory()

Message ID 20211105055058.3152564-4-naoya.horiguchi@linux.dev (mailing list archive)
State New
Headers show
Series mm/hwpoison: fix unpoison_memory() | expand

Commit Message

Naoya Horiguchi Nov. 5, 2021, 5:50 a.m. UTC
From: Naoya Horiguchi <naoya.horiguchi@nec.com>

After recent soft-offline rework, error pages can be taken off from
buddy allocator, but the existing unpoison_memory() does not properly
undo the operation.  Moreover, due to the recent change on
__get_hwpoison_page(), get_page_unless_zero() is hardly called for
hwpoisoned pages.  So __get_hwpoison_page() highly likely returns -EBUSY
(meaning to fail to grab page refcount) and unpoison just clears
PG_hwpoison without releasing a refcount.  That does not lead to a
critical issue like kernel panic, but unpoisoned pages never get back to
buddy (leaked permanently), which is not good.

To (partially) fix this, we need to identify "taken off" pages from
other types of hwpoisoned pages.  We can't use refcount or page flags
for this purpose, so a pseudo flag is defined by hacking ->private
field.  Someone might think that put_page() is enough to cancel
taken-off pages, but the normal free path contains some operations not
suitable for the current purpose, and can fire VM_BUG_ON().

Note that unpoison_memory() is now supposed to be cancel hwpoison events
injected only by madvise() or /sys/devices/system/memory/{hard,soft}_offline_page,
not by MCE injection, so please don't try to use unpoison when testing
with MCE injection.

Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
ChangeLog v3:
- fix description
- add PageTable check in unpoison_memory()
- fix return value of clear_page_hwpoison()
- pass page instead head to PageHWPoisonTakenOff check.
- rename take_page_back_buddy() with put_page_back_buddy()

ChangeLog v2:
- unpoison_memory() returns as commented
- explicitly avoids unpoisoning slab pages
- separates internal pinning function into __get_unpoison_page()
---
 include/linux/mm.h         |   1 +
 include/linux/page-flags.h |   4 ++
 mm/memory-failure.c        | 107 ++++++++++++++++++++++++++++++-------
 mm/page_alloc.c            |  23 ++++++++
 4 files changed, 116 insertions(+), 19 deletions(-)

Comments

kernel test robot Nov. 6, 2021, 11:25 p.m. UTC | #1
Hi Naoya,

I love your patch! Yet something to improve:

[auto build test ERROR on hnaz-mm/master]

url:    https://github.com/0day-ci/linux/commits/Naoya-Horiguchi/mm-hwpoison-fix-unpoison_memory/20211105-135311
base:   https://github.com/hnaz/linux-mm master
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/1b78bd027ee061f9378cf47687f9f4649a160fa4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Naoya-Horiguchi/mm-hwpoison-fix-unpoison_memory/20211105-135311
        git checkout 1b78bd027ee061f9378cf47687f9f4649a160fa4
        # save the attached .config to linux build tree
        make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/mmzone.h:22,
                    from include/linux/gfp.h:6,
                    from include/linux/mm.h:10,
                    from mm/memory-failure.c:37:
   mm/memory-failure.c: In function 'SetPageHWPoisonTakenOff':
>> include/linux/page-flags.h:525:24: error: unsigned conversion from 'long long int' to 'long unsigned int' changes value from '5212723395167473486' to '1230196558' [-Werror=overflow]
     525 | #define MAGIC_HWPOISON 0x4857504f49534f4e
         |                        ^~~~~~~~~~~~~~~~~~
   mm/memory-failure.c:1170:25: note: in expansion of macro 'MAGIC_HWPOISON'
    1170 |  set_page_private(page, MAGIC_HWPOISON);
         |                         ^~~~~~~~~~~~~~
   cc1: all warnings being treated as errors


vim +525 include/linux/page-flags.h

   520	
   521	#ifdef CONFIG_MEMORY_FAILURE
   522	PAGEFLAG(HWPoison, hwpoison, PF_ANY)
   523	TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
   524	#define __PG_HWPOISON (1UL << PG_hwpoison)
 > 525	#define MAGIC_HWPOISON	0x4857504f49534f4e
   526	extern void SetPageHWPoisonTakenOff(struct page *page);
   527	extern void ClearPageHWPoisonTakenOff(struct page *page);
   528	extern bool take_page_off_buddy(struct page *page);
   529	extern bool put_page_back_buddy(struct page *page);
   530	#else
   531	PAGEFLAG_FALSE(HWPoison, hwpoison)
   532	#define __PG_HWPOISON 0
   533	#endif
   534	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Yang Shi Nov. 8, 2021, 11:27 p.m. UTC | #2
On Thu, Nov 4, 2021 at 10:52 PM Naoya Horiguchi
<naoya.horiguchi@linux.dev> wrote:
>
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
>
> After recent soft-offline rework, error pages can be taken off from
> buddy allocator, but the existing unpoison_memory() does not properly
> undo the operation.  Moreover, due to the recent change on
> __get_hwpoison_page(), get_page_unless_zero() is hardly called for
> hwpoisoned pages.  So __get_hwpoison_page() highly likely returns -EBUSY
> (meaning to fail to grab page refcount) and unpoison just clears
> PG_hwpoison without releasing a refcount.  That does not lead to a
> critical issue like kernel panic, but unpoisoned pages never get back to
> buddy (leaked permanently), which is not good.
>
> To (partially) fix this, we need to identify "taken off" pages from
> other types of hwpoisoned pages.  We can't use refcount or page flags
> for this purpose, so a pseudo flag is defined by hacking ->private
> field.  Someone might think that put_page() is enough to cancel
> taken-off pages, but the normal free path contains some operations not
> suitable for the current purpose, and can fire VM_BUG_ON().
>
> Note that unpoison_memory() is now supposed to be cancel hwpoison events
> injected only by madvise() or /sys/devices/system/memory/{hard,soft}_offline_page,
> not by MCE injection, so please don't try to use unpoison when testing
> with MCE injection.
>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> ---
> ChangeLog v3:
> - fix description
> - add PageTable check in unpoison_memory()
> - fix return value of clear_page_hwpoison()
> - pass page instead head to PageHWPoisonTakenOff check.
> - rename take_page_back_buddy() with put_page_back_buddy()
>
> ChangeLog v2:
> - unpoison_memory() returns as commented
> - explicitly avoids unpoisoning slab pages
> - separates internal pinning function into __get_unpoison_page()
> ---
>  include/linux/mm.h         |   1 +
>  include/linux/page-flags.h |   4 ++
>  mm/memory-failure.c        | 107 ++++++++++++++++++++++++++++++-------
>  mm/page_alloc.c            |  23 ++++++++
>  4 files changed, 116 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7941bca871dc..8bbb7205ef9f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3220,6 +3220,7 @@ enum mf_flags {
>         MF_ACTION_REQUIRED = 1 << 1,
>         MF_MUST_KILL = 1 << 2,
>         MF_SOFT_OFFLINE = 1 << 3,
> +       MF_UNPOISON = 1 << 4,
>  };
>  extern int memory_failure(unsigned long pfn, int flags);
>  extern void memory_failure_queue(unsigned long pfn, int flags);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 52ec4b5e5615..ccfe083152c4 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -522,7 +522,11 @@ PAGEFLAG_FALSE(Uncached, uncached)
>  PAGEFLAG(HWPoison, hwpoison, PF_ANY)
>  TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
>  #define __PG_HWPOISON (1UL << PG_hwpoison)
> +#define MAGIC_HWPOISON 0x4857504f49534f4e
> +extern void SetPageHWPoisonTakenOff(struct page *page);
> +extern void ClearPageHWPoisonTakenOff(struct page *page);
>  extern bool take_page_off_buddy(struct page *page);
> +extern bool put_page_back_buddy(struct page *page);
>  #else
>  PAGEFLAG_FALSE(HWPoison, hwpoison)
>  #define __PG_HWPOISON 0
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 722036539b44..f406e85eb733 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1160,6 +1160,22 @@ static int page_action(struct page_state *ps, struct page *p,
>         return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
>  }
>
> +static inline bool PageHWPoisonTakenOff(struct page *page)
> +{
> +       return PageHWPoison(page) && page_private(page) == MAGIC_HWPOISON;
> +}
> +
> +void SetPageHWPoisonTakenOff(struct page *page)
> +{
> +       set_page_private(page, MAGIC_HWPOISON);
> +}
> +
> +void ClearPageHWPoisonTakenOff(struct page *page)
> +{
> +       if (PageHWPoison(page))
> +               set_page_private(page, 0);
> +}
> +
>  /*
>   * Return true if a page type of a given page is supported by hwpoison
>   * mechanism (while handling could fail), otherwise false.  This function
> @@ -1262,6 +1278,27 @@ static int get_any_page(struct page *p, unsigned long flags)
>         return ret;
>  }
>
> +static int __get_unpoison_page(struct page *page)
> +{
> +       struct page *head = compound_head(page);
> +       int ret = 0;
> +       bool hugetlb = false;
> +
> +       ret = get_hwpoison_huge_page(head, &hugetlb);
> +       if (hugetlb)
> +               return ret;
> +
> +       /*
> +        * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison,
> +        * but also isolated from buddy freelist, so need to identify the
> +        * state and have to cancel both operations to unpoison.
> +        */
> +       if (PageHWPoisonTakenOff(page))
> +               return -EHWPOISON;
> +
> +       return get_page_unless_zero(page) ? 1 : 0;
> +}
> +
>  /**
>   * get_hwpoison_page() - Get refcount for memory error handling
>   * @p:         Raw error page (hit by memory error)
> @@ -1278,18 +1315,26 @@ static int get_any_page(struct page *p, unsigned long flags)
>   * extra care for the error page's state (as done in __get_hwpoison_page()),
>   * and has some retry logic in get_any_page().
>   *
> + * When called from unpoison_memory(), the caller should already ensure that
> + * the given page has PG_hwpoison. So it's never reused for other page
> + * allocations, and __get_unpoison_page() never races with them.
> + *
>   * Return: 0 on failure,
>   *         1 on success for in-use pages in a well-defined state,
>   *         -EIO for pages on which we can not handle memory errors,
>   *         -EBUSY when get_hwpoison_page() has raced with page lifecycle
> - *         operations like allocation and free.
> + *         operations like allocation and free,
> + *         -EHWPOISON when the page is hwpoisoned and taken off from buddy.
>   */
>  static int get_hwpoison_page(struct page *p, unsigned long flags)
>  {
>         int ret;
>
>         zone_pcp_disable(page_zone(p));
> -       ret = get_any_page(p, flags);
> +       if (flags & MF_UNPOISON)
> +               ret = __get_unpoison_page(p);
> +       else
> +               ret = get_any_page(p, flags);
>         zone_pcp_enable(page_zone(p));
>
>         return ret;
> @@ -1942,6 +1987,26 @@ core_initcall(memory_failure_init);
>                 pr_info(fmt, pfn);                      \
>  })
>
> +static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
> +{
> +       if (TestClearPageHWPoison(p)) {
> +               unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> +                                page_to_pfn(p), rs);
> +               num_poisoned_pages_dec();
> +               return 1;
> +       }
> +       return 0;
> +}
> +
> +static inline int unpoison_taken_off_page(struct ratelimit_state *rs,
> +                                         struct page *p)
> +{
> +       if (put_page_back_buddy(p) && clear_page_hwpoison(rs, p))

It seems there might be race condition between free and allocation to
kick the page out of buddy before the hwpoisoned flag is cleared IIUC?

        CPU A                     CPU B
Free:                            Allocation:
acquire zone lock
put back to buddy
release zone lock
                                     acquire zone lock
                                     try to allocate this page
                                     if it is hwpoisoned then reset
PageBuddy (not buddy page anymore)
                                     release zone lock

Did I miss something?

> +               return 0;
> +       else
> +               return -EBUSY;
> +}
> +
>  /**
>   * unpoison_memory - Unpoison a previously poisoned page
>   * @pfn: Page number of the to be unpoisoned page
> @@ -1958,9 +2023,7 @@ int unpoison_memory(unsigned long pfn)
>  {
>         struct page *page;
>         struct page *p;
> -       int freeit = 0;
> -       int ret = 0;
> -       unsigned long flags = 0;
> +       int ret = -EBUSY;
>         static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
>                                         DEFAULT_RATELIMIT_BURST);
>
> @@ -1996,24 +2059,30 @@ int unpoison_memory(unsigned long pfn)
>                 goto unlock_mutex;
>         }
>
> -       if (!get_hwpoison_page(p, flags)) {
> -               if (TestClearPageHWPoison(p))
> -                       num_poisoned_pages_dec();
> -               unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
> -                                pfn, &unpoison_rs);
> +       if (PageSlab(page) || PageTable(page))
>                 goto unlock_mutex;
> -       }
>
> -       if (TestClearPageHWPoison(page)) {
> -               unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> -                                pfn, &unpoison_rs);
> -               num_poisoned_pages_dec();
> -               freeit = 1;
> -       }
> +       ret = get_hwpoison_page(p, MF_UNPOISON);
> +       if (!ret) {
> +               if (clear_page_hwpoison(&unpoison_rs, page))
> +                       ret = 0;
> +               else
> +                       ret = -EBUSY;
> +       } else if (ret < 0) {
> +               if (ret == -EHWPOISON) {
> +                       ret = unpoison_taken_off_page(&unpoison_rs, p);
> +               } else
> +                       unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
> +                                        pfn, &unpoison_rs);
> +       } else {
> +               int freeit = clear_page_hwpoison(&unpoison_rs, p);
>
> -       put_page(page);
> -       if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
>                 put_page(page);
> +               if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) {
> +                       put_page(page);
> +                       ret = 0;
> +               }
> +       }
>
>  unlock_mutex:
>         mutex_unlock(&mf_mutex);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c5952749ad40..3a2e23e6c769 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -9448,6 +9448,7 @@ bool take_page_off_buddy(struct page *page)
>                         del_page_from_free_list(page_head, zone, page_order);
>                         break_down_buddy_pages(zone, page_head, page, 0,
>                                                 page_order, migratetype);
> +                       SetPageHWPoisonTakenOff(page);
>                         if (!is_migrate_isolate(migratetype))
>                                 __mod_zone_freepage_state(zone, -1, migratetype);
>                         ret = true;
> @@ -9459,4 +9460,26 @@ bool take_page_off_buddy(struct page *page)
>         spin_unlock_irqrestore(&zone->lock, flags);
>         return ret;
>  }
> +
> +/*
> + * Cancel takeoff done by take_page_off_buddy().
> + */
> +bool put_page_back_buddy(struct page *page)
> +{
> +       struct zone *zone = page_zone(page);
> +       unsigned long pfn = page_to_pfn(page);
> +       unsigned long flags;
> +       int migratetype = get_pfnblock_migratetype(page, pfn);
> +       bool ret = false;
> +
> +       spin_lock_irqsave(&zone->lock, flags);
> +       if (put_page_testzero(page)) {
> +               ClearPageHWPoisonTakenOff(page);
> +               __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE);
> +               ret = true;
> +       }
> +       spin_unlock_irqrestore(&zone->lock, flags);
> +
> +       return ret;
> +}
>  #endif
> --
> 2.25.1
>
>
Naoya Horiguchi Nov. 9, 2021, 12:53 a.m. UTC | #3
On Mon, Nov 08, 2021 at 03:27:55PM -0800, Yang Shi wrote:
> On Thu, Nov 4, 2021 at 10:52 PM Naoya Horiguchi
> <naoya.horiguchi@linux.dev> wrote:
...
> > @@ -1942,6 +1987,26 @@ core_initcall(memory_failure_init);
> >                 pr_info(fmt, pfn);                      \
> >  })
> >
> > +static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
> > +{
> > +       if (TestClearPageHWPoison(p)) {
> > +               unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> > +                                page_to_pfn(p), rs);
> > +               num_poisoned_pages_dec();
> > +               return 1;
> > +       }
> > +       return 0;
> > +}
> > +
> > +static inline int unpoison_taken_off_page(struct ratelimit_state *rs,
> > +                                         struct page *p)
> > +{
> > +       if (put_page_back_buddy(p) && clear_page_hwpoison(rs, p))
> 
> It seems there might be race condition between free and allocation to
> kick the page out of buddy before the hwpoisoned flag is cleared IIUC?
> 
>         CPU A                     CPU B
> Free:                            Allocation:
> acquire zone lock
> put back to buddy
> release zone lock
>                                      acquire zone lock
>                                      try to allocate this page
>                                      if it is hwpoisoned then reset
> PageBuddy (not buddy page anymore)
>                                      release zone lock
> 
> Did I miss something?

No, this is a nice catch. CPU B finds hwpoisoned page on free list then
calling bad_page().  So doing "put back to buddy" and clear_page_hwpoison
inside a single zone lock should prevent this race.

Thanks,
Naoya Horiguchi
Yang Shi Nov. 9, 2021, 1:03 a.m. UTC | #4
On Mon, Nov 8, 2021 at 4:53 PM Naoya Horiguchi
<naoya.horiguchi@linux.dev> wrote:
>
> On Mon, Nov 08, 2021 at 03:27:55PM -0800, Yang Shi wrote:
> > On Thu, Nov 4, 2021 at 10:52 PM Naoya Horiguchi
> > <naoya.horiguchi@linux.dev> wrote:
> ...
> > > @@ -1942,6 +1987,26 @@ core_initcall(memory_failure_init);
> > >                 pr_info(fmt, pfn);                      \
> > >  })
> > >
> > > +static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
> > > +{
> > > +       if (TestClearPageHWPoison(p)) {
> > > +               unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> > > +                                page_to_pfn(p), rs);
> > > +               num_poisoned_pages_dec();
> > > +               return 1;
> > > +       }
> > > +       return 0;
> > > +}
> > > +
> > > +static inline int unpoison_taken_off_page(struct ratelimit_state *rs,
> > > +                                         struct page *p)
> > > +{
> > > +       if (put_page_back_buddy(p) && clear_page_hwpoison(rs, p))
> >
> > It seems there might be race condition between free and allocation to
> > kick the page out of buddy before the hwpoisoned flag is cleared IIUC?
> >
> >         CPU A                     CPU B
> > Free:                            Allocation:
> > acquire zone lock
> > put back to buddy
> > release zone lock
> >                                      acquire zone lock
> >                                      try to allocate this page
> >                                      if it is hwpoisoned then reset
> > PageBuddy (not buddy page anymore)
> >                                      release zone lock
> >
> > Did I miss something?
>
> No, this is a nice catch. CPU B finds hwpoisoned page on free list then
> calling bad_page().  So doing "put back to buddy" and clear_page_hwpoison
> inside a single zone lock should prevent this race.

Yeah, or clear the flag before putting it back to buddy? Anyway
clearing the flag under zone lock is not a big deal either.

>
> Thanks,
> Naoya Horiguchi
Naoya Horiguchi Nov. 10, 2021, 2:48 a.m. UTC | #5
On Sun, Nov 07, 2021 at 07:25:13AM +0800, kernel test robot wrote:
> Hi Naoya,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on hnaz-mm/master]
> 
> url:    https://github.com/0day-ci/linux/commits/Naoya-Horiguchi/mm-hwpoison-fix-unpoison_memory/20211105-135311
> base:   https://github.com/hnaz/linux-mm master
> config: i386-allyesconfig (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> reproduce (this is a W=1 build):
>         # https://github.com/0day-ci/linux/commit/1b78bd027ee061f9378cf47687f9f4649a160fa4
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Naoya-Horiguchi/mm-hwpoison-fix-unpoison_memory/20211105-135311
>         git checkout 1b78bd027ee061f9378cf47687f9f4649a160fa4
>         # save the attached .config to linux build tree
>         make W=1 ARCH=i386 

I can't reproduce the same error with the above procedure (maybe because I
use newer version of gcc?), but anyway I suspect that memory-failure.c
should be compiled for i386 (which should not support software-recoverable
memory error).

X86_SUPPORTS_MEMORY_FAILURE depends on (X86_64 || !SPARSEMEM) in arch/x86/Kconfig,
where according to commit d949f36f1865 ("x86: Fix hwpoison code related build
failure on 32-bit NUMAQ"), dependency on !SPRASEMEM is just to avoid exhaustion of
page flag field. So I'm thinking of fixing this error by removing the dependency
on !SPARSEMEM like below.


diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bc9503674249..9a047cb71bba 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -717,11 +717,10 @@ config X86_32_NON_STANDARD
 config X86_SUPPORTS_MEMORY_FAILURE
        def_bool y
        # MCE code calls memory_failure():
        depends on X86_MCE
-       # On 32-bit this adds too big of NODES_SHIFT and we run out of page flags:
-       # On 32-bit SPARSEMEM adds too big of SECTIONS_WIDTH:
-       depends on X86_64 || !SPARSEMEM
+       # On 32-bit software recoverable memory error is not supported:
+       depends on X86_64
        select ARCH_SUPPORTS_MEMORY_FAILURE

 config STA2X11
        bool "STA2X11 Companion Chip Support"


> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>

I'll add the tag. Thank you for the report.

- Naoya Horiguchi

> 
> All errors (new ones prefixed by >>):
> 
>    In file included from include/linux/mmzone.h:22,
>                     from include/linux/gfp.h:6,
>                     from include/linux/mm.h:10,
>                     from mm/memory-failure.c:37:
>    mm/memory-failure.c: In function 'SetPageHWPoisonTakenOff':
> >> include/linux/page-flags.h:525:24: error: unsigned conversion from 'long long int' to 'long unsigned int' changes value from '5212723395167473486' to '1230196558' [-Werror=overflow]
>      525 | #define MAGIC_HWPOISON 0x4857504f49534f4e
>          |                        ^~~~~~~~~~~~~~~~~~
>    mm/memory-failure.c:1170:25: note: in expansion of macro 'MAGIC_HWPOISON'
>     1170 |  set_page_private(page, MAGIC_HWPOISON);
>          |                         ^~~~~~~~~~~~~~
>    cc1: all warnings being treated as errors
> 
> 
> vim +525 include/linux/page-flags.h
> 
>    520	
>    521	#ifdef CONFIG_MEMORY_FAILURE
>    522	PAGEFLAG(HWPoison, hwpoison, PF_ANY)
>    523	TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
>    524	#define __PG_HWPOISON (1UL << PG_hwpoison)
>  > 525	#define MAGIC_HWPOISON	0x4857504f49534f4e
>    526	extern void SetPageHWPoisonTakenOff(struct page *page);
>    527	extern void ClearPageHWPoisonTakenOff(struct page *page);
>    528	extern bool take_page_off_buddy(struct page *page);
>    529	extern bool put_page_back_buddy(struct page *page);
>    530	#else
>    531	PAGEFLAG_FALSE(HWPoison, hwpoison)
>    532	#define __PG_HWPOISON 0
>    533	#endif
>    534	
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7941bca871dc..8bbb7205ef9f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3220,6 +3220,7 @@  enum mf_flags {
 	MF_ACTION_REQUIRED = 1 << 1,
 	MF_MUST_KILL = 1 << 2,
 	MF_SOFT_OFFLINE = 1 << 3,
+	MF_UNPOISON = 1 << 4,
 };
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 52ec4b5e5615..ccfe083152c4 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -522,7 +522,11 @@  PAGEFLAG_FALSE(Uncached, uncached)
 PAGEFLAG(HWPoison, hwpoison, PF_ANY)
 TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
 #define __PG_HWPOISON (1UL << PG_hwpoison)
+#define MAGIC_HWPOISON	0x4857504f49534f4e
+extern void SetPageHWPoisonTakenOff(struct page *page);
+extern void ClearPageHWPoisonTakenOff(struct page *page);
 extern bool take_page_off_buddy(struct page *page);
+extern bool put_page_back_buddy(struct page *page);
 #else
 PAGEFLAG_FALSE(HWPoison, hwpoison)
 #define __PG_HWPOISON 0
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 722036539b44..f406e85eb733 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1160,6 +1160,22 @@  static int page_action(struct page_state *ps, struct page *p,
 	return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
 }
 
+static inline bool PageHWPoisonTakenOff(struct page *page)
+{
+	return PageHWPoison(page) && page_private(page) == MAGIC_HWPOISON;
+}
+
+void SetPageHWPoisonTakenOff(struct page *page)
+{
+	set_page_private(page, MAGIC_HWPOISON);
+}
+
+void ClearPageHWPoisonTakenOff(struct page *page)
+{
+	if (PageHWPoison(page))
+		set_page_private(page, 0);
+}
+
 /*
  * Return true if a page type of a given page is supported by hwpoison
  * mechanism (while handling could fail), otherwise false.  This function
@@ -1262,6 +1278,27 @@  static int get_any_page(struct page *p, unsigned long flags)
 	return ret;
 }
 
+static int __get_unpoison_page(struct page *page)
+{
+	struct page *head = compound_head(page);
+	int ret = 0;
+	bool hugetlb = false;
+
+	ret = get_hwpoison_huge_page(head, &hugetlb);
+	if (hugetlb)
+		return ret;
+
+	/*
+	 * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison,
+	 * but also isolated from buddy freelist, so need to identify the
+	 * state and have to cancel both operations to unpoison.
+	 */
+	if (PageHWPoisonTakenOff(page))
+		return -EHWPOISON;
+
+	return get_page_unless_zero(page) ? 1 : 0;
+}
+
 /**
  * get_hwpoison_page() - Get refcount for memory error handling
  * @p:		Raw error page (hit by memory error)
@@ -1278,18 +1315,26 @@  static int get_any_page(struct page *p, unsigned long flags)
  * extra care for the error page's state (as done in __get_hwpoison_page()),
  * and has some retry logic in get_any_page().
  *
+ * When called from unpoison_memory(), the caller should already ensure that
+ * the given page has PG_hwpoison. So it's never reused for other page
+ * allocations, and __get_unpoison_page() never races with them.
+ *
  * Return: 0 on failure,
  *         1 on success for in-use pages in a well-defined state,
  *         -EIO for pages on which we can not handle memory errors,
  *         -EBUSY when get_hwpoison_page() has raced with page lifecycle
- *         operations like allocation and free.
+ *         operations like allocation and free,
+ *         -EHWPOISON when the page is hwpoisoned and taken off from buddy.
  */
 static int get_hwpoison_page(struct page *p, unsigned long flags)
 {
 	int ret;
 
 	zone_pcp_disable(page_zone(p));
-	ret = get_any_page(p, flags);
+	if (flags & MF_UNPOISON)
+		ret = __get_unpoison_page(p);
+	else
+		ret = get_any_page(p, flags);
 	zone_pcp_enable(page_zone(p));
 
 	return ret;
@@ -1942,6 +1987,26 @@  core_initcall(memory_failure_init);
 		pr_info(fmt, pfn);			\
 })
 
+static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
+{
+	if (TestClearPageHWPoison(p)) {
+		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
+				 page_to_pfn(p), rs);
+		num_poisoned_pages_dec();
+		return 1;
+	}
+	return 0;
+}
+
+static inline int unpoison_taken_off_page(struct ratelimit_state *rs,
+					  struct page *p)
+{
+	if (put_page_back_buddy(p) && clear_page_hwpoison(rs, p))
+		return 0;
+	else
+		return -EBUSY;
+}
+
 /**
  * unpoison_memory - Unpoison a previously poisoned page
  * @pfn: Page number of the to be unpoisoned page
@@ -1958,9 +2023,7 @@  int unpoison_memory(unsigned long pfn)
 {
 	struct page *page;
 	struct page *p;
-	int freeit = 0;
-	int ret = 0;
-	unsigned long flags = 0;
+	int ret = -EBUSY;
 	static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
 					DEFAULT_RATELIMIT_BURST);
 
@@ -1996,24 +2059,30 @@  int unpoison_memory(unsigned long pfn)
 		goto unlock_mutex;
 	}
 
-	if (!get_hwpoison_page(p, flags)) {
-		if (TestClearPageHWPoison(p))
-			num_poisoned_pages_dec();
-		unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
-				 pfn, &unpoison_rs);
+	if (PageSlab(page) || PageTable(page))
 		goto unlock_mutex;
-	}
 
-	if (TestClearPageHWPoison(page)) {
-		unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
-				 pfn, &unpoison_rs);
-		num_poisoned_pages_dec();
-		freeit = 1;
-	}
+	ret = get_hwpoison_page(p, MF_UNPOISON);
+	if (!ret) {
+		if (clear_page_hwpoison(&unpoison_rs, page))
+			ret = 0;
+		else
+			ret = -EBUSY;
+	} else if (ret < 0) {
+		if (ret == -EHWPOISON) {
+			ret = unpoison_taken_off_page(&unpoison_rs, p);
+		} else
+			unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
+					 pfn, &unpoison_rs);
+	} else {
+		int freeit = clear_page_hwpoison(&unpoison_rs, p);
 
-	put_page(page);
-	if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
 		put_page(page);
+		if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) {
+			put_page(page);
+			ret = 0;
+		}
+	}
 
 unlock_mutex:
 	mutex_unlock(&mf_mutex);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..3a2e23e6c769 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9448,6 +9448,7 @@  bool take_page_off_buddy(struct page *page)
 			del_page_from_free_list(page_head, zone, page_order);
 			break_down_buddy_pages(zone, page_head, page, 0,
 						page_order, migratetype);
+			SetPageHWPoisonTakenOff(page);
 			if (!is_migrate_isolate(migratetype))
 				__mod_zone_freepage_state(zone, -1, migratetype);
 			ret = true;
@@ -9459,4 +9460,26 @@  bool take_page_off_buddy(struct page *page)
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret;
 }
+
+/*
+ * Cancel takeoff done by take_page_off_buddy().
+ */
+bool put_page_back_buddy(struct page *page)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long pfn = page_to_pfn(page);
+	unsigned long flags;
+	int migratetype = get_pfnblock_migratetype(page, pfn);
+	bool ret = false;
+
+	spin_lock_irqsave(&zone->lock, flags);
+	if (put_page_testzero(page)) {
+		ClearPageHWPoisonTakenOff(page);
+		__free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE);
+		ret = true;
+	}
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	return ret;
+}
 #endif