diff mbox series

[02/10] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED

Message ID 20190910103016.14290-3-osalvador@suse.de (mailing list archive)
State New, archived
Headers show
Series Hwpoison soft-offline rework | expand

Commit Message

Oscar Salvador Sept. 10, 2019, 10:30 a.m. UTC
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Currently madvise_inject_error() pins the target via get_user_pages_fast.
The call to get_user_pages_fast is only to get the respective page
of a given address, but it is the job of the memory-poisoning handler
to deal with races, so drop the refcount grabbed by get_user_pages_fast.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/madvise.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

Comments

David Hildenbrand Sept. 11, 2019, 10:23 a.m. UTC | #1
On 10.09.19 12:30, Oscar Salvador wrote:
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> 
> Currently madvise_inject_error() pins the target via get_user_pages_fast.
> The call to get_user_pages_fast is only to get the respective page
> of a given address, but it is the job of the memory-poisoning handler
> to deal with races, so drop the refcount grabbed by get_user_pages_fast.
> 
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/madvise.c | 25 +++++++++++--------------
>  1 file changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 6e023414f5c1..fbe6d402232c 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -883,6 +883,16 @@ static int madvise_inject_error(int behavior,
>  		ret = get_user_pages_fast(start, 1, 0, &page);
>  		if (ret != 1)
>  			return ret;
> +		/*
> +		 * The get_user_pages_fast() is just to get the pfn of the
> +		 * given address, and the refcount has nothing to do with
> +		 * what we try to test, so it should be released immediately.
> +		 * This is racy but it's intended because the real hardware
> +		 * errors could happen at any moment and memory error handlers
> +		 * must properly handle the race.
> +		 */
> +		put_page(page);
> +

I wonder if it would be clearer to do that after the page has been fully
used  - e.g. after getting the pfn and the order (and then e.g.,
symbolically setting the page pointer to 0).

I guess the important part of this patch is to not have an elevated
refcount while calling soft_offline_page().

>  		pfn = page_to_pfn(page);
>  
>  		/*
> @@ -892,16 +902,11 @@ static int madvise_inject_error(int behavior,
>  		 */
>  		order = compound_order(compound_head(page));
>  
> -		if (PageHWPoison(page)) {
> -			put_page(page);
> -			continue;
> -		}

This change is not reflected in the changelog. I would have expected
that only the put_page() would go. If this should go completely, I
suggest a separate patch.

> -
>  		if (behavior == MADV_SOFT_OFFLINE) {
>  			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
>  					pfn, start);
>  
> -			ret = soft_offline_page(page, MF_COUNT_INCREASED);
> +			ret = soft_offline_page(page, 0);
>  			if (ret)
>  				return ret;
>  			continue;
> @@ -909,14 +914,6 @@ static int madvise_inject_error(int behavior,
>  
>  		pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
>  				pfn, start);
> -
> -		/*
> -		 * Drop the page reference taken by get_user_pages_fast(). In
> -		 * the absence of MF_COUNT_INCREASED the memory_failure()
> -		 * routine is responsible for pinning the page to prevent it
> -		 * from being released back to the page allocator.
> -		 */
> -		put_page(page);
>  		ret = memory_failure(pfn, 0);
>  		if (ret)
>  			return ret;
>
David Hildenbrand Sept. 11, 2019, 10:27 a.m. UTC | #2
On 10.09.19 12:30, Oscar Salvador wrote:
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> 
> Currently madvise_inject_error() pins the target via get_user_pages_fast.
> The call to get_user_pages_fast is only to get the respective page
> of a given address, but it is the job of the memory-poisoning handler
> to deal with races, so drop the refcount grabbed by get_user_pages_fast.
> 

Oh, and another question "it is the job of the memory-poisoning handler"
- is that already properly implemented? (newbee question ¯\_(ツ)_/¯)
Naoya Horiguchi Sept. 12, 2019, 1:28 a.m. UTC | #3
Hi David,

On Wed, Sep 11, 2019 at 12:23:24PM +0200, David Hildenbrand wrote:
> On 10.09.19 12:30, Oscar Salvador wrote:
> > From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > 
> > Currently madvise_inject_error() pins the target via get_user_pages_fast.
> > The call to get_user_pages_fast is only to get the respective page
> > of a given address, but it is the job of the memory-poisoning handler
> > to deal with races, so drop the refcount grabbed by get_user_pages_fast.
> > 
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
> > ---
> >  mm/madvise.c | 25 +++++++++++--------------
> >  1 file changed, 11 insertions(+), 14 deletions(-)
> > 
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 6e023414f5c1..fbe6d402232c 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -883,6 +883,16 @@ static int madvise_inject_error(int behavior,
> >  		ret = get_user_pages_fast(start, 1, 0, &page);
> >  		if (ret != 1)
> >  			return ret;
> > +		/*
> > +		 * The get_user_pages_fast() is just to get the pfn of the
> > +		 * given address, and the refcount has nothing to do with
> > +		 * what we try to test, so it should be released immediately.
> > +		 * This is racy but it's intended because the real hardware
> > +		 * errors could happen at any moment and memory error handlers
> > +		 * must properly handle the race.
> > +		 */
> > +		put_page(page);
> > +
> 
> I wonder if it would be clearer to do that after the page has been fully
> used  - e.g. after getting the pfn and the order (and then e.g.,
> symbolically setting the page pointer to 0).

Yes, this could be called just after page_to_pfn() below.

> I guess the important part of this patch is to not have an elevated
> refcount while calling soft_offline_page().
> 

That's right.

> >  		pfn = page_to_pfn(page);
> >  
> >  		/*
> > @@ -892,16 +902,11 @@ static int madvise_inject_error(int behavior,
> >  		 */
> >  		order = compound_order(compound_head(page));
> >  
> > -		if (PageHWPoison(page)) {
> > -			put_page(page);
> > -			continue;
> > -		}
> 
> This change is not reflected in the changelog. I would have expected
> that only the put_page() would go. If this should go completely, I
> suggest a separate patch.
> 

I forget why I tried to remove the if block, and now I think only the
put_page() should go as you point out.

Thanks for the comment.

- Naoya
Naoya Horiguchi Sept. 12, 2019, 1:37 a.m. UTC | #4
On Wed, Sep 11, 2019 at 12:27:22PM +0200, David Hildenbrand wrote:
> On 10.09.19 12:30, Oscar Salvador wrote:
> > From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > 
> > Currently madvise_inject_error() pins the target via get_user_pages_fast.
> > The call to get_user_pages_fast is only to get the respective page
> > of a given address, but it is the job of the memory-poisoning handler
> > to deal with races, so drop the refcount grabbed by get_user_pages_fast.
> > 
> 
> Oh, and another question "it is the job of the memory-poisoning handler"
> - is that already properly implemented? (newbee question ¯\_(ツ)_/¯)

The above description might be confusing, sorry. It's intended likes

  The call to get_user_pages_fast is only to get the pointer to struct
  page of a given address, pinning it is memory-poisoning handler's job,
  so drop the refcount grabbed by get_user_pages_fast.

And pinning is done in get_hwpoison_page() for hard-offline and
get_any_page() for soft-offline.  For soft-offline case, the semantics of
refcount of poisoned pages is what this patchset tries to change/improve.

Thanks,
Naoya Horiguchi
David Hildenbrand Sept. 13, 2019, 8:37 a.m. UTC | #5
On 12.09.19 03:37, Naoya Horiguchi wrote:
> On Wed, Sep 11, 2019 at 12:27:22PM +0200, David Hildenbrand wrote:
>> On 10.09.19 12:30, Oscar Salvador wrote:
>>> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>>
>>> Currently madvise_inject_error() pins the target via get_user_pages_fast.
>>> The call to get_user_pages_fast is only to get the respective page
>>> of a given address, but it is the job of the memory-poisoning handler
>>> to deal with races, so drop the refcount grabbed by get_user_pages_fast.
>>>
>>
>> Oh, and another question "it is the job of the memory-poisoning handler"
>> - is that already properly implemented? (newbee question ¯\_(ツ)_/¯)
> 
> The above description might be confusing, sorry. It's intended likes
> 
>   The call to get_user_pages_fast is only to get the pointer to struct
>   page of a given address, pinning it is memory-poisoning handler's job,
>   so drop the refcount grabbed by get_user_pages_fast.
> 
> And pinning is done in get_hwpoison_page() for hard-offline and
> get_any_page() for soft-offline.  For soft-offline case, the semantics of
> refcount of poisoned pages is what this patchset tries to change/improve.

Thanks for the clarification!
diff mbox series

Patch

diff --git a/mm/madvise.c b/mm/madvise.c
index 6e023414f5c1..fbe6d402232c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -883,6 +883,16 @@  static int madvise_inject_error(int behavior,
 		ret = get_user_pages_fast(start, 1, 0, &page);
 		if (ret != 1)
 			return ret;
+		/*
+		 * The get_user_pages_fast() is just to get the pfn of the
+		 * given address, and the refcount has nothing to do with
+		 * what we try to test, so it should be released immediately.
+		 * This is racy but it's intended because the real hardware
+		 * errors could happen at any moment and memory error handlers
+		 * must properly handle the race.
+		 */
+		put_page(page);
+
 		pfn = page_to_pfn(page);
 
 		/*
@@ -892,16 +902,11 @@  static int madvise_inject_error(int behavior,
 		 */
 		order = compound_order(compound_head(page));
 
-		if (PageHWPoison(page)) {
-			put_page(page);
-			continue;
-		}
-
 		if (behavior == MADV_SOFT_OFFLINE) {
 			pr_info("Soft offlining pfn %#lx at process virtual address %#lx\n",
 					pfn, start);
 
-			ret = soft_offline_page(page, MF_COUNT_INCREASED);
+			ret = soft_offline_page(page, 0);
 			if (ret)
 				return ret;
 			continue;
@@ -909,14 +914,6 @@  static int madvise_inject_error(int behavior,
 
 		pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
 				pfn, start);
-
-		/*
-		 * Drop the page reference taken by get_user_pages_fast(). In
-		 * the absence of MF_COUNT_INCREASED the memory_failure()
-		 * routine is responsible for pinning the page to prevent it
-		 * from being released back to the page allocator.
-		 */
-		put_page(page);
 		ret = memory_failure(pfn, 0);
 		if (ret)
 			return ret;