mm, memory_hotplug: Don't bail out in do_migrate_range prematurely
diff mbox series

Message ID 20181211085042.2696-1-osalvador@suse.de
State New
Headers show
Series
  • mm, memory_hotplug: Don't bail out in do_migrate_range prematurely
Related show

Commit Message

Oscar Salvador Dec. 11, 2018, 8:50 a.m. UTC
do_migrate_ranges() takes a memory range and tries to isolate the
pages to put them into a list.
This list will be later on used in migrate_pages() to know
the pages we need to migrate.

Currently, if we fail to isolate a single page, we put all already
isolated pages back to their LRU and we bail out from the function.
This is quite suboptimal, as this will force us to start over again
because scan_movable_pages will give us the same range.
If there is no chance that we can isolate that page, we will loop here
forever.

Issue debugged in 4d0c7db96 ("hwpoison, memory_hotplug: allow hwpoisoned
pages to be offlined") has proved that.
During the debugging of that issue, it was noticed that if
do_migrate_ranges() fails to isolate a single page, we will
just discard the work we have done so far and bail out, which means
that scan_movable_pages() will find again the same set of pages.

Instead, we can just skip the error, keep isolating as much pages
as possible and then proceed with the call to migrate_pages().
This will allow us to do some work at least.

There is no danger in the skipped pages to be lost, because
scan_movable_pages() will give us them as they could not
be isolated and therefore migrated.

Although this patch has proved to be useful when dealing with
4d0c7db96 because it allows us to move forward as long as the
page is not in LRU, we still need 4d0c7db96
("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
to handle the LRU case and the unmapping of the page if needed.
So, this is just a follow-up cleanup.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/memory_hotplug.c | 19 +------------------
 1 file changed, 1 insertion(+), 18 deletions(-)

Comments

Oscar Salvador Dec. 11, 2018, 8:57 a.m. UTC | #1
On 2018-12-11 09:50, Oscar Salvador wrote:

> -		} else {
> -			pr_warn("failed to isolate pfn %lx\n", pfn);
> -			dump_page(page, "isolation failed");
> -			put_page(page);
> -			/* Because we don't have big zone->lock. we should
> -			   check this again here. */
> -			if (page_count(page)) {
> -				not_managed++;
> -				ret = -EBUSY;
> -				break;

I forgot that here we should at least leave the put_page().
But leave also the dump_page() and the pr_warn().

--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1394,6 +1394,10 @@ do_migrate_range(unsigned long start_pfn, 
unsigned long end_pfn)
                                 inc_node_page_state(page, 
NR_ISOLATED_ANON +
                                                     
page_is_file_cache(page));

+               } else {
+                       pr_warn("failed to isolate pfn %lx\n", pfn);
+                       dump_page(page, "isolation failed");
+                       put_page(page);
                 }
         }
         if (!list_empty(&source)) {
David Hildenbrand Dec. 11, 2018, 9:35 a.m. UTC | #2
On 11.12.18 09:50, Oscar Salvador wrote:
> do_migrate_ranges() takes a memory range and tries to isolate the
> pages to put them into a list.
> This list will be later on used in migrate_pages() to know
> the pages we need to migrate.
> 
> Currently, if we fail to isolate a single page, we put all already
> isolated pages back to their LRU and we bail out from the function.
> This is quite suboptimal, as this will force us to start over again
> because scan_movable_pages will give us the same range.
> If there is no chance that we can isolate that page, we will loop here
> forever.
> 
> Issue debugged in 4d0c7db96 ("hwpoison, memory_hotplug: allow hwpoisoned
> pages to be offlined") has proved that.
> During the debugging of that issue, it was noticed that if
> do_migrate_ranges() fails to isolate a single page, we will
> just discard the work we have done so far and bail out, which means
> that scan_movable_pages() will find again the same set of pages.
> 
> Instead, we can just skip the error, keep isolating as much pages
> as possible and then proceed with the call to migrate_pages().
> This will allow us to do some work at least.
> 
> There is no danger in the skipped pages to be lost, because
> scan_movable_pages() will give us them as they could not
> be isolated and therefore migrated.
> 
> Although this patch has proved to be useful when dealing with
> 4d0c7db96 because it allows us to move forward as long as the
> page is not in LRU, we still need 4d0c7db96
> ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
> to handle the LRU case and the unmapping of the page if needed.
> So, this is just a follow-up cleanup.

So if I get it right the behavior of the function is now

"If at least one page could be migrated, return 0. If no pages could be
migrated, return an error."

Don't we want that to be

"If all pages could be migrated, return 0. If only a subset could be
migrated, return -EAGAIN. If no pages could be migrated, return the
causing error."

So somehow remember if we had issues with one page and instead of
reporting 0, report e.g. -EAGAIN?

Meaning something like "If you get -EAGAIN, try again to get the real
reason".

> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/memory_hotplug.c | 19 +------------------
>  1 file changed, 1 insertion(+), 18 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 86ab673fc4e3..804d0280d2ab 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1339,7 +1339,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  	unsigned long pfn;
>  	struct page *page;
>  	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
> -	int not_managed = 0;
>  	int ret = 0;
>  	LIST_HEAD(source);
>  
> @@ -1395,25 +1394,9 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  				inc_node_page_state(page, NR_ISOLATED_ANON +
>  						    page_is_file_cache(page));
>  
> -		} else {
> -			pr_warn("failed to isolate pfn %lx\n", pfn);
> -			dump_page(page, "isolation failed");
> -			put_page(page);
> -			/* Because we don't have big zone->lock. we should
> -			   check this again here. */
> -			if (page_count(page)) {
> -				not_managed++;
> -				ret = -EBUSY;
> -				break;
> -			}
>  		}
>  	}
>  	if (!list_empty(&source)) {
> -		if (not_managed) {
> -			putback_movable_pages(&source);
> -			goto out;
> -		}
> -
>  		/* Allocate a new page from the nearest neighbor node */
>  		ret = migrate_pages(&source, new_node_page, NULL, 0,
>  					MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
> @@ -1426,7 +1409,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  			putback_movable_pages(&source);
>  		}
>  	}
> -out:
> +
>  	return ret;
>  }
>  
>
David Hildenbrand Dec. 11, 2018, 9:37 a.m. UTC | #3
On 11.12.18 09:57, osalvador@suse.de wrote:
> On 2018-12-11 09:50, Oscar Salvador wrote:
> 
>> -		} else {
>> -			pr_warn("failed to isolate pfn %lx\n", pfn);
>> -			dump_page(page, "isolation failed");
>> -			put_page(page);
>> -			/* Because we don't have big zone->lock. we should
>> -			   check this again here. */
>> -			if (page_count(page)) {
>> -				not_managed++;
>> -				ret = -EBUSY;
>> -				break;
> 
> I forgot that here we should at least leave the put_page().
> But leave also the dump_page() and the pr_warn().
> 
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1394,6 +1394,10 @@ do_migrate_range(unsigned long start_pfn, 
> unsigned long end_pfn)
>                                  inc_node_page_state(page, 
> NR_ISOLATED_ANON +
>                                                      
> page_is_file_cache(page));
> 
> +               } else {
> +                       pr_warn("failed to isolate pfn %lx\n", pfn);
> +                       dump_page(page, "isolation failed");
> +                       put_page(page);

When we're stuck with one problematic page, and we keep looping over
that function, won't that print out *way too much* messages? Shouldn't
that be rate limited somehow (same applies to other callers in this file)

>                  }
>          }
>          if (!list_empty(&source)) {
>
Michal Hocko Dec. 11, 2018, 10:18 a.m. UTC | #4
On Tue 11-12-18 09:50:42, Oscar Salvador wrote:
> do_migrate_ranges() takes a memory range and tries to isolate the
> pages to put them into a list.
> This list will be later on used in migrate_pages() to know
> the pages we need to migrate.
> 
> Currently, if we fail to isolate a single page, we put all already
> isolated pages back to their LRU and we bail out from the function.
> This is quite suboptimal, as this will force us to start over again
> because scan_movable_pages will give us the same range.
> If there is no chance that we can isolate that page, we will loop here
> forever.

This is true but reorganizing the code will not help the underlying
issue. Because the permanently failing page will be still there for
scan_movable_pages to encounter.
 
> Issue debugged in 4d0c7db96 ("hwpoison, memory_hotplug: allow hwpoisoned
> pages to be offlined") has proved that.

I assume that 4d0c7db96 is a sha1 from the linux-next. Please note that
this is not going to be the case when merged upstream. So I would use a
link.

> During the debugging of that issue, it was noticed that if
> do_migrate_ranges() fails to isolate a single page, we will
> just discard the work we have done so far and bail out, which means
> that scan_movable_pages() will find again the same set of pages.
> 
> Instead, we can just skip the error, keep isolating as much pages
> as possible and then proceed with the call to migrate_pages().
> This will allow us to do some work at least.

Yes, this makes sense to me.

> There is no danger in the skipped pages to be lost, because
> scan_movable_pages() will give us them as they could not
> be isolated and therefore migrated.
> 
> Although this patch has proved to be useful when dealing with
> 4d0c7db96 because it allows us to move forward as long as the
> page is not in LRU, we still need 4d0c7db96
> ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
> to handle the LRU case and the unmapping of the page if needed.
> So, this is just a follow-up cleanup.

I suspect the above paragraph is adding more confusion than necessary. I
would just drop it.

The main question here is. Do we want to migrate as much as possible or
do we want to be conservative and bail out early. The later could be an
advantage if the next attempt could fail the whole operation because the
impact of the failed operation would be somehow reduced. The former
should be better for throughput because easily done stuff is done first.

I would go with the throuput because our failure mode is to bail out
much earlier - even before we try to migrate. Even though the detection
is not perfect it works reasonably well for most usecases.

[...]
> @@ -1395,25 +1394,9 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  				inc_node_page_state(page, NR_ISOLATED_ANON +
>  						    page_is_file_cache(page));
>  
> -		} else {
> -			pr_warn("failed to isolate pfn %lx\n", pfn);
> -			dump_page(page, "isolation failed");
> -			put_page(page);
> -			/* Because we don't have big zone->lock. we should
> -			   check this again here. */
> -			if (page_count(page)) {
> -				not_managed++;
> -				ret = -EBUSY;
> -				break;
> -			}
>  		}

you really want to keep this branch. You just do not want to bail out.
We want to know about pages which fail to isolate and you definitely do
not want to keep the reference elevated behind. not_managed stuff can go
away.
Michal Hocko Dec. 11, 2018, 10:20 a.m. UTC | #5
On Tue 11-12-18 10:35:53, David Hildenbrand wrote:
> So somehow remember if we had issues with one page and instead of
> reporting 0, report e.g. -EAGAIN?

There is no consumer of the return value right now and it is not really
clear whether we need one. I would just make do_migrate_range return void.
David Hildenbrand Dec. 11, 2018, 10:35 a.m. UTC | #6
On 11.12.18 11:20, Michal Hocko wrote:
> On Tue 11-12-18 10:35:53, David Hildenbrand wrote:
>> So somehow remember if we had issues with one page and instead of
>> reporting 0, report e.g. -EAGAIN?
> 
> There is no consumer of the return value right now and it is not really
> clear whether we need one. I would just make do_migrate_range return void.
> 

Well, this would allow optimizations like "No need to check if
everything has been migrated, I can tell you right away that it has been
done".
Oscar Salvador Dec. 11, 2018, 12:22 p.m. UTC | #7
On 2018-12-11 11:18, Michal Hocko wrote:
>> Currently, if we fail to isolate a single page, we put all already
>> isolated pages back to their LRU and we bail out from the function.
>> This is quite suboptimal, as this will force us to start over again
>> because scan_movable_pages will give us the same range.
>> If there is no chance that we can isolate that page, we will loop here
>> forever.
> 
> This is true but reorganizing the code will not help the underlying
> issue. Because the permanently failing page will be still there for
> scan_movable_pages to encounter.

Well, it would only help in case the page is neither LRU nor
non-movable page, then we would fail to isolate it in do_migrate_range
and we will start over.
Letting do_migrate_range do some work, would mean that at some point
the permanently failing page will not be within a range but the first 
one
of a range, and so scan_movable_pages will skip it.


> 
>> Issue debugged in 4d0c7db96 ("hwpoison, memory_hotplug: allow 
>> hwpoisoned
>> pages to be offlined") has proved that.
> 
> I assume that 4d0c7db96 is a sha1 from the linux-next. Please note that
> this is not going to be the case when merged upstream. So I would use a
> link.

I will replace the sha1 with the link in the next version.

>> Although this patch has proved to be useful when dealing with
>> 4d0c7db96 because it allows us to move forward as long as the
>> page is not in LRU, we still need 4d0c7db96
>> ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
>> to handle the LRU case and the unmapping of the page if needed.
>> So, this is just a follow-up cleanup.
> 
> I suspect the above paragraph is adding more confusion than necessary. 
> I
> would just drop it.

Fair enough, I will drop it.

> The main question here is. Do we want to migrate as much as possible or
> do we want to be conservative and bail out early. The later could be an
> advantage if the next attempt could fail the whole operation because 
> the
> impact of the failed operation would be somehow reduced. The former
> should be better for throughput because easily done stuff is done 
> first.
> 
> I would go with the throuput because our failure mode is to bail out
> much earlier - even before we try to migrate. Even though the detection
> is not perfect it works reasonably well for most usecases.

I agree here.
I think it is better to do as much work as possible at once.


> you really want to keep this branch. You just do not want to bail out.
> We want to know about pages which fail to isolate and you definitely do
> not want to keep the reference elevated behind. not_managed stuff can 
> go
> away.

Yeah, I just realized when I sent it.
I will fix it up in v2.

Thanks
Michal Hocko Dec. 11, 2018, 12:52 p.m. UTC | #8
On Tue 11-12-18 13:22:27, osalvador@suse.de wrote:
> On 2018-12-11 11:18, Michal Hocko wrote:
[...]
> > The main question here is. Do we want to migrate as much as possible or
> > do we want to be conservative and bail out early. The later could be an
> > advantage if the next attempt could fail the whole operation because the
> > impact of the failed operation would be somehow reduced. The former
> > should be better for throughput because easily done stuff is done first.
> > 
> > I would go with the throuput because our failure mode is to bail out
> > much earlier - even before we try to migrate. Even though the detection
> > is not perfect it works reasonably well for most usecases.
> 
> I agree here.
> I think it is better to do as much work as possible at once.

This would be great to mention in the changelog. Because that is the
real justification for the change IMHO.

Patch
diff mbox series

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 86ab673fc4e3..804d0280d2ab 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1339,7 +1339,6 @@  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 	unsigned long pfn;
 	struct page *page;
 	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
-	int not_managed = 0;
 	int ret = 0;
 	LIST_HEAD(source);
 
@@ -1395,25 +1394,9 @@  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 				inc_node_page_state(page, NR_ISOLATED_ANON +
 						    page_is_file_cache(page));
 
-		} else {
-			pr_warn("failed to isolate pfn %lx\n", pfn);
-			dump_page(page, "isolation failed");
-			put_page(page);
-			/* Because we don't have big zone->lock. we should
-			   check this again here. */
-			if (page_count(page)) {
-				not_managed++;
-				ret = -EBUSY;
-				break;
-			}
 		}
 	}
 	if (!list_empty(&source)) {
-		if (not_managed) {
-			putback_movable_pages(&source);
-			goto out;
-		}
-
 		/* Allocate a new page from the nearest neighbor node */
 		ret = migrate_pages(&source, new_node_page, NULL, 0,
 					MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
@@ -1426,7 +1409,7 @@  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			putback_movable_pages(&source);
 		}
 	}
-out:
+
 	return ret;
 }