diff mbox series

mm, memory_hotplug: teach has_unmovable_pages about of LRU migrateable pages

Message ID 20181102155528.20358-1-mhocko@kernel.org (mailing list archive)
State New, archived
Headers show
Series mm, memory_hotplug: teach has_unmovable_pages about of LRU migrateable pages | expand

Commit Message

Michal Hocko Nov. 2, 2018, 3:55 p.m. UTC
From: Michal Hocko <mhocko@suse.com>

Baoquan He has noticed that 15c30bc09085  ("mm, memory_hotplug: make
has_unmovable_pages more robust") is causing memory offlining failures
on a movable node. After a further debugging it turned out that
has_unmovable_pages fails prematurely because it stumbles over off-LRU
pages. Nevertheless those pages are not on LRU because they are waiting
on the pcp LRU caches (an example of __dump_page added by a debugging
patch)
[  560.923297] page:ffffea043f39fa80 count:1 mapcount:0 mapping:ffff880e5dce1b59 index:0x7f6eec459
[  560.931967] flags: 0x5fffffc0080024(uptodate|active|swapbacked)
[  560.937867] raw: 005fffffc0080024 dead000000000100 dead000000000200 ffff880e5dce1b59
[  560.945606] raw: 00000007f6eec459 0000000000000000 00000001ffffffff ffff880e43ae8000
[  560.953323] page dumped because: hotplug
[  560.957238] page->mem_cgroup:ffff880e43ae8000
[  560.961620] has_unmovable_pages: pfn:0x10fd030d, found:0x1, count:0x0
[  560.968127] page:ffffea043f40c340 count:2 mapcount:0 mapping:ffff880e2f2d8628 index:0x0
[  560.976104] flags: 0x5fffffc0000006(referenced|uptodate)
[  560.981401] raw: 005fffffc0000006 dead000000000100 dead000000000200 ffff880e2f2d8628
[  560.989119] raw: 0000000000000000 0000000000000000 00000002ffffffff ffff88010a8f5000
[  560.996833] page dumped because: hotplug

The issue could be worked around by calling lru_add_drain_all but we can
do better than that. We know that all swap backed pages are migrateable
and the same applies for pages which do implement the migratepage
callback.

Reported-by: Baoquan He <bhe@redhat.com>
Fixes: 15c30bc09085  ("mm, memory_hotplug: make has_unmovable_pages more robust")
Cc: stable
Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi,
we have been discussing issue reported by Baoquan [1] mostly off-list
and he has confirmed the patch solved failures he is seeing. I believe
that has_unmovable_pages begs for a much better implementation and/or
substantial pages isolation design rethinking but let's close the bug
which can be really annoying first.

[1] http://lkml.kernel.org/r/20181101091055.GA15166@MiWiFi-R3L-srv

 mm/page_alloc.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

Comments

Baoquan He Nov. 5, 2018, 12:20 a.m. UTC | #1
Hi Michal,

On 11/02/18 at 04:55pm, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Baoquan He has noticed that 15c30bc09085  ("mm, memory_hotplug: make
> has_unmovable_pages more robust") is causing memory offlining failures
> on a movable node. After a further debugging it turned out that
> has_unmovable_pages fails prematurely because it stumbles over off-LRU
> pages. Nevertheless those pages are not on LRU because they are waiting
> on the pcp LRU caches (an example of __dump_page added by a debugging
> patch)
> [  560.923297] page:ffffea043f39fa80 count:1 mapcount:0 mapping:ffff880e5dce1b59 index:0x7f6eec459
> [  560.931967] flags: 0x5fffffc0080024(uptodate|active|swapbacked)
> [  560.937867] raw: 005fffffc0080024 dead000000000100 dead000000000200 ffff880e5dce1b59
> [  560.945606] raw: 00000007f6eec459 0000000000000000 00000001ffffffff ffff880e43ae8000
> [  560.953323] page dumped because: hotplug
> [  560.957238] page->mem_cgroup:ffff880e43ae8000
> [  560.961620] has_unmovable_pages: pfn:0x10fd030d, found:0x1, count:0x0
> [  560.968127] page:ffffea043f40c340 count:2 mapcount:0 mapping:ffff880e2f2d8628 index:0x0
> [  560.976104] flags: 0x5fffffc0000006(referenced|uptodate)
> [  560.981401] raw: 005fffffc0000006 dead000000000100 dead000000000200 ffff880e2f2d8628
> [  560.989119] raw: 0000000000000000 0000000000000000 00000002ffffffff ffff88010a8f5000
> [  560.996833] page dumped because: hotplug

Sorry, last week I didn't test this patch with memory pressure adding.
Today use "stress -m 200 -t 2h" to add pressure, hot removing failed.
Will send you output log. W/o memory pressure, it sometimes succeed. I
saw one failure last night, it still show un-removable as 0 in
hotpluggable node one time, I worried it might be caused by my compiling
mistake, so compile and try again this morning.

Thanks
Baoquan
Michal Hocko Nov. 5, 2018, 9:14 a.m. UTC | #2
On Mon 05-11-18 08:20:09, Baoquan He wrote:
> Hi Michal,
> 
> On 11/02/18 at 04:55pm, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Baoquan He has noticed that 15c30bc09085  ("mm, memory_hotplug: make
> > has_unmovable_pages more robust") is causing memory offlining failures
> > on a movable node. After a further debugging it turned out that
> > has_unmovable_pages fails prematurely because it stumbles over off-LRU
> > pages. Nevertheless those pages are not on LRU because they are waiting
> > on the pcp LRU caches (an example of __dump_page added by a debugging
> > patch)
> > [  560.923297] page:ffffea043f39fa80 count:1 mapcount:0 mapping:ffff880e5dce1b59 index:0x7f6eec459
> > [  560.931967] flags: 0x5fffffc0080024(uptodate|active|swapbacked)
> > [  560.937867] raw: 005fffffc0080024 dead000000000100 dead000000000200 ffff880e5dce1b59
> > [  560.945606] raw: 00000007f6eec459 0000000000000000 00000001ffffffff ffff880e43ae8000
> > [  560.953323] page dumped because: hotplug
> > [  560.957238] page->mem_cgroup:ffff880e43ae8000
> > [  560.961620] has_unmovable_pages: pfn:0x10fd030d, found:0x1, count:0x0
> > [  560.968127] page:ffffea043f40c340 count:2 mapcount:0 mapping:ffff880e2f2d8628 index:0x0
> > [  560.976104] flags: 0x5fffffc0000006(referenced|uptodate)
> > [  560.981401] raw: 005fffffc0000006 dead000000000100 dead000000000200 ffff880e2f2d8628
> > [  560.989119] raw: 0000000000000000 0000000000000000 00000002ffffffff ffff88010a8f5000
> > [  560.996833] page dumped because: hotplug
> 
> Sorry, last week I didn't test this patch with memory pressure adding.
> Today use "stress -m 200 -t 2h" to add pressure, hot removing failed.
> Will send you output log. W/o memory pressure, it sometimes succeed. I
> saw one failure last night, it still show un-removable as 0 in
> hotpluggable node one time, I worried it might be caused by my compiling
> mistake, so compile and try again this morning.

In a private email you have sent this (let's assume this is correctly
testing the patch I have posted):

: [43283.914082] has_unmovable_pages: pfn:0x10e62600, found:0x1, count:0x0 
: [43283.920669] page:ffffea0439898000 count:1 mapcount:1 mapping:ffff880e5639d3c9 index:0x7f2430400 compound_mapcount: 1
: [43283.931219] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked)
: [43283.937954] raw: 005fffffc0090034 ffffea043ffcb888 ffffea043f728008 ffff880e5639d3c9
: [43283.945722] raw: 00000007f2430400 0000000000000000 00000001ffffffff ffff880e2baad000
: [43283.955381] page dumped because: hotplug

The page is both LRU and SwapBacked which should hit the some of the
checks in has_unmovable_pages. The fact it hasn't means we have clearly
raced with the page being allocated and marked SwapBacked/LRU. This is
surely possible and there is no universal way to prevent from that for
all types of potentially migratedable pages. The race window should be
relatively small. Maybe we can add a retry for movable zone pages.

How reproducible this is?

But, as I've said memory isolation resp. has_unmovable_pages begs for a
complete redesign.
Baoquan He Nov. 5, 2018, 9:26 a.m. UTC | #3
On 11/05/18 at 10:14am, Michal Hocko wrote:
> On Mon 05-11-18 08:20:09, Baoquan He wrote:
> > Hi Michal,
> > 
> > On 11/02/18 at 04:55pm, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > Baoquan He has noticed that 15c30bc09085  ("mm, memory_hotplug: make
> > > has_unmovable_pages more robust") is causing memory offlining failures
> > > on a movable node. After a further debugging it turned out that
> > > has_unmovable_pages fails prematurely because it stumbles over off-LRU
> > > pages. Nevertheless those pages are not on LRU because they are waiting
> > > on the pcp LRU caches (an example of __dump_page added by a debugging
> > > patch)
> > > [  560.923297] page:ffffea043f39fa80 count:1 mapcount:0 mapping:ffff880e5dce1b59 index:0x7f6eec459
> > > [  560.931967] flags: 0x5fffffc0080024(uptodate|active|swapbacked)
> > > [  560.937867] raw: 005fffffc0080024 dead000000000100 dead000000000200 ffff880e5dce1b59
> > > [  560.945606] raw: 00000007f6eec459 0000000000000000 00000001ffffffff ffff880e43ae8000
> > > [  560.953323] page dumped because: hotplug
> > > [  560.957238] page->mem_cgroup:ffff880e43ae8000
> > > [  560.961620] has_unmovable_pages: pfn:0x10fd030d, found:0x1, count:0x0
> > > [  560.968127] page:ffffea043f40c340 count:2 mapcount:0 mapping:ffff880e2f2d8628 index:0x0
> > > [  560.976104] flags: 0x5fffffc0000006(referenced|uptodate)
> > > [  560.981401] raw: 005fffffc0000006 dead000000000100 dead000000000200 ffff880e2f2d8628
> > > [  560.989119] raw: 0000000000000000 0000000000000000 00000002ffffffff ffff88010a8f5000
> > > [  560.996833] page dumped because: hotplug
> > 
> > Sorry, last week I didn't test this patch with memory pressure adding.
> > Today use "stress -m 200 -t 2h" to add pressure, hot removing failed.
> > Will send you output log. W/o memory pressure, it sometimes succeed. I
> > saw one failure last night, it still show un-removable as 0 in
> > hotpluggable node one time, I worried it might be caused by my compiling
> > mistake, so compile and try again this morning.
> 
> In a private email you have sent this (let's assume this is correctly
> testing the patch I have posted):

Yeah, I recompiled and copy bzImage to /boot to ensure the patch is
compiled in.

> 
> : [43283.914082] has_unmovable_pages: pfn:0x10e62600, found:0x1, count:0x0 
> : [43283.920669] page:ffffea0439898000 count:1 mapcount:1 mapping:ffff880e5639d3c9 index:0x7f2430400 compound_mapcount: 1
> : [43283.931219] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked)
> : [43283.937954] raw: 005fffffc0090034 ffffea043ffcb888 ffffea043f728008 ffff880e5639d3c9
> : [43283.945722] raw: 00000007f2430400 0000000000000000 00000001ffffffff ffff880e2baad000
> : [43283.955381] page dumped because: hotplug
> 
> The page is both LRU and SwapBacked which should hit the some of the
> checks in has_unmovable_pages. The fact it hasn't means we have clearly
> raced with the page being allocated and marked SwapBacked/LRU. This is
> surely possible and there is no universal way to prevent from that for
> all types of potentially migratedable pages. The race window should be
> relatively small. Maybe we can add a retry for movable zone pages.
> 
> How reproducible this is?

On the bare metal with 8 nodes and each node has 64 GB memory, node1~7
are hotpluggable and movable_node is added. After reboot, execute
"stress -m 200 -t 2h", by default the 200 processes will malloc/free
256MB continuously. Then hot remove one memory board on node1~7, it
always happened.

The progress is that all memory blocks on node1~7 are removable now,
accessing and reading them won't trigger the old trace now.

> 
> But, as I've said memory isolation resp. has_unmovable_pages begs for a
> complete redesign.

So how about using the patch I pasted before? It has an explanation and
test result is good. 

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a919ba5..021e39d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7824,7 +7824,8 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
                if (__PageMovable(page))
                        continue;

-               if (!PageLRU(page))
+               if (!PageLRU(page) &&
+                       (get_pageblock_migratetype(page) != MIGRATE_MOVABLE))
                        found++;
                /*
                 * If there are RECLAIMABLE pages, we need to check

Thanks
Baoquan
Michal Hocko Nov. 5, 2018, 9:28 a.m. UTC | #4
On Mon 05-11-18 10:14:07, Michal Hocko wrote:
> Maybe we can add a retry for movable zone pages.

Or something like this. Ugly as hell, no question about that. I also
have to think about this some more to convince myself this will not
result in an endless loop under some situations.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48ceda313332..342d66eca0f3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7779,12 +7779,16 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	pfn = page_to_pfn(page);
 	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
 		unsigned long check = pfn + iter;
+		unsigned long saved_flags;
 
 		if (!pfn_valid_within(check))
 			continue;
 
 		page = pfn_to_page(check);
 
+retry:
+		saved_flags = READ_ONCE(page->flags);
+
 		if (PageReserved(page))
 			goto unmovable;
 
@@ -7840,6 +7844,13 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 				page->mapping->a_ops->migratepage)
 			continue;
 
+		/*
+		 * We might race with the allocation of the page so retry
+		 * if flags have changed.
+		 */
+		if (saved_flags != READ_ONCE(page->flags))
+			goto retry;
+
 		/*
 		 * If there are RECLAIMABLE pages, we need to check
 		 * it.  But now, memory offline itself doesn't call
Michal Hocko Nov. 5, 2018, 9:35 a.m. UTC | #5
On Mon 05-11-18 17:26:18, Baoquan He wrote:
[...]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a919ba5..021e39d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7824,7 +7824,8 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>                 if (__PageMovable(page))
>                         continue;
> 
> -               if (!PageLRU(page))
> +               if (!PageLRU(page) &&
> +                       (get_pageblock_migratetype(page) != MIGRATE_MOVABLE))
>                         found++;
>                 /*
>                  * If there are RECLAIMABLE pages, we need to check

As explained during the private conversion I am not really thrilled by
this check. AFAIU this will be the case for basically all pages in the
zone_movable. As we have seen already some unexpected ones can lurk in
easily.
Baoquan He Nov. 5, 2018, 9:45 a.m. UTC | #6
On 11/05/18 at 10:28am, Michal Hocko wrote:
> On Mon 05-11-18 10:14:07, Michal Hocko wrote:
> > Maybe we can add a retry for movable zone pages.
> 
> Or something like this. Ugly as hell, no question about that. I also
> have to think about this some more to convince myself this will not
> result in an endless loop under some situations.

Testing, will tell the result.

> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 48ceda313332..342d66eca0f3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7779,12 +7779,16 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  	pfn = page_to_pfn(page);
>  	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
>  		unsigned long check = pfn + iter;
> +		unsigned long saved_flags;
>  
>  		if (!pfn_valid_within(check))
>  			continue;
>  
>  		page = pfn_to_page(check);
>  
> +retry:
> +		saved_flags = READ_ONCE(page->flags);
> +
>  		if (PageReserved(page))
>  			goto unmovable;
>  
> @@ -7840,6 +7844,13 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  				page->mapping->a_ops->migratepage)
>  			continue;
>  
> +		/*
> +		 * We might race with the allocation of the page so retry
> +		 * if flags have changed.
> +		 */
> +		if (saved_flags != READ_ONCE(page->flags))
> +			goto retry;
> +
>  		/*
>  		 * If there are RECLAIMABLE pages, we need to check
>  		 * it.  But now, memory offline itself doesn't call
> -- 
> Michal Hocko
> SUSE Labs
Baoquan He Nov. 5, 2018, 10:25 a.m. UTC | #7
Hi Michal,

On 11/05/18 at 10:28am, Michal Hocko wrote:
> 
> Or something like this. Ugly as hell, no question about that. I also
> have to think about this some more to convince myself this will not
> result in an endless loop under some situations.

It failed. Paste the log and patch diff here, please help check if I made
any mistake on manual code change. The log is at bottom.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a919ba5cb3c8..cdcd923ec337 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7779,14 +7779,22 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	pfn = page_to_pfn(page);
 	for (found = 0, iter = 0; iter < pageblock_nr_pages; iter++) {
 		unsigned long check = pfn + iter;
+		unsigned long saved_flags;
 
 		if (!pfn_valid_within(check))
 			continue;
 
 		page = pfn_to_page(check);
 
-		if (PageReserved(page))
+retry:
+		saved_flags = READ_ONCE(page->flags);
+
+
+		if (PageReserved(page)) {
+			pr_info("has_unmovable_pages 000: pfn:0x%x\n", pfn+iter);
+			__dump_page(page, "hotplug");
 			goto unmovable;
+		}
 
 		/*
 		 * Hugepages are not in LRU lists, but they're movable.
@@ -7795,8 +7803,11 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 		 */
 		if (PageHuge(page)) {
 
-			if (!hugepage_migration_supported(page_hstate(page)))
+			if (!hugepage_migration_supported(page_hstate(page))) {
+				pr_info("has_unmovable_pages 111: pfn:0x%x\n", pfn+iter);
+				__dump_page(page, "hotplug");
 				goto unmovable;
+			}
 
 			iter = round_up(iter + 1, 1<<compound_order(page)) - 1;
 			continue;
@@ -7824,8 +7835,29 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 		if (__PageMovable(page))
 			continue;
 
-		if (!PageLRU(page))
+#if 0
+		if (!PageLRU(page) && (get_pageblock_migratetype(page)!=MIGRATE_MOVABLE) )
 			found++;
+#endif
+               if (PageLRU(page))
+                       continue;
+
+               if (PageSwapBacked(page))
+                       continue;
+
+
+               if (page->mapping && !page->mapping->a_ops)
+		       pr_info("page->mapping:%ps \n", page->mapping);
+
+               if (page->mapping && page->mapping->a_ops && page->mapping->a_ops->migratepage)
+                       continue;
+
+		/*
+		 * We might race with the allocation of the page so retry
+		 * if flags have changed.
+		 */
+		if (saved_flags != READ_ONCE(page->flags))
+			goto retry;
 		/*
 		 * If there are RECLAIMABLE pages, we need to check
 		 * it.  But now, memory offline itself doesn't call
@@ -7839,8 +7871,11 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 		 * is set to both of a memory hole page and a _used_ kernel
 		 * page at boot.
 		 */
-		if (found > count)
+		if (++found > count) {
+			pr_info("has_unmovable_pages: pfn:0x%x, found:0x%x, count:0x%x \n", pfn+iter, found, count);
+			__dump_page(page, "hotplug");
 			goto unmovable;
+		}
 	}
 	return false;
 unmovable:


***********console log*******************
[  458.584711] Offlined Pages 524288
[  458.943655] Offlined Pages 524288
[  459.390757] Offlined Pages 524288
[  460.086409] Offlined Pages 524288
[  460.931868] Offlined Pages 524288
[  461.741327] Offlined Pages 524288
[  462.576653] Offlined Pages 524288
[  463.291947] Offlined Pages 524288
[  464.121980] Offlined Pages 524288
[  464.869983] Offlined Pages 524288
[  465.550254] Offlined Pages 524288
[  466.337934] Offlined Pages 524288
[  467.143416] Offlined Pages 524288
[  467.925108] Offlined Pages 524288
[  468.665318] Offlined Pages 524288
[  469.473999] Offlined Pages 524288
[  470.390116] Offlined Pages 524288
[  471.069104] Offlined Pages 524288
[  471.704154] Offlined Pages 524288
[  472.322466] Offlined Pages 524288
[  472.964513] Offlined Pages 524288
[  473.629328] Offlined Pages 524288
[  474.265908] Offlined Pages 524288
[  474.883829] Offlined Pages 524288
[  475.538700] Offlined Pages 524288
[  476.247451] Offlined Pages 524288
[  476.575516] has_unmovable_pages: pfn:0x10dfec00, found:0x1, count:0x0 
[  476.582103] page:ffffea0437fb0000 count:1 mapcount:1 mapping:ffff880e05239841 index:0x7f26e5000 compound_mapcount: 1
[  476.592645] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked)
[  476.599386] raw: 005fffffc0090034 ffffea043bd58008 ffffea0437fb8008 ffff880e05239841
[  476.607154] raw: 00000007f26e5000 0000000000000000 00000001ffffffff ffff880e74f5c000
[  476.616725] page dumped because: hotplug
[  476.620682] page->mem_cgroup:ffff880e74f5c000
[  476.625190] WARNING: CPU: 245 PID: 8 at mm/page_alloc.c:7882 has_unmovable_pages.cold.123+0x44/0xb6
[  476.634230] Modules linked in: vfat fat intel_rapl sb_edac x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul iTCO_wdt crc32_pclmul iTCO_vendor_support ghash_clmulni_intel intel_cstate joydev ses ipmi_si enclosure ipmi_devintf scsi_transport_sas intel_uncore ipmi_msghandler pcspkr intel_rapl_perf sg mei_me i2c_i801 mei lpc_ich wmi xfs libcrc32c sd_mod ahci igb crc32c_intel libahci i2c_algo_bit dca libata megaraid_sas dm_mirror dm_region_hash dm_log dm_mod
[  476.678239] CPU: 245 PID: 8 Comm: kworker/u576:0 Not tainted 4.19.0+ #9
[  476.684871] Hardware name:  9008/IT91SMUB, BIOS BLXSV512 03/22/2018
[  476.691199] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[  476.696678] RIP: 0010:has_unmovable_pages.cold.123+0x44/0xb6
[  476.702369] Code: fe 0f 0e 82 4c 89 ff e8 0f a0 02 00 48 8b 44 24 10 48 2b 40 50 48 89 c2 b8 01 00 00 00 48 81 fa 40 11 00 00 0f 85 ec eb ff ff <0f> 0b e9 e5 eb ff ff 48 89 de 48 c7 c7 08 4d 0a 82 e8 79 1e f0 ff
[  476.721100] RSP: 0018:ffffc900000e3c70 EFLAGS: 00010046
[  476.726361] RAX: 0000000000000001 RBX: 0000000010dfec00 RCX: 0000000000000006
[  476.733543] RDX: 0000000000001140 RSI: 0000000000000096 RDI: ffff880e7cb55ad0
[  476.742768] RBP: 005fffffc0010000 R08: 0000000000000bbf R09: 0000000000000007
[  476.749926] R10: 0000000000000000 R11: ffffffff829f162d R12: 0000000010dfec00
[  476.757082] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea0437fb0000
[  476.764241] FS:  0000000000000000(0000) GS:ffff880e7cb40000(0000) knlGS:0000000000000000
[  476.772338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  476.778102] CR2: 00007fc3670f3000 CR3: 0000000e716c8003 CR4: 00000000003606e0
[  476.785249] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  476.792405] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  476.799562] Call Trace:
[  476.804039]  start_isolate_page_range+0x258/0x2f0
[  476.808823]  __offline_pages+0xcc/0x8e0
[  476.812753]  ? klist_next+0xf2/0x100
[  476.816402]  ? device_is_dependent+0x90/0x90
[  476.820759]  memory_subsys_offline+0x40/0x60
[  476.825127]  device_offline+0x81/0xb0
[  476.828920]  acpi_bus_offline+0xdb/0x140
[  476.832937]  acpi_device_hotplug+0x21c/0x460
[  476.837281]  acpi_hotplug_work_fn+0x1a/0x30
[  476.841562]  process_one_work+0x1a1/0x3a0
[  476.845647]  worker_thread+0x30/0x380
[  476.849381]  ? drain_workqueue+0x120/0x120
[  476.853549]  kthread+0x112/0x130
[  476.856866]  ? kthread_park+0x80/0x80
[  476.860588]  ret_from_fork+0x35/0x40
[  476.864204] ---[ end trace 08fb4fe25cf760b3 ]---
[  476.955547] has_unmovable_pages: pfn:0x10e07a00, found:0x1, count:0x0 
[  476.962126] page:ffffea04381e8000 count:1 mapcount:1 mapping:ffff880e0913d899 index:0x7f26ec600 compound_mapcount: 1
[  476.972673] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked)
[  476.979413] raw: 005fffffc0090034 ffffea043c338008 ffffea043f5b0008 ffff880e0913d899
[  476.987192] raw: 00000007f26ec600 0000000000000000 00000001ffffffff ffff880e74f5c000
[  476.996921] page dumped because: hotplug
[  477.000880] page->mem_cgroup:ffff880e74f5c000
[  477.110154] has_unmovable_pages: pfn:0x10e9ee00, found:0x1, count:0x0 
[  477.118626] page:ffffea043a7b8000 count:1 mapcount:1 mapping:ffff880e0c89c2c1 index:0x7f26e5000 compound_mapcount: 1
[  477.129176] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked)
[  477.135911] raw: 005fffffc0090034 ffffea043b0e0008 ffffea04383e8008 ffff880e0c89c2c1
[  477.143690] raw: 00000007f26e5000 0000000000000000 00000001ffffffff ffff880e74f5c000
[  477.151448] page dumped because: hotplug
[  477.155404] page->mem_cgroup:ffff880e74f5c000
[  477.224784] has_unmovable_pages: pfn:0x10f13600, found:0x1, count:0x0 
[  477.231368] page:ffffea043c4d8000 count:1 mapcount:1 mapping:ffff880e57b7adc1 index:0x7f26e8600 compound_mapcount: 1
[  477.241922] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked)
[  477.250324] raw: 005fffffc0090034 ffffea043af88008 ffffea043cf20508 ffff880e57b7adc1
[  477.258089] raw: 00000007f26e8600 0000000000000000 00000001ffffffff ffff880e74f5c000
[  477.265857] page dumped because: hotplug
[  477.269811] page->mem_cgroup:ffff880e74f5c000
[  477.307236] has_unmovable_pages: pfn:0x10f8da00, found:0x1, count:0x0 
[  477.313807] page:ffffea043e368000 count:1 mapcount:1 mapping:ffff880e75132529 index:0x7f26e1600 compound_mapcount: 1
[  477.324361] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked)
[  477.331096] raw: 005fffffc0090034 ffffea043d2c0008 ffffea043ba40008 ffff880e75132529
[  477.338875] raw: 00000007f26e1600 0000000000000000 00000001ffffffff ffff880e74f5c000
[  477.346635] page dumped because: hotplug
[  477.350590] page->mem_cgroup:ffff880e74f5c000
[  477.380478] has_unmovable_pages: pfn:0x10d87200, found:0x1, count:0x0 
[  477.387060] page:ffffea04361c8000 count:1 mapcount:1 mapping:ffff880e0913d899 index:0x7f26e2400 compound_mapcount: 1
[  477.397610] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked)
[  477.404340] raw: 005fffffc0090034 ffffea0437fb8008 ffffea043cd20008 ffff880e0913d899
[  477.412113] raw: 00000007f26e2400 0000000000000000 00000001ffffffff ffff880e74f5c000
[  477.419870] page dumped because: hotplug
[  477.423842] page->mem_cgroup:ffff880e74f5c000
[  477.435557] memory memory539: Offline failed.
[  489.171077] perf: interrupt took too long (2745 > 2500), lowering kernel.perf_event_max_sample_rate to 72000
[  501.332276] INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.179 msecs
[  511.073564] perf: interrupt took too long (3593 > 3431), lowering kernel.perf_event_max_sample_rate to 55000
[  521.050208] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1.836 msecs
[  521.058339] perf: interrupt took too long (16324 > 4491), lowering kernel.perf_event_max_sample_rate to 12000
Michal Hocko Nov. 5, 2018, 12:38 p.m. UTC | #8
On Mon 05-11-18 18:25:20, Baoquan He wrote:
> Hi Michal,
> 
> On 11/05/18 at 10:28am, Michal Hocko wrote:
> > 
> > Or something like this. Ugly as hell, no question about that. I also
> > have to think about this some more to convince myself this will not
> > result in an endless loop under some situations.
> 
> It failed. Paste the log and patch diff here, please help check if I made
> any mistake on manual code change. The log is at bottom.

The retry patch is obviously still racy, it just makes the race window
slightly smaller and I hoped it would catch most of those races but this
is obviously not the case.

I was thinking about your MIGRATE_MOVABLE check some more and I still do
not like it much, we just change migrate type at many places and I have
hard time to actually see this is always safe wrt. to what we need here.

We should be able to restore the zone type check though. The
primary problem fixed by 15c30bc09085 ("mm, memory_hotplug: make
has_unmovable_pages more robust") was that early allocations made it to
the zone_movable range. If we add the check _after_ the PageReserved()
check then we should be able to rule all bootmem allocation out.

So what about the following (on top of the previous patch which makes
sense on its own I believe).


From d7ffd1342529c892f1de8999c3a5609211599c9d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Mon, 5 Nov 2018 13:28:51 +0100
Subject: [PATCH] mm, memory_hotplug: check zone_movable in has_unmovable_pages

Page state checks are racy. Under a heavy memory workload (e.g. stress
-m 200 -t 2h) it is quite easy to hit a race window when the page is
allocated but its state is not fully populated yet. A debugging patch to
dump the struct page state shows
: [  476.575516] has_unmovable_pages: pfn:0x10dfec00, found:0x1, count:0x0
: [  476.582103] page:ffffea0437fb0000 count:1 mapcount:1 mapping:ffff880e05239841 index:0x7f26e5000 compound_mapcount: 1
: [  476.592645] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked)

Note that the state has been checked for both PageLRU and PageSwapBacked
already. Closing this race completely would require some sort of retry
logic. This can be tricky and error prone (think of potential endless
or long taking loops).

Workaround this problem for movable zones at least. Such a zone should
only contain movable pages. 15c30bc09085 ("mm, memory_hotplug: make
has_unmovable_pages more robust") has told us that this is not strictly
true though. Bootmem pages should be marked reserved though so we can
move the original check after the PageReserved check. Pages from other
zones are still prone to races but we even do not pretend that memory
hotremove works for those so pre-mature failure doesn't hurt that much.

Reported-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48ceda313332..5b64c5bc6ea0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7788,6 +7788,14 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 		if (PageReserved(page))
 			goto unmovable;
 
+		/*
+		 * If the zone is movable and we have ruled out all reserved
+		 * pages then it should be reasonably safe to assume the rest
+		 * is movable.
+		 */
+		if (zone_idx(zone) == ZONE_MOVABLE)
+			continue;
+
 		/*
 		 * Hugepages are not in LRU lists, but they're movable.
 		 * We need not scan over tail pages bacause we don't
Baoquan He Nov. 5, 2018, 2:23 p.m. UTC | #9
On 11/05/18 at 01:38pm, Michal Hocko wrote:
> On Mon 05-11-18 18:25:20, Baoquan He wrote:
> > Hi Michal,
> > 
> > On 11/05/18 at 10:28am, Michal Hocko wrote:
> > > 
> > > Or something like this. Ugly as hell, no question about that. I also
> > > have to think about this some more to convince myself this will not
> > > result in an endless loop under some situations.
> > 
> > It failed. Paste the log and patch diff here, please help check if I made
> > any mistake on manual code change. The log is at bottom.
> 
> The retry patch is obviously still racy, it just makes the race window
> slightly smaller and I hoped it would catch most of those races but this
> is obviously not the case.
> 
> I was thinking about your MIGRATE_MOVABLE check some more and I still do
> not like it much, we just change migrate type at many places and I have
> hard time to actually see this is always safe wrt. to what we need here.
> 
> We should be able to restore the zone type check though. The
> primary problem fixed by 15c30bc09085 ("mm, memory_hotplug: make
> has_unmovable_pages more robust") was that early allocations made it to
> the zone_movable range. If we add the check _after_ the PageReserved()
> check then we should be able to rule all bootmem allocation out.
> 
> So what about the following (on top of the previous patch which makes
> sense on its own I believe).

Yes, I think this looks very reasonable and should be robust.

Have tested it, hot removing 4 hotpluggable nodes continusously
succeeds, and then hot adding them back, still works well.

So please feel free to add my Tested-by or Acked-by.

Tested-by: Baoquan He <bhe@redhat.com>
or
Acked-by: Baoquan He <bhe@redhat.com>

Thanks, Michal.
> 
> 
> From d7ffd1342529c892f1de8999c3a5609211599c9d Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Mon, 5 Nov 2018 13:28:51 +0100
> Subject: [PATCH] mm, memory_hotplug: check zone_movable in has_unmovable_pages
> 
> Page state checks are racy. Under a heavy memory workload (e.g. stress
> -m 200 -t 2h) it is quite easy to hit a race window when the page is
> allocated but its state is not fully populated yet. A debugging patch to
> dump the struct page state shows
> : [  476.575516] has_unmovable_pages: pfn:0x10dfec00, found:0x1, count:0x0
> : [  476.582103] page:ffffea0437fb0000 count:1 mapcount:1 mapping:ffff880e05239841 index:0x7f26e5000 compound_mapcount: 1
> : [  476.592645] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked)
> 
> Note that the state has been checked for both PageLRU and PageSwapBacked
> already. Closing this race completely would require some sort of retry
> logic. This can be tricky and error prone (think of potential endless
> or long taking loops).
> 
> Workaround this problem for movable zones at least. Such a zone should
> only contain movable pages. 15c30bc09085 ("mm, memory_hotplug: make
> has_unmovable_pages more robust") has told us that this is not strictly
> true though. Bootmem pages should be marked reserved though so we can
> move the original check after the PageReserved check. Pages from other
> zones are still prone to races but we even do not pretend that memory
> hotremove works for those so pre-mature failure doesn't hurt that much.
> 
> Reported-by: Baoquan He <bhe@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/page_alloc.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 48ceda313332..5b64c5bc6ea0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7788,6 +7788,14 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  		if (PageReserved(page))
>  			goto unmovable;
>  
> +		/*
> +		 * If the zone is movable and we have ruled out all reserved
> +		 * pages then it should be reasonably safe to assume the rest
> +		 * is movable.
> +		 */
> +		if (zone_idx(zone) == ZONE_MOVABLE)
> +			continue;
> +
>  		/*
>  		 * Hugepages are not in LRU lists, but they're movable.
>  		 * We need not scan over tail pages bacause we don't
> -- 
> 2.19.1
> 
> -- 
> Michal Hocko
> SUSE Labs
Michal Hocko Nov. 5, 2018, 5:10 p.m. UTC | #10
On Mon 05-11-18 22:23:08, Baoquan He wrote:
> On 11/05/18 at 01:38pm, Michal Hocko wrote:
> > On Mon 05-11-18 18:25:20, Baoquan He wrote:
> > > Hi Michal,
> > > 
> > > On 11/05/18 at 10:28am, Michal Hocko wrote:
> > > > 
> > > > Or something like this. Ugly as hell, no question about that. I also
> > > > have to think about this some more to convince myself this will not
> > > > result in an endless loop under some situations.
> > > 
> > > It failed. Paste the log and patch diff here, please help check if I made
> > > any mistake on manual code change. The log is at bottom.
> > 
> > The retry patch is obviously still racy, it just makes the race window
> > slightly smaller and I hoped it would catch most of those races but this
> > is obviously not the case.
> > 
> > I was thinking about your MIGRATE_MOVABLE check some more and I still do
> > not like it much, we just change migrate type at many places and I have
> > hard time to actually see this is always safe wrt. to what we need here.
> > 
> > We should be able to restore the zone type check though. The
> > primary problem fixed by 15c30bc09085 ("mm, memory_hotplug: make
> > has_unmovable_pages more robust") was that early allocations made it to
> > the zone_movable range. If we add the check _after_ the PageReserved()
> > check then we should be able to rule all bootmem allocation out.
> > 
> > So what about the following (on top of the previous patch which makes
> > sense on its own I believe).
> 
> Yes, I think this looks very reasonable and should be robust.
> 
> Have tested it, hot removing 4 hotpluggable nodes continusously
> succeeds, and then hot adding them back, still works well.
> 
> So please feel free to add my Tested-by or Acked-by.
> 
> Tested-by: Baoquan He <bhe@redhat.com>
> or
> Acked-by: Baoquan He <bhe@redhat.com>

Thanks for retesting! Does this apply to both patches?
Baoquan He Nov. 6, 2018, 12:22 a.m. UTC | #11
On 11/05/18 at 06:10pm, Michal Hocko wrote:
> On Mon 05-11-18 22:23:08, Baoquan He wrote:
> > On 11/05/18 at 01:38pm, Michal Hocko wrote:
> > > On Mon 05-11-18 18:25:20, Baoquan He wrote:
> > > > Hi Michal,
> > > > 
> > > > On 11/05/18 at 10:28am, Michal Hocko wrote:
> > > > > 
> > > > > Or something like this. Ugly as hell, no question about that. I also
> > > > > have to think about this some more to convince myself this will not
> > > > > result in an endless loop under some situations.
> > > > 
> > > > It failed. Paste the log and patch diff here, please help check if I made
> > > > any mistake on manual code change. The log is at bottom.
> > > 
> > > The retry patch is obviously still racy, it just makes the race window
> > > slightly smaller and I hoped it would catch most of those races but this
> > > is obviously not the case.
> > > 
> > > I was thinking about your MIGRATE_MOVABLE check some more and I still do
> > > not like it much, we just change migrate type at many places and I have
> > > hard time to actually see this is always safe wrt. to what we need here.
> > > 
> > > We should be able to restore the zone type check though. The
> > > primary problem fixed by 15c30bc09085 ("mm, memory_hotplug: make
> > > has_unmovable_pages more robust") was that early allocations made it to
> > > the zone_movable range. If we add the check _after_ the PageReserved()
> > > check then we should be able to rule all bootmem allocation out.
> > > 
> > > So what about the following (on top of the previous patch which makes
> > > sense on its own I believe).
> > 
> > Yes, I think this looks very reasonable and should be robust.
> > 
> > Have tested it, hot removing 4 hotpluggable nodes continusously
> > succeeds, and then hot adding them back, still works well.
> > 
> > So please feel free to add my Tested-by or Acked-by.
> > 
> > Tested-by: Baoquan He <bhe@redhat.com>
> > or
> > Acked-by: Baoquan He <bhe@redhat.com>
> 
> Thanks for retesting! Does this apply to both patches?

Sorry, don't get it. I just applied this on top of linus's tree and
tested. Do you mean applying it on top of previous code change?
Michal Hocko Nov. 6, 2018, 8:28 a.m. UTC | #12
On Tue 06-11-18 08:22:16, Baoquan He wrote:
> On 11/05/18 at 06:10pm, Michal Hocko wrote:
> > On Mon 05-11-18 22:23:08, Baoquan He wrote:
> > > On 11/05/18 at 01:38pm, Michal Hocko wrote:
> > > > On Mon 05-11-18 18:25:20, Baoquan He wrote:
> > > > > Hi Michal,
> > > > > 
> > > > > On 11/05/18 at 10:28am, Michal Hocko wrote:
> > > > > > 
> > > > > > Or something like this. Ugly as hell, no question about that. I also
> > > > > > have to think about this some more to convince myself this will not
> > > > > > result in an endless loop under some situations.
> > > > > 
> > > > > It failed. Paste the log and patch diff here, please help check if I made
> > > > > any mistake on manual code change. The log is at bottom.
> > > > 
> > > > The retry patch is obviously still racy, it just makes the race window
> > > > slightly smaller and I hoped it would catch most of those races but this
> > > > is obviously not the case.
> > > > 
> > > > I was thinking about your MIGRATE_MOVABLE check some more and I still do
> > > > not like it much, we just change migrate type at many places and I have
> > > > hard time to actually see this is always safe wrt. to what we need here.
> > > > 
> > > > We should be able to restore the zone type check though. The
> > > > primary problem fixed by 15c30bc09085 ("mm, memory_hotplug: make
> > > > has_unmovable_pages more robust") was that early allocations made it to
> > > > the zone_movable range. If we add the check _after_ the PageReserved()
> > > > check then we should be able to rule all bootmem allocation out.
> > > > 
> > > > So what about the following (on top of the previous patch which makes
> > > > sense on its own I believe).
> > > 
> > > Yes, I think this looks very reasonable and should be robust.
> > > 
> > > Have tested it, hot removing 4 hotpluggable nodes continusously
> > > succeeds, and then hot adding them back, still works well.
> > > 
> > > So please feel free to add my Tested-by or Acked-by.
> > > 
> > > Tested-by: Baoquan He <bhe@redhat.com>
> > > or
> > > Acked-by: Baoquan He <bhe@redhat.com>
> > 
> > Thanks for retesting! Does this apply to both patches?
> 
> Sorry, don't get it. I just applied this on top of linus's tree and
> tested. Do you mean applying it on top of previous code change?

Yes. While the first patch will obviously not help for movable zone
because the movable check will override any later check it
seems still useful to reduce false positives on normal zones.

Or do you think this is not worth it?
Baoquan He Nov. 6, 2018, 9:16 a.m. UTC | #13
On 11/06/18 at 09:28am, Michal Hocko wrote:
> > > > > > It failed. Paste the log and patch diff here, please help check if I made
> > > > > > any mistake on manual code change. The log is at bottom.
> > > > > 
> > > > > The retry patch is obviously still racy, it just makes the race window
> > > > > slightly smaller and I hoped it would catch most of those races but this
> > > > > is obviously not the case.
> > > > > 
> > > > > I was thinking about your MIGRATE_MOVABLE check some more and I still do
> > > > > not like it much, we just change migrate type at many places and I have
> > > > > hard time to actually see this is always safe wrt. to what we need here.
> > > > > 
> > > > > We should be able to restore the zone type check though. The
> > > > > primary problem fixed by 15c30bc09085 ("mm, memory_hotplug: make
> > > > > has_unmovable_pages more robust") was that early allocations made it to
> > > > > the zone_movable range. If we add the check _after_ the PageReserved()
> > > > > check then we should be able to rule all bootmem allocation out.
> > > > > 
> > > > > So what about the following (on top of the previous patch which makes
> > > > > sense on its own I believe).
> > > > 
> > > > Yes, I think this looks very reasonable and should be robust.
> > > > 
> > > > Have tested it, hot removing 4 hotpluggable nodes continusously
> > > > succeeds, and then hot adding them back, still works well.
> > > > 
> > > > So please feel free to add my Tested-by or Acked-by.
> > > > 
> > > > Tested-by: Baoquan He <bhe@redhat.com>
> > > > or
> > > > Acked-by: Baoquan He <bhe@redhat.com>
> > > 
> > > Thanks for retesting! Does this apply to both patches?
> > 
> > Sorry, don't get it. I just applied this on top of linus's tree and
> > tested. Do you mean applying it on top of previous code change?
> 
> Yes. While the first patch will obviously not help for movable zone
> because the movable check will override any later check it
> seems still useful to reduce false positives on normal zones.

Hmm, I don't know if it will bring a little bit confusion on code
understanding. Since we only recognize the movable zone issue, and I can
only reproduce and verify it on the movable zone issue with the movable
zone check adding.

Not sure if there are any scenario or use cases to cover those newly added
checking other movable zone checking. Surely, I have no objection to
adding them. But the two patches are separate issues, they have no
dependency on each other.

I just tested the movable zone checking yesterday, will add your
previous check back, then test again. I believe the result will be
positive. Will udpate once done.

Thanks
Baoquan
Baoquan He Nov. 6, 2018, 9:36 a.m. UTC | #14
On 11/06/18 at 05:16pm, Baoquan He wrote:
> On 11/06/18 at 09:28am, Michal Hocko wrote:
> > > > > > > It failed. Paste the log and patch diff here, please help check if I made
> > > > > > > any mistake on manual code change. The log is at bottom.
> > > > > > 
> > > > > > The retry patch is obviously still racy, it just makes the race window
> > > > > > slightly smaller and I hoped it would catch most of those races but this
> > > > > > is obviously not the case.
> > > > > > 
> > > > > > I was thinking about your MIGRATE_MOVABLE check some more and I still do
> > > > > > not like it much, we just change migrate type at many places and I have
> > > > > > hard time to actually see this is always safe wrt. to what we need here.
> > > > > > 
> > > > > > We should be able to restore the zone type check though. The
> > > > > > primary problem fixed by 15c30bc09085 ("mm, memory_hotplug: make
> > > > > > has_unmovable_pages more robust") was that early allocations made it to
> > > > > > the zone_movable range. If we add the check _after_ the PageReserved()
> > > > > > check then we should be able to rule all bootmem allocation out.
> > > > > > 
> > > > > > So what about the following (on top of the previous patch which makes
> > > > > > sense on its own I believe).
> > > > > 
> > > > > Yes, I think this looks very reasonable and should be robust.
> > > > > 
> > > > > Have tested it, hot removing 4 hotpluggable nodes continusously
> > > > > succeeds, and then hot adding them back, still works well.
> > > > > 
> > > > > So please feel free to add my Tested-by or Acked-by.
> > > > > 
> > > > > Tested-by: Baoquan He <bhe@redhat.com>
> > > > > or
> > > > > Acked-by: Baoquan He <bhe@redhat.com>
> > > > 
> > > > Thanks for retesting! Does this apply to both patches?
> > > 
> > > Sorry, don't get it. I just applied this on top of linus's tree and
> > > tested. Do you mean applying it on top of previous code change?
> > 
> > Yes. While the first patch will obviously not help for movable zone
> > because the movable check will override any later check it
> > seems still useful to reduce false positives on normal zones.
> 
> Hmm, I don't know if it will bring a little bit confusion on code
> understanding. Since we only recognize the movable zone issue, and I can
> only reproduce and verify it on the movable zone issue with the movable
> zone check adding.
> 
> Not sure if there are any scenario or use cases to cover those newly added
> checking other movable zone checking. Surely, I have no objection to
		^ than
> adding them. But the two patches are separate issues, they have no
> dependency on each other.
> 
> I just tested the movable zone checking yesterday, will add your
> previous check back, then test again. I believe the result will be
> positive. Will udpate once done.
> 
> Thanks
> Baoquan
Michal Hocko Nov. 6, 2018, 9:51 a.m. UTC | #15
On Tue 06-11-18 17:16:24, Baoquan He wrote:
[...]
> Not sure if there are any scenario or use cases to cover those newly added
> checking other movable zone checking. Surely, I have no objection to
> adding them. But the two patches are separate issues, they have no
> dependency on each other.

Yes that is correct. I will drop those additional checks for now. Let's
see if we need them later.

> I just tested the movable zone checking yesterday, will add your
> previous check back, then test again. I believe the result will be
> positive. Will udpate once done.

THere is no need to retest with that patch for your movable node setup.
Baoquan He Nov. 6, 2018, 10 a.m. UTC | #16
On 11/06/18 at 10:51am, Michal Hocko wrote:
> > I just tested the movable zone checking yesterday, will add your
> > previous check back, then test again. I believe the result will be
> > positive. Will udpate once done.
> 
> THere is no need to retest with that patch for your movable node setup.

OK, thanks.
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 863d46da6586..48ceda313332 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7824,8 +7824,22 @@  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 		if (__PageMovable(page))
 			continue;
 
-		if (!PageLRU(page))
-			found++;
+		if (PageLRU(page))
+			continue;
+
+		/*
+		 * Some LRU pages might be temporarily off-LRU for all
+		 * sort of different reasons - reclaim, migration,
+		 * per-cpu LRU caches etc.
+		 * Make sure we do not consider those pages to be unmovable.
+		 */
+		if (PageSwapBacked(page))
+			continue;
+
+		if (page->mapping && page->mapping->a_ops &&
+				page->mapping->a_ops->migratepage)
+			continue;
+
 		/*
 		 * If there are RECLAIMABLE pages, we need to check
 		 * it.  But now, memory offline itself doesn't call
@@ -7839,7 +7853,7 @@  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 		 * is set to both of a memory hole page and a _used_ kernel
 		 * page at boot.
 		 */
-		if (found > count)
+		if (++found > count)
 			goto unmovable;
 	}
 	return false;