diff mbox series

[v6,08/14] mm/gup: do not migrate zero page

Message ID 20210120014333.222547-9-pasha.tatashin@soleen.com (mailing list archive)
State New, archived
Headers show
Series prohibit pinning pages in ZONE_MOVABLE | expand

Commit Message

Pasha Tatashin Jan. 20, 2021, 1:43 a.m. UTC
On some platforms ZERO_PAGE(0) might end-up in a movable zone. Do not
migrate zero page in gup during longterm pinning as migration of zero page
is not allowed.

For example, in x86 QEMU with 16G of memory and kernelcore=5G parameter, I
see the following:

Boot#1: zero_pfn  0x48a8d zero_pfn zone: ZONE_DMA32
Boot#2: zero_pfn 0x20168d zero_pfn zone: ZONE_MOVABLE

On x86, empty_zero_page is declared in .bss and depending on the loader
may end up in different physical locations during boots.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 include/linux/mmzone.h | 4 ++++
 mm/gup.c               | 2 ++
 2 files changed, 6 insertions(+)

Comments

Jason Gunthorpe Jan. 20, 2021, 1:14 p.m. UTC | #1
On Tue, Jan 19, 2021 at 08:43:27PM -0500, Pavel Tatashin wrote:
> On some platforms ZERO_PAGE(0) might end-up in a movable zone. Do not
> migrate zero page in gup during longterm pinning as migration of zero page
> is not allowed.
> 
> For example, in x86 QEMU with 16G of memory and kernelcore=5G parameter, I
> see the following:
> 
> Boot#1: zero_pfn  0x48a8d zero_pfn zone: ZONE_DMA32
> Boot#2: zero_pfn 0x20168d zero_pfn zone: ZONE_MOVABLE
> 
> On x86, empty_zero_page is declared in .bss and depending on the loader
> may end up in different physical locations during boots.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
>  include/linux/mmzone.h | 4 ++++
>  mm/gup.c               | 2 ++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fc99e9241846..f67427a8f22b 100644
> +++ b/include/linux/mmzone.h
> @@ -427,6 +427,10 @@ enum zone_type {
>  	 *    techniques might use alloc_contig_range() to hide previously
>  	 *    exposed pages from the buddy again (e.g., to implement some sort
>  	 *    of memory unplug in virtio-mem).
> +	 * 6. ZERO_PAGE(0), kernelcore/movablecore setups might create
> +	 *    situations where ZERO_PAGE(0) which is allocated differently
> +	 *    on different platforms may end up in a movable zone. ZERO_PAGE(0)
> +	 *    cannot be migrated.
>  	 *
>  	 * In general, no unmovable allocations that degrade memory offlining
>  	 * should end up in ZONE_MOVABLE. Allocators (like alloc_contig_range())
> diff --git a/mm/gup.c b/mm/gup.c
> index 857b273e32ac..fdd5cda30a07 100644
> +++ b/mm/gup.c
> @@ -1580,6 +1580,8 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
>  		 * of the CMA zone if possible.
>  		 */
>  		if (is_migrate_cma_page(head)) {
> +			if (is_zero_pfn(page_to_pfn(head)))
> +				continue;

I think you should put this logic in is_pinnable_page()

Jason
Pasha Tatashin Jan. 20, 2021, 2:26 p.m. UTC | #2
On Wed, Jan 20, 2021 at 8:14 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jan 19, 2021 at 08:43:27PM -0500, Pavel Tatashin wrote:
> > On some platforms ZERO_PAGE(0) might end-up in a movable zone. Do not
> > migrate zero page in gup during longterm pinning as migration of zero page
> > is not allowed.
> >
> > For example, in x86 QEMU with 16G of memory and kernelcore=5G parameter, I
> > see the following:
> >
> > Boot#1: zero_pfn  0x48a8d zero_pfn zone: ZONE_DMA32
> > Boot#2: zero_pfn 0x20168d zero_pfn zone: ZONE_MOVABLE
> >
> > On x86, empty_zero_page is declared in .bss and depending on the loader
> > may end up in different physical locations during boots.
> >
> > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> >  include/linux/mmzone.h | 4 ++++
> >  mm/gup.c               | 2 ++
> >  2 files changed, 6 insertions(+)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index fc99e9241846..f67427a8f22b 100644
> > +++ b/include/linux/mmzone.h
> > @@ -427,6 +427,10 @@ enum zone_type {
> >        *    techniques might use alloc_contig_range() to hide previously
> >        *    exposed pages from the buddy again (e.g., to implement some sort
> >        *    of memory unplug in virtio-mem).
> > +      * 6. ZERO_PAGE(0), kernelcore/movablecore setups might create
> > +      *    situations where ZERO_PAGE(0) which is allocated differently
> > +      *    on different platforms may end up in a movable zone. ZERO_PAGE(0)
> > +      *    cannot be migrated.
> >        *
> >        * In general, no unmovable allocations that degrade memory offlining
> >        * should end up in ZONE_MOVABLE. Allocators (like alloc_contig_range())
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 857b273e32ac..fdd5cda30a07 100644
> > +++ b/mm/gup.c
> > @@ -1580,6 +1580,8 @@ static long check_and_migrate_cma_pages(struct mm_struct *mm,
> >                * of the CMA zone if possible.
> >                */
> >               if (is_migrate_cma_page(head)) {
> > +                     if (is_zero_pfn(page_to_pfn(head)))
> > +                             continue;
>
> I think you should put this logic in is_pinnable_page()

I thought about this, and it would code a little cleaner. But, the
reason I did not is because zero_page is perfectly pinnable, it is not
pinnable only when it is in a movable zone (and it should not be in a
movable zones for other reasons as well), but that is another bug that
needs to be resolved, and once that bug is resolved this condition can
be removed from gup migration.

Pasha

>
> Jason
Jason Gunthorpe Jan. 25, 2021, 2:28 p.m. UTC | #3
On Wed, Jan 20, 2021 at 09:26:41AM -0500, Pavel Tatashin wrote:

> I thought about this, and it would code a little cleaner. But, the
> reason I did not is because zero_page is perfectly pinnable, it is not
> pinnable only when it is in a movable zone (and it should not be in a
> movable zones for other reasons as well), but that is another bug that
> needs to be resolved, and once that bug is resolved this condition can
> be removed from gup migration.

My point is you've defined the zero page to be pinnable no matter what
zone it is in, so is_pinnable(zero_page) == true

Jason
Pasha Tatashin Jan. 25, 2021, 3:38 p.m. UTC | #4
On Mon, Jan 25, 2021 at 9:28 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jan 20, 2021 at 09:26:41AM -0500, Pavel Tatashin wrote:
>
> > I thought about this, and it would code a little cleaner. But, the
> > reason I did not is because zero_page is perfectly pinnable, it is not
> > pinnable only when it is in a movable zone (and it should not be in a
> > movable zones for other reasons as well), but that is another bug that
> > needs to be resolved, and once that bug is resolved this condition can
> > be removed from gup migration.
>
> My point is you've defined the zero page to be pinnable no matter what
> zone it is in, so is_pinnable(zero_page) == true

Sure, I will move it inside is_pinnable in the next update.

Thank you,
Pasha

>
> Jason
diff mbox series

Patch

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fc99e9241846..f67427a8f22b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -427,6 +427,10 @@  enum zone_type {
 	 *    techniques might use alloc_contig_range() to hide previously
 	 *    exposed pages from the buddy again (e.g., to implement some sort
 	 *    of memory unplug in virtio-mem).
+	 * 6. ZERO_PAGE(0), kernelcore/movablecore setups might create
+	 *    situations where ZERO_PAGE(0) which is allocated differently
+	 *    on different platforms may end up in a movable zone. ZERO_PAGE(0)
+	 *    cannot be migrated.
 	 *
 	 * In general, no unmovable allocations that degrade memory offlining
 	 * should end up in ZONE_MOVABLE. Allocators (like alloc_contig_range())
diff --git a/mm/gup.c b/mm/gup.c
index 857b273e32ac..fdd5cda30a07 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1580,6 +1580,8 @@  static long check_and_migrate_cma_pages(struct mm_struct *mm,
 		 * of the CMA zone if possible.
 		 */
 		if (is_migrate_cma_page(head)) {
+			if (is_zero_pfn(page_to_pfn(head)))
+				continue;
 			if (PageHuge(head)) {
 				if (!isolate_huge_page(head, &cma_page_list))
 					isolation_error_count++;