diff mbox series

[v5,08/14] mm/gup: do not allow zero page for pinned pages

Message ID 20210119043920.155044-9-pasha.tatashin@soleen.com (mailing list archive)
State New
Headers show
Series prohibit pinning pages in ZONE_MOVABLE | expand

Commit Message

Pavel Tatashin Jan. 19, 2021, 4:39 a.m. UTC
Zero page should not be used for long term pinned pages. Once pages
are pinned their physical addresses cannot changed until they are unpinned.

Guarantee to always return real pages when they are pinned by adding
FOLL_WRITE.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 mm/gup.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Jason Gunthorpe Jan. 19, 2021, 6:30 p.m. UTC | #1
On Mon, Jan 18, 2021 at 11:39:14PM -0500, Pavel Tatashin wrote:
> Zero page should not be used for long term pinned pages. Once pages
> are pinned their physical addresses cannot changed until they are unpinned.
> 
> Guarantee to always return real pages when they are pinned by adding
> FOLL_WRITE.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
>  mm/gup.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)

No, this will definitely break things

Why does the zero page have to be movable?

Jason
Pavel Tatashin Jan. 19, 2021, 6:34 p.m. UTC | #2
On Tue, Jan 19, 2021 at 1:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Jan 18, 2021 at 11:39:14PM -0500, Pavel Tatashin wrote:
> > Zero page should not be used for long term pinned pages. Once pages
> > are pinned their physical addresses cannot changed until they are unpinned.
> >
> > Guarantee to always return real pages when they are pinned by adding
> > FOLL_WRITE.
> >
> > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> > ---
> >  mm/gup.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
>
> No, this will definitely break things

What will break

>
> Why does the zero page have to be movable?

It is not even about being movable, we can't cow pinned pages returned
by GUP call, how can we use zero page for that?

>
> Jason
Jason Gunthorpe Jan. 19, 2021, 6:47 p.m. UTC | #3
On Tue, Jan 19, 2021 at 01:34:26PM -0500, Pavel Tatashin wrote:
> On Tue, Jan 19, 2021 at 1:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Mon, Jan 18, 2021 at 11:39:14PM -0500, Pavel Tatashin wrote:
> > > Zero page should not be used for long term pinned pages. Once pages
> > > are pinned their physical addresses cannot changed until they are unpinned.
> > >
> > > Guarantee to always return real pages when they are pinned by adding
> > > FOLL_WRITE.
> > >
> > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> > >  mm/gup.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > No, this will definitely break things
> 
> What will break

Things assuming GUP doesn't break COW, making all GUP WRITE was
already tried and revered for some other reason

> > Why does the zero page have to be movable?
> 
> It is not even about being movable, we can't cow pinned pages returned
> by GUP call, how can we use zero page for that?

The zero page is always zero, it is never written to. What does cow
matter?

Jason
Pavel Tatashin Jan. 19, 2021, 8:14 p.m. UTC | #4
On Tue, Jan 19, 2021 at 1:47 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jan 19, 2021 at 01:34:26PM -0500, Pavel Tatashin wrote:
> > On Tue, Jan 19, 2021 at 1:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Mon, Jan 18, 2021 at 11:39:14PM -0500, Pavel Tatashin wrote:
> > > > Zero page should not be used for long term pinned pages. Once pages
> > > > are pinned their physical addresses cannot changed until they are unpinned.
> > > >
> > > > Guarantee to always return real pages when they are pinned by adding
> > > > FOLL_WRITE.
> > > >
> > > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> > > >  mm/gup.c | 10 +++++++++-
> > > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > No, this will definitely break things
> >
> > What will break
>
> Things assuming GUP doesn't break COW, making all GUP WRITE was
> already tried and revered for some other reason
>
> > > Why does the zero page have to be movable?
> >
> > It is not even about being movable, we can't cow pinned pages returned
> > by GUP call, how can we use zero page for that?
>
> The zero page is always zero, it is never written to. What does cow
> matter?

Hi Jason,

I was thinking about a use case where userland would pin an address
without FOLL_WRITE, because the PTE for that address is not going to
be writable, but some device via DMA will write to it. Now, if we got
a zero page we have a problem... If this usecase is not valid then the
fix for movable zero page is make the zero page always come from a
non-movable zone so we do not need to isolate it during migration, and
so the memory can be offlined later.

Pasha

>
> Jason
Pavel Tatashin Jan. 19, 2021, 8:48 p.m. UTC | #5
> I was thinking about a use case where userland would pin an address
> without FOLL_WRITE, because the PTE for that address is not going to
> be writable, but some device via DMA will write to it. Now, if we got
> a zero page we have a problem... If this usecase is not valid then the
> fix for movable zero page is make the zero page always come from a
> non-movable zone so we do not need to isolate it during migration, and
> so the memory can be offlined later.

I looked into making zero_page non-movable, and I am confused here.

huge zero page is already not movable:
get_huge_zero_page()
   zero_page = alloc_pages((GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE, ...

Base zero page can be in a movable zone, which is a bug: if there are
references to zero page, that page cannot be migrated, and we won't be
hot-remove memory area where that page is located. On x86, zero page
should always come from the bottom 4G of physical memory / DMA32 ZONE.

However, I see that sometimes it is not (I reproduce in QEMU emulator):
QEMU instance with 16G of memory and kernelcore=5G

Boot#1:
zero_pfn 48a8d
zero_pfn zone: ZONE_DMA32

Boot#2:
zero_pfn 20168d
zero_pfn zone: ZONE_MOVABLE (???)

The problem is that the x86 zero page comes from the .bss segment:
https://soleen.com/source/xref/linux/arch/x86/kernel/head_64.S?r=31d85460#583

Which, I thought would always be set within the first 4G of physical
memory. What is going on here?

Pasha
Jason Gunthorpe Jan. 19, 2021, 11:35 p.m. UTC | #6
On Tue, Jan 19, 2021 at 03:14:04PM -0500, Pavel Tatashin wrote:

> I was thinking about a use case where userland would pin an address
> without FOLL_WRITE, because the PTE for that address is not going to
> be writable, but some device via DMA will write to it.

That would be a serious bug in the get_user_pages caller to write to a
page without using FOLL_WRITE

Jason
diff mbox series

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 857b273e32ac..9a817652f501 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1668,8 +1668,16 @@  static long __gup_longterm_locked(struct mm_struct *mm,
 	unsigned long flags = 0;
 	long rc;
 
-	if (gup_flags & FOLL_LONGTERM)
+	if (gup_flags & FOLL_LONGTERM) {
+		/*
+		 * We are long term pinning pages and their PA's should not
+		 * change until unpinned. Without FOLL_WRITE we might get zero
+		 * page which we do not want. Force creating normal
+		 * pages by adding FOLL_WRITE.
+		 */
+		gup_flags |= FOLL_WRITE;
 		flags = memalloc_pin_save();
+	}
 
 	rc = __get_user_pages_locked(mm, start, nr_pages, pages, vmas, NULL,
 				     gup_flags);