diff mbox series

[3/5] userfualtfd: Replace lru_cache functions with folio_add functions

Message ID 20221101175326.13265-4-vishal.moola@gmail.com (mailing list archive)
State New
Headers show
Series Removing the lru_cache_add() wrapper | expand

Commit Message

Vishal Moola Nov. 1, 2022, 5:53 p.m. UTC
Replaces lru_cache_add() and lru_cache_add_inactive_or_unevictable()
with folio_add_lru() and folio_add_lru_vma(). This is in preparation for
the removal of lru_cache_add().

Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
 mm/userfaultfd.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Matthew Wilcox Nov. 1, 2022, 6:31 p.m. UTC | #1
On Tue, Nov 01, 2022 at 10:53:24AM -0700, Vishal Moola (Oracle) wrote:
> Replaces lru_cache_add() and lru_cache_add_inactive_or_unevictable()
> with folio_add_lru() and folio_add_lru_vma(). This is in preparation for
> the removal of lru_cache_add().

Ummmmm.  Reviewing this patch reveals a bug (not introduced by your
patch).  Look:

mfill_atomic_install_pte:
        bool page_in_cache = page->mapping;

mcontinue_atomic_pte:
        ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC);
...
        page = folio_file_page(folio, pgoff);
        ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
                                       page, false, wp_copy);

That says pretty plainly that mfill_atomic_install_pte() can be passed
a tail page from shmem, and if it is ...

        if (page_in_cache) {
...
        } else {
                page_add_new_anon_rmap(page, dst_vma, dst_addr);
                lru_cache_add_inactive_or_unevictable(page, dst_vma);
        }

it'll get put on the rmap as an anon page!

> Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> ---
>  mm/userfaultfd.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index e24e8a47ce8a..2560973b00d8 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -66,6 +66,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
>  	bool vm_shared = dst_vma->vm_flags & VM_SHARED;
>  	bool page_in_cache = page->mapping;
>  	spinlock_t *ptl;
> +	struct folio *folio;
>  	struct inode *inode;
>  	pgoff_t offset, max_off;
>  
> @@ -113,14 +114,15 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
>  	if (!pte_none_mostly(*dst_pte))
>  		goto out_unlock;
>  
> +	folio = page_folio(page);
>  	if (page_in_cache) {
>  		/* Usually, cache pages are already added to LRU */
>  		if (newly_allocated)
> -			lru_cache_add(page);
> +			folio_add_lru(folio);
>  		page_add_file_rmap(page, dst_vma, false);
>  	} else {
>  		page_add_new_anon_rmap(page, dst_vma, dst_addr);
> -		lru_cache_add_inactive_or_unevictable(page, dst_vma);
> +		folio_add_lru_vma(folio, dst_vma);
>  	}
>  
>  	/*
> -- 
> 2.38.1
> 
>
Peter Xu Nov. 2, 2022, 7:02 p.m. UTC | #2
On Tue, Nov 01, 2022 at 06:31:26PM +0000, Matthew Wilcox wrote:
> On Tue, Nov 01, 2022 at 10:53:24AM -0700, Vishal Moola (Oracle) wrote:
> > Replaces lru_cache_add() and lru_cache_add_inactive_or_unevictable()
> > with folio_add_lru() and folio_add_lru_vma(). This is in preparation for
> > the removal of lru_cache_add().
> 
> Ummmmm.  Reviewing this patch reveals a bug (not introduced by your
> patch).  Look:
> 
> mfill_atomic_install_pte:
>         bool page_in_cache = page->mapping;
> 
> mcontinue_atomic_pte:
>         ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC);
> ...
>         page = folio_file_page(folio, pgoff);
>         ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr,
>                                        page, false, wp_copy);
> 
> That says pretty plainly that mfill_atomic_install_pte() can be passed
> a tail page from shmem, and if it is ...
> 
>         if (page_in_cache) {
> ...
>         } else {
>                 page_add_new_anon_rmap(page, dst_vma, dst_addr);
>                 lru_cache_add_inactive_or_unevictable(page, dst_vma);
>         }
> 
> it'll get put on the rmap as an anon page!

Hmm yeah.. thanks Matthew!

Does the patch attached look reasonable to you?

Copying Axel too.

> 
> > Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
> > ---
> >  mm/userfaultfd.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > index e24e8a47ce8a..2560973b00d8 100644
> > --- a/mm/userfaultfd.c
> > +++ b/mm/userfaultfd.c
> > @@ -66,6 +66,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> >  	bool vm_shared = dst_vma->vm_flags & VM_SHARED;
> >  	bool page_in_cache = page->mapping;
> >  	spinlock_t *ptl;
> > +	struct folio *folio;
> >  	struct inode *inode;
> >  	pgoff_t offset, max_off;
> >  
> > @@ -113,14 +114,15 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> >  	if (!pte_none_mostly(*dst_pte))
> >  		goto out_unlock;
> >  
> > +	folio = page_folio(page);
> >  	if (page_in_cache) {
> >  		/* Usually, cache pages are already added to LRU */
> >  		if (newly_allocated)
> > -			lru_cache_add(page);
> > +			folio_add_lru(folio);
> >  		page_add_file_rmap(page, dst_vma, false);
> >  	} else {
> >  		page_add_new_anon_rmap(page, dst_vma, dst_addr);
> > -		lru_cache_add_inactive_or_unevictable(page, dst_vma);
> > +		folio_add_lru_vma(folio, dst_vma);
> >  	}
> >  
> >  	/*
> > -- 
> > 2.38.1
> > 
> > 
>
Matthew Wilcox Nov. 2, 2022, 7:21 p.m. UTC | #3
On Wed, Nov 02, 2022 at 03:02:35PM -0400, Peter Xu wrote:
> Does the patch attached look reasonable to you?

Mmm, no.  If the page is in the swap cache, this will be "true".

> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 3d0fef3980b3..650ab6cfd5f4 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -64,7 +64,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
>  	pte_t _dst_pte, *dst_pte;
>  	bool writable = dst_vma->vm_flags & VM_WRITE;
>  	bool vm_shared = dst_vma->vm_flags & VM_SHARED;
> -	bool page_in_cache = page->mapping;
> +	bool page_in_cache = page_mapping(page);

We could do:

	struct page *head = compound_head(page);
	bool page_in_cache = head->mapping && !PageMappingFlags(head);
Peter Xu Nov. 2, 2022, 8:44 p.m. UTC | #4
On Wed, Nov 02, 2022 at 07:21:19PM +0000, Matthew Wilcox wrote:
> On Wed, Nov 02, 2022 at 03:02:35PM -0400, Peter Xu wrote:
> > Does the patch attached look reasonable to you?
> 
> Mmm, no.  If the page is in the swap cache, this will be "true".

It will not happen in practise, right?

I mean, shmem_get_folio() should have done the swap-in, and we should have
the page lock held at the meantime.

For anon, mcopy_atomic_pte() is the only user and it's passing in a newly
allocated page here.

> 
> > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > index 3d0fef3980b3..650ab6cfd5f4 100644
> > --- a/mm/userfaultfd.c
> > +++ b/mm/userfaultfd.c
> > @@ -64,7 +64,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> >  	pte_t _dst_pte, *dst_pte;
> >  	bool writable = dst_vma->vm_flags & VM_WRITE;
> >  	bool vm_shared = dst_vma->vm_flags & VM_SHARED;
> > -	bool page_in_cache = page->mapping;
> > +	bool page_in_cache = page_mapping(page);
> 
> We could do:
> 
> 	struct page *head = compound_head(page);
> 	bool page_in_cache = head->mapping && !PageMappingFlags(head);

Sounds good to me, but it just gets a bit complicated.

If page_mapping() doesn't sound good, how about we just pass that over from
callers?  We only have three, so quite doable too.
Andrew Morton Nov. 2, 2022, 8:47 p.m. UTC | #5
On Wed, 2 Nov 2022 15:02:35 -0400 Peter Xu <peterx@redhat.com> wrote:

> mfill_atomic_install_pte() checks page->mapping to detect whether one page
> is used in the page cache.  However as pointed out by Matthew, the page can
> logically be a tail page rather than always the head in the case of uffd
> minor mode with UFFDIO_CONTINUE.  It means we could wrongly install one pte
> with shmem thp tail page assuming it's an anonymous page.
> 
> It's not that clear even for anonymous page, since normally anonymous pages
> also have page->mapping being setup with the anon vma. It's safe here only
> because the only such caller to mfill_atomic_install_pte() is always
> passing in a newly allocated page (mcopy_atomic_pte()), whose page->mapping
> is not yet setup.  However that's not extremely obvious either.
> 
> For either of above, use page_mapping() instead.
> 
> And this should be stable material.

I added

Fixes: 153132571f02 ("userfaultfd/shmem: support UFFDIO_CONTINUE for shmem")
Axel Rasmussen Nov. 3, 2022, 5:34 p.m. UTC | #6
On Wed, Nov 2, 2022 at 1:44 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Nov 02, 2022 at 07:21:19PM +0000, Matthew Wilcox wrote:
> > On Wed, Nov 02, 2022 at 03:02:35PM -0400, Peter Xu wrote:
> > > Does the patch attached look reasonable to you?
> >
> > Mmm, no.  If the page is in the swap cache, this will be "true".
>
> It will not happen in practise, right?
>
> I mean, shmem_get_folio() should have done the swap-in, and we should have
> the page lock held at the meantime.
>
> For anon, mcopy_atomic_pte() is the only user and it's passing in a newly
> allocated page here.
>
> >
> > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > > index 3d0fef3980b3..650ab6cfd5f4 100644
> > > --- a/mm/userfaultfd.c
> > > +++ b/mm/userfaultfd.c
> > > @@ -64,7 +64,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> > >     pte_t _dst_pte, *dst_pte;
> > >     bool writable = dst_vma->vm_flags & VM_WRITE;
> > >     bool vm_shared = dst_vma->vm_flags & VM_SHARED;
> > > -   bool page_in_cache = page->mapping;
> > > +   bool page_in_cache = page_mapping(page);
> >
> > We could do:
> >
> >       struct page *head = compound_head(page);
> >       bool page_in_cache = head->mapping && !PageMappingFlags(head);
>
> Sounds good to me, but it just gets a bit complicated.
>
> If page_mapping() doesn't sound good, how about we just pass that over from
> callers?  We only have three, so quite doable too.

For what it's worth, I think I like Matthew's version better than the
original patch. This is because, although page_mapping() looks simpler
here, looking into the definition of page_mapping() I feel it's
handling several cases, not all of which are relevant here (or, as
Matthew points out, would actually be wrong if it were possible to
reach those cases here).

It's not clear to me what is meant by "pass that over from callers"?
Do you mean, have callers pass in true/false for page_in_cache
directly?

That could work, but I still think I prefer Matthew's version slightly
better, if only because this function already takes a lot of
arguments.

>
> --
> Peter Xu
>
Peter Xu Nov. 3, 2022, 5:56 p.m. UTC | #7
On Thu, Nov 03, 2022 at 10:34:38AM -0700, Axel Rasmussen wrote:
> On Wed, Nov 2, 2022 at 1:44 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Wed, Nov 02, 2022 at 07:21:19PM +0000, Matthew Wilcox wrote:
> > > On Wed, Nov 02, 2022 at 03:02:35PM -0400, Peter Xu wrote:
> > > > Does the patch attached look reasonable to you?
> > >
> > > Mmm, no.  If the page is in the swap cache, this will be "true".
> >
> > It will not happen in practise, right?
> >
> > I mean, shmem_get_folio() should have done the swap-in, and we should have
> > the page lock held at the meantime.
> >
> > For anon, mcopy_atomic_pte() is the only user and it's passing in a newly
> > allocated page here.
> >
> > >
> > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > > > index 3d0fef3980b3..650ab6cfd5f4 100644
> > > > --- a/mm/userfaultfd.c
> > > > +++ b/mm/userfaultfd.c
> > > > @@ -64,7 +64,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> > > >     pte_t _dst_pte, *dst_pte;
> > > >     bool writable = dst_vma->vm_flags & VM_WRITE;
> > > >     bool vm_shared = dst_vma->vm_flags & VM_SHARED;
> > > > -   bool page_in_cache = page->mapping;
> > > > +   bool page_in_cache = page_mapping(page);
> > >
> > > We could do:
> > >
> > >       struct page *head = compound_head(page);
> > >       bool page_in_cache = head->mapping && !PageMappingFlags(head);
> >
> > Sounds good to me, but it just gets a bit complicated.
> >
> > If page_mapping() doesn't sound good, how about we just pass that over from
> > callers?  We only have three, so quite doable too.
> 
> For what it's worth, I think I like Matthew's version better than the
> original patch. This is because, although page_mapping() looks simpler
> here, looking into the definition of page_mapping() I feel it's
> handling several cases, not all of which are relevant here (or, as
> Matthew points out, would actually be wrong if it were possible to
> reach those cases here).
> 
> It's not clear to me what is meant by "pass that over from callers"?
> Do you mean, have callers pass in true/false for page_in_cache
> directly?

Yes.

> 
> That could work, but I still think I prefer Matthew's version slightly
> better, if only because this function already takes a lot of
> arguments.

IMHO that's not an issue, we can merge them into flags, cleaning things
alongside.

The simplest so far is still just to use page_mapping() to me, but no
strong opinion here.

If to go with Matthew's patch, it'll be great if we can add a comment
showing what we're doing (something like "Unwrapped page_mapping() but
avoid looking into swap cache" would be good enough to me).

Thanks,
diff mbox series

Patch

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index e24e8a47ce8a..2560973b00d8 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -66,6 +66,7 @@  int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 	bool vm_shared = dst_vma->vm_flags & VM_SHARED;
 	bool page_in_cache = page->mapping;
 	spinlock_t *ptl;
+	struct folio *folio;
 	struct inode *inode;
 	pgoff_t offset, max_off;
 
@@ -113,14 +114,15 @@  int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 	if (!pte_none_mostly(*dst_pte))
 		goto out_unlock;
 
+	folio = page_folio(page);
 	if (page_in_cache) {
 		/* Usually, cache pages are already added to LRU */
 		if (newly_allocated)
-			lru_cache_add(page);
+			folio_add_lru(folio);
 		page_add_file_rmap(page, dst_vma, false);
 	} else {
 		page_add_new_anon_rmap(page, dst_vma, dst_addr);
-		lru_cache_add_inactive_or_unevictable(page, dst_vma);
+		folio_add_lru_vma(folio, dst_vma);
 	}
 
 	/*