diff mbox series

[mmotm] mm/munlock: mlock_vma_folio() check against VM_SPECIAL

Message ID 9b95d366-1719-f8e2-a5a3-429f9e808288@google.com (mailing list archive)
State New
Headers show
Series [mmotm] mm/munlock: mlock_vma_folio() check against VM_SPECIAL | expand

Commit Message

Hugh Dickins March 3, 2022, 1:35 a.m. UTC
Although mmap_region() and mlock_fixup() take care that VM_LOCKED
is never left set on a VM_SPECIAL vma, there is an interval while
file->f_op->mmap() is using vm_insert_page(s), when VM_LOCKED may
still be set while VM_SPECIAL bits are added: so mlock_vma_folio()
should ignore VM_LOCKED while any VM_SPECIAL bits are set.

This showed up as a "Bad page" still mlocked, when vfree()ing pages
which had been vm_inserted by remap_vmalloc_range_partial(): while
release_pages() and __page_cache_release(), and so put_page(), catch
pages still mlocked when freeing (and clear_page_mlock() caught them
when unmapping), the vfree() path is unprepared for them: fix it?
but these pages should not have been mlocked in the first place.

I assume that an mlockall(MCL_FUTURE) had been done in the past; or
maybe the user got to specify MAP_LOCKED on a vmalloc'ing driver mmap.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
Diffed against top of next-20220301 or mmotm 2022-02-28-14-45.
This patch really belongs as a fix to the mm/munlock series in
Matthew's tree, so he might like to take it in there (but the patch
here is the foliated version, so easiest to place it after foliation).

 mm/internal.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Matthew Wilcox March 3, 2022, 2:23 p.m. UTC | #1
On Wed, Mar 02, 2022 at 05:35:30PM -0800, Hugh Dickins wrote:
> Although mmap_region() and mlock_fixup() take care that VM_LOCKED
> is never left set on a VM_SPECIAL vma, there is an interval while
> file->f_op->mmap() is using vm_insert_page(s), when VM_LOCKED may
> still be set while VM_SPECIAL bits are added: so mlock_vma_folio()
> should ignore VM_LOCKED while any VM_SPECIAL bits are set.
> 
> This showed up as a "Bad page" still mlocked, when vfree()ing pages
> which had been vm_inserted by remap_vmalloc_range_partial(): while
> release_pages() and __page_cache_release(), and so put_page(), catch
> pages still mlocked when freeing (and clear_page_mlock() caught them
> when unmapping), the vfree() path is unprepared for them: fix it?
> but these pages should not have been mlocked in the first place.
> 
> I assume that an mlockall(MCL_FUTURE) had been done in the past; or
> maybe the user got to specify MAP_LOCKED on a vmalloc'ing driver mmap.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> Diffed against top of next-20220301 or mmotm 2022-02-28-14-45.
> This patch really belongs as a fix to the mm/munlock series in
> Matthew's tree, so he might like to take it in there (but the patch
> here is the foliated version, so easiest to place it after foliation).

It looks like it fixes "mm/munlock: mlock_pte_range() when mlocking or
munlocking", so I'll fold it into that patch?

>  mm/internal.h | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -421,8 +421,15 @@ extern int mlock_future_check(struct mm_struct *mm, unsigned long flags,
>  static inline void mlock_vma_folio(struct folio *folio,
>  			struct vm_area_struct *vma, bool compound)
>  {
> -	/* VM_IO check prevents migration from double-counting during mlock */
> -	if (unlikely((vma->vm_flags & (VM_LOCKED|VM_IO)) == VM_LOCKED) &&
> +	/*
> +	 * The VM_SPECIAL check here serves two purposes.
> +	 * 1) VM_IO check prevents migration from double-counting during mlock.
> +	 * 2) Although mmap_region() and mlock_fixup() take care that VM_LOCKED
> +	 *    is never left set on a VM_SPECIAL vma, there is an interval while
> +	 *    file->f_op->mmap() is using vm_insert_page(s), when VM_LOCKED may
> +	 *    still be set while VM_SPECIAL bits are added: so ignore it then.
> +	 */
> +	if (unlikely((vma->vm_flags & (VM_LOCKED|VM_SPECIAL)) == VM_LOCKED) &&
>  	    (compound || !folio_test_large(folio)))
>  		mlock_folio(folio);
>  }
Hugh Dickins March 3, 2022, 5:25 p.m. UTC | #2
On Thu, 3 Mar 2022, Matthew Wilcox wrote:
> On Wed, Mar 02, 2022 at 05:35:30PM -0800, Hugh Dickins wrote:
> > Although mmap_region() and mlock_fixup() take care that VM_LOCKED
> > is never left set on a VM_SPECIAL vma, there is an interval while
> > file->f_op->mmap() is using vm_insert_page(s), when VM_LOCKED may
> > still be set while VM_SPECIAL bits are added: so mlock_vma_folio()
> > should ignore VM_LOCKED while any VM_SPECIAL bits are set.
> > 
> > This showed up as a "Bad page" still mlocked, when vfree()ing pages
> > which had been vm_inserted by remap_vmalloc_range_partial(): while
> > release_pages() and __page_cache_release(), and so put_page(), catch
> > pages still mlocked when freeing (and clear_page_mlock() caught them
> > when unmapping), the vfree() path is unprepared for them: fix it?
> > but these pages should not have been mlocked in the first place.
> > 
> > I assume that an mlockall(MCL_FUTURE) had been done in the past; or
> > maybe the user got to specify MAP_LOCKED on a vmalloc'ing driver mmap.
> > 
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > ---
> > Diffed against top of next-20220301 or mmotm 2022-02-28-14-45.
> > This patch really belongs as a fix to the mm/munlock series in
> > Matthew's tree, so he might like to take it in there (but the patch
> > here is the foliated version, so easiest to place it after foliation).
> 
> It looks like it fixes "mm/munlock: mlock_pte_range() when mlocking or
> munlocking", so I'll fold it into that patch?

No and yes.

That's great if you're prepared to move it back before the foliation.
I think that just involves editing every "folio" to "page", including
in the title - I very nearly sent it out with mlock_vma_page() in title.

But I would prefer it to remain as a separate fix at the end of the
mm/munlock series: this case is too unusual, and only a "Bad page",
to mess with bisection prospects; and it's addressing an entirely
different issue from what the "mlock_pte_range()..." is dealing with.
Each of them needs its own explanation.

So I would prefer it as a separate fix about "page"s, on top of the
mm/munlock series, and you then adjust your foliation commit accordingly.

Thank you: this is what I really wanted, but was afraid to ask of you
(and of course, other fixes may turn out to be required, too late to
adjust across the page<->folio barrier in this way: so it's nice to
be able to do it this way, but rather beyond the call of duty).

Hugh
Matthew Wilcox March 3, 2022, 5:52 p.m. UTC | #3
On Thu, Mar 03, 2022 at 09:25:47AM -0800, Hugh Dickins wrote:
> On Thu, 3 Mar 2022, Matthew Wilcox wrote:
> > It looks like it fixes "mm/munlock: mlock_pte_range() when mlocking or
> > munlocking", so I'll fold it into that patch?
> 
> No and yes.
> 
> That's great if you're prepared to move it back before the foliation.
> I think that just involves editing every "folio" to "page", including
> in the title - I very nearly sent it out with mlock_vma_page() in title.

Thanks for the reminder to do the title and changelog ;-)

> But I would prefer it to remain as a separate fix at the end of the
> mm/munlock series: this case is too unusual, and only a "Bad page",
> to mess with bisection prospects; and it's addressing an entirely
> different issue from what the "mlock_pte_range()..." is dealing with.
> Each of them needs its own explanation.
> 
> So I would prefer it as a separate fix about "page"s, on top of the
> mm/munlock series, and you then adjust your foliation commit accordingly.

Done!  Pushed out.

> Thank you: this is what I really wanted, but was afraid to ask of you
> (and of course, other fixes may turn out to be required, too late to
> adjust across the page<->folio barrier in this way: so it's nice to
> be able to do it this way, but rather beyond the call of duty).

Yes, once it's landed in Linus' tree, it's too late to edit, but I'm
willing to insert patches like this.  It's a minor adjustment to my
patch on top of it.
diff mbox series

Patch

--- a/mm/internal.h
+++ b/mm/internal.h
@@ -421,8 +421,15 @@  extern int mlock_future_check(struct mm_struct *mm, unsigned long flags,
 static inline void mlock_vma_folio(struct folio *folio,
 			struct vm_area_struct *vma, bool compound)
 {
-	/* VM_IO check prevents migration from double-counting during mlock */
-	if (unlikely((vma->vm_flags & (VM_LOCKED|VM_IO)) == VM_LOCKED) &&
+	/*
+	 * The VM_SPECIAL check here serves two purposes.
+	 * 1) VM_IO check prevents migration from double-counting during mlock.
+	 * 2) Although mmap_region() and mlock_fixup() take care that VM_LOCKED
+	 *    is never left set on a VM_SPECIAL vma, there is an interval while
+	 *    file->f_op->mmap() is using vm_insert_page(s), when VM_LOCKED may
+	 *    still be set while VM_SPECIAL bits are added: so ignore it then.
+	 */
+	if (unlikely((vma->vm_flags & (VM_LOCKED|VM_SPECIAL)) == VM_LOCKED) &&
 	    (compound || !folio_test_large(folio)))
 		mlock_folio(folio);
 }