diff mbox series

mm/rmap: fix the handling of !private device page in try_to_unmap_one()

Message ID 1585750678-5840-1-git-send-email-kernelfans@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm/rmap: fix the handling of !private device page in try_to_unmap_one() | expand

Commit Message

Pingfan Liu April 1, 2020, 2:17 p.m. UTC
This patch is a pure code refinement without any functional changes.

try_to_unmap_one() is shared by try_to_unmap() and try_to_munlock(). As for
unmap, if try_to_unmap_one() return true, it means the pte has been teared
down and mapcount dec. Apparently the current code

        if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
            is_zone_device_page(page) && !is_device_private_page(page))
               return true;

conflicts with this logic.

Further more, as for zone_device, migration can only happen on
is_device_private_page(page). For other zone_device, memmap_init_zone_device()
raises an extra _refcount on all zone pages. This extra _refcount will block
migration. So in try_to_unmap_one(), it can just return false for other zone
device.

The reason why original code happen to work one !private zone_device.
-1. if page mapped, then try_to_unmap_one()->page_remove_rmap() is skipped, and
finally try_to_unmap(){ return !page_mapcount(page) ? true : false;} will
return false.
-2. if page not mapped, the extra _refcount will prevent the migration.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
To: linux-mm@kvack.org
---
v1 -> v2: improve commit log and note in code
 mm/rmap.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--
2.7.5

Comments

Michal Hocko April 1, 2020, 3:58 p.m. UTC | #1
On Wed 01-04-20 22:17:58, Pingfan Liu wrote:
> This patch is a pure code refinement without any functional changes.
> 
> try_to_unmap_one() is shared by try_to_unmap() and try_to_munlock(). As for
> unmap, if try_to_unmap_one() return true, it means the pte has been teared
> down and mapcount dec.

I haven't really checked the full history of the rmap walk but this is
certainly not the currently implemented semantic of this callback.
Returing true only tells the caller that it should continue with other
VMAs which map the given page. It doesn't really mean that the pte has
been torn down. The munlock case is a nice example of how that is use
properly while migration path for device pages how it is used
incorrectly because it doesn't make any sense to walk other VMAs because
is_device_private_page is a property of the page not the VMA. And that
is the only reason to drop that.

> Apparently the current code
> 
>         if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
>             is_zone_device_page(page) && !is_device_private_page(page))
>                return true;
> 
> conflicts with this logic.
> 
> Further more, as for zone_device, migration can only happen on
> is_device_private_page(page). For other zone_device, memmap_init_zone_device()
> raises an extra _refcount on all zone pages. This extra _refcount will block
> migration. So in try_to_unmap_one(), it can just return false for other zone
> device.
> 
> The reason why original code happen to work one !private zone_device.
> -1. if page mapped, then try_to_unmap_one()->page_remove_rmap() is skipped, and
> finally try_to_unmap(){ return !page_mapcount(page) ? true : false;} will
> return false.
> -2. if page not mapped, the extra _refcount will prevent the migration.

And this sounds like a separate change to me worth a patch on its own.

> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> To: linux-mm@kvack.org
> ---
> v1 -> v2: improve commit log and note in code
>  mm/rmap.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index b838647..723af4f 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1358,6 +1358,10 @@ void page_remove_rmap(struct page *page, bool compound)
> 
>  /*
>   * @arg: enum ttu_flags will be passed to this argument
> + *
> + * For munlock, return true if @page is not mlocked by @vma without killing pte
> + * For unmap, return true after tearing down pte.
> + * For both cases, return false if rmap_walk should be stopped.
>   */
>  static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  		     unsigned long address, void *arg)
> @@ -1380,7 +1384,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> 
>  	if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
>  	    is_zone_device_page(page) && !is_device_private_page(page))
> -		return true;
> +		return false;
> 
>  	if (flags & TTU_SPLIT_HUGE_PMD) {
>  		split_huge_pmd_address(vma, address,
> @@ -1487,7 +1491,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> 
>  		if (IS_ENABLED(CONFIG_MIGRATION) &&
>  		    (flags & TTU_MIGRATION) &&
> -		    is_zone_device_page(page)) {
> +		    is_device_private_page(page)) {
>  			swp_entry_t entry;
>  			pte_t swp_pte;
> 
> --
> 2.7.5
Pingfan Liu April 2, 2020, 7:40 a.m. UTC | #2
On Wed, Apr 1, 2020 at 11:58 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 01-04-20 22:17:58, Pingfan Liu wrote:
> > This patch is a pure code refinement without any functional changes.
> >
> > try_to_unmap_one() is shared by try_to_unmap() and try_to_munlock(). As for
> > unmap, if try_to_unmap_one() return true, it means the pte has been teared
> > down and mapcount dec.
>
> I haven't really checked the full history of the rmap walk but this is
> certainly not the currently implemented semantic of this callback.
> Returing true only tells the caller that it should continue with other
> VMAs which map the given page. It doesn't really mean that the pte has
> been torn down. The munlock case is a nice example of how that is use
I did not paste the whole story in commit log, but note them in the code.
For munlock, we only care about the page will be put to correct lru.
But as commit "As for unmap", it should tear down pte, otherwise the
page may be accessed by and old mapping. Also here omit an assumption,
for !private device page, e.g. fs-dax, there is no need to mlock them.
> properly while migration path for device pages how it is used
> incorrectly because it doesn't make any sense to walk other VMAs because
> is_device_private_page is a property of the page not the VMA. And that
> is the only reason to drop that.
>
> > Apparently the current code
> >
> >         if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
> >             is_zone_device_page(page) && !is_device_private_page(page))
> >                return true;
> >
> > conflicts with this logic.
> >
[...]
> >  /*
> >   * @arg: enum ttu_flags will be passed to this argument
> > + *
> > + * For munlock, return true if @page is not mlocked by @vma without killing pte
Here is the note for munlock case

Thanks,
Pingfan
> > + * For unmap, return true after tearing down pte.
> > + * For both cases, return false if rmap_walk should be stopped.
> >   */
> >  static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >                    unsigned long address, void *arg)
> > @@ -1380,7 +1384,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >
> >       if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
> >           is_zone_device_page(page) && !is_device_private_page(page))
> > -             return true;
> > +             return false;
> >
> >       if (flags & TTU_SPLIT_HUGE_PMD) {
> >               split_huge_pmd_address(vma, address,
> > @@ -1487,7 +1491,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >
> >               if (IS_ENABLED(CONFIG_MIGRATION) &&
> >                   (flags & TTU_MIGRATION) &&
> > -                 is_zone_device_page(page)) {
> > +                 is_device_private_page(page)) {
> >                       swp_entry_t entry;
> >                       pte_t swp_pte;
> >
> > --
> > 2.7.5
>
> --
> Michal Hocko
> SUSE Labs
diff mbox series

Patch

diff --git a/mm/rmap.c b/mm/rmap.c
index b838647..723af4f 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1358,6 +1358,10 @@  void page_remove_rmap(struct page *page, bool compound)

 /*
  * @arg: enum ttu_flags will be passed to this argument
+ *
+ * For munlock, return true if @page is not mlocked by @vma without killing pte
+ * For unmap, return true after tearing down pte.
+ * For both cases, return false if rmap_walk should be stopped.
  */
 static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		     unsigned long address, void *arg)
@@ -1380,7 +1384,7 @@  static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,

 	if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
 	    is_zone_device_page(page) && !is_device_private_page(page))
-		return true;
+		return false;

 	if (flags & TTU_SPLIT_HUGE_PMD) {
 		split_huge_pmd_address(vma, address,
@@ -1487,7 +1491,7 @@  static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,

 		if (IS_ENABLED(CONFIG_MIGRATION) &&
 		    (flags & TTU_MIGRATION) &&
-		    is_zone_device_page(page)) {
+		    is_device_private_page(page)) {
 			swp_entry_t entry;
 			pte_t swp_pte;