diff mbox series

mm/rmap: fix the handling of device private page in try_to_unmap_one()

Message ID 1584885427-4952-1-git-send-email-kernelfans@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm/rmap: fix the handling of device private page in try_to_unmap_one() | expand

Commit Message

Pingfan Liu March 22, 2020, 1:57 p.m. UTC
For zone_device, migration can only happen on is_device_private_page(page).
Correct the logic in try_to_unmap_one().

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
To: linux-mm@kvack.org
---
 mm/rmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--
2.7.5

Comments

Michal Hocko March 23, 2020, 7:34 a.m. UTC | #1
On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> For zone_device, migration can only happen on is_device_private_page(page).
> Correct the logic in try_to_unmap_one().

Maybe it is just me lacking knowledge in the zone_device ZOO. But
this really deserves a much more detailed explanation IMHO. It seems
a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
migration") deliberately made the decision to allow unmapping these
pages? Is the check just wrong, inncomplete? Why?

What is the real user visible problem here?
 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> To: linux-mm@kvack.org
> ---
>  mm/rmap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index b838647..ffadf3e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1380,7 +1380,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> 
>  	if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
>  	    is_zone_device_page(page) && !is_device_private_page(page))
> -		return true;
> +		return false;
> 
>  	if (flags & TTU_SPLIT_HUGE_PMD) {
>  		split_huge_pmd_address(vma, address,
> @@ -1487,7 +1487,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> 
>  		if (IS_ENABLED(CONFIG_MIGRATION) &&
>  		    (flags & TTU_MIGRATION) &&
> -		    is_zone_device_page(page)) {
> +		    is_device_private_page(page)) {
>  			swp_entry_t entry;
>  			pte_t swp_pte;
> 
> --
> 2.7.5
John Hubbard March 23, 2020, 11:32 p.m. UTC | #2
On 3/23/20 12:34 AM, Michal Hocko wrote:
> On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
>> For zone_device, migration can only happen on is_device_private_page(page).
>> Correct the logic in try_to_unmap_one().
> 
> Maybe it is just me lacking knowledge in the zone_device ZOO. But
> this really deserves a much more detailed explanation IMHO. It seems
> a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> migration") deliberately made the decision to allow unmapping these
> pages? Is the check just wrong, inncomplete? Why?
> 
> What is the real user visible problem here?
>   

I was going to guess that someone was having trouble with the behavior on
non-device-private ZONE_DEVICE pages, but...I am also at a loss as to
what triggered this patch. So, I have the exact same questions as Michal,
plus really wondering what tests you are running, and on what hardware config.

It's hard to get the right CC's, but probably Dan Williams and Ralph Campbell
should also be added, if you are sure you need this. (I'm adding them now.)

thanks,
Education Directorate March 24, 2020, 12:04 a.m. UTC | #3
On 23/3/20 12:57 am, Pingfan Liu wrote:
> For zone_device, migration can only happen on is_device_private_page(page).
> Correct the logic in try_to_unmap_one().

!DEVICE_PRIVATE implies it's a cache coherent page and we shouldn't bail
out - Could you clarify what issue your fixing?

Balbir Singh.
Ralph Campbell March 24, 2020, 12:20 a.m. UTC | #4
On 3/23/20 12:34 AM, Michal Hocko wrote:
> On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
>> For zone_device, migration can only happen on is_device_private_page(page).
>> Correct the logic in try_to_unmap_one().
> 
> Maybe it is just me lacking knowledge in the zone_device ZOO. But
> this really deserves a much more detailed explanation IMHO. It seems
> a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> migration") deliberately made the decision to allow unmapping these
> pages? Is the check just wrong, inncomplete? Why?
> 
> What is the real user visible problem here?
>   
>> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
>> Cc: Jérôme Glisse <jglisse@redhat.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: John Hubbard <jhubbard@nvidia.com>
>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> To: linux-mm@kvack.org
>> ---
>>   mm/rmap.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index b838647..ffadf3e 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1380,7 +1380,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>>
>>   	if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
>>   	    is_zone_device_page(page) && !is_device_private_page(page))
>> -		return true;
>> +		return false;

Pages can be mapped in multiple vmas. Returning false here will only break out
of the loop and skip any other vmas mapping this page which is a minor optimization
but shouldn't really affect what try_to_unmap_one() does.

>>   	if (flags & TTU_SPLIT_HUGE_PMD) {
>>   		split_huge_pmd_address(vma, address,
>> @@ -1487,7 +1487,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>>
>>   		if (IS_ENABLED(CONFIG_MIGRATION) &&
>>   		    (flags & TTU_MIGRATION) &&
>> -		    is_zone_device_page(page)) {
>> +		    is_device_private_page(page)) {
>>   			swp_entry_t entry;
>>   			pte_t swp_pte;

Since the page was checked for !device private, this is more clear but shouldn't
change anything.
Pingfan Liu March 24, 2020, 3:47 a.m. UTC | #5
On Mon, Mar 23, 2020 at 3:34 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> > For zone_device, migration can only happen on is_device_private_page(page).
> > Correct the logic in try_to_unmap_one().
>
> Maybe it is just me lacking knowledge in the zone_device ZOO. But
> this really deserves a much more detailed explanation IMHO. It seems
> a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> migration") deliberately made the decision to allow unmapping these
> pages? Is the check just wrong, inncomplete? Why?
I am not quite sure about zone_device, but I will try to explain it later.

But first of all, I think the code conflicts with the logic behind it.
If try_to_unmap_one() success to unmap a page, then it should kill the
pte, and return true. But the original code return true before the
code like "ptep_clear_flush()"

Now, I try to say about !device_private zone device. (Please pardon
and correct me if I make a mistake)
memmap_init_zone_device() raises an extra _refcount on all zone
device. And private-device should lifts the count later, otherwise it
can not migrate. But I did not find the exact place yet.

While this extra _refcount will block migration, it is not the whole
reason if a zone device page is mapped.

If  a zone device page is mapped, then I think the original code
happen to work due to it skip the call of page_remove_rmap(), and in
try_to_unmap(){ return !page_mapcount(page) ? true : false;}.
>
> What is the real user visible problem here?
As explained, the original code happens to work, but it conflicts with
the logic.

Thanks,
Pingfan
Pingfan Liu March 24, 2020, 3:50 a.m. UTC | #6
On Tue, Mar 24, 2020 at 7:32 AM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 3/23/20 12:34 AM, Michal Hocko wrote:
> > On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> >> For zone_device, migration can only happen on is_device_private_page(page).
> >> Correct the logic in try_to_unmap_one().
> >
> > Maybe it is just me lacking knowledge in the zone_device ZOO. But
> > this really deserves a much more detailed explanation IMHO. It seems
> > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> > migration") deliberately made the decision to allow unmapping these
> > pages? Is the check just wrong, inncomplete? Why?
> >
> > What is the real user visible problem here?
> >
>
> I was going to guess that someone was having trouble with the behavior on
> non-device-private ZONE_DEVICE pages, but...I am also at a loss as to
> what triggered this patch. So, I have the exact same questions as Michal,
> plus really wondering what tests you are running, and on what hardware config.
:), please see my reply to Michal.
>
> It's hard to get the right CC's, but probably Dan Williams and Ralph Campbell
> should also be added, if you are sure you need this. (I'm adding them now.)
Appreciate you to do that. In fact, I am not sure about some detail,
and hope someone can correct me if I make mistake.

Thanks,
Pingfan
Pingfan Liu March 24, 2020, 3:55 a.m. UTC | #7
On Tue, Mar 24, 2020 at 8:04 AM Balbir Singh <bsingharora@gmail.com> wrote:
>
> On 23/3/20 12:57 am, Pingfan Liu wrote:
> > For zone_device, migration can only happen on is_device_private_page(page).
> > Correct the logic in try_to_unmap_one().
>
> !DEVICE_PRIVATE implies it's a cache coherent page and we shouldn't bail
> out - Could you clarify what issue your fixing?
As I replied to Michal, I tried to fix the code logic, but not really hit a bug.

Thanks,
Pingfan
Pingfan Liu March 24, 2020, 4:21 a.m. UTC | #8
Thanks for your explanation.

On Tue, Mar 24, 2020 at 8:20 AM Ralph Campbell <rcampbell@nvidia.com> wrote:
>
>
> On 3/23/20 12:34 AM, Michal Hocko wrote:
> > On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> >> For zone_device, migration can only happen on is_device_private_page(page).
> >> Correct the logic in try_to_unmap_one().
> >
> > Maybe it is just me lacking knowledge in the zone_device ZOO. But
> > this really deserves a much more detailed explanation IMHO. It seems
> > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> > migration") deliberately made the decision to allow unmapping these
> > pages? Is the check just wrong, inncomplete? Why?
> >
> > What is the real user visible problem here?
> >
> >> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> >> Cc: Jérôme Glisse <jglisse@redhat.com>
> >> Cc: Michal Hocko <mhocko@kernel.org>
> >> Cc: John Hubbard <jhubbard@nvidia.com>
> >> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> To: linux-mm@kvack.org
> >> ---
> >>   mm/rmap.c | 4 ++--
> >>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/mm/rmap.c b/mm/rmap.c
> >> index b838647..ffadf3e 100644
> >> --- a/mm/rmap.c
> >> +++ b/mm/rmap.c
> >> @@ -1380,7 +1380,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >>
> >>      if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
> >>          is_zone_device_page(page) && !is_device_private_page(page))
> >> -            return true;
> >> +            return false;
>
> Pages can be mapped in multiple vmas. Returning false here will only break out
> of the loop and skip any other vmas mapping this page which is a minor optimization
> but shouldn't really affect what try_to_unmap_one() does.
Yes, it returns false to terminate further iteration. And I think
fs-dax page would not go through this path.

The unmap of fs-dax should go through: umount fs, and arch_remove_memory().
>
> >>      if (flags & TTU_SPLIT_HUGE_PMD) {
> >>              split_huge_pmd_address(vma, address,
> >> @@ -1487,7 +1487,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >>
> >>              if (IS_ENABLED(CONFIG_MIGRATION) &&
> >>                  (flags & TTU_MIGRATION) &&
> >> -                is_zone_device_page(page)) {
> >> +                is_device_private_page(page)) {
> >>                      swp_entry_t entry;
> >>                      pte_t swp_pte;
>
> Since the page was checked for !device private, this is more clear but shouldn't
> change anything.
Yes. It just makes things clear.

Thanks,
Pingfan
Michal Hocko March 24, 2020, 9:14 a.m. UTC | #9
On Tue 24-03-20 11:47:20, Pingfan Liu wrote:
> On Mon, Mar 23, 2020 at 3:34 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> > > For zone_device, migration can only happen on is_device_private_page(page).
> > > Correct the logic in try_to_unmap_one().
> >
> > Maybe it is just me lacking knowledge in the zone_device ZOO. But
> > this really deserves a much more detailed explanation IMHO. It seems
> > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> > migration") deliberately made the decision to allow unmapping these
> > pages? Is the check just wrong, inncomplete? Why?
> I am not quite sure about zone_device, but I will try to explain it later.
> 
> But first of all, I think the code conflicts with the logic behind it.
> If try_to_unmap_one() success to unmap a page, then it should kill the
> pte, and return true. But the original code return true before the
> code like "ptep_clear_flush()"
> 
> Now, I try to say about !device_private zone device. (Please pardon
> and correct me if I make a mistake)
> memmap_init_zone_device() raises an extra _refcount on all zone
> device. And private-device should lifts the count later, otherwise it
> can not migrate. But I did not find the exact place yet.
> 
> While this extra _refcount will block migration, it is not the whole
> reason if a zone device page is mapped.
> 
> If  a zone device page is mapped, then I think the original code
> happen to work due to it skip the call of page_remove_rmap(), and in
> try_to_unmap(){ return !page_mapcount(page) ? true : false;}.

OK, you made me look more closely.

The lack of documentation and therefore the expected semantic doesn't
really help. So we can only deduce from the existing code which is a
recipe for cargo cult programming :/

The only difference betweena rmap_one returning true and false is that
the VMA walk stops for the later and done() callback is not called.
Does rmap_one success means the mapping for the vma has been torn down?
No. As we can see for the munlock case which just shows hot vague the
semantic of this callback might be.

I believe the zone device path was just copying it. Is that wrong?
Well, it is less optimal than necessary because the property we are
checking is not VMA specific so all other VMAs (if there are any at all)
will have the same to say.

So the only last remaining point is the done() callback. That one is
documented as a check. There is no note about potential side effects but
the current implementation is really only a check so skipping it doesn't
make any real difference.

> > What is the real user visible problem here?
> As explained, the original code happens to work, but it conflicts with
> the logic.

Your changelog should be explicit about this being a pure code
refinement/cleanup without any functional changes.

The rmap walk and callbacks would benefit from a proper documentation.
Hint...
Pingfan Liu March 25, 2020, 10:54 a.m. UTC | #10
On Tue, Mar 24, 2020 at 5:14 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 24-03-20 11:47:20, Pingfan Liu wrote:
> > On Mon, Mar 23, 2020 at 3:34 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> > > > For zone_device, migration can only happen on is_device_private_page(page).
> > > > Correct the logic in try_to_unmap_one().
> > >
> > > Maybe it is just me lacking knowledge in the zone_device ZOO. But
> > > this really deserves a much more detailed explanation IMHO. It seems
> > > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> > > migration") deliberately made the decision to allow unmapping these
> > > pages? Is the check just wrong, inncomplete? Why?
> > I am not quite sure about zone_device, but I will try to explain it later.
> >
> > But first of all, I think the code conflicts with the logic behind it.
> > If try_to_unmap_one() success to unmap a page, then it should kill the
> > pte, and return true. But the original code return true before the
> > code like "ptep_clear_flush()"
> >
> > Now, I try to say about !device_private zone device. (Please pardon
> > and correct me if I make a mistake)
> > memmap_init_zone_device() raises an extra _refcount on all zone
> > device. And private-device should lifts the count later, otherwise it
> > can not migrate. But I did not find the exact place yet.
> >
> > While this extra _refcount will block migration, it is not the whole
> > reason if a zone device page is mapped.
> >
> > If  a zone device page is mapped, then I think the original code
> > happen to work due to it skip the call of page_remove_rmap(), and in
> > try_to_unmap(){ return !page_mapcount(page) ? true : false;}.
>
> OK, you made me look more closely.
>
> The lack of documentation and therefore the expected semantic doesn't
> really help. So we can only deduce from the existing code which is a
> recipe for cargo cult programming :/
>
> The only difference betweena rmap_one returning true and false is that
> the VMA walk stops for the later and done() callback is not called.
> Does rmap_one success means the mapping for the vma has been torn down?
> No. As we can see for the munlock case which just shows hot vague the
> semantic of this callback might be.
>
> I believe the zone device path was just copying it. Is that wrong?
> Well, it is less optimal than necessary because the property we are
> checking is not VMA specific so all other VMAs (if there are any at all)
> will have the same to say.
>
> So the only last remaining point is the done() callback. That one is
> documented as a check. There is no note about potential side effects but
> the current implementation is really only a check so skipping it doesn't
> make any real difference.
>
> > > What is the real user visible problem here?
> > As explained, the original code happens to work, but it conflicts with
> > the logic.
>
> Your changelog should be explicit about this being a pure code
> refinement/cleanup without any functional changes.
OK, I will do that.
>
> The rmap walk and callbacks would benefit from a proper documentation.
> Hint...
I will add some note around try_to_unmap_one(), and update the commit
log. (Hope my understanding of zone device is right, and will cc more
people for comment)

Thanks,
Pingfan
Pingfan Liu April 1, 2020, 2:10 p.m. UTC | #11
On Wed, Mar 25, 2020 at 6:54 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> On Tue, Mar 24, 2020 at 5:14 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Tue 24-03-20 11:47:20, Pingfan Liu wrote:
> > > On Mon, Mar 23, 2020 at 3:34 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> > > > > For zone_device, migration can only happen on is_device_private_page(page).
> > > > > Correct the logic in try_to_unmap_one().
> > > >
> > > > Maybe it is just me lacking knowledge in the zone_device ZOO. But
> > > > this really deserves a much more detailed explanation IMHO. It seems
> > > > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> > > > migration") deliberately made the decision to allow unmapping these
> > > > pages? Is the check just wrong, inncomplete? Why?
> > > I am not quite sure about zone_device, but I will try to explain it later.
> > >
> > > But first of all, I think the code conflicts with the logic behind it.
> > > If try_to_unmap_one() success to unmap a page, then it should kill the
> > > pte, and return true. But the original code return true before the
> > > code like "ptep_clear_flush()"
> > >
> > > Now, I try to say about !device_private zone device. (Please pardon
> > > and correct me if I make a mistake)
> > > memmap_init_zone_device() raises an extra _refcount on all zone
> > > device. And private-device should lifts the count later, otherwise it
> > > can not migrate. But I did not find the exact place yet.
> > >
> > > While this extra _refcount will block migration, it is not the whole
> > > reason if a zone device page is mapped.
> > >
> > > If  a zone device page is mapped, then I think the original code
> > > happen to work due to it skip the call of page_remove_rmap(), and in
> > > try_to_unmap(){ return !page_mapcount(page) ? true : false;}.
> >
> > OK, you made me look more closely.
> >
> > The lack of documentation and therefore the expected semantic doesn't
> > really help. So we can only deduce from the existing code which is a
> > recipe for cargo cult programming :/
> >
> > The only difference betweena rmap_one returning true and false is that
> > the VMA walk stops for the later and done() callback is not called.
> > Does rmap_one success means the mapping for the vma has been torn down?
> > No. As we can see for the munlock case which just shows hot vague the
> > semantic of this callback might be.
> >
> > I believe the zone device path was just copying it. Is that wrong?
> > Well, it is less optimal than necessary because the property we are
> > checking is not VMA specific so all other VMAs (if there are any at all)
> > will have the same to say.
> >
> > So the only last remaining point is the done() callback. That one is
> > documented as a check. There is no note about potential side effects but
> > the current implementation is really only a check so skipping it doesn't
> > make any real difference.
> >
> > > > What is the real user visible problem here?
> > > As explained, the original code happens to work, but it conflicts with
> > > the logic.
> >
> > Your changelog should be explicit about this being a pure code
> > refinement/cleanup without any functional changes.
> OK, I will do that.
It took me some time to make clear try_to_munlock(). And now I can
make some notes. I will send out V2 soon.

Thanks,
Pingfan
diff mbox series

Patch

diff --git a/mm/rmap.c b/mm/rmap.c
index b838647..ffadf3e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1380,7 +1380,7 @@  static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,

 	if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
 	    is_zone_device_page(page) && !is_device_private_page(page))
-		return true;
+		return false;

 	if (flags & TTU_SPLIT_HUGE_PMD) {
 		split_huge_pmd_address(vma, address,
@@ -1487,7 +1487,7 @@  static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,

 		if (IS_ENABLED(CONFIG_MIGRATION) &&
 		    (flags & TTU_MIGRATION) &&
-		    is_zone_device_page(page)) {
+		    is_device_private_page(page)) {
 			swp_entry_t entry;
 			pte_t swp_pte;