diff mbox series

mm/migrate_device: Do not access pgmap for non zone device pages

Message ID 20250219231337.364133-1-balbirs@nvidia.com (mailing list archive)
State New
Headers show
Series mm/migrate_device: Do not access pgmap for non zone device pages | expand

Commit Message

Balbir Singh Feb. 19, 2025, 11:13 p.m. UTC
page_pgmap() is referenced before checking if the page is a zone
device page and this triggers the warning in page_pgmap(). Refactor
the code to use the helper function after relevant checks.

Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>

Signed-off-by: Balbir Singh <balbirs@nvidia.com>
---

Fixes: 7f1cfd71153b ("mm: allow compound zone device pages") on
mm-unstable

 mm/migrate_device.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

David Hildenbrand Feb. 20, 2025, 11:48 a.m. UTC | #1
On 20.02.25 00:13, Balbir Singh wrote:
> page_pgmap() is referenced before checking if the page is a zone
> device page and this triggers the warning in page_pgmap(). Refactor
> the code to use the helper function after relevant checks.
> 
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> 
> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
> ---
> 
> Fixes: 7f1cfd71153b ("mm: allow compound zone device pages") on
> mm-unstable

Is there actually something broken? At least for now, reading 
folio->pgmap should just work, although it might be garbage.

> 
>   mm/migrate_device.c | 13 ++++++++-----
>   1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 6771893d4601..e0bf771edb6f 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -153,14 +153,17 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>   				goto next;
>   			}
>   			page = vm_normal_page(migrate->vma, addr, pte);
> -			pgmap = page_pgmap(page);
>   			if (page && !is_zone_device_page(page) &&
>   			    !(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
>   				goto next;
> -			else if (page && is_device_coherent_page(page) &&
> -			    (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_COHERENT) ||
> -			     pgmap->owner != migrate->pgmap_owner))
> -				goto next;
> +			else if (page && is_device_coherent_page(page)) {
> +				pgmap = page_pgmap(page);
> +
> +				if (!(migrate->flags &
> +					MIGRATE_VMA_SELECT_DEVICE_COHERENT) ||
> +					pgmap->owner != migrate->pgmap_owner)
> +					goto next;
> +			}

Coding style wants you to use

if () {

} else if {

}

Not

if ()
else if {

}


Something simpler might be

page_pgmap(page)->owner != migrate->pgmap_owner

>   			mpfn = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE;
>   			mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0;
>   		}
Balbir Singh Feb. 20, 2025, 11:58 a.m. UTC | #2
On 2/20/25 22:48, David Hildenbrand wrote:
> On 20.02.25 00:13, Balbir Singh wrote:
>> page_pgmap() is referenced before checking if the page is a zone
>> device page and this triggers the warning in page_pgmap(). Refactor
>> the code to use the helper function after relevant checks.
>>
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>>
>> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
>> ---
>>
>> Fixes: 7f1cfd71153b ("mm: allow compound zone device pages") on
>> mm-unstable
> 
> Is there actually something broken? At least for now, reading folio->pgmap should just work, although it might be garbage.
> 

It triggers the VM_WARN_ON_ONCE_PAGE

static inline struct dev_pagemap *page_pgmap(const struct page *page)
{
	VM_WARN_ON_ONCE_PAGE(!is_zone_device_page(page), page);
	return page_folio(page)->pgmap;
}

Nothing is broken, because the code below has checks for is_device_coherent_page(),
but in general I think the WARN_ON is correct because it warns us against garbage
and it's propagation if the correct checks are not in place.


>>
>>   mm/migrate_device.c | 13 ++++++++-----
>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>> index 6771893d4601..e0bf771edb6f 100644
>> --- a/mm/migrate_device.c
>> +++ b/mm/migrate_device.c
>> @@ -153,14 +153,17 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>                   goto next;
>>               }
>>               page = vm_normal_page(migrate->vma, addr, pte);
>> -            pgmap = page_pgmap(page);
>>               if (page && !is_zone_device_page(page) &&
>>                   !(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
>>                   goto next;
>> -            else if (page && is_device_coherent_page(page) &&
>> -                (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_COHERENT) ||
>> -                 pgmap->owner != migrate->pgmap_owner))
>> -                goto next;
>> +            else if (page && is_device_coherent_page(page)) {
>> +                pgmap = page_pgmap(page);
>> +
>> +                if (!(migrate->flags &
>> +                    MIGRATE_VMA_SELECT_DEVICE_COHERENT) ||
>> +                    pgmap->owner != migrate->pgmap_owner)
>> +                    goto next;
>> +            }
> 
> Coding style wants you to use
> 
> if () {
> 
> } else if {
> 
> }
> 
> Not
> 
> if ()
> else if {
> 
> }
> 

Ack, checkpatch.pl missed it, but agreed

> 
> Something simpler might be
> 
> page_pgmap(page)->owner != migrate->pgmap_owner
> 

Yep, I had that and dropped it, the four clauses made it feel that it might
benefit from a split.

>>               mpfn = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE;
>>               mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0;
>>           }
> 
> 

Thanks for the review!
Balbir
David Hildenbrand Feb. 20, 2025, 12:05 p.m. UTC | #3
On 20.02.25 12:58, Balbir Singh wrote:
> On 2/20/25 22:48, David Hildenbrand wrote:
>> On 20.02.25 00:13, Balbir Singh wrote:
>>> page_pgmap() is referenced before checking if the page is a zone
>>> device page and this triggers the warning in page_pgmap(). Refactor
>>> the code to use the helper function after relevant checks.
>>>
>>> Cc: Alistair Popple <apopple@nvidia.com>
>>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>>
>>> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
>>> ---
>>>
>>> Fixes: 7f1cfd71153b ("mm: allow compound zone device pages") on
>>> mm-unstable
>>
>> Is there actually something broken? At least for now, reading folio->pgmap should just work, although it might be garbage.
>>
> 
> It triggers the VM_WARN_ON_ONCE_PAGE
> 
> static inline struct dev_pagemap *page_pgmap(const struct page *page)
> {
> 	VM_WARN_ON_ONCE_PAGE(!is_zone_device_page(page), page);
> 	return page_folio(page)->pgmap;
> }
> 
> Nothing is broken, because the code below has checks for is_device_coherent_page(),
> but in general I think the WARN_ON is correct because it warns us against garbage
> and it's propagation if the correct checks are not in place.

Ah! Now I read your "triggers the warning in page_pgmap()" in the 
description.

It's usually a good idea to just include the splat you observed, if you 
did, and call it "triggers the VM_WARN_ON_ONCE_PAGE".

The "Fixes:" should go above the "---" in that case.

> 
> 
>>>
>>>    mm/migrate_device.c | 13 ++++++++-----
>>>    1 file changed, 8 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>> index 6771893d4601..e0bf771edb6f 100644
>>> --- a/mm/migrate_device.c
>>> +++ b/mm/migrate_device.c
>>> @@ -153,14 +153,17 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>>                    goto next;
>>>                }
>>>                page = vm_normal_page(migrate->vma, addr, pte);
>>> -            pgmap = page_pgmap(page);
>>>                if (page && !is_zone_device_page(page) &&
>>>                    !(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
>>>                    goto next;
>>> -            else if (page && is_device_coherent_page(page) &&
>>> -                (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_COHERENT) ||
>>> -                 pgmap->owner != migrate->pgmap_owner))
>>> -                goto next;
>>> +            else if (page && is_device_coherent_page(page)) {
>>> +                pgmap = page_pgmap(page);
>>> +
>>> +                if (!(migrate->flags &
>>> +                    MIGRATE_VMA_SELECT_DEVICE_COHERENT) ||
>>> +                    pgmap->owner != migrate->pgmap_owner)
>>> +                    goto next;
>>> +            }
>>
>> Coding style wants you to use
>>
>> if () {
>>
>> } else if {
>>
>> }
>>
>> Not
>>
>> if ()
>> else if {
>>
>> }
>>
> 
> Ack, checkpatch.pl missed it, but agreed
> 
>>
>> Something simpler might be
>>
>> page_pgmap(page)->owner != migrate->pgmap_owner
>>
> 
> Yep, I had that and dropped it, the four clauses made it feel that it might
> benefit from a split.

Right, the mixture of && and || is confusing.


So with the {}

Acked-by: David Hildenbrand <david@redhat.com>

Thanks!
Balbir Singh Feb. 20, 2025, 12:38 p.m. UTC | #4
On 2/20/25 23:05, David Hildenbrand wrote:
> On 20.02.25 12:58, Balbir Singh wrote:
>> On 2/20/25 22:48, David Hildenbrand wrote:
>>> On 20.02.25 00:13, Balbir Singh wrote:
>>>> page_pgmap() is referenced before checking if the page is a zone
>>>> device page and this triggers the warning in page_pgmap(). Refactor
>>>> the code to use the helper function after relevant checks.
>>>>
>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>>>
>>>> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
>>>> ---
>>>>
>>>> Fixes: 7f1cfd71153b ("mm: allow compound zone device pages") on
>>>> mm-unstable
>>>
>>> Is there actually something broken? At least for now, reading folio->pgmap should just work, although it might be garbage.
>>>
>>
>> It triggers the VM_WARN_ON_ONCE_PAGE
>>
>> static inline struct dev_pagemap *page_pgmap(const struct page *page)
>> {
>>     VM_WARN_ON_ONCE_PAGE(!is_zone_device_page(page), page);
>>     return page_folio(page)->pgmap;
>> }
>>
>> Nothing is broken, because the code below has checks for is_device_coherent_page(),
>> but in general I think the WARN_ON is correct because it warns us against garbage
>> and it's propagation if the correct checks are not in place.
> 
> Ah! Now I read your "triggers the warning in page_pgmap()" in the description.
> 
> It's usually a good idea to just include the splat you observed, if you did, and call it "triggers the VM_WARN_ON_ONCE_PAGE".

Sure

> 
> The "Fixes:" should go above the "---" in that case.

I was not sure if the commit-ids in mm-unstable are persistent, so I put it
below the --- to avoid confusion

> 
>>
>>
>>>>
>>>>    mm/migrate_device.c | 13 ++++++++-----
>>>>    1 file changed, 8 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>>>> index 6771893d4601..e0bf771edb6f 100644
>>>> --- a/mm/migrate_device.c
>>>> +++ b/mm/migrate_device.c
>>>> @@ -153,14 +153,17 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>>>>                    goto next;
>>>>                }
>>>>                page = vm_normal_page(migrate->vma, addr, pte);
>>>> -            pgmap = page_pgmap(page);
>>>>                if (page && !is_zone_device_page(page) &&
>>>>                    !(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
>>>>                    goto next;
>>>> -            else if (page && is_device_coherent_page(page) &&
>>>> -                (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_COHERENT) ||
>>>> -                 pgmap->owner != migrate->pgmap_owner))
>>>> -                goto next;
>>>> +            else if (page && is_device_coherent_page(page)) {
>>>> +                pgmap = page_pgmap(page);
>>>> +
>>>> +                if (!(migrate->flags &
>>>> +                    MIGRATE_VMA_SELECT_DEVICE_COHERENT) ||
>>>> +                    pgmap->owner != migrate->pgmap_owner)
>>>> +                    goto next;
>>>> +            }
>>>
>>> Coding style wants you to use
>>>
>>> if () {
>>>
>>> } else if {
>>>
>>> }
>>>
>>> Not
>>>
>>> if ()
>>> else if {
>>>
>>> }
>>>
>>
>> Ack, checkpatch.pl missed it, but agreed
>>
>>>
>>> Something simpler might be
>>>
>>> page_pgmap(page)->owner != migrate->pgmap_owner
>>>
>>
>> Yep, I had that and dropped it, the four clauses made it feel that it might
>> benefit from a split.
> 
> Right, the mixture of && and || is confusing.
> 
> 
> So with the {}
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> 
> Thanks!
> 

Thanks, I can respin it with the {} as needed.

Balbir
Alistair Popple Feb. 20, 2025, 10:45 p.m. UTC | #5
On Thu, Feb 20, 2025 at 11:38:10PM +1100, Balbir Singh wrote:
> On 2/20/25 23:05, David Hildenbrand wrote:
> > On 20.02.25 12:58, Balbir Singh wrote:
> >> On 2/20/25 22:48, David Hildenbrand wrote:
> >>> On 20.02.25 00:13, Balbir Singh wrote:
> >>>> page_pgmap() is referenced before checking if the page is a zone
> >>>> device page and this triggers the warning in page_pgmap(). Refactor
> >>>> the code to use the helper function after relevant checks.
> >>>>
> >>>> Cc: Alistair Popple <apopple@nvidia.com>
> >>>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> >>>> Cc: David Hildenbrand <david@redhat.com>
> >>>> Cc: Dan Williams <dan.j.williams@intel.com>
> >>>>
> >>>> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
> >>>> ---
> >>>>
> >>>> Fixes: 7f1cfd71153b ("mm: allow compound zone device pages") on
> >>>> mm-unstable
> >>>
> >>> Is there actually something broken? At least for now, reading folio->pgmap should just work, although it might be garbage.
> >>>
> >>
> >> It triggers the VM_WARN_ON_ONCE_PAGE
> >>
> >> static inline struct dev_pagemap *page_pgmap(const struct page *page)
> >> {
> >>     VM_WARN_ON_ONCE_PAGE(!is_zone_device_page(page), page);
> >>     return page_folio(page)->pgmap;
> >> }
> >>
> >> Nothing is broken, because the code below has checks for is_device_coherent_page(),
> >> but in general I think the WARN_ON is correct because it warns us against garbage
> >> and it's propagation if the correct checks are not in place.
> > 
> > Ah! Now I read your "triggers the warning in page_pgmap()" in the description.
> > 
> > It's usually a good idea to just include the splat you observed, if you did, and call it "triggers the VM_WARN_ON_ONCE_PAGE".
> 
> Sure
> 
> > 
> > The "Fixes:" should go above the "---" in that case.
> 
> I was not sure if the commit-ids in mm-unstable are persistent, so I put it
> below the --- to avoid confusion

They are not, but I'm unsure what the process is here. I guess either I roll
this fix into my series and repost or maybe Andrew takes it as is a separate fix
on top of my series?

Either way the fix looks good, so with the code style comment below fixed feel
free to add:

Reviewed-by: Alistair Popple <apopple@nvidia.com>

> > 
> >>
> >>
> >>>>
> >>>>    mm/migrate_device.c | 13 ++++++++-----
> >>>>    1 file changed, 8 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> >>>> index 6771893d4601..e0bf771edb6f 100644
> >>>> --- a/mm/migrate_device.c
> >>>> +++ b/mm/migrate_device.c
> >>>> @@ -153,14 +153,17 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> >>>>                    goto next;
> >>>>                }
> >>>>                page = vm_normal_page(migrate->vma, addr, pte);
> >>>> -            pgmap = page_pgmap(page);
> >>>>                if (page && !is_zone_device_page(page) &&
> >>>>                    !(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
> >>>>                    goto next;
> >>>> -            else if (page && is_device_coherent_page(page) &&
> >>>> -                (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_COHERENT) ||
> >>>> -                 pgmap->owner != migrate->pgmap_owner))
> >>>> -                goto next;
> >>>> +            else if (page && is_device_coherent_page(page)) {
> >>>> +                pgmap = page_pgmap(page);
> >>>> +
> >>>> +                if (!(migrate->flags &
> >>>> +                    MIGRATE_VMA_SELECT_DEVICE_COHERENT) ||
> >>>> +                    pgmap->owner != migrate->pgmap_owner)
> >>>> +                    goto next;
> >>>> +            }
> >>>
> >>> Coding style wants you to use
> >>>
> >>> if () {
> >>>
> >>> } else if {
> >>>
> >>> }
> >>>
> >>> Not
> >>>
> >>> if ()
> >>> else if {
> >>>
> >>> }
> >>>
> >>
> >> Ack, checkpatch.pl missed it, but agreed
> >>
> >>>
> >>> Something simpler might be
> >>>
> >>> page_pgmap(page)->owner != migrate->pgmap_owner
> >>>
> >>
> >> Yep, I had that and dropped it, the four clauses made it feel that it might
> >> benefit from a split.
> > 
> > Right, the mixture of && and || is confusing.
> > 
> > 
> > So with the {}
> > 
> > Acked-by: David Hildenbrand <david@redhat.com>
> > 
> > Thanks!
> > 
> 
> Thanks, I can respin it with the {} as needed.
> 
> Balbir
David Hildenbrand Feb. 21, 2025, 8:13 a.m. UTC | #6
On 20.02.25 23:45, Alistair Popple wrote:
> On Thu, Feb 20, 2025 at 11:38:10PM +1100, Balbir Singh wrote:
>> On 2/20/25 23:05, David Hildenbrand wrote:
>>> On 20.02.25 12:58, Balbir Singh wrote:
>>>> On 2/20/25 22:48, David Hildenbrand wrote:
>>>>> On 20.02.25 00:13, Balbir Singh wrote:
>>>>>> page_pgmap() is referenced before checking if the page is a zone
>>>>>> device page and this triggers the warning in page_pgmap(). Refactor
>>>>>> the code to use the helper function after relevant checks.
>>>>>>
>>>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>>>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>>>>>> Cc: David Hildenbrand <david@redhat.com>
>>>>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>>>>>
>>>>>> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
>>>>>> ---
>>>>>>
>>>>>> Fixes: 7f1cfd71153b ("mm: allow compound zone device pages") on
>>>>>> mm-unstable
>>>>>
>>>>> Is there actually something broken? At least for now, reading folio->pgmap should just work, although it might be garbage.
>>>>>
>>>>
>>>> It triggers the VM_WARN_ON_ONCE_PAGE
>>>>
>>>> static inline struct dev_pagemap *page_pgmap(const struct page *page)
>>>> {
>>>>      VM_WARN_ON_ONCE_PAGE(!is_zone_device_page(page), page);
>>>>      return page_folio(page)->pgmap;
>>>> }
>>>>
>>>> Nothing is broken, because the code below has checks for is_device_coherent_page(),
>>>> but in general I think the WARN_ON is correct because it warns us against garbage
>>>> and it's propagation if the correct checks are not in place.
>>>
>>> Ah! Now I read your "triggers the warning in page_pgmap()" in the description.
>>>
>>> It's usually a good idea to just include the splat you observed, if you did, and call it "triggers the VM_WARN_ON_ONCE_PAGE".
>>
>> Sure
>>
>>>
>>> The "Fixes:" should go above the "---" in that case.
>>
>> I was not sure if the commit-ids in mm-unstable are persistent, so I put it
>> below the --- to avoid confusion
> 
> They are not, but I'm unsure what the process is here. I guess either I roll
> this fix into my series and repost or maybe Andrew takes it as is a separate fix
> on top of my series?

Ah, I completely ignored that this is not in mm/mm-stable yet.

Yes, that should best just be squashed into the original commit as long 
as it is not in mm/mm-stable.
diff mbox series

Patch

diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 6771893d4601..e0bf771edb6f 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -153,14 +153,17 @@  static int migrate_vma_collect_pmd(pmd_t *pmdp,
 				goto next;
 			}
 			page = vm_normal_page(migrate->vma, addr, pte);
-			pgmap = page_pgmap(page);
 			if (page && !is_zone_device_page(page) &&
 			    !(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
 				goto next;
-			else if (page && is_device_coherent_page(page) &&
-			    (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_COHERENT) ||
-			     pgmap->owner != migrate->pgmap_owner))
-				goto next;
+			else if (page && is_device_coherent_page(page)) {
+				pgmap = page_pgmap(page);
+
+				if (!(migrate->flags &
+					MIGRATE_VMA_SELECT_DEVICE_COHERENT) ||
+					pgmap->owner != migrate->pgmap_owner)
+					goto next;
+			}
 			mpfn = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE;
 			mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0;
 		}