diff mbox series

[v2,5/5] mm/migrate.c: fix potential deadlock in NUMA balancing shared exec THP case

Message ID 20210323135405.65059-6-linmiaohe@huawei.com (mailing list archive)
State New, archived
Headers show
Series Cleanup and fixup for mm/migrate.c | expand

Commit Message

Miaohe Lin March 23, 2021, 1:54 p.m. UTC
Since commit c77c5cbafe54 ("mm: migrate: skip shared exec THP for NUMA
balancing"), the NUMA balancing would skip shared exec transhuge page.
But this enhancement is not suitable for transhuge page. Because it's
required that page_mapcount() must be 1 due to no migration pte dance
is done here. On the other hand, the shared exec transhuge page will
leave the migrate_misplaced_page() with pte entry untouched and page
locked. Thus pagefault for NUMA will be triggered again and deadlock
occurs when we start waiting for the page lock held by ourselves.

Fixes: c77c5cbafe54 ("mm: migrate: skip shared exec THP for NUMA balancing")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/migrate.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Yang Shi March 23, 2021, 5:17 p.m. UTC | #1
On Tue, Mar 23, 2021 at 6:55 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> Since commit c77c5cbafe54 ("mm: migrate: skip shared exec THP for NUMA
> balancing"), the NUMA balancing would skip shared exec transhuge page.
> But this enhancement is not suitable for transhuge page. Because it's
> required that page_mapcount() must be 1 due to no migration pte dance
> is done here. On the other hand, the shared exec transhuge page will
> leave the migrate_misplaced_page() with pte entry untouched and page
> locked. Thus pagefault for NUMA will be triggered again and deadlock
> occurs when we start waiting for the page lock held by ourselves.

Thanks for catching this. By relooking the code I think the other
important reason for removing this is
migrate_misplaced_transhuge_page() actually can't see shared exec file
THP at all since page_lock_anon_vma_read() is called before and if
page is not anonymous page it will just restore the PMD without
migrating anything.

The pages for private mapped file vma may be anonymous pages due to
COW but they can't be THP so it won't trigger THP numa fault at all. I
think this is why no bug was reported. I overlooked this in the first
place.

Your fix is correct, and please add the above justification to your commit log.

Reviewed-by: Yang Shi <shy828301@gmail.com>

>
> Fixes: c77c5cbafe54 ("mm: migrate: skip shared exec THP for NUMA balancing")
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/migrate.c | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 5357a8527ca2..68bfa1625898 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2192,9 +2192,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>         int page_lru = page_is_file_lru(page);
>         unsigned long start = address & HPAGE_PMD_MASK;
>
> -       if (is_shared_exec_page(vma, page))
> -               goto out;
> -
>         new_page = alloc_pages_node(node,
>                 (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
>                 HPAGE_PMD_ORDER);
> @@ -2306,7 +2303,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>
>  out_unlock:
>         unlock_page(page);
> -out:
>         put_page(page);
>         return 0;
>  }
> --
> 2.19.1
>
Yang Shi March 24, 2021, 1:16 a.m. UTC | #2
On Tue, Mar 23, 2021 at 10:17 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Mar 23, 2021 at 6:55 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
> >
> > Since commit c77c5cbafe54 ("mm: migrate: skip shared exec THP for NUMA
> > balancing"), the NUMA balancing would skip shared exec transhuge page.
> > But this enhancement is not suitable for transhuge page. Because it's
> > required that page_mapcount() must be 1 due to no migration pte dance
> > is done here. On the other hand, the shared exec transhuge page will
> > leave the migrate_misplaced_page() with pte entry untouched and page
> > locked. Thus pagefault for NUMA will be triggered again and deadlock
> > occurs when we start waiting for the page lock held by ourselves.
>
> Thanks for catching this. By relooking the code I think the other
> important reason for removing this is
> migrate_misplaced_transhuge_page() actually can't see shared exec file
> THP at all since page_lock_anon_vma_read() is called before and if
> page is not anonymous page it will just restore the PMD without
> migrating anything.
>
> The pages for private mapped file vma may be anonymous pages due to
> COW but they can't be THP so it won't trigger THP numa fault at all. I
> think this is why no bug was reported. I overlooked this in the first
> place.
>
> Your fix is correct, and please add the above justification to your commit log.

BTW, I think you can just undo or revert commit c77c5cbafe54 ("mm:
migrate: skip shared exec THP for NUMA balancing").

Thanks,
Yang

>
> Reviewed-by: Yang Shi <shy828301@gmail.com>
>
> >
> > Fixes: c77c5cbafe54 ("mm: migrate: skip shared exec THP for NUMA balancing")
> > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> > ---
> >  mm/migrate.c | 4 ----
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 5357a8527ca2..68bfa1625898 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -2192,9 +2192,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
> >         int page_lru = page_is_file_lru(page);
> >         unsigned long start = address & HPAGE_PMD_MASK;
> >
> > -       if (is_shared_exec_page(vma, page))
> > -               goto out;
> > -
> >         new_page = alloc_pages_node(node,
> >                 (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
> >                 HPAGE_PMD_ORDER);
> > @@ -2306,7 +2303,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
> >
> >  out_unlock:
> >         unlock_page(page);
> > -out:
> >         put_page(page);
> >         return 0;
> >  }
> > --
> > 2.19.1
> >
Miaohe Lin March 24, 2021, 2:03 a.m. UTC | #3
On 2021/3/24 1:17, Yang Shi wrote:
> On Tue, Mar 23, 2021 at 6:55 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> Since commit c77c5cbafe54 ("mm: migrate: skip shared exec THP for NUMA
>> balancing"), the NUMA balancing would skip shared exec transhuge page.
>> But this enhancement is not suitable for transhuge page. Because it's
>> required that page_mapcount() must be 1 due to no migration pte dance
>> is done here. On the other hand, the shared exec transhuge page will
>> leave the migrate_misplaced_page() with pte entry untouched and page
>> locked. Thus pagefault for NUMA will be triggered again and deadlock
>> occurs when we start waiting for the page lock held by ourselves.
> 
> Thanks for catching this. By relooking the code I think the other
> important reason for removing this is
> migrate_misplaced_transhuge_page() actually can't see shared exec file
> THP at all since page_lock_anon_vma_read() is called before and if
> page is not anonymous page it will just restore the PMD without
> migrating anything.
> 
> The pages for private mapped file vma may be anonymous pages due to
> COW but they can't be THP so it won't trigger THP numa fault at all. I
> think this is why no bug was reported. I overlooked this in the first
> place.
> 
> Your fix is correct, and please add the above justification to your commit log.
> 

Will do. Many thanks for your review!

> Reviewed-by: Yang Shi <shy828301@gmail.com>
> 
>>
>> Fixes: c77c5cbafe54 ("mm: migrate: skip shared exec THP for NUMA balancing")
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/migrate.c | 4 ----
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 5357a8527ca2..68bfa1625898 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -2192,9 +2192,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>>         int page_lru = page_is_file_lru(page);
>>         unsigned long start = address & HPAGE_PMD_MASK;
>>
>> -       if (is_shared_exec_page(vma, page))
>> -               goto out;
>> -
>>         new_page = alloc_pages_node(node,
>>                 (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
>>                 HPAGE_PMD_ORDER);
>> @@ -2306,7 +2303,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>>
>>  out_unlock:
>>         unlock_page(page);
>> -out:
>>         put_page(page);
>>         return 0;
>>  }
>> --
>> 2.19.1
>>
> .
>
Miaohe Lin March 24, 2021, 2:14 a.m. UTC | #4
On 2021/3/24 9:16, Yang Shi wrote:
> On Tue, Mar 23, 2021 at 10:17 AM Yang Shi <shy828301@gmail.com> wrote:
>>
>> On Tue, Mar 23, 2021 at 6:55 AM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>
>>> Since commit c77c5cbafe54 ("mm: migrate: skip shared exec THP for NUMA
>>> balancing"), the NUMA balancing would skip shared exec transhuge page.
>>> But this enhancement is not suitable for transhuge page. Because it's
>>> required that page_mapcount() must be 1 due to no migration pte dance
>>> is done here. On the other hand, the shared exec transhuge page will
>>> leave the migrate_misplaced_page() with pte entry untouched and page
>>> locked. Thus pagefault for NUMA will be triggered again and deadlock
>>> occurs when we start waiting for the page lock held by ourselves.
>>
>> Thanks for catching this. By relooking the code I think the other
>> important reason for removing this is
>> migrate_misplaced_transhuge_page() actually can't see shared exec file
>> THP at all since page_lock_anon_vma_read() is called before and if
>> page is not anonymous page it will just restore the PMD without
>> migrating anything.
>>
>> The pages for private mapped file vma may be anonymous pages due to
>> COW but they can't be THP so it won't trigger THP numa fault at all. I
>> think this is why no bug was reported. I overlooked this in the first
>> place.
>>
>> Your fix is correct, and please add the above justification to your commit log.
> 
> BTW, I think you can just undo or revert commit c77c5cbafe54 ("mm:
> migrate: skip shared exec THP for NUMA balancing").
> 

Yep, we can revert this commit. I thought it handle the shared exec base page too.
Will do it and with the above justification to the commit log.
Many Thanks!

> Thanks,
> Yang
> 
>>
>> Reviewed-by: Yang Shi <shy828301@gmail.com>
>>
>>>
>>> Fixes: c77c5cbafe54 ("mm: migrate: skip shared exec THP for NUMA balancing")
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>  mm/migrate.c | 4 ----
>>>  1 file changed, 4 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index 5357a8527ca2..68bfa1625898 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -2192,9 +2192,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>>>         int page_lru = page_is_file_lru(page);
>>>         unsigned long start = address & HPAGE_PMD_MASK;
>>>
>>> -       if (is_shared_exec_page(vma, page))
>>> -               goto out;
>>> -
>>>         new_page = alloc_pages_node(node,
>>>                 (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
>>>                 HPAGE_PMD_ORDER);
>>> @@ -2306,7 +2303,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>>>
>>>  out_unlock:
>>>         unlock_page(page);
>>> -out:
>>>         put_page(page);
>>>         return 0;
>>>  }
>>> --
>>> 2.19.1
>>>
> .
>
diff mbox series

Patch

diff --git a/mm/migrate.c b/mm/migrate.c
index 5357a8527ca2..68bfa1625898 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2192,9 +2192,6 @@  int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	int page_lru = page_is_file_lru(page);
 	unsigned long start = address & HPAGE_PMD_MASK;
 
-	if (is_shared_exec_page(vma, page))
-		goto out;
-
 	new_page = alloc_pages_node(node,
 		(GFP_TRANSHUGE_LIGHT | __GFP_THISNODE),
 		HPAGE_PMD_ORDER);
@@ -2306,7 +2303,6 @@  int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 
 out_unlock:
 	unlock_page(page);
-out:
 	put_page(page);
 	return 0;
 }