diff mbox series

mm: thp: don't have to lock page anymore when splitting PMD

Message ID 20220303222014.517033-1-shy828301@gmail.com (mailing list archive)
State New
Headers show
Series mm: thp: don't have to lock page anymore when splitting PMD | expand

Commit Message

Yang Shi March 3, 2022, 10:20 p.m. UTC
The commit c444eb564fb1 ("mm: thp: make the THP mapcount atomic against
__split_huge_pmd_locked()") locked the page for PMD split to make
mapcount stable for reuse_swap_page(), then commit 1c2f67308af4 ("mm:
thp: fix MADV_REMOVE deadlock on shmem THP") reduce the scope to
anonymous page only.

However COW has not used mapcount to determine if the page is shared or
not anymore due to the COW fixes [1] from David Hildenbrand and the
reuse_swap_page() was removed as well.  So PMD split doesn't have to
lock the page anymore.  This patch basically reverted the above two
commits.

[1] https://lore.kernel.org/linux-mm/20220131162940.210846-1-david@redhat.com/

Cc: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/huge_memory.c | 44 +++++---------------------------------------
 1 file changed, 5 insertions(+), 39 deletions(-)

Comments

Miaohe Lin March 4, 2022, 2:25 a.m. UTC | #1
On 2022/3/4 6:20, Yang Shi wrote:
> The commit c444eb564fb1 ("mm: thp: make the THP mapcount atomic against
> __split_huge_pmd_locked()") locked the page for PMD split to make
> mapcount stable for reuse_swap_page(), then commit 1c2f67308af4 ("mm:
> thp: fix MADV_REMOVE deadlock on shmem THP") reduce the scope to
> anonymous page only.
> 
> However COW has not used mapcount to determine if the page is shared or
> not anymore due to the COW fixes [1] from David Hildenbrand and the
> reuse_swap_page() was removed as well.  So PMD split doesn't have to
> lock the page anymore.  This patch basically reverted the above two
> commits.
> 

Sounds reasonable. Since mapcount is not used to determine if we need to COW
the page, PMD split doesn't have to lock the page anymore.

> [1] https://lore.kernel.org/linux-mm/20220131162940.210846-1-david@redhat.com/
> 
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  mm/huge_memory.c | 44 +++++---------------------------------------
>  1 file changed, 5 insertions(+), 39 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b49e1a11df2e..daaa698bd273 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2134,8 +2134,6 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>  {
>  	spinlock_t *ptl;
>  	struct mmu_notifier_range range;
> -	bool do_unlock_folio = false;
> -	pmd_t _pmd;
>  
>  	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
>  				address & HPAGE_PMD_MASK,
> @@ -2148,48 +2146,16 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>  	 * pmd against. Otherwise we can end up replacing wrong folio.
>  	 */
>  	VM_BUG_ON(freeze && !folio);
> -	if (folio) {
> -		VM_WARN_ON_ONCE(!folio_test_locked(folio));
> -		if (folio != page_folio(pmd_page(*pmd)))
> -			goto out;
> -	}
> +	if (folio && folio != page_folio(pmd_page(*pmd)))
> +		goto out;
>  
> -repeat:
> -	if (pmd_trans_huge(*pmd)) {
> -		if (!folio) {
> -			folio = page_folio(pmd_page(*pmd));
> -			/*
> -			 * An anonymous page must be locked, to ensure that a
> -			 * concurrent reuse_swap_page() sees stable mapcount;
> -			 * but reuse_swap_page() is not used on shmem or file,
> -			 * and page lock must not be taken when zap_pmd_range()
> -			 * calls __split_huge_pmd() while i_mmap_lock is held.
> -			 */
> -			if (folio_test_anon(folio)) {
> -				if (unlikely(!folio_trylock(folio))) {
> -					folio_get(folio);
> -					_pmd = *pmd;
> -					spin_unlock(ptl);
> -					folio_lock(folio);
> -					spin_lock(ptl);
> -					if (unlikely(!pmd_same(*pmd, _pmd))) {
> -						folio_unlock(folio);
> -						folio_put(folio);
> -						folio = NULL;
> -						goto repeat;
> -					}
> -					folio_put(folio);
> -				}
> -				do_unlock_folio = true;
> -			}
> -		}
> -	} else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
> +	if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
>  		goto out;
> +
>  	__split_huge_pmd_locked(vma, pmd, range.start, freeze);

IUUC, here should be something like below:
if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))
        __split_huge_pmd_locked(vma, pmd, range.start, freeze);

i.e. the pmd_trans_huge case is missing in the above change. Or am I miss something ?

Thanks for the patch. This really simplify the code and avoid the unneeded overhead.

>  out:
>  	spin_unlock(ptl);
> -	if (do_unlock_folio)
> -		folio_unlock(folio);
> +
>  	/*
>  	 * No need to double call mmu_notifier->invalidate_range() callback.
>  	 * They are 3 cases to consider inside __split_huge_pmd_locked():
>
Yang Shi March 4, 2022, 3:12 a.m. UTC | #2
On Thu, Mar 3, 2022 at 6:25 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2022/3/4 6:20, Yang Shi wrote:
> > The commit c444eb564fb1 ("mm: thp: make the THP mapcount atomic against
> > __split_huge_pmd_locked()") locked the page for PMD split to make
> > mapcount stable for reuse_swap_page(), then commit 1c2f67308af4 ("mm:
> > thp: fix MADV_REMOVE deadlock on shmem THP") reduce the scope to
> > anonymous page only.
> >
> > However COW has not used mapcount to determine if the page is shared or
> > not anymore due to the COW fixes [1] from David Hildenbrand and the
> > reuse_swap_page() was removed as well.  So PMD split doesn't have to
> > lock the page anymore.  This patch basically reverted the above two
> > commits.
> >
>
> Sounds reasonable. Since mapcount is not used to determine if we need to COW
> the page, PMD split doesn't have to lock the page anymore.
>
> > [1] https://lore.kernel.org/linux-mm/20220131162940.210846-1-david@redhat.com/
> >
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> >  mm/huge_memory.c | 44 +++++---------------------------------------
> >  1 file changed, 5 insertions(+), 39 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index b49e1a11df2e..daaa698bd273 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2134,8 +2134,6 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> >  {
> >       spinlock_t *ptl;
> >       struct mmu_notifier_range range;
> > -     bool do_unlock_folio = false;
> > -     pmd_t _pmd;
> >
> >       mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> >                               address & HPAGE_PMD_MASK,
> > @@ -2148,48 +2146,16 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> >        * pmd against. Otherwise we can end up replacing wrong folio.
> >        */
> >       VM_BUG_ON(freeze && !folio);
> > -     if (folio) {
> > -             VM_WARN_ON_ONCE(!folio_test_locked(folio));
> > -             if (folio != page_folio(pmd_page(*pmd)))
> > -                     goto out;
> > -     }
> > +     if (folio && folio != page_folio(pmd_page(*pmd)))
> > +             goto out;
> >
> > -repeat:
> > -     if (pmd_trans_huge(*pmd)) {
> > -             if (!folio) {
> > -                     folio = page_folio(pmd_page(*pmd));
> > -                     /*
> > -                      * An anonymous page must be locked, to ensure that a
> > -                      * concurrent reuse_swap_page() sees stable mapcount;
> > -                      * but reuse_swap_page() is not used on shmem or file,
> > -                      * and page lock must not be taken when zap_pmd_range()
> > -                      * calls __split_huge_pmd() while i_mmap_lock is held.
> > -                      */
> > -                     if (folio_test_anon(folio)) {
> > -                             if (unlikely(!folio_trylock(folio))) {
> > -                                     folio_get(folio);
> > -                                     _pmd = *pmd;
> > -                                     spin_unlock(ptl);
> > -                                     folio_lock(folio);
> > -                                     spin_lock(ptl);
> > -                                     if (unlikely(!pmd_same(*pmd, _pmd))) {
> > -                                             folio_unlock(folio);
> > -                                             folio_put(folio);
> > -                                             folio = NULL;
> > -                                             goto repeat;
> > -                                     }
> > -                                     folio_put(folio);
> > -                             }
> > -                             do_unlock_folio = true;
> > -                     }
> > -             }
> > -     } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
> > +     if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
> >               goto out;
> > +
> >       __split_huge_pmd_locked(vma, pmd, range.start, freeze);
>
> IUUC, here should be something like below:
> if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))
>         __split_huge_pmd_locked(vma, pmd, range.start, freeze);
>
> i.e. the pmd_trans_huge case is missing in the above change. Or am I miss something ?

Yes, you are definitely right. Must have pmd_trans_huge(*pmd).

>
> Thanks for the patch. This really simplify the code and avoid the unneeded overhead.
>
> >  out:
> >       spin_unlock(ptl);
> > -     if (do_unlock_folio)
> > -             folio_unlock(folio);
> > +
> >       /*
> >        * No need to double call mmu_notifier->invalidate_range() callback.
> >        * They are 3 cases to consider inside __split_huge_pmd_locked():
> >
>
David Hildenbrand March 4, 2022, 5:06 a.m. UTC | #3
Hi,

This probably bounces on the list due to html junk from the gmail app.

What happened to

https://lore.kernel.org/linux-mm/20220131162940.210846-10-david@redhat.com/

Included in the very series mentioned below?

Was this silently dropped due to folio conversion collisions? :/

Yang Shi <shy828301@gmail.com> schrieb am Do. 3. März 2022 um 23:20:

> The commit c444eb564fb1 ("mm: thp: make the THP mapcount atomic against
> __split_huge_pmd_locked()") locked the page for PMD split to make
> mapcount stable for reuse_swap_page(), then commit 1c2f67308af4 ("mm:
> thp: fix MADV_REMOVE deadlock on shmem THP") reduce the scope to
> anonymous page only.
>
> However COW has not used mapcount to determine if the page is shared or
> not anymore due to the COW fixes [1] from David Hildenbrand and the
> reuse_swap_page() was removed as well.  So PMD split doesn't have to
> lock the page anymore.  This patch basically reverted the above two
> commits.
>
> [1]
> https://lore.kernel.org/linux-mm/20220131162940.210846-1-david@redhat.com/
>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  mm/huge_memory.c | 44 +++++---------------------------------------
>  1 file changed, 5 insertions(+), 39 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b49e1a11df2e..daaa698bd273 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2134,8 +2134,6 @@ void __split_huge_pmd(struct vm_area_struct *vma,
> pmd_t *pmd,
>  {
>         spinlock_t *ptl;
>         struct mmu_notifier_range range;
> -       bool do_unlock_folio = false;
> -       pmd_t _pmd;
>
>         mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma,
> vma->vm_mm,
>                                 address & HPAGE_PMD_MASK,
> @@ -2148,48 +2146,16 @@ void __split_huge_pmd(struct vm_area_struct *vma,
> pmd_t *pmd,
>          * pmd against. Otherwise we can end up replacing wrong folio.
>          */
>         VM_BUG_ON(freeze && !folio);
> -       if (folio) {
> -               VM_WARN_ON_ONCE(!folio_test_locked(folio));
> -               if (folio != page_folio(pmd_page(*pmd)))
> -                       goto out;
> -       }
> +       if (folio && folio != page_folio(pmd_page(*pmd)))
> +               goto out;
>
> -repeat:
> -       if (pmd_trans_huge(*pmd)) {
> -               if (!folio) {
> -                       folio = page_folio(pmd_page(*pmd));
> -                       /*
> -                        * An anonymous page must be locked, to ensure
> that a
> -                        * concurrent reuse_swap_page() sees stable
> mapcount;
> -                        * but reuse_swap_page() is not used on shmem or
> file,
> -                        * and page lock must not be taken when
> zap_pmd_range()
> -                        * calls __split_huge_pmd() while i_mmap_lock is
> held.
> -                        */
> -                       if (folio_test_anon(folio)) {
> -                               if (unlikely(!folio_trylock(folio))) {
> -                                       folio_get(folio);
> -                                       _pmd = *pmd;
> -                                       spin_unlock(ptl);
> -                                       folio_lock(folio);
> -                                       spin_lock(ptl);
> -                                       if (unlikely(!pmd_same(*pmd,
> _pmd))) {
> -                                               folio_unlock(folio);
> -                                               folio_put(folio);
> -                                               folio = NULL;
> -                                               goto repeat;
> -                                       }
> -                                       folio_put(folio);
> -                               }
> -                               do_unlock_folio = true;
> -                       }
> -               }
> -       } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
> +       if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
>                 goto out;
> +
>         __split_huge_pmd_locked(vma, pmd, range.start, freeze);
>  out:
>         spin_unlock(ptl);
> -       if (do_unlock_folio)
> -               folio_unlock(folio);
> +
>         /*
>          * No need to double call mmu_notifier->invalidate_range()
> callback.
>          * They are 3 cases to consider inside __split_huge_pmd_locked():
> --
> 2.26.3
>
>
Yang Shi March 4, 2022, 6:30 p.m. UTC | #4
On Thu, Mar 3, 2022 at 9:06 PM David Hildenbrand <david@redhat.com> wrote:
>
> Hi,
>
> This probably bounces on the list due to html junk from the gmail app.
>
> What happened to
>
> https://lore.kernel.org/linux-mm/20220131162940.210846-10-david@redhat.com/
>
> Included in the very series mentioned below?
>
> Was this silently dropped due to folio conversion collisions? :/

I really didn't notice you already proposed this. Maybe folio
conversion, maybe mlock cleanup, I can't tell. But anyway this patch
needs to get rebased. I will submit v2 to solve the comment, will add
your signed-off-by.

>
> Yang Shi <shy828301@gmail.com> schrieb am Do. 3. März 2022 um 23:20:
>>
>> The commit c444eb564fb1 ("mm: thp: make the THP mapcount atomic against
>> __split_huge_pmd_locked()") locked the page for PMD split to make
>> mapcount stable for reuse_swap_page(), then commit 1c2f67308af4 ("mm:
>> thp: fix MADV_REMOVE deadlock on shmem THP") reduce the scope to
>> anonymous page only.
>>
>> However COW has not used mapcount to determine if the page is shared or
>> not anymore due to the COW fixes [1] from David Hildenbrand and the
>> reuse_swap_page() was removed as well.  So PMD split doesn't have to
>> lock the page anymore.  This patch basically reverted the above two
>> commits.
>>
>> [1] https://lore.kernel.org/linux-mm/20220131162940.210846-1-david@redhat.com/
>>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
>> Signed-off-by: Yang Shi <shy828301@gmail.com>
>> ---
>>  mm/huge_memory.c | 44 +++++---------------------------------------
>>  1 file changed, 5 insertions(+), 39 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index b49e1a11df2e..daaa698bd273 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2134,8 +2134,6 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>>  {
>>         spinlock_t *ptl;
>>         struct mmu_notifier_range range;
>> -       bool do_unlock_folio = false;
>> -       pmd_t _pmd;
>>
>>         mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
>>                                 address & HPAGE_PMD_MASK,
>> @@ -2148,48 +2146,16 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>>          * pmd against. Otherwise we can end up replacing wrong folio.
>>          */
>>         VM_BUG_ON(freeze && !folio);
>> -       if (folio) {
>> -               VM_WARN_ON_ONCE(!folio_test_locked(folio));
>> -               if (folio != page_folio(pmd_page(*pmd)))
>> -                       goto out;
>> -       }
>> +       if (folio && folio != page_folio(pmd_page(*pmd)))
>> +               goto out;
>>
>> -repeat:
>> -       if (pmd_trans_huge(*pmd)) {
>> -               if (!folio) {
>> -                       folio = page_folio(pmd_page(*pmd));
>> -                       /*
>> -                        * An anonymous page must be locked, to ensure that a
>> -                        * concurrent reuse_swap_page() sees stable mapcount;
>> -                        * but reuse_swap_page() is not used on shmem or file,
>> -                        * and page lock must not be taken when zap_pmd_range()
>> -                        * calls __split_huge_pmd() while i_mmap_lock is held.
>> -                        */
>> -                       if (folio_test_anon(folio)) {
>> -                               if (unlikely(!folio_trylock(folio))) {
>> -                                       folio_get(folio);
>> -                                       _pmd = *pmd;
>> -                                       spin_unlock(ptl);
>> -                                       folio_lock(folio);
>> -                                       spin_lock(ptl);
>> -                                       if (unlikely(!pmd_same(*pmd, _pmd))) {
>> -                                               folio_unlock(folio);
>> -                                               folio_put(folio);
>> -                                               folio = NULL;
>> -                                               goto repeat;
>> -                                       }
>> -                                       folio_put(folio);
>> -                               }
>> -                               do_unlock_folio = true;
>> -                       }
>> -               }
>> -       } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
>> +       if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
>>                 goto out;
>> +
>>         __split_huge_pmd_locked(vma, pmd, range.start, freeze);
>>  out:
>>         spin_unlock(ptl);
>> -       if (do_unlock_folio)
>> -               folio_unlock(folio);
>> +
>>         /*
>>          * No need to double call mmu_notifier->invalidate_range() callback.
>>          * They are 3 cases to consider inside __split_huge_pmd_locked():
>> --
>> 2.26.3
>>
David Hildenbrand March 4, 2022, 6:50 p.m. UTC | #5
On 04.03.22 19:30, Yang Shi wrote:
> On Thu, Mar 3, 2022 at 9:06 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> Hi,
>>
>> This probably bounces on the list due to html junk from the gmail app.
>>
>> What happened to
>>
>> https://lore.kernel.org/linux-mm/20220131162940.210846-10-david@redhat.com/
>>
>> Included in the very series mentioned below?
>>
>> Was this silently dropped due to folio conversion collisions? :/
> 
> I really didn't notice you already proposed this. Maybe folio
> conversion, maybe mlock cleanup, I can't tell. But anyway this patch
> needs to get rebased. I will submit v2 to solve the comment, will add
> your signed-off-by.
> 

Why a rebase? The folio change comes via another tree (unfortunately not
Andrews tree, I wish we would have a single MM tree for MM patches).

@Andrew, the last mail I received was

+ mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
added to -mm tree

The patch shows up in mmotm as

#[merged]mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch

... which shouldn't be true.
Yang Shi March 4, 2022, 7:01 p.m. UTC | #6
On Fri, Mar 4, 2022 at 10:50 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 04.03.22 19:30, Yang Shi wrote:
> > On Thu, Mar 3, 2022 at 9:06 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> Hi,
> >>
> >> This probably bounces on the list due to html junk from the gmail app.
> >>
> >> What happened to
> >>
> >> https://lore.kernel.org/linux-mm/20220131162940.210846-10-david@redhat.com/
> >>
> >> Included in the very series mentioned below?
> >>
> >> Was this silently dropped due to folio conversion collisions? :/
> >
> > I really didn't notice you already proposed this. Maybe folio
> > conversion, maybe mlock cleanup, I can't tell. But anyway this patch
> > needs to get rebased. I will submit v2 to solve the comment, will add
> > your signed-off-by.
> >
>
> Why a rebase? The folio change comes via another tree (unfortunately not
> Andrews tree, I wish we would have a single MM tree for MM patches).
>
> @Andrew, the last mail I received was
>
> + mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
> added to -mm tree
>
> The patch shows up in mmotm as
>
> #[merged]mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
>
> ... which shouldn't be true.

Yes, otherwise my patch can't be applied. What I saw on linux-next for
mm/huge_memory.c looks like:

bfede97b8d6d mm/huge_memory: remove stale page_trans_huge_mapcount()
6c127ac2ff1d mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()
cb63c1eb2a3c mm/migration: add trace events for THP migrations
e65f964148fe Merge branch 'akpm-current/current'
61435e1e267c mm/readahead: Align file mappings for non-DAX
80ef527bf929 mm: Turn can_split_huge_page() into can_split_folio()
8339af1d0a18 mm/rmap: Convert rmap_walk() to take a folio
16f06327291e mm/migrate: Convert remove_migration_ptes() to folios
bd23d3f12232 memory tiering: skip to scan fast memory
47a3e10abb78 mm: thp: fix wrong cache flush in remove_migration_pmd()
1de8566cca0a mm/rmap: Convert try_to_migrate() to folios
5a470d51cb2b mm/rmap: Convert try_to_unmap() to take a folio
1c760ad73a13 mm/huge_memory: Convert __split_huge_pmd() to take a folio
82865a9e1187 mm: Add folio_mapcount()
07ca76067308 mm/munlock: maintain page->mlock_count while unevictable
cea86fe246b6 mm/munlock: rmap call mlock_vma_page() munlock_vma_page()
b67bf49ce7aa mm/munlock: delete FOLL_MLOCK and FOLL_POPULATE
f56caedaf94f Merge branch 'akpm' (patches from Andrew)

It looks like your series were rebased on top of mlock cleanup and
folio. Anyway I will wait for Andrew. If he thought a new patch
rebased on top of both mlock cleanup and folio would make his life
easier, I will submit v2.

>
> --
> Thanks,
>
> David / dhildenb
>
Andrew Morton March 7, 2022, 2:07 a.m. UTC | #7
On Fri, 4 Mar 2022 19:50:08 +0100 David Hildenbrand <david@redhat.com> wrote:

> @Andrew, the last mail I received was
> 
> + mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
> added to -mm tree
> 
> The patch shows up in mmotm as
> 
> #[merged]mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
> 
> ... which shouldn't be true.

I guess I mislabelled the reason for dropping it.  Should have been to-be-updated, 
due to https://lkml.kernel.org/r/CAHbLzkpbnQyHRckoRtbZoaLvANu92MY4kEsbKudaQ8MDUA3nVg@mail.gmail.com
David Hildenbrand March 7, 2022, 8:24 a.m. UTC | #8
On 07.03.22 03:07, Andrew Morton wrote:
> On Fri, 4 Mar 2022 19:50:08 +0100 David Hildenbrand <david@redhat.com> wrote:
> 
>> @Andrew, the last mail I received was
>>
>> + mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
>> added to -mm tree
>>
>> The patch shows up in mmotm as
>>
>> #[merged]mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
>>
>> ... which shouldn't be true.
> 
> I guess I mislabelled the reason for dropping it.  Should have been to-be-updated, 
> due to https://lkml.kernel.org/r/CAHbLzkpbnQyHRckoRtbZoaLvANu92MY4kEsbKudaQ8MDUA3nVg@mail.gmail.com
> 

Let me clarify.

1. I sent [1] (9 patches)

2. You queued the 9 patches

E.g., in "mmotm 2022-02-15-20-22 uploaded"

* mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
* mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
* mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
* mm-streamline-cow-logic-in-do_swap_page.patch
* mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
* mm-khugepaged-remove-reuse_swap_page-usage.patch
* mm-swapfile-remove-stale-reuse_swap_page.patch
* mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch
* mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch

3. The last patch in the series was dropped. What remains are 8 patches.

E.g., in "mmotm 2022-02-24-22-38 uploaded"

* mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
* mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
* mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
* mm-streamline-cow-logic-in-do_swap_page.patch
* mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
* mm-khugepaged-remove-reuse_swap_page-usage.patch
* mm-swapfile-remove-stale-reuse_swap_page.patch
* mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch

4. Yang Shi sent his patch (the one we're replying to)

5. You picked his patch and dropped it again due to [2]


I'm wondering why 3 happened and why
https://www.ozlabs.org/~akpm/mmotm/series contains:


mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
mm-streamline-cow-logic-in-do_swap_page.patch
mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
mm-khugepaged-remove-reuse_swap_page-usage.patch
mm-swapfile-remove-stale-reuse_swap_page.patch
mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch
...
#[merged]mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch


[1]
https://lore.kernel.org/linux-mm/20220131162940.210846-1-david@redhat.com/

[2]
https://lkml.kernel.org/r/CAHbLzkpbnQyHRckoRtbZoaLvANu92MY4kEsbKudaQ8MDUA3nVg@mail.gmail.com
Andrew Morton March 7, 2022, 11:43 p.m. UTC | #9
On Mon, 7 Mar 2022 09:24:58 +0100 David Hildenbrand <david@redhat.com> wrote:

> On 07.03.22 03:07, Andrew Morton wrote:
> > On Fri, 4 Mar 2022 19:50:08 +0100 David Hildenbrand <david@redhat.com> wrote:
> > 
> >> @Andrew, the last mail I received was
> >>
> >> + mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
> >> added to -mm tree
> >>
> >> The patch shows up in mmotm as
> >>
> >> #[merged]mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
> >>
> >> ... which shouldn't be true.
> > 
> > I guess I mislabelled the reason for dropping it.  Should have been to-be-updated, 
> > due to https://lkml.kernel.org/r/CAHbLzkpbnQyHRckoRtbZoaLvANu92MY4kEsbKudaQ8MDUA3nVg@mail.gmail.com
> > 
> 
> Let me clarify.
> 
> 1. I sent [1] (9 patches)
> 
> 2. You queued the 9 patches
> 
> E.g., in "mmotm 2022-02-15-20-22 uploaded"
> 
> * mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
> * mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
> * mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
> * mm-streamline-cow-logic-in-do_swap_page.patch
> * mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
> * mm-khugepaged-remove-reuse_swap_page-usage.patch
> * mm-swapfile-remove-stale-reuse_swap_page.patch
> * mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch
> * mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
> 
> 3. The last patch in the series was dropped. What remains are 8 patches.
> 
> E.g., in "mmotm 2022-02-24-22-38 uploaded"
> 
> * mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
> * mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
> * mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
> * mm-streamline-cow-logic-in-do_swap_page.patch
> * mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
> * mm-khugepaged-remove-reuse_swap_page-usage.patch
> * mm-swapfile-remove-stale-reuse_swap_page.patch
> * mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch
> 
> 4. Yang Shi sent his patch (the one we're replying to)
> 
> 5. You picked his patch and dropped it again due to [2]
> 
> 
> I'm wondering why 3 happened and why
> https://www.ozlabs.org/~akpm/mmotm/series contains:
> 
> 
> mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
> mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
> mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
> mm-streamline-cow-logic-in-do_swap_page.patch
> mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
> mm-khugepaged-remove-reuse_swap_page-usage.patch
> mm-swapfile-remove-stale-reuse_swap_page.patch
> mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch
> ...
> #[merged]mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch

OK, thanks.  I guess it was me seeing 100% rejects when merging onto
the folio changes then incorrectly deciding the patch was now in
linux-next via some other tree.

I restored it and fixed things up.  Please check.


--- a/mm/huge_memory.c~mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd
+++ a/mm/huge_memory.c
@@ -2133,8 +2133,6 @@ void __split_huge_pmd(struct vm_area_str
 {
 	spinlock_t *ptl;
 	struct mmu_notifier_range range;
-	bool do_unlock_folio = false;
-	pmd_t _pmd;
 
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
 				address & HPAGE_PMD_MASK,
@@ -2153,42 +2151,14 @@ void __split_huge_pmd(struct vm_area_str
 			goto out;
 	}
 
-repeat:
 	if (pmd_trans_huge(*pmd)) {
-		if (!folio) {
+		if (!folio)
 			folio = page_folio(pmd_page(*pmd));
-			/*
-			 * An anonymous page must be locked, to ensure that a
-			 * concurrent reuse_swap_page() sees stable mapcount;
-			 * but reuse_swap_page() is not used on shmem or file,
-			 * and page lock must not be taken when zap_pmd_range()
-			 * calls __split_huge_pmd() while i_mmap_lock is held.
-			 */
-			if (folio_test_anon(folio)) {
-				if (unlikely(!folio_trylock(folio))) {
-					folio_get(folio);
-					_pmd = *pmd;
-					spin_unlock(ptl);
-					folio_lock(folio);
-					spin_lock(ptl);
-					if (unlikely(!pmd_same(*pmd, _pmd))) {
-						folio_unlock(folio);
-						folio_put(folio);
-						folio = NULL;
-						goto repeat;
-					}
-					folio_put(folio);
-				}
-				do_unlock_folio = true;
-			}
-		}
 	} else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
 		goto out;
 	__split_huge_pmd_locked(vma, pmd, range.start, freeze);
 out:
 	spin_unlock(ptl);
-	if (do_unlock_folio)
-		folio_unlock(folio);
 	/*
 	 * No need to double call mmu_notifier->invalidate_range() callback.
 	 * They are 3 cases to consider inside __split_huge_pmd_locked():
Yang Shi March 8, 2022, 12:03 a.m. UTC | #10
On Mon, Mar 7, 2022 at 3:43 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 7 Mar 2022 09:24:58 +0100 David Hildenbrand <david@redhat.com> wrote:
>
> > On 07.03.22 03:07, Andrew Morton wrote:
> > > On Fri, 4 Mar 2022 19:50:08 +0100 David Hildenbrand <david@redhat.com> wrote:
> > >
> > >> @Andrew, the last mail I received was
> > >>
> > >> + mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
> > >> added to -mm tree
> > >>
> > >> The patch shows up in mmotm as
> > >>
> > >> #[merged]mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
> > >>
> > >> ... which shouldn't be true.
> > >
> > > I guess I mislabelled the reason for dropping it.  Should have been to-be-updated,
> > > due to https://lkml.kernel.org/r/CAHbLzkpbnQyHRckoRtbZoaLvANu92MY4kEsbKudaQ8MDUA3nVg@mail.gmail.com
> > >
> >
> > Let me clarify.
> >
> > 1. I sent [1] (9 patches)
> >
> > 2. You queued the 9 patches
> >
> > E.g., in "mmotm 2022-02-15-20-22 uploaded"
> >
> > * mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
> > * mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
> > * mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
> > * mm-streamline-cow-logic-in-do_swap_page.patch
> > * mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
> > * mm-khugepaged-remove-reuse_swap_page-usage.patch
> > * mm-swapfile-remove-stale-reuse_swap_page.patch
> > * mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch
> > * mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
> >
> > 3. The last patch in the series was dropped. What remains are 8 patches.
> >
> > E.g., in "mmotm 2022-02-24-22-38 uploaded"
> >
> > * mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
> > * mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
> > * mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
> > * mm-streamline-cow-logic-in-do_swap_page.patch
> > * mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
> > * mm-khugepaged-remove-reuse_swap_page-usage.patch
> > * mm-swapfile-remove-stale-reuse_swap_page.patch
> > * mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch
> >
> > 4. Yang Shi sent his patch (the one we're replying to)
> >
> > 5. You picked his patch and dropped it again due to [2]
> >
> >
> > I'm wondering why 3 happened and why
> > https://www.ozlabs.org/~akpm/mmotm/series contains:
> >
> >
> > mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
> > mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
> > mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
> > mm-streamline-cow-logic-in-do_swap_page.patch
> > mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
> > mm-khugepaged-remove-reuse_swap_page-usage.patch
> > mm-swapfile-remove-stale-reuse_swap_page.patch
> > mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch
> > ...
> > #[merged]mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
>
> OK, thanks.  I guess it was me seeing 100% rejects when merging onto
> the folio changes then incorrectly deciding the patch was now in
> linux-next via some other tree.
>
> I restored it and fixed things up.  Please check.

Thanks, Andrew. I think we could clean it up a little bit further.

>
>
> --- a/mm/huge_memory.c~mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd
> +++ a/mm/huge_memory.c
> @@ -2133,8 +2133,6 @@ void __split_huge_pmd(struct vm_area_str
>  {
>         spinlock_t *ptl;
>         struct mmu_notifier_range range;
> -       bool do_unlock_folio = false;
> -       pmd_t _pmd;
>
>         mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
>                                 address & HPAGE_PMD_MASK,
> @@ -2153,42 +2151,14 @@ void __split_huge_pmd(struct vm_area_str
>                         goto out;
>         }
>
> -repeat:
>         if (pmd_trans_huge(*pmd)) {
> -               if (!folio) {
> +               if (!folio)
>                         folio = page_folio(pmd_page(*pmd));

We could remove the "if (pmd_trans_huge(*pmd))" section since folio is
actually not used afterward at all.

> -                       /*
> -                        * An anonymous page must be locked, to ensure that a
> -                        * concurrent reuse_swap_page() sees stable mapcount;
> -                        * but reuse_swap_page() is not used on shmem or file,
> -                        * and page lock must not be taken when zap_pmd_range()
> -                        * calls __split_huge_pmd() while i_mmap_lock is held.
> -                        */
> -                       if (folio_test_anon(folio)) {
> -                               if (unlikely(!folio_trylock(folio))) {
> -                                       folio_get(folio);
> -                                       _pmd = *pmd;
> -                                       spin_unlock(ptl);
> -                                       folio_lock(folio);
> -                                       spin_lock(ptl);
> -                                       if (unlikely(!pmd_same(*pmd, _pmd))) {
> -                                               folio_unlock(folio);
> -                                               folio_put(folio);
> -                                               folio = NULL;
> -                                               goto repeat;
> -                                       }
> -                                       folio_put(folio);
> -                               }
> -                               do_unlock_folio = true;
> -                       }
> -               }
>         } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
>                 goto out;
>         __split_huge_pmd_locked(vma, pmd, range.start, freeze);

With the above if removed, this could be changed to:

if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
            is_pmd_migration_entry(*pmd))
                __split_huge_pmd_locked(vma, pmd, range.start, freeze);

>  out:
>         spin_unlock(ptl);
> -       if (do_unlock_folio)
> -               folio_unlock(folio);
>         /*
>          * No need to double call mmu_notifier->invalidate_range() callback.
>          * They are 3 cases to consider inside __split_huge_pmd_locked():
> _
>
Andrew Morton March 8, 2022, 12:50 a.m. UTC | #11
On Mon, 7 Mar 2022 16:03:12 -0800 Yang Shi <shy828301@gmail.com> wrote:

> On Mon, Mar 7, 2022 at 3:43 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > @@ -2133,8 +2133,6 @@ void __split_huge_pmd(struct vm_area_str
> >  {
> >         spinlock_t *ptl;
> >         struct mmu_notifier_range range;
> > -       bool do_unlock_folio = false;
> > -       pmd_t _pmd;
> >
> >         mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> >                                 address & HPAGE_PMD_MASK,
> > @@ -2153,42 +2151,14 @@ void __split_huge_pmd(struct vm_area_str
> >                         goto out;
> >         }
> >
> > -repeat:
> >         if (pmd_trans_huge(*pmd)) {
> > -               if (!folio) {
> > +               if (!folio)
> >                         folio = page_folio(pmd_page(*pmd));
> 
> We could remove the "if (pmd_trans_huge(*pmd))" section since folio is
> actually not used afterward at all.

>
> ...
>
> 
> With the above if removed, this could be changed to:
> 
> if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
>             is_pmd_migration_entry(*pmd))
>                 __split_huge_pmd_locked(vma, pmd, range.start, freeze);
> 

OK, looks sane.  Can someone please test all this?

--- a/mm/huge_memory.c~mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd-fix
+++ a/mm/huge_memory.c
@@ -2151,12 +2151,10 @@ void __split_huge_pmd(struct vm_area_str
 			goto out;
 	}
 
-	if (pmd_trans_huge(*pmd)) {
-		if (!folio)
-			folio = page_folio(pmd_page(*pmd));
-	} else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
-		goto out;
-	__split_huge_pmd_locked(vma, pmd, range.start, freeze);
+	if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
+	    is_pmd_migration_entry(*pmd)))
+		__split_huge_pmd_locked(vma, pmd, range.start, freeze);
+
 out:
 	spin_unlock(ptl);
 	/*
Yang Shi March 8, 2022, 12:59 a.m. UTC | #12
On Mon, Mar 7, 2022 at 4:50 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 7 Mar 2022 16:03:12 -0800 Yang Shi <shy828301@gmail.com> wrote:
>
> > On Mon, Mar 7, 2022 at 3:43 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > @@ -2133,8 +2133,6 @@ void __split_huge_pmd(struct vm_area_str
> > >  {
> > >         spinlock_t *ptl;
> > >         struct mmu_notifier_range range;
> > > -       bool do_unlock_folio = false;
> > > -       pmd_t _pmd;
> > >
> > >         mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> > >                                 address & HPAGE_PMD_MASK,
> > > @@ -2153,42 +2151,14 @@ void __split_huge_pmd(struct vm_area_str
> > >                         goto out;
> > >         }
> > >
> > > -repeat:
> > >         if (pmd_trans_huge(*pmd)) {
> > > -               if (!folio) {
> > > +               if (!folio)
> > >                         folio = page_folio(pmd_page(*pmd));
> >
> > We could remove the "if (pmd_trans_huge(*pmd))" section since folio is
> > actually not used afterward at all.
>
> >
> > ...
> >
> >
> > With the above if removed, this could be changed to:
> >
> > if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
> >             is_pmd_migration_entry(*pmd))
> >                 __split_huge_pmd_locked(vma, pmd, range.start, freeze);
> >
>
> OK, looks sane.  Can someone please test all this?

Build test? I had the exact same change in my tree, it worked for me.


>
> --- a/mm/huge_memory.c~mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd-fix
> +++ a/mm/huge_memory.c
> @@ -2151,12 +2151,10 @@ void __split_huge_pmd(struct vm_area_str
>                         goto out;
>         }
>
> -       if (pmd_trans_huge(*pmd)) {
> -               if (!folio)
> -                       folio = page_folio(pmd_page(*pmd));
> -       } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
> -               goto out;
> -       __split_huge_pmd_locked(vma, pmd, range.start, freeze);
> +       if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
> +           is_pmd_migration_entry(*pmd)))
> +               __split_huge_pmd_locked(vma, pmd, range.start, freeze);
> +
>  out:
>         spin_unlock(ptl);
>         /*
> _
>
Hugh Dickins March 8, 2022, 2:36 a.m. UTC | #13
On Mon, 7 Mar 2022, Andrew Morton wrote:
> 
> OK, looks sane.  Can someone please test all this?

Only the briefest of testing, but

mmotm 2022-03-06-20-33
minus mm-thp-dont-have-to-lock-page-anymore-when-splitting-pmd.patch
plus mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
plus mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd-fix.patch

is infinitely better than mmotm 2022-03-06-20-33.

I could just about reach a login on yesterday's mmotm, but graphics no.
Couldn't spend time to investigate, but Naresh's mail today that LTP
thp04 timeouted (hey, I'm English, we say timed out!) inspired me to
try booting with transparent_hugepage=never on cmdline, and that worked.

Again, no time to investigate, but the combination above has booted
and is running load, including transparent hugepages.  And before
setting that load going, I did try LTP thp04, which passed.

(There is an unrelated console printk lockdep spew, and am I the only
one to see these mm/workingset.c:567 shadow_lru_isolate warnings that
started up a week or three ago?)

Must dash,
Hugh
Yang Shi March 8, 2022, 3:12 a.m. UTC | #14
On Mon, Mar 7, 2022 at 6:36 PM Hugh Dickins <hughd@google.com> wrote:
>
> On Mon, 7 Mar 2022, Andrew Morton wrote:
> >
> > OK, looks sane.  Can someone please test all this?
>
> Only the briefest of testing, but
>
> mmotm 2022-03-06-20-33
> minus mm-thp-dont-have-to-lock-page-anymore-when-splitting-pmd.patch
> plus mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
> plus mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd-fix.patch
>
> is infinitely better than mmotm 2022-03-06-20-33.
>
> I could just about reach a login on yesterday's mmotm, but graphics no.
> Couldn't spend time to investigate, but Naresh's mail today that LTP
> thp04 timeouted (hey, I'm English, we say timed out!) inspired me to
> try booting with transparent_hugepage=never on cmdline, and that worked.

I think it was because of
mm-thp-dont-have-to-lock-page-anymore-when-splitting-pmd.patch. It was
buggy (missed pmd_trans_huge(*pmd) check) and was dropped by Andrew.

>
> Again, no time to investigate, but the combination above has booted
> and is running load, including transparent hugepages.  And before
> setting that load going, I did try LTP thp04, which passed.
>
> (There is an unrelated console printk lockdep spew, and am I the only
> one to see these mm/workingset.c:567 shadow_lru_isolate warnings that
> started up a week or three ago?)
>
> Must dash,
> Hugh
David Hildenbrand March 8, 2022, 8:53 a.m. UTC | #15
On 08.03.22 00:43, Andrew Morton wrote:
> On Mon, 7 Mar 2022 09:24:58 +0100 David Hildenbrand <david@redhat.com> wrote:
> 
>> On 07.03.22 03:07, Andrew Morton wrote:
>>> On Fri, 4 Mar 2022 19:50:08 +0100 David Hildenbrand <david@redhat.com> wrote:
>>>
>>>> @Andrew, the last mail I received was
>>>>
>>>> + mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
>>>> added to -mm tree
>>>>
>>>> The patch shows up in mmotm as
>>>>
>>>> #[merged]mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
>>>>
>>>> ... which shouldn't be true.
>>>
>>> I guess I mislabelled the reason for dropping it.  Should have been to-be-updated, 
>>> due to https://lkml.kernel.org/r/CAHbLzkpbnQyHRckoRtbZoaLvANu92MY4kEsbKudaQ8MDUA3nVg@mail.gmail.com
>>>
>>
>> Let me clarify.
>>
>> 1. I sent [1] (9 patches)
>>
>> 2. You queued the 9 patches
>>
>> E.g., in "mmotm 2022-02-15-20-22 uploaded"
>>
>> * mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
>> * mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
>> * mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
>> * mm-streamline-cow-logic-in-do_swap_page.patch
>> * mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
>> * mm-khugepaged-remove-reuse_swap_page-usage.patch
>> * mm-swapfile-remove-stale-reuse_swap_page.patch
>> * mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch
>> * mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
>>
>> 3. The last patch in the series was dropped. What remains are 8 patches.
>>
>> E.g., in "mmotm 2022-02-24-22-38 uploaded"
>>
>> * mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
>> * mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
>> * mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
>> * mm-streamline-cow-logic-in-do_swap_page.patch
>> * mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
>> * mm-khugepaged-remove-reuse_swap_page-usage.patch
>> * mm-swapfile-remove-stale-reuse_swap_page.patch
>> * mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch
>>
>> 4. Yang Shi sent his patch (the one we're replying to)
>>
>> 5. You picked his patch and dropped it again due to [2]
>>
>>
>> I'm wondering why 3 happened and why
>> https://www.ozlabs.org/~akpm/mmotm/series contains:
>>
>>
>> mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
>> mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
>> mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
>> mm-streamline-cow-logic-in-do_swap_page.patch
>> mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
>> mm-khugepaged-remove-reuse_swap_page-usage.patch
>> mm-swapfile-remove-stale-reuse_swap_page.patch
>> mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch
>> ...
>> #[merged]mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
> 
> OK, thanks.  I guess it was me seeing 100% rejects when merging onto
> the folio changes then incorrectly deciding the patch was now in
> linux-next via some other tree.
> 

Thanks Andrew, my 2 cents would have been that my series, which fixes
actual CVEs should go in before folio cleanups. But that's a different
discussion (and the patch is question is just a cleanup part of the same
series, so i don't particularly care).

> I restored it and fixed things up.  Please check.
> 

That change looks good to me. I'd even say that we do the second cleanup
separately, with Yang Shi being the author. But whatever you+others prefer.
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b49e1a11df2e..daaa698bd273 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2134,8 +2134,6 @@  void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 {
 	spinlock_t *ptl;
 	struct mmu_notifier_range range;
-	bool do_unlock_folio = false;
-	pmd_t _pmd;
 
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
 				address & HPAGE_PMD_MASK,
@@ -2148,48 +2146,16 @@  void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 	 * pmd against. Otherwise we can end up replacing wrong folio.
 	 */
 	VM_BUG_ON(freeze && !folio);
-	if (folio) {
-		VM_WARN_ON_ONCE(!folio_test_locked(folio));
-		if (folio != page_folio(pmd_page(*pmd)))
-			goto out;
-	}
+	if (folio && folio != page_folio(pmd_page(*pmd)))
+		goto out;
 
-repeat:
-	if (pmd_trans_huge(*pmd)) {
-		if (!folio) {
-			folio = page_folio(pmd_page(*pmd));
-			/*
-			 * An anonymous page must be locked, to ensure that a
-			 * concurrent reuse_swap_page() sees stable mapcount;
-			 * but reuse_swap_page() is not used on shmem or file,
-			 * and page lock must not be taken when zap_pmd_range()
-			 * calls __split_huge_pmd() while i_mmap_lock is held.
-			 */
-			if (folio_test_anon(folio)) {
-				if (unlikely(!folio_trylock(folio))) {
-					folio_get(folio);
-					_pmd = *pmd;
-					spin_unlock(ptl);
-					folio_lock(folio);
-					spin_lock(ptl);
-					if (unlikely(!pmd_same(*pmd, _pmd))) {
-						folio_unlock(folio);
-						folio_put(folio);
-						folio = NULL;
-						goto repeat;
-					}
-					folio_put(folio);
-				}
-				do_unlock_folio = true;
-			}
-		}
-	} else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
+	if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
 		goto out;
+
 	__split_huge_pmd_locked(vma, pmd, range.start, freeze);
 out:
 	spin_unlock(ptl);
-	if (do_unlock_folio)
-		folio_unlock(folio);
+
 	/*
 	 * No need to double call mmu_notifier->invalidate_range() callback.
 	 * They are 3 cases to consider inside __split_huge_pmd_locked():