diff mbox series

[3/6] mm: free zapped tail pages when splitting isolated thp

Message ID 20240730125346.1580150-4-usamaarif642@gmail.com (mailing list archive)
State New
Headers show
Series mm: split underutilized THPs | expand

Commit Message

Usama Arif July 30, 2024, 12:46 p.m. UTC
From: Yu Zhao <yuzhao@google.com>

If a tail page has only two references left, one inherited from the
isolation of its head and the other from lru_add_page_tail() which we
are about to drop, it means this tail page was concurrently zapped.
Then we can safely free it and save page reclaim or migration the
trouble of trying it.

Signed-off-by: Yu Zhao <yuzhao@google.com>
Tested-by: Shuang Zhai <zhais@google.com>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
---
 mm/huge_memory.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

Comments

David Hildenbrand July 30, 2024, 3:14 p.m. UTC | #1
On 30.07.24 14:46, Usama Arif wrote:
> From: Yu Zhao <yuzhao@google.com>
> 
> If a tail page has only two references left, one inherited from the
> isolation of its head and the other from lru_add_page_tail() which we
> are about to drop, it means this tail page was concurrently zapped.
> Then we can safely free it and save page reclaim or migration the
> trouble of trying it.
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> Tested-by: Shuang Zhai <zhais@google.com>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> ---
>   mm/huge_memory.c | 26 ++++++++++++++++++++++++++
>   1 file changed, 26 insertions(+)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 0167dc27e365..76a3b6a2b796 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2923,6 +2923,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>   	unsigned int new_nr = 1 << new_order;
>   	int order = folio_order(folio);
>   	unsigned int nr = 1 << order;
> +	LIST_HEAD(pages_to_free);
> +	int nr_pages_to_free = 0;
>   
>   	/* complete memcg works before add pages to LRU */
>   	split_page_memcg(head, order, new_order);
> @@ -3007,6 +3009,24 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>   		if (subpage == page)
>   			continue;
>   		folio_unlock(new_folio);
> +		/*
> +		 * If a tail page has only two references left, one inherited
> +		 * from the isolation of its head and the other from
> +		 * lru_add_page_tail() which we are about to drop, it means this
> +		 * tail page was concurrently zapped. Then we can safely free it
> +		 * and save page reclaim or migration the trouble of trying it.
> +		 */
> +		if (list && page_ref_freeze(subpage, 2)) {
> +			VM_BUG_ON_PAGE(PageLRU(subpage), subpage);
> +			VM_BUG_ON_PAGE(PageCompound(subpage), subpage);
> +			VM_BUG_ON_PAGE(page_mapped(subpage), subpage);
> +

No VM_BUG_*, VM_WARN is good enough.

> +			ClearPageActive(subpage);
> +			ClearPageUnevictable(subpage);
> +			list_move(&subpage->lru, &pages_to_free);

Most checks here should operate on new_folio instead of subpage.
Usama Arif Aug. 4, 2024, 7:02 p.m. UTC | #2
On 30/07/2024 16:14, David Hildenbrand wrote:
> On 30.07.24 14:46, Usama Arif wrote:
>> From: Yu Zhao <yuzhao@google.com>
>>
>> If a tail page has only two references left, one inherited from the
>> isolation of its head and the other from lru_add_page_tail() which we
>> are about to drop, it means this tail page was concurrently zapped.
>> Then we can safely free it and save page reclaim or migration the
>> trouble of trying it.
>>
>> Signed-off-by: Yu Zhao <yuzhao@google.com>
>> Tested-by: Shuang Zhai <zhais@google.com>
>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>> ---
>>   mm/huge_memory.c | 26 ++++++++++++++++++++++++++
>>   1 file changed, 26 insertions(+)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 0167dc27e365..76a3b6a2b796 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2923,6 +2923,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>>       unsigned int new_nr = 1 << new_order;
>>       int order = folio_order(folio);
>>       unsigned int nr = 1 << order;
>> +    LIST_HEAD(pages_to_free);
>> +    int nr_pages_to_free = 0;
>>         /* complete memcg works before add pages to LRU */
>>       split_page_memcg(head, order, new_order);
>> @@ -3007,6 +3009,24 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>>           if (subpage == page)
>>               continue;
>>           folio_unlock(new_folio);
>> +        /*
>> +         * If a tail page has only two references left, one inherited
>> +         * from the isolation of its head and the other from
>> +         * lru_add_page_tail() which we are about to drop, it means this
>> +         * tail page was concurrently zapped. Then we can safely free it
>> +         * and save page reclaim or migration the trouble of trying it.
>> +         */
>> +        if (list && page_ref_freeze(subpage, 2)) {
>> +            VM_BUG_ON_PAGE(PageLRU(subpage), subpage);
>> +            VM_BUG_ON_PAGE(PageCompound(subpage), subpage);
>> +            VM_BUG_ON_PAGE(page_mapped(subpage), subpage);
>> +
> 
> No VM_BUG_*, VM_WARN is good enough.
> 
>> +            ClearPageActive(subpage);
>> +            ClearPageUnevictable(subpage);
>> +            list_move(&subpage->lru, &pages_to_free);
> 
> Most checks here should operate on new_folio instead of subpage.
> 
> 
Do you mean instead of doing the PageLRU, PageCompound and page_mapped check on the subpage, there should be checks on new_folio?
If new_folio is a large folio, then it could be that only some of the subpages were zapped?

Could do below if subpage makes sense

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3305e6d0b90e..abfcd4b7cbba 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3041,9 +3041,9 @@ static void __split_huge_page(struct page *page, struct list_head *list,
                 * and save page reclaim or migration the trouble of trying it.
                 */
                if (list && page_ref_freeze(subpage, 2)) {
-                       VM_BUG_ON_PAGE(PageLRU(subpage), subpage);
-                       VM_BUG_ON_PAGE(PageCompound(subpage), subpage);
-                       VM_BUG_ON_PAGE(page_mapped(subpage), subpage);
+                       VM_WARN_ON_ONCE_PAGE(PageLRU(subpage), subpage);
+                       VM_WARN_ON_ONCE_PAGE(PageCompound(subpage), subpage);
+                       VM_WARN_ON_ONCE_PAGE(page_mapped(subpage), subpage);
 
                        ClearPageActive(subpage);
                        ClearPageUnevictable(subpage);
David Hildenbrand Aug. 5, 2024, 9 a.m. UTC | #3
On 04.08.24 21:02, Usama Arif wrote:
> 
> 
> On 30/07/2024 16:14, David Hildenbrand wrote:
>> On 30.07.24 14:46, Usama Arif wrote:
>>> From: Yu Zhao <yuzhao@google.com>
>>>
>>> If a tail page has only two references left, one inherited from the
>>> isolation of its head and the other from lru_add_page_tail() which we
>>> are about to drop, it means this tail page was concurrently zapped.
>>> Then we can safely free it and save page reclaim or migration the
>>> trouble of trying it.
>>>
>>> Signed-off-by: Yu Zhao <yuzhao@google.com>
>>> Tested-by: Shuang Zhai <zhais@google.com>
>>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>>> ---
>>>    mm/huge_memory.c | 26 ++++++++++++++++++++++++++
>>>    1 file changed, 26 insertions(+)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 0167dc27e365..76a3b6a2b796 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2923,6 +2923,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>>>        unsigned int new_nr = 1 << new_order;
>>>        int order = folio_order(folio);
>>>        unsigned int nr = 1 << order;
>>> +    LIST_HEAD(pages_to_free);
>>> +    int nr_pages_to_free = 0;
>>>          /* complete memcg works before add pages to LRU */
>>>        split_page_memcg(head, order, new_order);
>>> @@ -3007,6 +3009,24 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>>>            if (subpage == page)
>>>                continue;
>>>            folio_unlock(new_folio);
>>> +        /*
>>> +         * If a tail page has only two references left, one inherited
>>> +         * from the isolation of its head and the other from
>>> +         * lru_add_page_tail() which we are about to drop, it means this
>>> +         * tail page was concurrently zapped. Then we can safely free it
>>> +         * and save page reclaim or migration the trouble of trying it.
>>> +         */
>>> +        if (list && page_ref_freeze(subpage, 2)) {
>>> +            VM_BUG_ON_PAGE(PageLRU(subpage), subpage);
>>> +            VM_BUG_ON_PAGE(PageCompound(subpage), subpage);
>>> +            VM_BUG_ON_PAGE(page_mapped(subpage), subpage);
>>> +
>>
>> No VM_BUG_*, VM_WARN is good enough.
>>
>>> +            ClearPageActive(subpage);
>>> +            ClearPageUnevictable(subpage);
>>> +            list_move(&subpage->lru, &pages_to_free);
>>
>> Most checks here should operate on new_folio instead of subpage.
>>
>>
> Do you mean instead of doing the PageLRU, PageCompound and page_mapped check on the subpage, there should be checks on new_folio?
> If new_folio is a large folio, then it could be that only some of the subpages were zapped?

We do a:

struct folio *new_folio = page_folio(subpage);

Then:

PageLRU() will end up getting translated to 
folio_test_lru(page_folio(subpage))

page_mapped() will end up getting translated to
folio_mapped(page_folio(subpage))

PageCompound() is essentially a folio_test_large() check.

So what stops us from doing

VM_WARN_ON_ONCE_FOLIO(folio_test_lru(new_folio), new_folio);
VM_WARN_ON_ONCE_FOLIO(folio_test_large(new_folio), new_folio);
VM_WARN_ON_ONCE_FOLIO(folio_mapped(new_folio), new_folio);

folio_clear_active(new_folio);
folio_clear_unevictable(new_folio);
...

?

The page_ref_freeze() should make sure that we don't have a tail page of
a large folio. Tail pages would have a refcount of 0.

Or what am I missing?

> 
> Could do below if subpage makes sense
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 3305e6d0b90e..abfcd4b7cbba 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3041,9 +3041,9 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>                   * and save page reclaim or migration the trouble of trying it.
>                   */
>                  if (list && page_ref_freeze(subpage, 2)) {
> -                       VM_BUG_ON_PAGE(PageLRU(subpage), subpage);
> -                       VM_BUG_ON_PAGE(PageCompound(subpage), subpage);
> -                       VM_BUG_ON_PAGE(page_mapped(subpage), subpage);
> +                       VM_WARN_ON_ONCE_PAGE(PageLRU(subpage), subpage);
> +                       VM_WARN_ON_ONCE_PAGE(PageCompound(subpage), subpage);
> +                       VM_WARN_ON_ONCE_PAGE(page_mapped(subpage), subpage);
>   
>                          ClearPageActive(subpage);
>                          ClearPageUnevictable(subpage);
>
Usama Arif Aug. 6, 2024, 9:58 a.m. UTC | #4
On 05/08/2024 10:00, David Hildenbrand wrote:
> On 04.08.24 21:02, Usama Arif wrote:
>>
>>
>> On 30/07/2024 16:14, David Hildenbrand wrote:
>>> On 30.07.24 14:46, Usama Arif wrote:
>>>> From: Yu Zhao <yuzhao@google.com>
>>>>
>>>> If a tail page has only two references left, one inherited from the
>>>> isolation of its head and the other from lru_add_page_tail() which we
>>>> are about to drop, it means this tail page was concurrently zapped.
>>>> Then we can safely free it and save page reclaim or migration the
>>>> trouble of trying it.
>>>>
>>>> Signed-off-by: Yu Zhao <yuzhao@google.com>
>>>> Tested-by: Shuang Zhai <zhais@google.com>
>>>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>>>> ---
>>>>    mm/huge_memory.c | 26 ++++++++++++++++++++++++++
>>>>    1 file changed, 26 insertions(+)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 0167dc27e365..76a3b6a2b796 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -2923,6 +2923,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>>>>        unsigned int new_nr = 1 << new_order;
>>>>        int order = folio_order(folio);
>>>>        unsigned int nr = 1 << order;
>>>> +    LIST_HEAD(pages_to_free);
>>>> +    int nr_pages_to_free = 0;
>>>>          /* complete memcg works before add pages to LRU */
>>>>        split_page_memcg(head, order, new_order);
>>>> @@ -3007,6 +3009,24 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>>>>            if (subpage == page)
>>>>                continue;
>>>>            folio_unlock(new_folio);
>>>> +        /*
>>>> +         * If a tail page has only two references left, one inherited
>>>> +         * from the isolation of its head and the other from
>>>> +         * lru_add_page_tail() which we are about to drop, it means this
>>>> +         * tail page was concurrently zapped. Then we can safely free it
>>>> +         * and save page reclaim or migration the trouble of trying it.
>>>> +         */
>>>> +        if (list && page_ref_freeze(subpage, 2)) {
>>>> +            VM_BUG_ON_PAGE(PageLRU(subpage), subpage);
>>>> +            VM_BUG_ON_PAGE(PageCompound(subpage), subpage);
>>>> +            VM_BUG_ON_PAGE(page_mapped(subpage), subpage);
>>>> +
>>>
>>> No VM_BUG_*, VM_WARN is good enough.
>>>
>>>> +            ClearPageActive(subpage);
>>>> +            ClearPageUnevictable(subpage);
>>>> +            list_move(&subpage->lru, &pages_to_free);
>>>
>>> Most checks here should operate on new_folio instead of subpage.
>>>
>>>
>> Do you mean instead of doing the PageLRU, PageCompound and page_mapped check on the subpage, there should be checks on new_folio?
>> If new_folio is a large folio, then it could be that only some of the subpages were zapped?
> 
> We do a:
> 
> struct folio *new_folio = page_folio(subpage);
> 
> Then:
> 
> PageLRU() will end up getting translated to folio_test_lru(page_folio(subpage))
> 
> page_mapped() will end up getting translated to
> folio_mapped(page_folio(subpage))
> 
> PageCompound() is essentially a folio_test_large() check.
> 
> So what stops us from doing
> 
> VM_WARN_ON_ONCE_FOLIO(folio_test_lru(new_folio), new_folio);
> VM_WARN_ON_ONCE_FOLIO(folio_test_large(new_folio), new_folio);
> VM_WARN_ON_ONCE_FOLIO(folio_mapped(new_folio), new_folio);
> 
> folio_clear_active(new_folio);
> folio_clear_unevictable(new_folio);
> ...
> 
> ?
> 
> The page_ref_freeze() should make sure that we don't have a tail page of
> a large folio. Tail pages would have a refcount of 0.
> 
> Or what am I missing?
> 

Yes you are right. For some reason I was thinking tail pages would be able to reach this path.
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0167dc27e365..76a3b6a2b796 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2923,6 +2923,8 @@  static void __split_huge_page(struct page *page, struct list_head *list,
 	unsigned int new_nr = 1 << new_order;
 	int order = folio_order(folio);
 	unsigned int nr = 1 << order;
+	LIST_HEAD(pages_to_free);
+	int nr_pages_to_free = 0;
 
 	/* complete memcg works before add pages to LRU */
 	split_page_memcg(head, order, new_order);
@@ -3007,6 +3009,24 @@  static void __split_huge_page(struct page *page, struct list_head *list,
 		if (subpage == page)
 			continue;
 		folio_unlock(new_folio);
+		/*
+		 * If a tail page has only two references left, one inherited
+		 * from the isolation of its head and the other from
+		 * lru_add_page_tail() which we are about to drop, it means this
+		 * tail page was concurrently zapped. Then we can safely free it
+		 * and save page reclaim or migration the trouble of trying it.
+		 */
+		if (list && page_ref_freeze(subpage, 2)) {
+			VM_BUG_ON_PAGE(PageLRU(subpage), subpage);
+			VM_BUG_ON_PAGE(PageCompound(subpage), subpage);
+			VM_BUG_ON_PAGE(page_mapped(subpage), subpage);
+
+			ClearPageActive(subpage);
+			ClearPageUnevictable(subpage);
+			list_move(&subpage->lru, &pages_to_free);
+			nr_pages_to_free++;
+			continue;
+		}
 
 		/*
 		 * Subpages may be freed if there wasn't any mapping
@@ -3017,6 +3037,12 @@  static void __split_huge_page(struct page *page, struct list_head *list,
 		 */
 		free_page_and_swap_cache(subpage);
 	}
+
+	if (!nr_pages_to_free)
+		return;
+
+	mem_cgroup_uncharge_list(&pages_to_free);
+	free_unref_page_list(&pages_to_free);
 }
 
 /* Racy check whether the huge page can be split */