[1/9] mm: Always initialise folio->_deferred_list

Message ID	20240321142448.1645400-2-willy@infradead.org (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: "Matthew Wilcox (Oracle)" <willy@infradead.org> To: Andrew Morton <akpm@linux-foundation.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>, linux-mm@kvack.org, David Hildenbrand <david@redhat.com>, Vlastimil Babka <vbabka@suse.cz>, Miaohe Lin <linmiaohe@huawei.com>, Muchun Song <muchun.song@linux.dev>, Oscar Salvador <osalvador@suse.de> Subject: [PATCH 1/9] mm: Always initialise folio->_deferred_list Date: Thu, 21 Mar 2024 14:24:39 +0000 Message-ID: <20240321142448.1645400-2-willy@infradead.org> In-Reply-To: <20240321142448.1645400-1-willy@infradead.org> References: <20240321142448.1645400-1-willy@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Various significant MM patches \| expand [0/9] Various significant MM patches [1/9] mm: Always initialise folio->_deferred_list [2/9] mm: Create FOLIO_FLAG_FALSE and FOLIO_TYPE_OPS macros [3/9] mm: Remove folio_prep_large_rmappable() [4/9] mm: Support page_mapcount() on page_has_type() pages [5/9] mm: Turn folio_test_hugetlb into a PageType [6/9] mm: Remove a call to compound_head() from is_page_hwpoison() [7/9] mm: Free up PG_slab [8/9] mm: Improve dumping of mapcount and page_type [9/9] hugetlb: Remove mention of destructors

Matthew Wilcox March 21, 2024, 2:24 p.m. UTC

For compound pages which are at least order-2 (and hence have a
deferred_list), initialise it and then we can check at free that the
page is not part of a deferred list.  We recently found this useful to
rule out a source of corruption.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/huge_memory.c | 2 --
 mm/hugetlb.c     | 3 ++-
 mm/internal.h    | 2 ++
 mm/memcontrol.c  | 2 ++
 mm/page_alloc.c  | 9 +++++----
 5 files changed, 11 insertions(+), 7 deletions(-)

Miaohe Lin March 22, 2024, 8:23 a.m. UTC | #1

On 2024/3/21 22:24, Matthew Wilcox (Oracle) wrote:
> For compound pages which are at least order-2 (and hence have a
> deferred_list), initialise it and then we can check at free that the
> page is not part of a deferred list.  We recently found this useful to
> rule out a source of corruption.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/huge_memory.c | 2 --
>  mm/hugetlb.c     | 3 ++-
>  mm/internal.h    | 2 ++
>  mm/memcontrol.c  | 2 ++
>  mm/page_alloc.c  | 9 +++++----
>  5 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9859aa4f7553..04fb994a7b0b 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -792,8 +792,6 @@ void folio_prep_large_rmappable(struct folio *folio)
>  {
>  	if (!folio || !folio_test_large(folio))
>  		return;
> -	if (folio_order(folio) > 1)
> -		INIT_LIST_HEAD(&folio->_deferred_list);
>  	folio_set_large_rmappable(folio);
>  }
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 23ef240ba48a..7e9a766059aa 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
>  		destroy_compound_gigantic_folio(folio, huge_page_order(h));
>  		free_gigantic_folio(folio, huge_page_order(h));
>  	} else {
> -		__free_pages(&folio->page, huge_page_order(h));
> +		INIT_LIST_HEAD(&folio->_deferred_list);

Will it be better to add a comment to explain why INIT_LIST_HEAD is needed ?

> +		folio_put(folio);

Can all __free_pages be replaced with folio_put in mm/hugetlb.c?

Thanks.

Vlastimil Babka March 22, 2024, 9:30 a.m. UTC | #2

On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
> For compound pages which are at least order-2 (and hence have a
> deferred_list), initialise it and then we can check at free that the
> page is not part of a deferred list.  We recently found this useful to
> rule out a source of corruption.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

David Hildenbrand March 22, 2024, 12:49 p.m. UTC | #3

On 21.03.24 15:24, Matthew Wilcox (Oracle) wrote:
> For compound pages which are at least order-2 (and hence have a
> deferred_list), initialise it and then we can check at free that the
> page is not part of a deferred list.  We recently found this useful to
> rule out a source of corruption.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

Matthew Wilcox March 22, 2024, 1 p.m. UTC | #4

On Fri, Mar 22, 2024 at 04:23:59PM +0800, Miaohe Lin wrote:
> > +++ b/mm/hugetlb.c
> > @@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
> >  		destroy_compound_gigantic_folio(folio, huge_page_order(h));
> >  		free_gigantic_folio(folio, huge_page_order(h));
> >  	} else {
> > -		__free_pages(&folio->page, huge_page_order(h));
> > +		INIT_LIST_HEAD(&folio->_deferred_list);
> 
> Will it be better to add a comment to explain why INIT_LIST_HEAD is needed ?

Maybe?  Something like
		/* We reused this space for our own purposes */

> > +		folio_put(folio);
> 
> Can all __free_pages be replaced with folio_put in mm/hugetlb.c?

There's only one left, and indeed it can!

I'll drop this into my tree and send it as a proper patch later.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 333f6278ef63..43cc7e6bc374 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2177,13 +2177,13 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
 		nodemask_t *node_alloc_noretry)
 {
 	int order = huge_page_order(h);
-	struct page *page;
+	struct folio *folio;
 	bool alloc_try_hard = true;
 	bool retry = true;
 
 	/*
-	 * By default we always try hard to allocate the page with
-	 * __GFP_RETRY_MAYFAIL flag.  However, if we are allocating pages in
+	 * By default we always try hard to allocate the folio with
+	 * __GFP_RETRY_MAYFAIL flag.  However, if we are allocating folios in
 	 * a loop (to adjust global huge page counts) and previous allocation
 	 * failed, do not continue to try hard on the same node.  Use the
 	 * node_alloc_noretry bitmap to manage this state information.
@@ -2196,43 +2196,42 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
 	if (nid == NUMA_NO_NODE)
 		nid = numa_mem_id();
 retry:
-	page = __alloc_pages(gfp_mask, order, nid, nmask);
+	folio = __folio_alloc(gfp_mask, order, nid, nmask);
 
-	/* Freeze head page */
-	if (page && !page_ref_freeze(page, 1)) {
-		__free_pages(page, order);
+	if (folio && !folio_ref_freeze(folio, 1)) {
+		folio_put(folio);
 		if (retry) {	/* retry once */
 			retry = false;
 			goto retry;
 		}
 		/* WOW!  twice in a row. */
-		pr_warn("HugeTLB head page unexpected inflated ref count\n");
-		page = NULL;
+		pr_warn("HugeTLB unexpected inflated folio ref count\n");
+		folio = NULL;
 	}
 
 	/*
-	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a page this
-	 * indicates an overall state change.  Clear bit so that we resume
-	 * normal 'try hard' allocations.
+	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a
+	 * folio this indicates an overall state change.  Clear bit so
+	 * that we resume normal 'try hard' allocations.
 	 */
-	if (node_alloc_noretry && page && !alloc_try_hard)
+	if (node_alloc_noretry && folio && !alloc_try_hard)
 		node_clear(nid, *node_alloc_noretry);
 
 	/*
-	 * If we tried hard to get a page but failed, set bit so that
+	 * If we tried hard to get a folio but failed, set bit so that
 	 * subsequent attempts will not try as hard until there is an
 	 * overall state change.
 	 */
-	if (node_alloc_noretry && !page && alloc_try_hard)
+	if (node_alloc_noretry && !folio && alloc_try_hard)
 		node_set(nid, *node_alloc_noretry);
 
-	if (!page) {
+	if (!folio) {
 		__count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
 		return NULL;
 	}
 
 	__count_vm_event(HTLB_BUDDY_PGALLOC);
-	return page_folio(page);
+	return folio;
 }
 
 static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h,

Miaohe Lin April 1, 2024, 3:14 a.m. UTC | #5

On 2024/3/22 21:00, Matthew Wilcox wrote:
> On Fri, Mar 22, 2024 at 04:23:59PM +0800, Miaohe Lin wrote:
>>> +++ b/mm/hugetlb.c
>>> @@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
>>>  		destroy_compound_gigantic_folio(folio, huge_page_order(h));
>>>  		free_gigantic_folio(folio, huge_page_order(h));
>>>  	} else {
>>> -		__free_pages(&folio->page, huge_page_order(h));
>>> +		INIT_LIST_HEAD(&folio->_deferred_list);
>>
>> Will it be better to add a comment to explain why INIT_LIST_HEAD is needed ?

Sorry for late, I was on off-the-job training last week. It's really tired. :(

> 
> Maybe?  Something like
> 		/* We reused this space for our own purposes */

This one looks good to me.

> 
>>> +		folio_put(folio);
>>
>> Can all __free_pages be replaced with folio_put in mm/hugetlb.c?
> 
> There's only one left, and indeed it can!
> 
> I'll drop this into my tree and send it as a proper patch later.
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 333f6278ef63..43cc7e6bc374 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2177,13 +2177,13 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
>  		nodemask_t *node_alloc_noretry)
>  {
>  	int order = huge_page_order(h);
> -	struct page *page;
> +	struct folio *folio;
>  	bool alloc_try_hard = true;
>  	bool retry = true;
>  
>  	/*
> -	 * By default we always try hard to allocate the page with
> -	 * __GFP_RETRY_MAYFAIL flag.  However, if we are allocating pages in
> +	 * By default we always try hard to allocate the folio with
> +	 * __GFP_RETRY_MAYFAIL flag.  However, if we are allocating folios in
>  	 * a loop (to adjust global huge page counts) and previous allocation
>  	 * failed, do not continue to try hard on the same node.  Use the
>  	 * node_alloc_noretry bitmap to manage this state information.
> @@ -2196,43 +2196,42 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
>  	if (nid == NUMA_NO_NODE)
>  		nid = numa_mem_id();
>  retry:
> -	page = __alloc_pages(gfp_mask, order, nid, nmask);
> +	folio = __folio_alloc(gfp_mask, order, nid, nmask);
>  
> -	/* Freeze head page */
> -	if (page && !page_ref_freeze(page, 1)) {
> -		__free_pages(page, order);
> +	if (folio && !folio_ref_freeze(folio, 1)) {
> +		folio_put(folio);
>  		if (retry) {	/* retry once */
>  			retry = false;
>  			goto retry;
>  		}
>  		/* WOW!  twice in a row. */
> -		pr_warn("HugeTLB head page unexpected inflated ref count\n");
> -		page = NULL;
> +		pr_warn("HugeTLB unexpected inflated folio ref count\n");
> +		folio = NULL;
>  	}
>  
>  	/*
> -	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a page this
> -	 * indicates an overall state change.  Clear bit so that we resume
> -	 * normal 'try hard' allocations.
> +	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a
> +	 * folio this indicates an overall state change.  Clear bit so
> +	 * that we resume normal 'try hard' allocations.
>  	 */
> -	if (node_alloc_noretry && page && !alloc_try_hard)
> +	if (node_alloc_noretry && folio && !alloc_try_hard)
>  		node_clear(nid, *node_alloc_noretry);
>  
>  	/*
> -	 * If we tried hard to get a page but failed, set bit so that
> +	 * If we tried hard to get a folio but failed, set bit so that
>  	 * subsequent attempts will not try as hard until there is an
>  	 * overall state change.
>  	 */
> -	if (node_alloc_noretry && !page && alloc_try_hard)
> +	if (node_alloc_noretry && !folio && alloc_try_hard)
>  		node_set(nid, *node_alloc_noretry);
>  
> -	if (!page) {
> +	if (!folio) {
>  		__count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
>  		return NULL;
>  	}
>  
>  	__count_vm_event(HTLB_BUDDY_PGALLOC);
> -	return page_folio(page);
> +	return folio;
>  }
>  
>  static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h,
> .

This also looks good to me. Thanks for your work.

[1/9] mm: Always initialise folio->_deferred_list

Commit Message

Comments

Patch