diff mbox series

[1/9] mm: Always initialise folio->_deferred_list

Message ID 20240321142448.1645400-2-willy@infradead.org (mailing list archive)
State New
Headers show
Series Various significant MM patches | expand

Commit Message

Matthew Wilcox March 21, 2024, 2:24 p.m. UTC
For compound pages which are at least order-2 (and hence have a
deferred_list), initialise it and then we can check at free that the
page is not part of a deferred list.  We recently found this useful to
rule out a source of corruption.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/huge_memory.c | 2 --
 mm/hugetlb.c     | 3 ++-
 mm/internal.h    | 2 ++
 mm/memcontrol.c  | 2 ++
 mm/page_alloc.c  | 9 +++++----
 5 files changed, 11 insertions(+), 7 deletions(-)

Comments

Miaohe Lin March 22, 2024, 8:23 a.m. UTC | #1
On 2024/3/21 22:24, Matthew Wilcox (Oracle) wrote:
> For compound pages which are at least order-2 (and hence have a
> deferred_list), initialise it and then we can check at free that the
> page is not part of a deferred list.  We recently found this useful to
> rule out a source of corruption.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/huge_memory.c | 2 --
>  mm/hugetlb.c     | 3 ++-
>  mm/internal.h    | 2 ++
>  mm/memcontrol.c  | 2 ++
>  mm/page_alloc.c  | 9 +++++----
>  5 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9859aa4f7553..04fb994a7b0b 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -792,8 +792,6 @@ void folio_prep_large_rmappable(struct folio *folio)
>  {
>  	if (!folio || !folio_test_large(folio))
>  		return;
> -	if (folio_order(folio) > 1)
> -		INIT_LIST_HEAD(&folio->_deferred_list);
>  	folio_set_large_rmappable(folio);
>  }
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 23ef240ba48a..7e9a766059aa 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
>  		destroy_compound_gigantic_folio(folio, huge_page_order(h));
>  		free_gigantic_folio(folio, huge_page_order(h));
>  	} else {
> -		__free_pages(&folio->page, huge_page_order(h));
> +		INIT_LIST_HEAD(&folio->_deferred_list);

Will it be better to add a comment to explain why INIT_LIST_HEAD is needed ?

> +		folio_put(folio);

Can all __free_pages be replaced with folio_put in mm/hugetlb.c?

Thanks.
Vlastimil Babka March 22, 2024, 9:30 a.m. UTC | #2
On 3/21/24 15:24, Matthew Wilcox (Oracle) wrote:
> For compound pages which are at least order-2 (and hence have a
> deferred_list), initialise it and then we can check at free that the
> page is not part of a deferred list.  We recently found this useful to
> rule out a source of corruption.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>
David Hildenbrand March 22, 2024, 12:49 p.m. UTC | #3
On 21.03.24 15:24, Matthew Wilcox (Oracle) wrote:
> For compound pages which are at least order-2 (and hence have a
> deferred_list), initialise it and then we can check at free that the
> page is not part of a deferred list.  We recently found this useful to
> rule out a source of corruption.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>
Matthew Wilcox March 22, 2024, 1 p.m. UTC | #4
On Fri, Mar 22, 2024 at 04:23:59PM +0800, Miaohe Lin wrote:
> > +++ b/mm/hugetlb.c
> > @@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
> >  		destroy_compound_gigantic_folio(folio, huge_page_order(h));
> >  		free_gigantic_folio(folio, huge_page_order(h));
> >  	} else {
> > -		__free_pages(&folio->page, huge_page_order(h));
> > +		INIT_LIST_HEAD(&folio->_deferred_list);
> 
> Will it be better to add a comment to explain why INIT_LIST_HEAD is needed ?

Maybe?  Something like
		/* We reused this space for our own purposes */

> > +		folio_put(folio);
> 
> Can all __free_pages be replaced with folio_put in mm/hugetlb.c?

There's only one left, and indeed it can!

I'll drop this into my tree and send it as a proper patch later.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 333f6278ef63..43cc7e6bc374 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2177,13 +2177,13 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
 		nodemask_t *node_alloc_noretry)
 {
 	int order = huge_page_order(h);
-	struct page *page;
+	struct folio *folio;
 	bool alloc_try_hard = true;
 	bool retry = true;
 
 	/*
-	 * By default we always try hard to allocate the page with
-	 * __GFP_RETRY_MAYFAIL flag.  However, if we are allocating pages in
+	 * By default we always try hard to allocate the folio with
+	 * __GFP_RETRY_MAYFAIL flag.  However, if we are allocating folios in
 	 * a loop (to adjust global huge page counts) and previous allocation
 	 * failed, do not continue to try hard on the same node.  Use the
 	 * node_alloc_noretry bitmap to manage this state information.
@@ -2196,43 +2196,42 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
 	if (nid == NUMA_NO_NODE)
 		nid = numa_mem_id();
 retry:
-	page = __alloc_pages(gfp_mask, order, nid, nmask);
+	folio = __folio_alloc(gfp_mask, order, nid, nmask);
 
-	/* Freeze head page */
-	if (page && !page_ref_freeze(page, 1)) {
-		__free_pages(page, order);
+	if (folio && !folio_ref_freeze(folio, 1)) {
+		folio_put(folio);
 		if (retry) {	/* retry once */
 			retry = false;
 			goto retry;
 		}
 		/* WOW!  twice in a row. */
-		pr_warn("HugeTLB head page unexpected inflated ref count\n");
-		page = NULL;
+		pr_warn("HugeTLB unexpected inflated folio ref count\n");
+		folio = NULL;
 	}
 
 	/*
-	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a page this
-	 * indicates an overall state change.  Clear bit so that we resume
-	 * normal 'try hard' allocations.
+	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a
+	 * folio this indicates an overall state change.  Clear bit so
+	 * that we resume normal 'try hard' allocations.
 	 */
-	if (node_alloc_noretry && page && !alloc_try_hard)
+	if (node_alloc_noretry && folio && !alloc_try_hard)
 		node_clear(nid, *node_alloc_noretry);
 
 	/*
-	 * If we tried hard to get a page but failed, set bit so that
+	 * If we tried hard to get a folio but failed, set bit so that
 	 * subsequent attempts will not try as hard until there is an
 	 * overall state change.
 	 */
-	if (node_alloc_noretry && !page && alloc_try_hard)
+	if (node_alloc_noretry && !folio && alloc_try_hard)
 		node_set(nid, *node_alloc_noretry);
 
-	if (!page) {
+	if (!folio) {
 		__count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
 		return NULL;
 	}
 
 	__count_vm_event(HTLB_BUDDY_PGALLOC);
-	return page_folio(page);
+	return folio;
 }
 
 static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h,
Miaohe Lin April 1, 2024, 3:14 a.m. UTC | #5
On 2024/3/22 21:00, Matthew Wilcox wrote:
> On Fri, Mar 22, 2024 at 04:23:59PM +0800, Miaohe Lin wrote:
>>> +++ b/mm/hugetlb.c
>>> @@ -1796,7 +1796,8 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
>>>  		destroy_compound_gigantic_folio(folio, huge_page_order(h));
>>>  		free_gigantic_folio(folio, huge_page_order(h));
>>>  	} else {
>>> -		__free_pages(&folio->page, huge_page_order(h));
>>> +		INIT_LIST_HEAD(&folio->_deferred_list);
>>
>> Will it be better to add a comment to explain why INIT_LIST_HEAD is needed ?

Sorry for late, I was on off-the-job training last week. It's really tired. :(

> 
> Maybe?  Something like
> 		/* We reused this space for our own purposes */

This one looks good to me.

> 
>>> +		folio_put(folio);
>>
>> Can all __free_pages be replaced with folio_put in mm/hugetlb.c?
> 
> There's only one left, and indeed it can!
> 
> I'll drop this into my tree and send it as a proper patch later.
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 333f6278ef63..43cc7e6bc374 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2177,13 +2177,13 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
>  		nodemask_t *node_alloc_noretry)
>  {
>  	int order = huge_page_order(h);
> -	struct page *page;
> +	struct folio *folio;
>  	bool alloc_try_hard = true;
>  	bool retry = true;
>  
>  	/*
> -	 * By default we always try hard to allocate the page with
> -	 * __GFP_RETRY_MAYFAIL flag.  However, if we are allocating pages in
> +	 * By default we always try hard to allocate the folio with
> +	 * __GFP_RETRY_MAYFAIL flag.  However, if we are allocating folios in
>  	 * a loop (to adjust global huge page counts) and previous allocation
>  	 * failed, do not continue to try hard on the same node.  Use the
>  	 * node_alloc_noretry bitmap to manage this state information.
> @@ -2196,43 +2196,42 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
>  	if (nid == NUMA_NO_NODE)
>  		nid = numa_mem_id();
>  retry:
> -	page = __alloc_pages(gfp_mask, order, nid, nmask);
> +	folio = __folio_alloc(gfp_mask, order, nid, nmask);
>  
> -	/* Freeze head page */
> -	if (page && !page_ref_freeze(page, 1)) {
> -		__free_pages(page, order);
> +	if (folio && !folio_ref_freeze(folio, 1)) {
> +		folio_put(folio);
>  		if (retry) {	/* retry once */
>  			retry = false;
>  			goto retry;
>  		}
>  		/* WOW!  twice in a row. */
> -		pr_warn("HugeTLB head page unexpected inflated ref count\n");
> -		page = NULL;
> +		pr_warn("HugeTLB unexpected inflated folio ref count\n");
> +		folio = NULL;
>  	}
>  
>  	/*
> -	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a page this
> -	 * indicates an overall state change.  Clear bit so that we resume
> -	 * normal 'try hard' allocations.
> +	 * If we did not specify __GFP_RETRY_MAYFAIL, but still got a
> +	 * folio this indicates an overall state change.  Clear bit so
> +	 * that we resume normal 'try hard' allocations.
>  	 */
> -	if (node_alloc_noretry && page && !alloc_try_hard)
> +	if (node_alloc_noretry && folio && !alloc_try_hard)
>  		node_clear(nid, *node_alloc_noretry);
>  
>  	/*
> -	 * If we tried hard to get a page but failed, set bit so that
> +	 * If we tried hard to get a folio but failed, set bit so that
>  	 * subsequent attempts will not try as hard until there is an
>  	 * overall state change.
>  	 */
> -	if (node_alloc_noretry && !page && alloc_try_hard)
> +	if (node_alloc_noretry && !folio && alloc_try_hard)
>  		node_set(nid, *node_alloc_noretry);
>  
> -	if (!page) {
> +	if (!folio) {
>  		__count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
>  		return NULL;
>  	}
>  
>  	__count_vm_event(HTLB_BUDDY_PGALLOC);
> -	return page_folio(page);
> +	return folio;
>  }
>  
>  static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h,
> .

This also looks good to me. Thanks for your work.
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9859aa4f7553..04fb994a7b0b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -792,8 +792,6 @@  void folio_prep_large_rmappable(struct folio *folio)
 {
 	if (!folio || !folio_test_large(folio))
 		return;
-	if (folio_order(folio) > 1)
-		INIT_LIST_HEAD(&folio->_deferred_list);
 	folio_set_large_rmappable(folio);
 }
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 23ef240ba48a..7e9a766059aa 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1796,7 +1796,8 @@  static void __update_and_free_hugetlb_folio(struct hstate *h,
 		destroy_compound_gigantic_folio(folio, huge_page_order(h));
 		free_gigantic_folio(folio, huge_page_order(h));
 	} else {
-		__free_pages(&folio->page, huge_page_order(h));
+		INIT_LIST_HEAD(&folio->_deferred_list);
+		folio_put(folio);
 	}
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index 7e486f2c502c..10895ec52546 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -525,6 +525,8 @@  static inline void prep_compound_head(struct page *page, unsigned int order)
 	atomic_set(&folio->_entire_mapcount, -1);
 	atomic_set(&folio->_nr_pages_mapped, 0);
 	atomic_set(&folio->_pincount, 0);
+	if (order > 1)
+		INIT_LIST_HEAD(&folio->_deferred_list);
 }
 
 static inline void prep_compound_tail(struct page *head, int tail_idx)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fabce2b50c69..a2a74d4ca0b1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7448,6 +7448,8 @@  static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug)
 	struct obj_cgroup *objcg;
 
 	VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
+	VM_BUG_ON_FOLIO(folio_order(folio) > 1 &&
+			!list_empty(&folio->_deferred_list), folio);
 
 	/*
 	 * Nobody should be changing or seriously looking at
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 14d39f34d336..4301146a5bf4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1006,10 +1006,11 @@  static int free_tail_page_prepare(struct page *head_page, struct page *page)
 		}
 		break;
 	case 2:
-		/*
-		 * the second tail page: ->mapping is
-		 * deferred_list.next -- ignore value.
-		 */
+		/* the second tail page: deferred_list overlaps ->mapping */
+		if (unlikely(!list_empty(&folio->_deferred_list))) {
+			bad_page(page, "on deferred list");
+			goto out;
+		}
 		break;
 	default:
 		if (page->mapping != TAIL_MAPPING) {