diff mbox series

[v3,5/5] hugetlb: add hugetlb demote page support

Message ID 20211001175210.45968-6-mike.kravetz@oracle.com (mailing list archive)
State New
Headers show
Series hugetlb: add demote/split page functionality | expand

Commit Message

Mike Kravetz Oct. 1, 2021, 5:52 p.m. UTC
Demote page functionality will split a huge page into a number of huge
pages of a smaller size.  For example, on x86 a 1GB huge page can be
demoted into 512 2M huge pages.  Demotion is done 'in place' by simply
splitting the huge page.

Added '*_for_demote' wrappers for remove_hugetlb_page,
destroy_compound_gigantic_page and prep_compound_gigantic_page for use
by demote code.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/hugetlb.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 74 insertions(+), 8 deletions(-)

Comments

Oscar Salvador Oct. 6, 2021, 8:41 a.m. UTC | #1
On Fri, Oct 01, 2021 at 10:52:10AM -0700, Mike Kravetz wrote:
> Demote page functionality will split a huge page into a number of huge
> pages of a smaller size.  For example, on x86 a 1GB huge page can be
> demoted into 512 2M huge pages.  Demotion is done 'in place' by simply
> splitting the huge page.
> 
> Added '*_for_demote' wrappers for remove_hugetlb_page,
> destroy_compound_gigantic_page and prep_compound_gigantic_page for use
> by demote code.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  mm/hugetlb.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 74 insertions(+), 8 deletions(-)
> 
...  
> +static int demote_free_huge_page(struct hstate *h, struct page *page)
> +{
> +	int i, nid = page_to_nid(page);
> +	struct hstate *target_hstate;
> +	int rc = 0;
> +
> +	target_hstate = size_to_hstate(PAGE_SIZE << h->demote_order);
> +
> +	remove_hugetlb_page_for_demote(h, page, false);
> +	spin_unlock_irq(&hugetlb_lock);
> +
> +	rc = alloc_huge_page_vmemmap(h, page);
> +	if (rc) {
> +		/* Allocation of vmemmmap failed, we can not demote page */
> +		spin_lock_irq(&hugetlb_lock);
> +		set_page_refcounted(page);
> +		add_hugetlb_page(h, page, false);
> +		return rc;
> +	}

Question: You keep the original error code returned from alloc_huge_page_vmemmap()
here, but then you lose it on demote_pool_huge_page() when doing the
!demote_free_huge_page. Would not make more sense to keep it all the way down to 
demote_store() in case you want to return the actual error code?

> +
> +	/*
> +	 * Use destroy_compound_gigantic_page_for_demote for all huge page
> +	 * sizes as it will not ref count pages.
> +	 */
> +	destroy_compound_gigantic_page_for_demote(page, huge_page_order(h));

It seems that for now we only allow gigantic pages to be demoted, but
destroy_compound_gigantic_page_for_demote feels kind of wrong, even
if it is only a wrapper that ends up calling _*gigantic_ functions.

We want a routine that destroy a hugetlb to be demoted into smaller hugetlb
pages, so the name gigantic makes little sense to appear in my opinion.

>  static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
>  	__must_hold(&hugetlb_lock)
>  {
> +	int nr_nodes, node;
> +	struct page *page;
>  	int rc = 0;
>  
>  	lockdep_assert_held(&hugetlb_lock);
> @@ -3313,9 +3377,15 @@ static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
>  	if (!h->demote_order)
>  		return rc;
>  
> -	/*
> -	 * TODO - demote fucntionality will be added in subsequent patch
> -	 */
> +	for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
> +		if (!list_empty(&h->hugepage_freelists[node])) {
> +			page = list_entry(h->hugepage_freelists[node].next,
> +					struct page, lru);
> +			rc = !demote_free_huge_page(h, page);

I kinda dislike this as I pointed out.
Mike Kravetz Oct. 6, 2021, 6:52 p.m. UTC | #2
On 10/6/21 1:41 AM, Oscar Salvador wrote:
> On Fri, Oct 01, 2021 at 10:52:10AM -0700, Mike Kravetz wrote:
>> Demote page functionality will split a huge page into a number of huge
>> pages of a smaller size.  For example, on x86 a 1GB huge page can be
>> demoted into 512 2M huge pages.  Demotion is done 'in place' by simply
>> splitting the huge page.
>>
>> Added '*_for_demote' wrappers for remove_hugetlb_page,
>> destroy_compound_gigantic_page and prep_compound_gigantic_page for use
>> by demote code.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>>  mm/hugetlb.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 74 insertions(+), 8 deletions(-)
>>
> ...  
>> +static int demote_free_huge_page(struct hstate *h, struct page *page)
>> +{
>> +	int i, nid = page_to_nid(page);
>> +	struct hstate *target_hstate;
>> +	int rc = 0;
>> +
>> +	target_hstate = size_to_hstate(PAGE_SIZE << h->demote_order);
>> +
>> +	remove_hugetlb_page_for_demote(h, page, false);
>> +	spin_unlock_irq(&hugetlb_lock);
>> +
>> +	rc = alloc_huge_page_vmemmap(h, page);
>> +	if (rc) {
>> +		/* Allocation of vmemmmap failed, we can not demote page */
>> +		spin_lock_irq(&hugetlb_lock);
>> +		set_page_refcounted(page);
>> +		add_hugetlb_page(h, page, false);
>> +		return rc;
>> +	}
> 
> Question: You keep the original error code returned from alloc_huge_page_vmemmap()
> here, but then you lose it on demote_pool_huge_page() when doing the
> !demote_free_huge_page. Would not make more sense to keep it all the way down to 
> demote_store() in case you want to return the actual error code?
> 

Yes, I will return it all the way to demote_store (and the user).

>> +
>> +	/*
>> +	 * Use destroy_compound_gigantic_page_for_demote for all huge page
>> +	 * sizes as it will not ref count pages.
>> +	 */
>> +	destroy_compound_gigantic_page_for_demote(page, huge_page_order(h));
> 
> It seems that for now we only allow gigantic pages to be demoted, but
> destroy_compound_gigantic_page_for_demote feels kind of wrong, even
> if it is only a wrapper that ends up calling _*gigantic_ functions.
> 
> We want a routine that destroy a hugetlb to be demoted into smaller hugetlb
> pages, so the name gigantic makes little sense to appear in my opinion.
> 

Agree, I do not love the name.  Since it is only a wrapper, how about
destroy_hugetlb_page_for_demote?  And, change those other *_for_demote
wrappers to similiarly not have gigantic in their names.

>>  static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
>>  	__must_hold(&hugetlb_lock)
>>  {
>> +	int nr_nodes, node;
>> +	struct page *page;
>>  	int rc = 0;
>>  
>>  	lockdep_assert_held(&hugetlb_lock);
>> @@ -3313,9 +3377,15 @@ static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
>>  	if (!h->demote_order)
>>  		return rc;
>>  
>> -	/*
>> -	 * TODO - demote fucntionality will be added in subsequent patch
>> -	 */
>> +	for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
>> +		if (!list_empty(&h->hugepage_freelists[node])) {
>> +			page = list_entry(h->hugepage_freelists[node].next,
>> +					struct page, lru);
>> +			rc = !demote_free_huge_page(h, page);
> 
> I kinda dislike this as I pointed out.
> 

Will change.

Thanks for all your comments!
Oscar Salvador Oct. 7, 2021, 7:52 a.m. UTC | #3
On Wed, Oct 06, 2021 at 11:52:33AM -0700, Mike Kravetz wrote:
> Agree, I do not love the name.  Since it is only a wrapper, how about
> destroy_hugetlb_page_for_demote?  And, change those other *_for_demote
> wrappers to similiarly not have gigantic in their names.

Yes, makes sense to me. We want the names to be generic and not tied to
gigantic stuff.

Thanks Mike
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ccbe323c992b..aec2e737d0cd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1270,7 +1270,7 @@  static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed)
 		((node = hstate_next_node_to_free(hs, mask)) || 1);	\
 		nr_nodes--)
 
-#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
+/* used to demote non-gigantic_huge pages as well */
 static void __destroy_compound_gigantic_page(struct page *page,
 					unsigned int order, bool demote)
 {
@@ -1293,6 +1293,13 @@  static void __destroy_compound_gigantic_page(struct page *page,
 	__ClearPageHead(page);
 }
 
+static void destroy_compound_gigantic_page_for_demote(struct page *page,
+					unsigned int order)
+{
+	__destroy_compound_gigantic_page(page, order, true);
+}
+
+#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
 static void destroy_compound_gigantic_page(struct page *page,
 					unsigned int order)
 {
@@ -1438,6 +1445,12 @@  static void remove_hugetlb_page(struct hstate *h, struct page *page,
 	__remove_hugetlb_page(h, page, adjust_surplus, false);
 }
 
+static void remove_hugetlb_page_for_demote(struct hstate *h, struct page *page,
+							bool adjust_surplus)
+{
+	__remove_hugetlb_page(h, page, adjust_surplus, true);
+}
+
 static void add_hugetlb_page(struct hstate *h, struct page *page,
 			     bool adjust_surplus)
 {
@@ -1779,6 +1792,12 @@  static bool prep_compound_gigantic_page(struct page *page, unsigned int order)
 	return __prep_compound_gigantic_page(page, order, false);
 }
 
+static bool prep_compound_gigantic_page_for_demote(struct page *page,
+							unsigned int order)
+{
+	return __prep_compound_gigantic_page(page, order, true);
+}
+
 /*
  * PageHuge() only returns true for hugetlbfs pages, but not for normal or
  * transparent huge pages.  See the PageTransHuge() documentation for more
@@ -3302,9 +3321,54 @@  static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 	return 0;
 }
 
+static int demote_free_huge_page(struct hstate *h, struct page *page)
+{
+	int i, nid = page_to_nid(page);
+	struct hstate *target_hstate;
+	int rc = 0;
+
+	target_hstate = size_to_hstate(PAGE_SIZE << h->demote_order);
+
+	remove_hugetlb_page_for_demote(h, page, false);
+	spin_unlock_irq(&hugetlb_lock);
+
+	rc = alloc_huge_page_vmemmap(h, page);
+	if (rc) {
+		/* Allocation of vmemmmap failed, we can not demote page */
+		spin_lock_irq(&hugetlb_lock);
+		set_page_refcounted(page);
+		add_hugetlb_page(h, page, false);
+		return rc;
+	}
+
+	/*
+	 * Use destroy_compound_gigantic_page_for_demote for all huge page
+	 * sizes as it will not ref count pages.
+	 */
+	destroy_compound_gigantic_page_for_demote(page, huge_page_order(h));
+
+	for (i = 0; i < pages_per_huge_page(h);
+				i += pages_per_huge_page(target_hstate)) {
+		if (hstate_is_gigantic(target_hstate))
+			prep_compound_gigantic_page_for_demote(page + i,
+							target_hstate->order);
+		else
+			prep_compound_page(page + i, target_hstate->order);
+		set_page_private(page + i, 0);
+		set_page_refcounted(page + i);
+		prep_new_huge_page(target_hstate, page + i, nid);
+		put_page(page + i);
+	}
+
+	spin_lock_irq(&hugetlb_lock);
+	return rc;
+}
+
 static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
 	__must_hold(&hugetlb_lock)
 {
+	int nr_nodes, node;
+	struct page *page;
 	int rc = 0;
 
 	lockdep_assert_held(&hugetlb_lock);
@@ -3313,9 +3377,15 @@  static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed)
 	if (!h->demote_order)
 		return rc;
 
-	/*
-	 * TODO - demote fucntionality will be added in subsequent patch
-	 */
+	for_each_node_mask_to_free(h, nr_nodes, node, nodes_allowed) {
+		if (!list_empty(&h->hugepage_freelists[node])) {
+			page = list_entry(h->hugepage_freelists[node].next,
+					struct page, lru);
+			rc = !demote_free_huge_page(h, page);
+			break;
+		}
+	}
+
 	return rc;
 }
 
@@ -3550,10 +3620,6 @@  static ssize_t demote_store(struct kobject *kobj,
 		/*
 		 * Check for available pages to demote each time thorough the
 		 * loop as demote_pool_huge_page will drop hugetlb_lock.
-		 *
-		 * NOTE: demote_pool_huge_page does not yet drop hugetlb_lock
-		 * but will when full demote functionality is added in a later
-		 * patch.
 		 */
 		if (nid != NUMA_NO_NODE)
 			nr_available = h->free_huge_pages_node[nid];