diff mbox series

mm/hugetlb: avoid hardcoding while checking if cma is reserved

Message ID 20200706084405.14236-1-song.bao.hua@hisilicon.com (mailing list archive)
State New, archived
Headers show
Series mm/hugetlb: avoid hardcoding while checking if cma is reserved | expand

Commit Message

Song Bao Hua (Barry Song) July 6, 2020, 8:44 a.m. UTC
hugetlb_cma[0] can be NULL due to various reasons, for example, node0 has
no memory. Thus, NULL hugetlb_cma[0] doesn't necessarily mean cma is not
enabled. gigantic pages might have been reserved on other nodes.

Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
Cc: Roman Gushchin <guro@fb.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
---
 mm/hugetlb.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Roman Gushchin July 6, 2020, 9:48 p.m. UTC | #1
On Mon, Jul 06, 2020 at 08:44:05PM +1200, Barry Song wrote:

Hello, Barry!

> hugetlb_cma[0] can be NULL due to various reasons, for example, node0 has
> no memory. Thus, NULL hugetlb_cma[0] doesn't necessarily mean cma is not
> enabled. gigantic pages might have been reserved on other nodes.

Just curious, is it a real-life problem you've seen? If so, I wonder how
you're using the hugetlb_cma option, and what's the outcome?

> 
> Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> ---
>  mm/hugetlb.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 57ece74e3aae..603aa854aa89 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2571,9 +2571,21 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
>  
>  	for (i = 0; i < h->max_huge_pages; ++i) {
>  		if (hstate_is_gigantic(h)) {
> -			if (IS_ENABLED(CONFIG_CMA) && hugetlb_cma[0]) {
> -				pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip boot time allocation\n");
> -				break;
> +			if (IS_ENABLED(CONFIG_CMA)) {
> +				int nid;
> +				bool cma_reserved = false;
> +
> +				for_each_node_state(nid, N_ONLINE) {
> +					if (hugetlb_cma[nid]) {
> +						pr_warn_once("HugeTLB: hugetlb_cma is reserved,"
> +								"skip boot time allocation\n");
> +						cma_reserved = true;
> +						break;
> +					}
> +				}
> +
> +				if (cma_reserved)
> +					break;

It's a valid problem, and I like to see it fixed. But I wonder if it would be better
to introduce a new helper bool hugetlb_cma_enabled()? And move both IS_ENABLED(CONFIG_CMA)
and hugetlb_cma[nid] checks there?

Thank you!
Song Bao Hua (Barry Song) July 6, 2020, 10:14 p.m. UTC | #2
> -----Original Message-----
> From: Roman Gushchin [mailto:guro@fb.com]
> Sent: Tuesday, July 7, 2020 9:48 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> Cc: akpm@linux-foundation.org; linux-mm@kvack.org;
> linux-kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>; Mike
> Kravetz <mike.kravetz@oracle.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>
> Subject: Re: [PATCH] mm/hugetlb: avoid hardcoding while checking if cma is
> reserved
> 
> On Mon, Jul 06, 2020 at 08:44:05PM +1200, Barry Song wrote:
> 
> Hello, Barry!
> 
> > hugetlb_cma[0] can be NULL due to various reasons, for example, node0 has
> > no memory. Thus, NULL hugetlb_cma[0] doesn't necessarily mean cma is not
> > enabled. gigantic pages might have been reserved on other nodes.
> 
> Just curious, is it a real-life problem you've seen? If so, I wonder how
> you're using the hugetlb_cma option, and what's the outcome?

Yes. It is kind of stupid but I once got a board on which node0 has no DDR
though node1 and node3 have memory.

I actually prefer we get cma size of per node by:
cma size of one node = hugetlb_cma/ (nodes with memory)
rather than:
cma size of one node = hugetlb_cma/ (all online nodes)

but unfortunately, or the N_MEMORY infrastructures are not ready yet. I mean:

for_each_node_state(nid, N_MEMORY) {
		int res;

		size = min(per_node, hugetlb_cma_size - reserved);
		size = round_up(size, PAGE_SIZE << order);

		res = cma_declare_contiguous_nid(0, size, 0, PAGE_SIZE << order,
						 0, false, "hugetlb",
						 &hugetlb_cma[nid], nid);
		...
	}

> 
> >
> > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages
> using cma")
> > Cc: Roman Gushchin <guro@fb.com>
> > Cc: Mike Kravetz <mike.kravetz@oracle.com>
> > Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > ---
> >  mm/hugetlb.c | 18 +++++++++++++++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 57ece74e3aae..603aa854aa89 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -2571,9 +2571,21 @@ static void __init
> hugetlb_hstate_alloc_pages(struct hstate *h)
> >
> >  	for (i = 0; i < h->max_huge_pages; ++i) {
> >  		if (hstate_is_gigantic(h)) {
> > -			if (IS_ENABLED(CONFIG_CMA) && hugetlb_cma[0]) {
> > -				pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip
> boot time allocation\n");
> > -				break;
> > +			if (IS_ENABLED(CONFIG_CMA)) {
> > +				int nid;
> > +				bool cma_reserved = false;
> > +
> > +				for_each_node_state(nid, N_ONLINE) {
> > +					if (hugetlb_cma[nid]) {
> > +						pr_warn_once("HugeTLB: hugetlb_cma is
> reserved,"
> > +								"skip boot time allocation\n");
> > +						cma_reserved = true;
> > +						break;
> > +					}
> > +				}
> > +
> > +				if (cma_reserved)
> > +					break;
> 
> It's a valid problem, and I like to see it fixed. But I wonder if it would be better
> to introduce a new helper bool hugetlb_cma_enabled()? And move both
> IS_ENABLED(CONFIG_CMA)
> and hugetlb_cma[nid] checks there?

Yep. that would be more readable.

> 
> Thank you!

Thanks
Barry
Song Bao Hua (Barry Song) July 6, 2020, 10:30 p.m. UTC | #3
> -----Original Message-----
> From: Song Bao Hua (Barry Song)
> Sent: Tuesday, July 7, 2020 10:12 AM
> To: 'Roman Gushchin' <guro@fb.com>
> Cc: akpm@linux-foundation.org; linux-mm@kvack.org;
> linux-kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>; Mike
> Kravetz <mike.kravetz@oracle.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>
> Subject: RE: [PATCH] mm/hugetlb: avoid hardcoding while checking if cma is
> reserved
> 
> 
> 
> > -----Original Message-----
> > From: Roman Gushchin [mailto:guro@fb.com]
> > Sent: Tuesday, July 7, 2020 9:48 AM
> > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > Cc: akpm@linux-foundation.org; linux-mm@kvack.org;
> > linux-kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>; Mike
> > Kravetz <mike.kravetz@oracle.com>; Jonathan Cameron
> > <jonathan.cameron@huawei.com>
> > Subject: Re: [PATCH] mm/hugetlb: avoid hardcoding while checking if
> > cma is reserved
> >
> > On Mon, Jul 06, 2020 at 08:44:05PM +1200, Barry Song wrote:
> >
> > Hello, Barry!
> >
> > > hugetlb_cma[0] can be NULL due to various reasons, for example,
> > > node0 has no memory. Thus, NULL hugetlb_cma[0] doesn't necessarily
> > > mean cma is not enabled. gigantic pages might have been reserved on
> other nodes.
> >
> > Just curious, is it a real-life problem you've seen? If so, I wonder
> > how you're using the hugetlb_cma option, and what's the outcome?
> 
> Yes. It is kind of stupid but I once got a board on which node0 has no DDR
> though node1 and node3 have memory.
> 
> I actually prefer we get cma size of per node by:
> cma size of one node = hugetlb_cma/ (nodes with memory) rather than:
> cma size of one node = hugetlb_cma/ (all online nodes)
> 
> but unfortunately, or the N_MEMORY infrastructures are not ready yet. I
> mean:
> 
> for_each_node_state(nid, N_MEMORY) {
> 		int res;
> 
> 		size = min(per_node, hugetlb_cma_size - reserved);
> 		size = round_up(size, PAGE_SIZE << order);
> 
> 		res = cma_declare_contiguous_nid(0, size, 0, PAGE_SIZE << order,
> 						 0, false, "hugetlb",
> 						 &hugetlb_cma[nid], nid);
> 		...
> 	}
> 

And for a server, there are many memory slots. The best config would be
making every node have at least one DDR. But it isn't necessarily true, it
is totally up to the users.

If we move hugetlb_cma_reserve() a bit later, we probably make hugetlb_cma size
completely consistent by splitting it to nodes with memory rather than nodes 
which are online:

void __init bootmem_init(void)
{
	...

	arm64_numa_init();

	/*
	 * must be done after arm64_numa_init() which calls numa_init() to
	 * initialize node_online_map that gets used in hugetlb_cma_reserve()
	 * while allocating required CMA size across online nodes.
	 */
- #ifdef CONFIG_ARM64_4K_PAGES
-	hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
- #endif

	...

	sparse_init();
	zone_sizes_init(min, max);

+ #ifdef CONFIG_ARM64_4K_PAGES
+	hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
+ #endif
	memblock_dump_all();
}

For x86, it could be done in similar way. Do you think it is worth to try?

> >
> > >
> > > Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic
> > > hugepages
> > using cma")
> > > Cc: Roman Gushchin <guro@fb.com>
> > > Cc: Mike Kravetz <mike.kravetz@oracle.com>
> > > Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> > > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> > > ---
> > >  mm/hugetlb.c | 18 +++++++++++++++---
> > >  1 file changed, 15 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c index
> > > 57ece74e3aae..603aa854aa89 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -2571,9 +2571,21 @@ static void __init
> > hugetlb_hstate_alloc_pages(struct hstate *h)
> > >
> > >  	for (i = 0; i < h->max_huge_pages; ++i) {
> > >  		if (hstate_is_gigantic(h)) {
> > > -			if (IS_ENABLED(CONFIG_CMA) && hugetlb_cma[0]) {
> > > -				pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip
> > boot time allocation\n");
> > > -				break;
> > > +			if (IS_ENABLED(CONFIG_CMA)) {
> > > +				int nid;
> > > +				bool cma_reserved = false;
> > > +
> > > +				for_each_node_state(nid, N_ONLINE) {
> > > +					if (hugetlb_cma[nid]) {
> > > +						pr_warn_once("HugeTLB: hugetlb_cma is
> > reserved,"
> > > +								"skip boot time allocation\n");
> > > +						cma_reserved = true;
> > > +						break;
> > > +					}
> > > +				}
> > > +
> > > +				if (cma_reserved)
> > > +					break;
> >
> > It's a valid problem, and I like to see it fixed. But I wonder if it
> > would be better to introduce a new helper bool hugetlb_cma_enabled()?
> > And move both
> > IS_ENABLED(CONFIG_CMA)
> > and hugetlb_cma[nid] checks there?
> 
> Yep. that would be more readable.
> 
> >
> > Thank you!
> 
> Thanks
> Barry

Thanks
Barry
Roman Gushchin July 6, 2020, 11:26 p.m. UTC | #4
On Mon, Jul 06, 2020 at 10:30:40PM +0000, Song Bao Hua (Barry Song) wrote:
> 
> 
> > -----Original Message-----
> > From: Song Bao Hua (Barry Song)
> > Sent: Tuesday, July 7, 2020 10:12 AM
> > To: 'Roman Gushchin' <guro@fb.com>
> > Cc: akpm@linux-foundation.org; linux-mm@kvack.org;
> > linux-kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>; Mike
> > Kravetz <mike.kravetz@oracle.com>; Jonathan Cameron
> > <jonathan.cameron@huawei.com>
> > Subject: RE: [PATCH] mm/hugetlb: avoid hardcoding while checking if cma is
> > reserved
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Roman Gushchin [mailto:guro@fb.com]
> > > Sent: Tuesday, July 7, 2020 9:48 AM
> > > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > > Cc: akpm@linux-foundation.org; linux-mm@kvack.org;
> > > linux-kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>; Mike
> > > Kravetz <mike.kravetz@oracle.com>; Jonathan Cameron
> > > <jonathan.cameron@huawei.com>
> > > Subject: Re: [PATCH] mm/hugetlb: avoid hardcoding while checking if
> > > cma is reserved
> > >
> > > On Mon, Jul 06, 2020 at 08:44:05PM +1200, Barry Song wrote:
> > >
> > > Hello, Barry!
> > >
> > > > hugetlb_cma[0] can be NULL due to various reasons, for example,
> > > > node0 has no memory. Thus, NULL hugetlb_cma[0] doesn't necessarily
> > > > mean cma is not enabled. gigantic pages might have been reserved on
> > other nodes.
> > >
> > > Just curious, is it a real-life problem you've seen? If so, I wonder
> > > how you're using the hugetlb_cma option, and what's the outcome?
> > 
> > Yes. It is kind of stupid but I once got a board on which node0 has no DDR
> > though node1 and node3 have memory.
> > 
> > I actually prefer we get cma size of per node by:
> > cma size of one node = hugetlb_cma/ (nodes with memory) rather than:
> > cma size of one node = hugetlb_cma/ (all online nodes)
> > 
> > but unfortunately, or the N_MEMORY infrastructures are not ready yet. I
> > mean:
> > 
> > for_each_node_state(nid, N_MEMORY) {
> > 		int res;
> > 
> > 		size = min(per_node, hugetlb_cma_size - reserved);
> > 		size = round_up(size, PAGE_SIZE << order);
> > 
> > 		res = cma_declare_contiguous_nid(0, size, 0, PAGE_SIZE << order,
> > 						 0, false, "hugetlb",
> > 						 &hugetlb_cma[nid], nid);
> > 		...
> > 	}
> > 
> 
> And for a server, there are many memory slots. The best config would be
> making every node have at least one DDR. But it isn't necessarily true, it
> is totally up to the users.
> 
> If we move hugetlb_cma_reserve() a bit later, we probably make hugetlb_cma size
> completely consistent by splitting it to nodes with memory rather than nodes 
> which are online:
> 
> void __init bootmem_init(void)
> {
> 	...
> 
> 	arm64_numa_init();
> 
> 	/*
> 	 * must be done after arm64_numa_init() which calls numa_init() to
> 	 * initialize node_online_map that gets used in hugetlb_cma_reserve()
> 	 * while allocating required CMA size across online nodes.
> 	 */
> - #ifdef CONFIG_ARM64_4K_PAGES
> -	hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> - #endif
> 
> 	...
> 
> 	sparse_init();
> 	zone_sizes_init(min, max);
> 
> + #ifdef CONFIG_ARM64_4K_PAGES
> +	hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> + #endif
> 	memblock_dump_all();
> }
> 
> For x86, it could be done in similar way. Do you think it is worth to try?

It sounds like a good idea to me!

Thanks.
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 57ece74e3aae..603aa854aa89 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2571,9 +2571,21 @@  static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
 
 	for (i = 0; i < h->max_huge_pages; ++i) {
 		if (hstate_is_gigantic(h)) {
-			if (IS_ENABLED(CONFIG_CMA) && hugetlb_cma[0]) {
-				pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip boot time allocation\n");
-				break;
+			if (IS_ENABLED(CONFIG_CMA)) {
+				int nid;
+				bool cma_reserved = false;
+
+				for_each_node_state(nid, N_ONLINE) {
+					if (hugetlb_cma[nid]) {
+						pr_warn_once("HugeTLB: hugetlb_cma is reserved,"
+								"skip boot time allocation\n");
+						cma_reserved = true;
+						break;
+					}
+				}
+
+				if (cma_reserved)
+					break;
 			}
 			if (!alloc_bootmem_huge_page(h))
 				break;