diff mbox series

[v2,1/2] mm/vmalloc: Use batched page requests in bulk-allocator

Message ID 20210705170537.43060-1-urezki@gmail.com (mailing list archive)
State New
Headers show
Series [v2,1/2] mm/vmalloc: Use batched page requests in bulk-allocator | expand

Commit Message

Uladzislau Rezki July 5, 2021, 5:05 p.m. UTC
In case of simultaneous vmalloc allocations, for example it is 1GB and
12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT
kernel.

<snip>
[   62.512621] RIP: 0010:__alloc_pages_bulk+0xa9f/0xbb0
[   62.512628] Code: ff 8b 44 24 48 44 29 f8 83 f8 01 0f 84 ea fe ff ff e9 07 f6 ff ff 48 8b 44 24 60 48 89 28 e9 00 f9 ff ff fb 66 0f 1f 44 00 00 <e9> e8 fd ff ff 65 48 01 51 10 e9 3e fe ff ff 48 8b 44 24 78 4d 89
[   62.512629] RSP: 0018:ffffa7bfc29ffd20 EFLAGS: 00000206
[   62.512631] RAX: 0000000000000200 RBX: ffffcd5405421888 RCX: ffff8c36ffdeb928
[   62.512632] RDX: 0000000000040000 RSI: ffffa896f06b2ff8 RDI: ffffcd5405421880
[   62.512633] RBP: ffffcd5405421880 R08: 000000000000007d R09: ffffffffffffffff
[   62.512634] R10: ffffffff9d63c084 R11: 00000000ffffffff R12: ffff8c373ffaeb80
[   62.512635] R13: ffff8c36ffdf65f8 R14: ffff8c373ffaeb80 R15: 0000000000040000
[   62.512637] FS:  0000000000000000(0000) GS:ffff8c36ffdc0000(0000) knlGS:0000000000000000
[   62.512638] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   62.512639] CR2: 000055c8e2fe8610 CR3: 0000000c13e10000 CR4: 00000000000006e0
[   62.512641] Call Trace:
[   62.512646]  __vmalloc_node_range+0x11c/0x2d0
[   62.512649]  ? full_fit_alloc_test+0x140/0x140 [test_vmalloc]
[   62.512654]  __vmalloc_node+0x4b/0x70
[   62.512656]  ? fix_size_alloc_test+0x44/0x60 [test_vmalloc]
[   62.512659]  fix_size_alloc_test+0x44/0x60 [test_vmalloc]
[   62.512662]  test_func+0xe7/0x1f0 [test_vmalloc]
[   62.512666]  ? fix_align_alloc_test+0x50/0x50 [test_vmalloc]
[   62.512668]  kthread+0x11a/0x140
[   62.512671]  ? set_kthread_struct+0x40/0x40
[   62.512672]  ret_from_fork+0x22/0x30
<snip>

To address this issue invoke a bulk-allocator many times until all pages
are obtained, i.e. do batched page requests adding cond_resched() meanwhile
to reschedule. Batched value is hard-coded and is 100 pages per call.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

Comments

Andrew Morton July 6, 2021, 8:26 p.m. UTC | #1
On Mon,  5 Jul 2021 19:05:36 +0200 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote:

> In case of simultaneous vmalloc allocations, for example it is 1GB and
> 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT
> kernel.
> 
> <snip>
> ...
>
> are obtained, i.e. do batched page requests adding cond_resched() meanwhile
> to reschedule. Batched value is hard-coded and is 100 pages per call.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Can we please have a Fixes: for this?

Is this fix important enough for 4.14-rcx?  I think so...

> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2785,10 +2785,32 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  	 * to fails, fallback to a single page allocator that is
>  	 * more permissive.
>  	 */
> -	if (!order)
> -		nr_allocated = alloc_pages_bulk_array_node(
> -			gfp, nid, nr_pages, pages);
> -	else
> +	if (!order) {
> +		while (nr_allocated < nr_pages) {
> +			int nr, nr_pages_request;
> +
> +			/*
> +			 * A maximum allowed request is hard-coded and is 100
> +			 * pages per call. That is done in order to prevent a
> +			 * long preemption off scenario in the bulk-allocator
> +			 * so the range is [1:100].
> +			 */
> +			nr_pages_request = min(100, (int)(nr_pages - nr_allocated));

Yes, they types are all over the place.

nr_pages: unsigned long
nr_allocated: unsigned int
nr, nr_pages_request: int

Can we please choose the most appropriate type and use that
consistently?
Michal Hocko July 7, 2021, 8:42 a.m. UTC | #2
On Tue 06-07-21 13:26:53, Andrew Morton wrote:
> On Mon,  5 Jul 2021 19:05:36 +0200 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote:
> 
> > In case of simultaneous vmalloc allocations, for example it is 1GB and
> > 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT
> > kernel.
> > 
> > <snip>
> > ...
> >
> > are obtained, i.e. do batched page requests adding cond_resched() meanwhile
> > to reschedule. Batched value is hard-coded and is 100 pages per call.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> Can we please have a Fixes: for this?

Is this a fix for any actual real life problem? I mean allocating 1GB of
vmalloc space back and forth sounds like a stretch to me.
 
> Is this fix important enough for 4.14-rcx?  I think so...

I do not think so. This is an improvement so that vmalloc behaves more
sanely for those abusers...
Michal Hocko July 7, 2021, 8:50 a.m. UTC | #3
On Mon 05-07-21 19:05:36, Uladzislau Rezki (Sony) wrote:
> In case of simultaneous vmalloc allocations, for example it is 1GB and
> 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT
> kernel.
> 
> <snip>
> [   62.512621] RIP: 0010:__alloc_pages_bulk+0xa9f/0xbb0
> [   62.512628] Code: ff 8b 44 24 48 44 29 f8 83 f8 01 0f 84 ea fe ff ff e9 07 f6 ff ff 48 8b 44 24 60 48 89 28 e9 00 f9 ff ff fb 66 0f 1f 44 00 00 <e9> e8 fd ff ff 65 48 01 51 10 e9 3e fe ff ff 48 8b 44 24 78 4d 89
> [   62.512629] RSP: 0018:ffffa7bfc29ffd20 EFLAGS: 00000206
> [   62.512631] RAX: 0000000000000200 RBX: ffffcd5405421888 RCX: ffff8c36ffdeb928
> [   62.512632] RDX: 0000000000040000 RSI: ffffa896f06b2ff8 RDI: ffffcd5405421880
> [   62.512633] RBP: ffffcd5405421880 R08: 000000000000007d R09: ffffffffffffffff
> [   62.512634] R10: ffffffff9d63c084 R11: 00000000ffffffff R12: ffff8c373ffaeb80
> [   62.512635] R13: ffff8c36ffdf65f8 R14: ffff8c373ffaeb80 R15: 0000000000040000
> [   62.512637] FS:  0000000000000000(0000) GS:ffff8c36ffdc0000(0000) knlGS:0000000000000000
> [   62.512638] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   62.512639] CR2: 000055c8e2fe8610 CR3: 0000000c13e10000 CR4: 00000000000006e0
> [   62.512641] Call Trace:
> [   62.512646]  __vmalloc_node_range+0x11c/0x2d0
> [   62.512649]  ? full_fit_alloc_test+0x140/0x140 [test_vmalloc]
> [   62.512654]  __vmalloc_node+0x4b/0x70
> [   62.512656]  ? fix_size_alloc_test+0x44/0x60 [test_vmalloc]
> [   62.512659]  fix_size_alloc_test+0x44/0x60 [test_vmalloc]
> [   62.512662]  test_func+0xe7/0x1f0 [test_vmalloc]
> [   62.512666]  ? fix_align_alloc_test+0x50/0x50 [test_vmalloc]
> [   62.512668]  kthread+0x11a/0x140
> [   62.512671]  ? set_kthread_struct+0x40/0x40
> [   62.512672]  ret_from_fork+0x22/0x30
> <snip>
> 
> To address this issue invoke a bulk-allocator many times until all pages
> are obtained, i.e. do batched page requests adding cond_resched() meanwhile
> to reschedule. Batched value is hard-coded and is 100 pages per call.

Yes, this makes perfect sense to me. I would just be more explicit that
this is an artificially created problem likely not being a problem at
the moment but why not to prepare for a future.

> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Acked-by: Michal Hocko <mhocko@suse.com>


Thanks!
> ---
>  mm/vmalloc.c | 30 ++++++++++++++++++++++++++----
>  1 file changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index aaad569e8963..5297958ac7c5 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2785,10 +2785,32 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  	 * to fails, fallback to a single page allocator that is
>  	 * more permissive.
>  	 */
> -	if (!order)
> -		nr_allocated = alloc_pages_bulk_array_node(
> -			gfp, nid, nr_pages, pages);
> -	else
> +	if (!order) {
> +		while (nr_allocated < nr_pages) {
> +			int nr, nr_pages_request;
> +
> +			/*
> +			 * A maximum allowed request is hard-coded and is 100
> +			 * pages per call. That is done in order to prevent a
> +			 * long preemption off scenario in the bulk-allocator
> +			 * so the range is [1:100].
> +			 */
> +			nr_pages_request = min(100, (int)(nr_pages - nr_allocated));
> +
> +			nr = alloc_pages_bulk_array_node(gfp, nid,
> +				nr_pages_request, pages + nr_allocated);
> +
> +			nr_allocated += nr;
> +			cond_resched();
> +
> +			/*
> +			 * If zero or pages were obtained partly,
> +			 * fallback to a single page allocator.
> +			 */
> +			if (nr != nr_pages_request)
> +				break;
> +		}
> +	} else
>  		/*
>  		 * Compound pages required for remap_vmalloc_page if
>  		 * high-order pages.
> -- 
> 2.20.1
Uladzislau Rezki July 7, 2021, 2:47 p.m. UTC | #4
On Wed, Jul 07, 2021 at 10:42:29AM +0200, Michal Hocko wrote:
> On Tue 06-07-21 13:26:53, Andrew Morton wrote:
> > On Mon,  5 Jul 2021 19:05:36 +0200 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote:
> > 
> > > In case of simultaneous vmalloc allocations, for example it is 1GB and
> > > 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT
> > > kernel.
> > > 
> > > <snip>
> > > ...
> > >
> > > are obtained, i.e. do batched page requests adding cond_resched() meanwhile
> > > to reschedule. Batched value is hard-coded and is 100 pages per call.
> > > 
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > 
> > Can we please have a Fixes: for this?
> 
> Is this a fix for any actual real life problem? I mean allocating 1GB of
> vmalloc space back and forth sounds like a stretch to me.
>  
It is not a real scenario. I simulated it by the stress-suite tests. So the
Fixes tag is not needed, IMHO.

> > Is this fix important enough for 4.14-rcx?  I think so...
> 
> I do not think so. This is an improvement so that vmalloc behaves more
> sanely for those abusers...
>
A bulk-allocator has recently been introduced, so 4.x does not have it,
i.e. this change is not applicable and 4.x kernel does not suffer from
it.

--
Vlad Rezki
Uladzislau Rezki July 7, 2021, 2:49 p.m. UTC | #5
On Tue, Jul 06, 2021 at 01:26:53PM -0700, Andrew Morton wrote:
> On Mon,  5 Jul 2021 19:05:36 +0200 "Uladzislau Rezki (Sony)" <urezki@gmail.com> wrote:
> 
> > In case of simultaneous vmalloc allocations, for example it is 1GB and
> > 12 CPUs my system is able to hit "BUG: soft lockup" for !CONFIG_PREEMPT
> > kernel.
> > 
> > <snip>
> > ...
> >
> > are obtained, i.e. do batched page requests adding cond_resched() meanwhile
> > to reschedule. Batched value is hard-coded and is 100 pages per call.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> Can we please have a Fixes: for this?
> 
> Is this fix important enough for 4.14-rcx?  I think so...
> 
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2785,10 +2785,32 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
> >  	 * to fails, fallback to a single page allocator that is
> >  	 * more permissive.
> >  	 */
> > -	if (!order)
> > -		nr_allocated = alloc_pages_bulk_array_node(
> > -			gfp, nid, nr_pages, pages);
> > -	else
> > +	if (!order) {
> > +		while (nr_allocated < nr_pages) {
> > +			int nr, nr_pages_request;
> > +
> > +			/*
> > +			 * A maximum allowed request is hard-coded and is 100
> > +			 * pages per call. That is done in order to prevent a
> > +			 * long preemption off scenario in the bulk-allocator
> > +			 * so the range is [1:100].
> > +			 */
> > +			nr_pages_request = min(100, (int)(nr_pages - nr_allocated));
> 
> Yes, they types are all over the place.
> 
> nr_pages: unsigned long
> nr_allocated: unsigned int
> nr, nr_pages_request: int
> 
> Can we please choose the most appropriate type and use that
> consistently?
> 
Let me think over it to see what i can do.

--
Vlad Rezki
diff mbox series

Patch

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index aaad569e8963..5297958ac7c5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2785,10 +2785,32 @@  vm_area_alloc_pages(gfp_t gfp, int nid,
 	 * to fails, fallback to a single page allocator that is
 	 * more permissive.
 	 */
-	if (!order)
-		nr_allocated = alloc_pages_bulk_array_node(
-			gfp, nid, nr_pages, pages);
-	else
+	if (!order) {
+		while (nr_allocated < nr_pages) {
+			int nr, nr_pages_request;
+
+			/*
+			 * A maximum allowed request is hard-coded and is 100
+			 * pages per call. That is done in order to prevent a
+			 * long preemption off scenario in the bulk-allocator
+			 * so the range is [1:100].
+			 */
+			nr_pages_request = min(100, (int)(nr_pages - nr_allocated));
+
+			nr = alloc_pages_bulk_array_node(gfp, nid,
+				nr_pages_request, pages + nr_allocated);
+
+			nr_allocated += nr;
+			cond_resched();
+
+			/*
+			 * If zero or pages were obtained partly,
+			 * fallback to a single page allocator.
+			 */
+			if (nr != nr_pages_request)
+				break;
+		}
+	} else
 		/*
 		 * Compound pages required for remap_vmalloc_page if
 		 * high-order pages.