diff mbox series

[v3] mm/page_owner: use kvmalloc instead of kmalloc

Message ID 1540790176-32339-1-git-send-email-miles.chen@mediatek.com (mailing list archive)
State New, archived
Headers show
Series [v3] mm/page_owner: use kvmalloc instead of kmalloc | expand

Commit Message

Miles Chen Oct. 29, 2018, 5:16 a.m. UTC
From: Miles Chen <miles.chen@mediatek.com>

The kbuf used by page owner is allocated by kmalloc(), which means it
can use only normal memory and there might be a "out of memory"
issue when we're out of normal memory.

To solve this problem, use kvmalloc() to allocate kbuf
from normal/highmem. But there is one problem here: kvmalloc()
does not fallback to vmalloc for sub page allocations. So sub
page allocation fails due to out of normal memory cannot fallback
to vmalloc.

Modify kvmalloc() to allow sub page allocations fallback to
vmalloc when CONFIG_HIGHMEM=y and use kvmalloc() to allocate
kbuf.

Clamp buffer size to PAGE_SIZE to avoid arbitrary size allocation.

Change since v2:
  - improve kvmalloc, allow sub page allocations fallback to
    vmalloc when CONFIG_HIGHMEM=y

Change since v1:
  - use kvmalloc()
  - clamp buffer size to PAGE_SIZE

Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Cc: Joe Perches <joe@perches.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
---
 mm/page_owner.c | 8 ++++----
 mm/util.c       | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

Comments

Michal Hocko Oct. 29, 2018, 8:07 a.m. UTC | #1
On Mon 29-10-18 13:16:16, miles.chen@mediatek.com wrote:
> From: Miles Chen <miles.chen@mediatek.com>
> 
> The kbuf used by page owner is allocated by kmalloc(), which means it
> can use only normal memory and there might be a "out of memory"
> issue when we're out of normal memory.
> 
> To solve this problem, use kvmalloc() to allocate kbuf
> from normal/highmem. But there is one problem here: kvmalloc()
> does not fallback to vmalloc for sub page allocations. So sub
> page allocation fails due to out of normal memory cannot fallback
> to vmalloc.
> 
> Modify kvmalloc() to allow sub page allocations fallback to
> vmalloc when CONFIG_HIGHMEM=y and use kvmalloc() to allocate
> kbuf.
> 
> Clamp buffer size to PAGE_SIZE to avoid arbitrary size allocation.
> 
> Change since v2:
>   - improve kvmalloc, allow sub page allocations fallback to
>     vmalloc when CONFIG_HIGHMEM=y

Matthew has suggested a much more viable way to go around this. We do
not really want to allow an unbound kernel allocation - even if the
interface is root only.

Besides that, the following doesn't make much sense to me. It simply
makes no sense to use vmalloc for sub page allocation regardless of
HIGHMEM.

> diff --git a/mm/util.c b/mm/util.c
> index 8bf08b5b5760..7b1c59b9bfbf 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -416,10 +416,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	ret = kmalloc_node(size, kmalloc_flags, node);
>  
>  	/*
> -	 * It doesn't really make sense to fallback to vmalloc for sub page
> -	 * requests
> +	 * It only makes sense to fallback to vmalloc for sub page
> +	 * requests if we might be able to allocate highmem pages.
>  	 */
> -	if (ret || size <= PAGE_SIZE)
> +	if (ret || (!IS_ENABLED(CONFIG_HIGHMEM) && size <= PAGE_SIZE))
>  		return ret;
>  
>  	return __vmalloc_node_flags_caller(size, node, flags,
> -- 
> 2.18.0
>
Michal Hocko Oct. 29, 2018, 8:17 a.m. UTC | #2
On Mon 29-10-18 09:07:08, Michal Hocko wrote:
[...]
> Besides that, the following doesn't make much sense to me. It simply
> makes no sense to use vmalloc for sub page allocation regardless of
> HIGHMEM.

OK, it is still early morning here. Now I get the point of the patch.
You just want to (ab)use highmeme for smaller requests. I do not like
this, to be honest. It causes an internal fragmentation and more
importantly the VMALLOC space on 32b where HIGHMEM is enabled (do we
have any 64b with HIGHMEM btw?) is quite small to be wasted like that.

In any case such a changes should come with some numbers and as a
separate patch for sure.

> > diff --git a/mm/util.c b/mm/util.c
> > index 8bf08b5b5760..7b1c59b9bfbf 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -416,10 +416,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
> >  	ret = kmalloc_node(size, kmalloc_flags, node);
> >  
> >  	/*
> > -	 * It doesn't really make sense to fallback to vmalloc for sub page
> > -	 * requests
> > +	 * It only makes sense to fallback to vmalloc for sub page
> > +	 * requests if we might be able to allocate highmem pages.
> >  	 */
> > -	if (ret || size <= PAGE_SIZE)
> > +	if (ret || (!IS_ENABLED(CONFIG_HIGHMEM) && size <= PAGE_SIZE))
> >  		return ret;
> >  
> >  	return __vmalloc_node_flags_caller(size, node, flags,
> > -- 
> > 2.18.0
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs
Miles Chen Oct. 30, 2018, 1:29 a.m. UTC | #3
On Mon, 2018-10-29 at 09:17 +0100, Michal Hocko wrote:
> On Mon 29-10-18 09:07:08, Michal Hocko wrote:
> [...]
> > Besides that, the following doesn't make much sense to me. It simply
> > makes no sense to use vmalloc for sub page allocation regardless of
> > HIGHMEM.
> 
> OK, it is still early morning here. Now I get the point of the patch.
> You just want to (ab)use highmeme for smaller requests. I do not like
> this, to be honest. It causes an internal fragmentation and more
> importantly the VMALLOC space on 32b where HIGHMEM is enabled (do we
> have any 64b with HIGHMEM btw?) is quite small to be wasted like that.
> 
thanks for your comment. It looks like that using vmalloc fallback for
sub page allocation is not good here.

Your comment gave another idea:

1. force kbuf to PAGE_SIZE
2. allocate a page by alloc_page(GFP_KERNEL | __GFP_HIGHMEM); so we can
get a highmem page if possible
3. use kmap/kunmap pair to create mapping for this page. No vmalloc
space is used.
4. do not change kvmalloc logic.


> In any case such a changes should come with some numbers and as a
> separate patch for sure.
> 
> > > diff --git a/mm/util.c b/mm/util.c
> > > index 8bf08b5b5760..7b1c59b9bfbf 100644
> > > --- a/mm/util.c
> > > +++ b/mm/util.c
> > > @@ -416,10 +416,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
> > >  	ret = kmalloc_node(size, kmalloc_flags, node);
> > >  
> > >  	/*
> > > -	 * It doesn't really make sense to fallback to vmalloc for sub page
> > > -	 * requests
> > > +	 * It only makes sense to fallback to vmalloc for sub page
> > > +	 * requests if we might be able to allocate highmem pages.
> > >  	 */
> > > -	if (ret || size <= PAGE_SIZE)
> > > +	if (ret || (!IS_ENABLED(CONFIG_HIGHMEM) && size <= PAGE_SIZE))
> > >  		return ret;
> > >  
> > >  	return __vmalloc_node_flags_caller(size, node, flags,
> > > -- 
> > > 2.18.0
> > > 
> > 
> > -- 
> > Michal Hocko
> > SUSE Labs
>
Michal Hocko Oct. 30, 2018, 6:06 a.m. UTC | #4
On Tue 30-10-18 09:29:10, Miles Chen wrote:
> On Mon, 2018-10-29 at 09:17 +0100, Michal Hocko wrote:
> > On Mon 29-10-18 09:07:08, Michal Hocko wrote:
> > [...]
> > > Besides that, the following doesn't make much sense to me. It simply
> > > makes no sense to use vmalloc for sub page allocation regardless of
> > > HIGHMEM.
> > 
> > OK, it is still early morning here. Now I get the point of the patch.
> > You just want to (ab)use highmeme for smaller requests. I do not like
> > this, to be honest. It causes an internal fragmentation and more
> > importantly the VMALLOC space on 32b where HIGHMEM is enabled (do we
> > have any 64b with HIGHMEM btw?) is quite small to be wasted like that.
> > 
> thanks for your comment. It looks like that using vmalloc fallback for
> sub page allocation is not good here.
> 
> Your comment gave another idea:
> 
> 1. force kbuf to PAGE_SIZE
> 2. allocate a page by alloc_page(GFP_KERNEL | __GFP_HIGHMEM); so we can
> get a highmem page if possible
> 3. use kmap/kunmap pair to create mapping for this page. No vmalloc
> space is used.
> 4. do not change kvmalloc logic.

If you mean for this particular situation then is this really worth
it? I mean this is a short term allocation for root only so you do not
have to worry about low mem depletion.

If you are thiking in more generic terms to allow kmalloc to use highmem
then I am not really sure this will work out.
Miles Chen Oct. 30, 2018, 6:55 a.m. UTC | #5
On Tue, 2018-10-30 at 07:06 +0100, Michal Hocko wrote:
> On Tue 30-10-18 09:29:10, Miles Chen wrote:
> > On Mon, 2018-10-29 at 09:17 +0100, Michal Hocko wrote:
> > > On Mon 29-10-18 09:07:08, Michal Hocko wrote:
> > > [...]
> > > > Besides that, the following doesn't make much sense to me. It simply
> > > > makes no sense to use vmalloc for sub page allocation regardless of
> > > > HIGHMEM.
> > > 
> > > OK, it is still early morning here. Now I get the point of the patch.
> > > You just want to (ab)use highmeme for smaller requests. I do not like
> > > this, to be honest. It causes an internal fragmentation and more
> > > importantly the VMALLOC space on 32b where HIGHMEM is enabled (do we
> > > have any 64b with HIGHMEM btw?) is quite small to be wasted like that.
> > > 
> > thanks for your comment. It looks like that using vmalloc fallback for
> > sub page allocation is not good here.
> > 
> > Your comment gave another idea:
> > 
> > 1. force kbuf to PAGE_SIZE
> > 2. allocate a page by alloc_page(GFP_KERNEL | __GFP_HIGHMEM); so we can
> > get a highmem page if possible
> > 3. use kmap/kunmap pair to create mapping for this page. No vmalloc
> > space is used.
> > 4. do not change kvmalloc logic.
> 
> If you mean for this particular situation then is this really worth
> it? I mean this is a short term allocation for root only so you do not
> have to worry about low mem depletion.

The 1...3 are applied to print_page_owner(), not in kmalloc() or
kvmalloc() logic. 


It's a real problem when using page_owner.
I found this issue recently: I'm not able to read page_owner information
during a overnight test. (error: read failed: Out of memory). I replace
kmalloc() with vmalloc() and it worked well.

> 
> If you are thiking in more generic terms to allow kmalloc to use highmem
> then I am not really sure this will work out.

I'm thinking about modify print_page_owner().
Michal Hocko Oct. 30, 2018, 8:15 a.m. UTC | #6
On Tue 30-10-18 14:55:51, Miles Chen wrote:
[...]
> It's a real problem when using page_owner.
> I found this issue recently: I'm not able to read page_owner information
> during a overnight test. (error: read failed: Out of memory). I replace
> kmalloc() with vmalloc() and it worked well.

Is this with trimming the allocation to a single page and doing shorter
than requested reads?
Miles Chen Oct. 31, 2018, 8:47 a.m. UTC | #7
On Tue, 2018-10-30 at 09:15 +0100, Michal Hocko wrote:
> On Tue 30-10-18 14:55:51, Miles Chen wrote:
> [...]
> > It's a real problem when using page_owner.
> > I found this issue recently: I'm not able to read page_owner information
> > during a overnight test. (error: read failed: Out of memory). I replace
> > kmalloc() with vmalloc() and it worked well.
> 
> Is this with trimming the allocation to a single page and doing shorter
> than requested reads?


I printed out the allocate count on my device the request count is <=
4096. So I tested this scenario by trimming the count to from 4096 to
1024 bytes and it works fine. 

count = count > 1024? 1024: count;

It tested it on both 32bit and 64bit kernel.
Michal Hocko Oct. 31, 2018, 10:15 a.m. UTC | #8
On Wed 31-10-18 16:47:17, Miles Chen wrote:
> On Tue, 2018-10-30 at 09:15 +0100, Michal Hocko wrote:
> > On Tue 30-10-18 14:55:51, Miles Chen wrote:
> > [...]
> > > It's a real problem when using page_owner.
> > > I found this issue recently: I'm not able to read page_owner information
> > > during a overnight test. (error: read failed: Out of memory). I replace
> > > kmalloc() with vmalloc() and it worked well.
> > 
> > Is this with trimming the allocation to a single page and doing shorter
> > than requested reads?
> 
> 
> I printed out the allocate count on my device the request count is <=
> 4096. So I tested this scenario by trimming the count to from 4096 to
> 1024 bytes and it works fine. 
> 
> count = count > 1024? 1024: count;
> 
> It tested it on both 32bit and 64bit kernel.

Are you saying that you see OOMs for 4k size?
Miles Chen Oct. 31, 2018, 10:19 a.m. UTC | #9
On Wed, 2018-10-31 at 11:15 +0100, Michal Hocko wrote:
> On Wed 31-10-18 16:47:17, Miles Chen wrote:
> > On Tue, 2018-10-30 at 09:15 +0100, Michal Hocko wrote:
> > > On Tue 30-10-18 14:55:51, Miles Chen wrote:
> > > [...]
> > > > It's a real problem when using page_owner.
> > > > I found this issue recently: I'm not able to read page_owner information
> > > > during a overnight test. (error: read failed: Out of memory). I replace
> > > > kmalloc() with vmalloc() and it worked well.
> > > 
> > > Is this with trimming the allocation to a single page and doing shorter
> > > than requested reads?
> > 
> > 
> > I printed out the allocate count on my device the request count is <=
> > 4096. So I tested this scenario by trimming the count to from 4096 to
> > 1024 bytes and it works fine. 
> > 
> > count = count > 1024? 1024: count;
> > 
> > It tested it on both 32bit and 64bit kernel.
> 
> Are you saying that you see OOMs for 4k size?
> 
yes, because kmalloc only use normal memor, not highmem + normal memory
I think that's why vmalloc() works.
Michal Hocko Oct. 31, 2018, 11:41 a.m. UTC | #10
On Wed 31-10-18 18:19:42, Miles Chen wrote:
> On Wed, 2018-10-31 at 11:15 +0100, Michal Hocko wrote:
> > On Wed 31-10-18 16:47:17, Miles Chen wrote:
> > > On Tue, 2018-10-30 at 09:15 +0100, Michal Hocko wrote:
> > > > On Tue 30-10-18 14:55:51, Miles Chen wrote:
> > > > [...]
> > > > > It's a real problem when using page_owner.
> > > > > I found this issue recently: I'm not able to read page_owner information
> > > > > during a overnight test. (error: read failed: Out of memory). I replace
> > > > > kmalloc() with vmalloc() and it worked well.
> > > > 
> > > > Is this with trimming the allocation to a single page and doing shorter
> > > > than requested reads?
> > > 
> > > 
> > > I printed out the allocate count on my device the request count is <=
> > > 4096. So I tested this scenario by trimming the count to from 4096 to
> > > 1024 bytes and it works fine. 
> > > 
> > > count = count > 1024? 1024: count;
> > > 
> > > It tested it on both 32bit and 64bit kernel.
> > 
> > Are you saying that you see OOMs for 4k size?
> > 
> yes, because kmalloc only use normal memor, not highmem + normal memory
> I think that's why vmalloc() works.

Can I see an OOM report please? I am especially interested that 1k
doesn't cause the problem because there shouldn't be that much of a
difference between the two. Larger allocations could be a result of
memory fragmentation but 1k vs. 4k to make a difference really seems
unexpected.
Miles Chen Nov. 1, 2018, 10 a.m. UTC | #11
On Wed, 2018-10-31 at 12:41 +0100, Michal Hocko wrote:
> On Wed 31-10-18 18:19:42, Miles Chen wrote:
> > On Wed, 2018-10-31 at 11:15 +0100, Michal Hocko wrote:
> > > On Wed 31-10-18 16:47:17, Miles Chen wrote:
> > > > On Tue, 2018-10-30 at 09:15 +0100, Michal Hocko wrote:
> > > > > On Tue 30-10-18 14:55:51, Miles Chen wrote:
> > > > > [...]
> > > > > > It's a real problem when using page_owner.
> > > > > > I found this issue recently: I'm not able to read page_owner information
> > > > > > during a overnight test. (error: read failed: Out of memory). I replace
> > > > > > kmalloc() with vmalloc() and it worked well.
> > > > > 
> > > > > Is this with trimming the allocation to a single page and doing shorter
> > > > > than requested reads?
> > > > 
> > > > 
> > > > I printed out the allocate count on my device the request count is <=
> > > > 4096. So I tested this scenario by trimming the count to from 4096 to
> > > > 1024 bytes and it works fine. 
> > > > 
> > > > count = count > 1024? 1024: count;
> > > > 
> > > > It tested it on both 32bit and 64bit kernel.
> > > 
> > > Are you saying that you see OOMs for 4k size?
> > > 
> > yes, because kmalloc only use normal memor, not highmem + normal memory
> > I think that's why vmalloc() works.
> 
> Can I see an OOM report please? I am especially interested that 1k
> doesn't cause the problem because there shouldn't be that much of a
> difference between the two. Larger allocations could be a result of
> memory fragmentation but 1k vs. 4k to make a difference really seems
> unexpected.
> 
You're right.

I pulled out the log and found  that the allocation fail is for order=4.

I found that the if I do the read on the device, the read count is <=
4096; if I do the read by 'adb pull' from my host PC, the read count
becomes 65532. (I'm working on a android device)

The overnight test used 'adb pull' command, so allocation fail occurred
because of the large read count and the arbitrary size allocation design
of page_owner. That also explains why vmalloc() works.

I did a test today, the only code changed is to clamp to read count to
PAGE_SIZE and it worked well. Maybe we can solve this issue by just
clamping the read count.

count = count > PAGE_SIZE ? PAGE_SIZE : count;


Here is the log:

<4>[  261.841770] (0)[2880:sync svc 43]sync svc 43: page allocation
failure: order:4, mode:0x24040c0
<4>[  261.841815]-(0)[2880:sync svc 43]CPU: 0 PID: 2880 Comm: sync svc
43 Tainted: G        W  O    4.4.146+ #16
<4>[  261.841825]-(0)[2880:sync svc 43]Hardware name: Generic DT based
system
<4>[  261.841834]-(0)[2880:sync svc 43]Backtrace:
<4>[  261.841844]-(0)[2880:sync svc 43][<c010d57c>] (dump_backtrace)
from [<c010d7a4>] (show_stack+0x18/0x1c)
<4>[  261.841866]-(0)[2880:sync svc 43] r6:60030013 r5:c123d488
r4:00000000 r3:dc8ba692
<4>[  261.841880]-(0)[2880:sync svc 43][<c010d78c>] (show_stack) from
[<c0470b84>] (dump_stack+0x94/0xa8)
<4>[  261.841892]-(0)[2880:sync svc 43][<c0470af0>] (dump_stack) from
[<c0236060>] (warn_alloc_failed+0x108/0x148)
<4>[  261.841905]-(0)[2880:sync svc 43] r6:00000000 r5:024040c0
r4:c1204948 r3:dc8ba692
<4>[  261.841919]-(0)[2880:sync svc 43][<c0235f5c>] (warn_alloc_failed)
from [<c023a284>] (__alloc_pages_nodemask+0xa08/0xbd8)
<4>[  261.841929]-(0)[2880:sync svc 43] r3:0000000f r2:00000000
<4>[  261.841939]-(0)[2880:sync svc 43] r8:0000002f r7:00000004
r6:dbb7a000 r5:024040c0
<4>[  261.841953]-(0)[2880:sync svc 43][<c023987c>]
(__alloc_pages_nodemask) from [<c023a5fc>] (alloc_kmem_pages+0x18/0x20)
<4>[  261.841963]-(0)[2880:sync svc 43] r10:c0286560 r9:c027b348
r8:0000fff8 r7:00000004
<4>[  261.841978]-(0)[2880:sync svc 43][<c023a5e4>] (alloc_kmem_pages)
from [<c02573c0>] (kmalloc_order_trace+0x2c/0xec)
Michal Hocko Nov. 1, 2018, 10:27 a.m. UTC | #12
On Thu 01-11-18 18:00:12, Miles Chen wrote:
[...]
> I did a test today, the only code changed is to clamp to read count to
> PAGE_SIZE and it worked well. Maybe we can solve this issue by just
> clamping the read count.
> 
> count = count > PAGE_SIZE ? PAGE_SIZE : count;

This i what Matthew was proposing AFAIR. At least as a stop gap
solution. Maybe we want to extend this to a more standard implementation
later on (e.g. seq_file).
diff mbox series

Patch

diff --git a/mm/page_owner.c b/mm/page_owner.c
index d80adfe702d3..a064cd046361 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -1,7 +1,6 @@ 
 // SPDX-License-Identifier: GPL-2.0
 #include <linux/debugfs.h>
 #include <linux/mm.h>
-#include <linux/slab.h>
 #include <linux/uaccess.h>
 #include <linux/bootmem.h>
 #include <linux/stacktrace.h>
@@ -351,7 +350,8 @@  print_page_owner(char __user *buf, size_t count, unsigned long pfn,
 		.skip = 0
 	};
 
-	kbuf = kmalloc(count, GFP_KERNEL);
+	count = count > PAGE_SIZE ? PAGE_SIZE : count;
+	kbuf = kvmalloc(count, GFP_KERNEL);
 	if (!kbuf)
 		return -ENOMEM;
 
@@ -397,11 +397,11 @@  print_page_owner(char __user *buf, size_t count, unsigned long pfn,
 	if (copy_to_user(buf, kbuf, ret))
 		ret = -EFAULT;
 
-	kfree(kbuf);
+	kvfree(kbuf);
 	return ret;
 
 err:
-	kfree(kbuf);
+	kvfree(kbuf);
 	return -ENOMEM;
 }
 
diff --git a/mm/util.c b/mm/util.c
index 8bf08b5b5760..7b1c59b9bfbf 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -416,10 +416,10 @@  void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	ret = kmalloc_node(size, kmalloc_flags, node);
 
 	/*
-	 * It doesn't really make sense to fallback to vmalloc for sub page
-	 * requests
+	 * It only makes sense to fallback to vmalloc for sub page
+	 * requests if we might be able to allocate highmem pages.
 	 */
-	if (ret || size <= PAGE_SIZE)
+	if (ret || (!IS_ENABLED(CONFIG_HIGHMEM) && size <= PAGE_SIZE))
 		return ret;
 
 	return __vmalloc_node_flags_caller(size, node, flags,