Message ID | 1540790176-32339-1-git-send-email-miles.chen@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] mm/page_owner: use kvmalloc instead of kmalloc | expand |
On Mon 29-10-18 13:16:16, miles.chen@mediatek.com wrote: > From: Miles Chen <miles.chen@mediatek.com> > > The kbuf used by page owner is allocated by kmalloc(), which means it > can use only normal memory and there might be a "out of memory" > issue when we're out of normal memory. > > To solve this problem, use kvmalloc() to allocate kbuf > from normal/highmem. But there is one problem here: kvmalloc() > does not fallback to vmalloc for sub page allocations. So sub > page allocation fails due to out of normal memory cannot fallback > to vmalloc. > > Modify kvmalloc() to allow sub page allocations fallback to > vmalloc when CONFIG_HIGHMEM=y and use kvmalloc() to allocate > kbuf. > > Clamp buffer size to PAGE_SIZE to avoid arbitrary size allocation. > > Change since v2: > - improve kvmalloc, allow sub page allocations fallback to > vmalloc when CONFIG_HIGHMEM=y Matthew has suggested a much more viable way to go around this. We do not really want to allow an unbound kernel allocation - even if the interface is root only. Besides that, the following doesn't make much sense to me. It simply makes no sense to use vmalloc for sub page allocation regardless of HIGHMEM. > diff --git a/mm/util.c b/mm/util.c > index 8bf08b5b5760..7b1c59b9bfbf 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -416,10 +416,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) > ret = kmalloc_node(size, kmalloc_flags, node); > > /* > - * It doesn't really make sense to fallback to vmalloc for sub page > - * requests > + * It only makes sense to fallback to vmalloc for sub page > + * requests if we might be able to allocate highmem pages. > */ > - if (ret || size <= PAGE_SIZE) > + if (ret || (!IS_ENABLED(CONFIG_HIGHMEM) && size <= PAGE_SIZE)) > return ret; > > return __vmalloc_node_flags_caller(size, node, flags, > -- > 2.18.0 >
On Mon 29-10-18 09:07:08, Michal Hocko wrote: [...] > Besides that, the following doesn't make much sense to me. It simply > makes no sense to use vmalloc for sub page allocation regardless of > HIGHMEM. OK, it is still early morning here. Now I get the point of the patch. You just want to (ab)use highmeme for smaller requests. I do not like this, to be honest. It causes an internal fragmentation and more importantly the VMALLOC space on 32b where HIGHMEM is enabled (do we have any 64b with HIGHMEM btw?) is quite small to be wasted like that. In any case such a changes should come with some numbers and as a separate patch for sure. > > diff --git a/mm/util.c b/mm/util.c > > index 8bf08b5b5760..7b1c59b9bfbf 100644 > > --- a/mm/util.c > > +++ b/mm/util.c > > @@ -416,10 +416,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) > > ret = kmalloc_node(size, kmalloc_flags, node); > > > > /* > > - * It doesn't really make sense to fallback to vmalloc for sub page > > - * requests > > + * It only makes sense to fallback to vmalloc for sub page > > + * requests if we might be able to allocate highmem pages. > > */ > > - if (ret || size <= PAGE_SIZE) > > + if (ret || (!IS_ENABLED(CONFIG_HIGHMEM) && size <= PAGE_SIZE)) > > return ret; > > > > return __vmalloc_node_flags_caller(size, node, flags, > > -- > > 2.18.0 > > > > -- > Michal Hocko > SUSE Labs
On Mon, 2018-10-29 at 09:17 +0100, Michal Hocko wrote: > On Mon 29-10-18 09:07:08, Michal Hocko wrote: > [...] > > Besides that, the following doesn't make much sense to me. It simply > > makes no sense to use vmalloc for sub page allocation regardless of > > HIGHMEM. > > OK, it is still early morning here. Now I get the point of the patch. > You just want to (ab)use highmeme for smaller requests. I do not like > this, to be honest. It causes an internal fragmentation and more > importantly the VMALLOC space on 32b where HIGHMEM is enabled (do we > have any 64b with HIGHMEM btw?) is quite small to be wasted like that. > thanks for your comment. It looks like that using vmalloc fallback for sub page allocation is not good here. Your comment gave another idea: 1. force kbuf to PAGE_SIZE 2. allocate a page by alloc_page(GFP_KERNEL | __GFP_HIGHMEM); so we can get a highmem page if possible 3. use kmap/kunmap pair to create mapping for this page. No vmalloc space is used. 4. do not change kvmalloc logic. > In any case such a changes should come with some numbers and as a > separate patch for sure. > > > > diff --git a/mm/util.c b/mm/util.c > > > index 8bf08b5b5760..7b1c59b9bfbf 100644 > > > --- a/mm/util.c > > > +++ b/mm/util.c > > > @@ -416,10 +416,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) > > > ret = kmalloc_node(size, kmalloc_flags, node); > > > > > > /* > > > - * It doesn't really make sense to fallback to vmalloc for sub page > > > - * requests > > > + * It only makes sense to fallback to vmalloc for sub page > > > + * requests if we might be able to allocate highmem pages. > > > */ > > > - if (ret || size <= PAGE_SIZE) > > > + if (ret || (!IS_ENABLED(CONFIG_HIGHMEM) && size <= PAGE_SIZE)) > > > return ret; > > > > > > return __vmalloc_node_flags_caller(size, node, flags, > > > -- > > > 2.18.0 > > > > > > > -- > > Michal Hocko > > SUSE Labs >
On Tue 30-10-18 09:29:10, Miles Chen wrote: > On Mon, 2018-10-29 at 09:17 +0100, Michal Hocko wrote: > > On Mon 29-10-18 09:07:08, Michal Hocko wrote: > > [...] > > > Besides that, the following doesn't make much sense to me. It simply > > > makes no sense to use vmalloc for sub page allocation regardless of > > > HIGHMEM. > > > > OK, it is still early morning here. Now I get the point of the patch. > > You just want to (ab)use highmeme for smaller requests. I do not like > > this, to be honest. It causes an internal fragmentation and more > > importantly the VMALLOC space on 32b where HIGHMEM is enabled (do we > > have any 64b with HIGHMEM btw?) is quite small to be wasted like that. > > > thanks for your comment. It looks like that using vmalloc fallback for > sub page allocation is not good here. > > Your comment gave another idea: > > 1. force kbuf to PAGE_SIZE > 2. allocate a page by alloc_page(GFP_KERNEL | __GFP_HIGHMEM); so we can > get a highmem page if possible > 3. use kmap/kunmap pair to create mapping for this page. No vmalloc > space is used. > 4. do not change kvmalloc logic. If you mean for this particular situation then is this really worth it? I mean this is a short term allocation for root only so you do not have to worry about low mem depletion. If you are thiking in more generic terms to allow kmalloc to use highmem then I am not really sure this will work out.
On Tue, 2018-10-30 at 07:06 +0100, Michal Hocko wrote: > On Tue 30-10-18 09:29:10, Miles Chen wrote: > > On Mon, 2018-10-29 at 09:17 +0100, Michal Hocko wrote: > > > On Mon 29-10-18 09:07:08, Michal Hocko wrote: > > > [...] > > > > Besides that, the following doesn't make much sense to me. It simply > > > > makes no sense to use vmalloc for sub page allocation regardless of > > > > HIGHMEM. > > > > > > OK, it is still early morning here. Now I get the point of the patch. > > > You just want to (ab)use highmeme for smaller requests. I do not like > > > this, to be honest. It causes an internal fragmentation and more > > > importantly the VMALLOC space on 32b where HIGHMEM is enabled (do we > > > have any 64b with HIGHMEM btw?) is quite small to be wasted like that. > > > > > thanks for your comment. It looks like that using vmalloc fallback for > > sub page allocation is not good here. > > > > Your comment gave another idea: > > > > 1. force kbuf to PAGE_SIZE > > 2. allocate a page by alloc_page(GFP_KERNEL | __GFP_HIGHMEM); so we can > > get a highmem page if possible > > 3. use kmap/kunmap pair to create mapping for this page. No vmalloc > > space is used. > > 4. do not change kvmalloc logic. > > If you mean for this particular situation then is this really worth > it? I mean this is a short term allocation for root only so you do not > have to worry about low mem depletion. The 1...3 are applied to print_page_owner(), not in kmalloc() or kvmalloc() logic. It's a real problem when using page_owner. I found this issue recently: I'm not able to read page_owner information during a overnight test. (error: read failed: Out of memory). I replace kmalloc() with vmalloc() and it worked well. > > If you are thiking in more generic terms to allow kmalloc to use highmem > then I am not really sure this will work out. I'm thinking about modify print_page_owner().
On Tue 30-10-18 14:55:51, Miles Chen wrote: [...] > It's a real problem when using page_owner. > I found this issue recently: I'm not able to read page_owner information > during a overnight test. (error: read failed: Out of memory). I replace > kmalloc() with vmalloc() and it worked well. Is this with trimming the allocation to a single page and doing shorter than requested reads?
On Tue, 2018-10-30 at 09:15 +0100, Michal Hocko wrote: > On Tue 30-10-18 14:55:51, Miles Chen wrote: > [...] > > It's a real problem when using page_owner. > > I found this issue recently: I'm not able to read page_owner information > > during a overnight test. (error: read failed: Out of memory). I replace > > kmalloc() with vmalloc() and it worked well. > > Is this with trimming the allocation to a single page and doing shorter > than requested reads? I printed out the allocate count on my device the request count is <= 4096. So I tested this scenario by trimming the count to from 4096 to 1024 bytes and it works fine. count = count > 1024? 1024: count; It tested it on both 32bit and 64bit kernel.
On Wed 31-10-18 16:47:17, Miles Chen wrote: > On Tue, 2018-10-30 at 09:15 +0100, Michal Hocko wrote: > > On Tue 30-10-18 14:55:51, Miles Chen wrote: > > [...] > > > It's a real problem when using page_owner. > > > I found this issue recently: I'm not able to read page_owner information > > > during a overnight test. (error: read failed: Out of memory). I replace > > > kmalloc() with vmalloc() and it worked well. > > > > Is this with trimming the allocation to a single page and doing shorter > > than requested reads? > > > I printed out the allocate count on my device the request count is <= > 4096. So I tested this scenario by trimming the count to from 4096 to > 1024 bytes and it works fine. > > count = count > 1024? 1024: count; > > It tested it on both 32bit and 64bit kernel. Are you saying that you see OOMs for 4k size?
On Wed, 2018-10-31 at 11:15 +0100, Michal Hocko wrote: > On Wed 31-10-18 16:47:17, Miles Chen wrote: > > On Tue, 2018-10-30 at 09:15 +0100, Michal Hocko wrote: > > > On Tue 30-10-18 14:55:51, Miles Chen wrote: > > > [...] > > > > It's a real problem when using page_owner. > > > > I found this issue recently: I'm not able to read page_owner information > > > > during a overnight test. (error: read failed: Out of memory). I replace > > > > kmalloc() with vmalloc() and it worked well. > > > > > > Is this with trimming the allocation to a single page and doing shorter > > > than requested reads? > > > > > > I printed out the allocate count on my device the request count is <= > > 4096. So I tested this scenario by trimming the count to from 4096 to > > 1024 bytes and it works fine. > > > > count = count > 1024? 1024: count; > > > > It tested it on both 32bit and 64bit kernel. > > Are you saying that you see OOMs for 4k size? > yes, because kmalloc only use normal memor, not highmem + normal memory I think that's why vmalloc() works.
On Wed 31-10-18 18:19:42, Miles Chen wrote: > On Wed, 2018-10-31 at 11:15 +0100, Michal Hocko wrote: > > On Wed 31-10-18 16:47:17, Miles Chen wrote: > > > On Tue, 2018-10-30 at 09:15 +0100, Michal Hocko wrote: > > > > On Tue 30-10-18 14:55:51, Miles Chen wrote: > > > > [...] > > > > > It's a real problem when using page_owner. > > > > > I found this issue recently: I'm not able to read page_owner information > > > > > during a overnight test. (error: read failed: Out of memory). I replace > > > > > kmalloc() with vmalloc() and it worked well. > > > > > > > > Is this with trimming the allocation to a single page and doing shorter > > > > than requested reads? > > > > > > > > > I printed out the allocate count on my device the request count is <= > > > 4096. So I tested this scenario by trimming the count to from 4096 to > > > 1024 bytes and it works fine. > > > > > > count = count > 1024? 1024: count; > > > > > > It tested it on both 32bit and 64bit kernel. > > > > Are you saying that you see OOMs for 4k size? > > > yes, because kmalloc only use normal memor, not highmem + normal memory > I think that's why vmalloc() works. Can I see an OOM report please? I am especially interested that 1k doesn't cause the problem because there shouldn't be that much of a difference between the two. Larger allocations could be a result of memory fragmentation but 1k vs. 4k to make a difference really seems unexpected.
On Wed, 2018-10-31 at 12:41 +0100, Michal Hocko wrote: > On Wed 31-10-18 18:19:42, Miles Chen wrote: > > On Wed, 2018-10-31 at 11:15 +0100, Michal Hocko wrote: > > > On Wed 31-10-18 16:47:17, Miles Chen wrote: > > > > On Tue, 2018-10-30 at 09:15 +0100, Michal Hocko wrote: > > > > > On Tue 30-10-18 14:55:51, Miles Chen wrote: > > > > > [...] > > > > > > It's a real problem when using page_owner. > > > > > > I found this issue recently: I'm not able to read page_owner information > > > > > > during a overnight test. (error: read failed: Out of memory). I replace > > > > > > kmalloc() with vmalloc() and it worked well. > > > > > > > > > > Is this with trimming the allocation to a single page and doing shorter > > > > > than requested reads? > > > > > > > > > > > > I printed out the allocate count on my device the request count is <= > > > > 4096. So I tested this scenario by trimming the count to from 4096 to > > > > 1024 bytes and it works fine. > > > > > > > > count = count > 1024? 1024: count; > > > > > > > > It tested it on both 32bit and 64bit kernel. > > > > > > Are you saying that you see OOMs for 4k size? > > > > > yes, because kmalloc only use normal memor, not highmem + normal memory > > I think that's why vmalloc() works. > > Can I see an OOM report please? I am especially interested that 1k > doesn't cause the problem because there shouldn't be that much of a > difference between the two. Larger allocations could be a result of > memory fragmentation but 1k vs. 4k to make a difference really seems > unexpected. > You're right. I pulled out the log and found that the allocation fail is for order=4. I found that the if I do the read on the device, the read count is <= 4096; if I do the read by 'adb pull' from my host PC, the read count becomes 65532. (I'm working on a android device) The overnight test used 'adb pull' command, so allocation fail occurred because of the large read count and the arbitrary size allocation design of page_owner. That also explains why vmalloc() works. I did a test today, the only code changed is to clamp to read count to PAGE_SIZE and it worked well. Maybe we can solve this issue by just clamping the read count. count = count > PAGE_SIZE ? PAGE_SIZE : count; Here is the log: <4>[ 261.841770] (0)[2880:sync svc 43]sync svc 43: page allocation failure: order:4, mode:0x24040c0 <4>[ 261.841815]-(0)[2880:sync svc 43]CPU: 0 PID: 2880 Comm: sync svc 43 Tainted: G W O 4.4.146+ #16 <4>[ 261.841825]-(0)[2880:sync svc 43]Hardware name: Generic DT based system <4>[ 261.841834]-(0)[2880:sync svc 43]Backtrace: <4>[ 261.841844]-(0)[2880:sync svc 43][<c010d57c>] (dump_backtrace) from [<c010d7a4>] (show_stack+0x18/0x1c) <4>[ 261.841866]-(0)[2880:sync svc 43] r6:60030013 r5:c123d488 r4:00000000 r3:dc8ba692 <4>[ 261.841880]-(0)[2880:sync svc 43][<c010d78c>] (show_stack) from [<c0470b84>] (dump_stack+0x94/0xa8) <4>[ 261.841892]-(0)[2880:sync svc 43][<c0470af0>] (dump_stack) from [<c0236060>] (warn_alloc_failed+0x108/0x148) <4>[ 261.841905]-(0)[2880:sync svc 43] r6:00000000 r5:024040c0 r4:c1204948 r3:dc8ba692 <4>[ 261.841919]-(0)[2880:sync svc 43][<c0235f5c>] (warn_alloc_failed) from [<c023a284>] (__alloc_pages_nodemask+0xa08/0xbd8) <4>[ 261.841929]-(0)[2880:sync svc 43] r3:0000000f r2:00000000 <4>[ 261.841939]-(0)[2880:sync svc 43] r8:0000002f r7:00000004 r6:dbb7a000 r5:024040c0 <4>[ 261.841953]-(0)[2880:sync svc 43][<c023987c>] (__alloc_pages_nodemask) from [<c023a5fc>] (alloc_kmem_pages+0x18/0x20) <4>[ 261.841963]-(0)[2880:sync svc 43] r10:c0286560 r9:c027b348 r8:0000fff8 r7:00000004 <4>[ 261.841978]-(0)[2880:sync svc 43][<c023a5e4>] (alloc_kmem_pages) from [<c02573c0>] (kmalloc_order_trace+0x2c/0xec)
On Thu 01-11-18 18:00:12, Miles Chen wrote: [...] > I did a test today, the only code changed is to clamp to read count to > PAGE_SIZE and it worked well. Maybe we can solve this issue by just > clamping the read count. > > count = count > PAGE_SIZE ? PAGE_SIZE : count; This i what Matthew was proposing AFAIR. At least as a stop gap solution. Maybe we want to extend this to a more standard implementation later on (e.g. seq_file).
diff --git a/mm/page_owner.c b/mm/page_owner.c index d80adfe702d3..a064cd046361 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -1,7 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include <linux/debugfs.h> #include <linux/mm.h> -#include <linux/slab.h> #include <linux/uaccess.h> #include <linux/bootmem.h> #include <linux/stacktrace.h> @@ -351,7 +350,8 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn, .skip = 0 }; - kbuf = kmalloc(count, GFP_KERNEL); + count = count > PAGE_SIZE ? PAGE_SIZE : count; + kbuf = kvmalloc(count, GFP_KERNEL); if (!kbuf) return -ENOMEM; @@ -397,11 +397,11 @@ print_page_owner(char __user *buf, size_t count, unsigned long pfn, if (copy_to_user(buf, kbuf, ret)) ret = -EFAULT; - kfree(kbuf); + kvfree(kbuf); return ret; err: - kfree(kbuf); + kvfree(kbuf); return -ENOMEM; } diff --git a/mm/util.c b/mm/util.c index 8bf08b5b5760..7b1c59b9bfbf 100644 --- a/mm/util.c +++ b/mm/util.c @@ -416,10 +416,10 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) ret = kmalloc_node(size, kmalloc_flags, node); /* - * It doesn't really make sense to fallback to vmalloc for sub page - * requests + * It only makes sense to fallback to vmalloc for sub page + * requests if we might be able to allocate highmem pages. */ - if (ret || size <= PAGE_SIZE) + if (ret || (!IS_ENABLED(CONFIG_HIGHMEM) && size <= PAGE_SIZE)) return ret; return __vmalloc_node_flags_caller(size, node, flags,