diff mbox series

[v2] alloc_tag: handle incomplete bulk allocations in vm_module_tags_populate

Message ID 20250409225111.3770347-1-tjmercier@google.com (mailing list archive)
State New
Headers show
Series [v2] alloc_tag: handle incomplete bulk allocations in vm_module_tags_populate | expand

Commit Message

T.J. Mercier April 9, 2025, 10:51 p.m. UTC
alloc_pages_bulk_node may partially succeed and allocate fewer than the
requested nr_pages. There are several conditions under which this can
occur, but we have encountered the case where CONFIG_PAGE_OWNER is
enabled causing all bulk allocations to always fallback to single page
allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page
allocator recursion with pagesets.lock held").

Currently vm_module_tags_populate immediately fails when
alloc_pages_bulk_node returns fewer than the requested number of pages.
When this happens memory allocation profiling gets disabled, for example

[   14.297583] [9:       modprobe:  465] Failed to allocate memory for allocation tags in the module scsc_wlan. Memory allocation profiling is disabled!
[   14.299339] [9:       modprobe:  465] modprobe: Failed to insmod '/vendor/lib/modules/scsc_wlan.ko' with args '': Out of memory

This patch causes vm_module_tags_populate to retry bulk allocations for
the remaining memory instead of failing immediately which will avoid the
disablement of memory allocation profiling.

Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com>
Fixes: 0f9b685626da ("alloc_tag: populate memory for module tags as needed")
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
---
 lib/alloc_tag.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Andrew Morton April 10, 2025, 12:12 a.m. UTC | #1
On Wed,  9 Apr 2025 22:51:11 +0000 "T.J. Mercier" <tjmercier@google.com> wrote:

> alloc_pages_bulk_node may partially succeed and allocate fewer than the
> requested nr_pages. There are several conditions under which this can
> occur, but we have encountered the case where CONFIG_PAGE_OWNER is
> enabled causing all bulk allocations to always fallback to single page
> allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page
> allocator recursion with pagesets.lock held").
> 
> Currently vm_module_tags_populate immediately fails when
> alloc_pages_bulk_node returns fewer than the requested number of pages.
> When this happens memory allocation profiling gets disabled, for example
> 
> [   14.297583] [9:       modprobe:  465] Failed to allocate memory for allocation tags in the module scsc_wlan. Memory allocation profiling is disabled!
> [   14.299339] [9:       modprobe:  465] modprobe: Failed to insmod '/vendor/lib/modules/scsc_wlan.ko' with args '': Out of memory
> 
> This patch causes vm_module_tags_populate to retry bulk allocations for
> the remaining memory instead of failing immediately which will avoid the
> disablement of memory allocation profiling.
> 

Thanks.  I'm assuming we want cc:stable on this?

btw, it looks like the "Clean up and error out" code in
vm_module_tags_populate() could use release_pages().
Suren Baghdasaryan April 10, 2025, 1:44 a.m. UTC | #2
On Thu, Apr 10, 2025 at 12:12 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Wed,  9 Apr 2025 22:51:11 +0000 "T.J. Mercier" <tjmercier@google.com> wrote:
>
> > alloc_pages_bulk_node may partially succeed and allocate fewer than the
> > requested nr_pages. There are several conditions under which this can
> > occur, but we have encountered the case where CONFIG_PAGE_OWNER is
> > enabled causing all bulk allocations to always fallback to single page
> > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page
> > allocator recursion with pagesets.lock held").
> >
> > Currently vm_module_tags_populate immediately fails when
> > alloc_pages_bulk_node returns fewer than the requested number of pages.
> > When this happens memory allocation profiling gets disabled, for example
> >
> > [   14.297583] [9:       modprobe:  465] Failed to allocate memory for allocation tags in the module scsc_wlan. Memory allocation profiling is disabled!
> > [   14.299339] [9:       modprobe:  465] modprobe: Failed to insmod '/vendor/lib/modules/scsc_wlan.ko' with args '': Out of memory
> >
> > This patch causes vm_module_tags_populate to retry bulk allocations for
> > the remaining memory instead of failing immediately which will avoid the
> > disablement of memory allocation profiling.
> >
>
> Thanks.  I'm assuming we want cc:stable on this?
>
> btw, it looks like the "Clean up and error out" code in
> vm_module_tags_populate() could use release_pages().

True. I'll add that into my TODO list. Thanks!

>
Yunsheng Lin April 10, 2025, 2:52 a.m. UTC | #3
On 2025/4/10 9:44, Suren Baghdasaryan wrote:
> On Thu, Apr 10, 2025 at 12:12 AM Andrew Morton
> <akpm@linux-foundation.org> wrote:
>>
>> On Wed,  9 Apr 2025 22:51:11 +0000 "T.J. Mercier" <tjmercier@google.com> wrote:
>>
>>> alloc_pages_bulk_node may partially succeed and allocate fewer than the
>>> requested nr_pages. There are several conditions under which this can
>>> occur, but we have encountered the case where CONFIG_PAGE_OWNER is
>>> enabled causing all bulk allocations to always fallback to single page
>>> allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page
>>> allocator recursion with pagesets.lock held").
>>>
>>> Currently vm_module_tags_populate immediately fails when
>>> alloc_pages_bulk_node returns fewer than the requested number of pages.
>>> When this happens memory allocation profiling gets disabled, for example
>>>
>>> [   14.297583] [9:       modprobe:  465] Failed to allocate memory for allocation tags in the module scsc_wlan. Memory allocation profiling is disabled!
>>> [   14.299339] [9:       modprobe:  465] modprobe: Failed to insmod '/vendor/lib/modules/scsc_wlan.ko' with args '': Out of memory
>>>
>>> This patch causes vm_module_tags_populate to retry bulk allocations for
>>> the remaining memory instead of failing immediately which will avoid the
>>> disablement of memory allocation profiling.
>>>
>>
>> Thanks.  I'm assuming we want cc:stable on this?
>>
>> btw, it looks like the "Clean up and error out" code in
>> vm_module_tags_populate() could use release_pages().

For the 'Clean up and error out' part:
next_page[] array might need to be reset to NULL if user is able to
reenable the memory allocation profiling when the above happens as the
current page bulk alloc API are only populating NULL elements.

> 
> True. I'll add that into my TODO list. Thanks!
> 
>>
>
Suren Baghdasaryan April 10, 2025, 10:20 p.m. UTC | #4
On Wed, Apr 9, 2025 at 7:52 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2025/4/10 9:44, Suren Baghdasaryan wrote:
> > On Thu, Apr 10, 2025 at 12:12 AM Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> >>
> >> On Wed,  9 Apr 2025 22:51:11 +0000 "T.J. Mercier" <tjmercier@google.com> wrote:
> >>
> >>> alloc_pages_bulk_node may partially succeed and allocate fewer than the
> >>> requested nr_pages. There are several conditions under which this can
> >>> occur, but we have encountered the case where CONFIG_PAGE_OWNER is
> >>> enabled causing all bulk allocations to always fallback to single page
> >>> allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page
> >>> allocator recursion with pagesets.lock held").
> >>>
> >>> Currently vm_module_tags_populate immediately fails when
> >>> alloc_pages_bulk_node returns fewer than the requested number of pages.
> >>> When this happens memory allocation profiling gets disabled, for example
> >>>
> >>> [   14.297583] [9:       modprobe:  465] Failed to allocate memory for allocation tags in the module scsc_wlan. Memory allocation profiling is disabled!
> >>> [   14.299339] [9:       modprobe:  465] modprobe: Failed to insmod '/vendor/lib/modules/scsc_wlan.ko' with args '': Out of memory
> >>>
> >>> This patch causes vm_module_tags_populate to retry bulk allocations for
> >>> the remaining memory instead of failing immediately which will avoid the
> >>> disablement of memory allocation profiling.
> >>>
> >>
> >> Thanks.  I'm assuming we want cc:stable on this?
> >>
> >> btw, it looks like the "Clean up and error out" code in
> >> vm_module_tags_populate() could use release_pages().
>
> For the 'Clean up and error out' part:
> next_page[] array might need to be reset to NULL if user is able to
> reenable the memory allocation profiling when the above happens as the
> current page bulk alloc API are only populating NULL elements.

We shouldn't be able to re-enable memory allocation profiling once
vm_module_tags_populate() fails. In that case shutdown_mem_profiling()
call disables memory allocation profiling and sets
mem_profiling_support=false. I might have to modify
memory_allocation_profiling_sysctls to prevent users from trying to
re-enable profiling via sysfs if mem_profiling_support is set to
false. Will take a closer look at that but regarding your comment,
re-enabling profiling once it's shut down is not a valid usecase.

>
> >
> > True. I'll add that into my TODO list. Thanks!
> >
> >>
> >
diff mbox series

Patch

diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 1d893e313614..25ecc1334b67 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -422,11 +422,20 @@  static int vm_module_tags_populate(void)
 		unsigned long old_shadow_end = ALIGN(phys_end, MODULE_ALIGN);
 		unsigned long new_shadow_end = ALIGN(new_end, MODULE_ALIGN);
 		unsigned long more_pages;
-		unsigned long nr;
+		unsigned long nr = 0;
 
 		more_pages = ALIGN(new_end - phys_end, PAGE_SIZE) >> PAGE_SHIFT;
-		nr = alloc_pages_bulk_node(GFP_KERNEL | __GFP_NOWARN,
-					   NUMA_NO_NODE, more_pages, next_page);
+		while (nr < more_pages) {
+			unsigned long allocated;
+
+			allocated = alloc_pages_bulk_node(GFP_KERNEL | __GFP_NOWARN,
+				NUMA_NO_NODE, more_pages - nr, next_page + nr);
+
+			if (!allocated)
+				break;
+			nr += allocated;
+		}
+
 		if (nr < more_pages ||
 		    vmap_pages_range(phys_end, phys_end + (nr << PAGE_SHIFT), PAGE_KERNEL,
 				     next_page, PAGE_SHIFT) < 0) {