diff mbox series

alloc_tag: Handle incomplete bulk allocations in vm_module_tags_populate

Message ID 20250409195448.3697351-1-tjmercier@google.com (mailing list archive)
State New
Headers show
Series alloc_tag: Handle incomplete bulk allocations in vm_module_tags_populate | expand

Commit Message

T.J. Mercier April 9, 2025, 7:54 p.m. UTC
alloc_pages_bulk_node may partially succeed and allocate fewer than the
requested nr_pages. There are several conditions under which this can
occur, but we have encountered the case where CONFIG_PAGE_OWNER is
enabled causing all bulk allocations to always fallback to single page
allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page
allocator recursion with pagesets.lock held").

Currently vm_module_tags_populate immediately fails when
alloc_pages_bulk_node returns fewer than the requested number of pages.
This patch causes vm_module_tags_populate to retry bulk allocations for
the remaining memory instead.

Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
 lib/alloc_tag.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Andrew Morton April 9, 2025, 9:08 p.m. UTC | #1
On Wed,  9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote:

> alloc_pages_bulk_node may partially succeed and allocate fewer than the
> requested nr_pages. There are several conditions under which this can
> occur, but we have encountered the case where CONFIG_PAGE_OWNER is
> enabled causing all bulk allocations to always fallback to single page
> allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page
> allocator recursion with pagesets.lock held").
> 
> Currently vm_module_tags_populate immediately fails when
> alloc_pages_bulk_node returns fewer than the requested number of pages.
> This patch causes vm_module_tags_populate to retry bulk allocations for
> the remaining memory instead.

Please describe the userspace-visible runtime effects of this change.  In a way
which permits a user who is experiencing some problem can recognize that this
patch will address that problem.
Andrew Morton April 9, 2025, 9:11 p.m. UTC | #2
On Wed, 9 Apr 2025 14:08:48 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed,  9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote:
> 
> > alloc_pages_bulk_node may partially succeed and allocate fewer than the
> > requested nr_pages. There are several conditions under which this can
> > occur, but we have encountered the case where CONFIG_PAGE_OWNER is
> > enabled causing all bulk allocations to always fallback to single page
> > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page
> > allocator recursion with pagesets.lock held").
> > 
> > Currently vm_module_tags_populate immediately fails when
> > alloc_pages_bulk_node returns fewer than the requested number of pages.
> > This patch causes vm_module_tags_populate to retry bulk allocations for
> > the remaining memory instead.
> 
> Please describe the userspace-visible runtime effects of this change.  In a way
> which permits a user who is experiencing some problem can recognize that this
> patch will address that problem.
>
> ...
>
> Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com>

A Closes: link will presumably help with the above info.  checkpatch
now warns about the absence of a Closes:
T.J. Mercier April 9, 2025, 9:48 p.m. UTC | #3
On Wed, Apr 9, 2025 at 2:08 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed,  9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote:
>
> > alloc_pages_bulk_node may partially succeed and allocate fewer than the
> > requested nr_pages. There are several conditions under which this can
> > occur, but we have encountered the case where CONFIG_PAGE_OWNER is
> > enabled causing all bulk allocations to always fallback to single page
> > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page
> > allocator recursion with pagesets.lock held").
> >
> > Currently vm_module_tags_populate immediately fails when
> > alloc_pages_bulk_node returns fewer than the requested number of pages.
> > This patch causes vm_module_tags_populate to retry bulk allocations for
> > the remaining memory instead.
>
> Please describe the userspace-visible runtime effects of this change.  In a way
> which permits a user who is experiencing some problem can recognize that this
> patch will address that problem.

The userspace visible effect is that memory allocation profiling will
get disabled when the bulk allocation is incomplete, for example:
[   14.297583] [9:       modprobe:  465] Failed to allocate memory for
allocation tags in the module scsc_wlan. Memory allocation profiling
is disabled!
[   14.299339] [9:       modprobe:  465] modprobe: Failed to insmod
'/vendor/lib/modules/scsc_wlan.ko' with args '': Out of memory
T.J. Mercier April 9, 2025, 9:51 p.m. UTC | #4
On Wed, Apr 9, 2025 at 2:11 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 9 Apr 2025 14:08:48 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
>
> > On Wed,  9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote:
> >
> > > alloc_pages_bulk_node may partially succeed and allocate fewer than the
> > > requested nr_pages. There are several conditions under which this can
> > > occur, but we have encountered the case where CONFIG_PAGE_OWNER is
> > > enabled causing all bulk allocations to always fallback to single page
> > > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page
> > > allocator recursion with pagesets.lock held").
> > >
> > > Currently vm_module_tags_populate immediately fails when
> > > alloc_pages_bulk_node returns fewer than the requested number of pages.
> > > This patch causes vm_module_tags_populate to retry bulk allocations for
> > > the remaining memory instead.
> >
> > Please describe the userspace-visible runtime effects of this change.  In a way
> > which permits a user who is experiencing some problem can recognize that this
> > patch will address that problem.
> >
> > ...
> >
> > Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com>
>
> A Closes: link will presumably help with the above info.  checkpatch
> now warns about the absence of a Closes:

Hi Andrew, This was reported on our internal bug tracker so there is
no public link I can provide here. If it's better not to add a
Reported-by in this case, then I will do that in the future.
Kent Overstreet April 9, 2025, 9:57 p.m. UTC | #5
On Wed, Apr 09, 2025 at 02:51:18PM -0700, T.J. Mercier wrote:
> On Wed, Apr 9, 2025 at 2:11 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Wed, 9 Apr 2025 14:08:48 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > > On Wed,  9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote:
> > >
> > > > alloc_pages_bulk_node may partially succeed and allocate fewer than the
> > > > requested nr_pages. There are several conditions under which this can
> > > > occur, but we have encountered the case where CONFIG_PAGE_OWNER is
> > > > enabled causing all bulk allocations to always fallback to single page
> > > > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page
> > > > allocator recursion with pagesets.lock held").
> > > >
> > > > Currently vm_module_tags_populate immediately fails when
> > > > alloc_pages_bulk_node returns fewer than the requested number of pages.
> > > > This patch causes vm_module_tags_populate to retry bulk allocations for
> > > > the remaining memory instead.
> > >
> > > Please describe the userspace-visible runtime effects of this change.  In a way
> > > which permits a user who is experiencing some problem can recognize that this
> > > patch will address that problem.
> > >
> > > ...
> > >
> > > Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com>
> >
> > A Closes: link will presumably help with the above info.  checkpatch
> > now warns about the absence of a Closes:
> 
> Hi Andrew, This was reported on our internal bug tracker so there is
> no public link I can provide here. If it's better not to add a
> Reported-by in this case, then I will do that in the future.

In that case perhaps cut and paste the info from your internal bug
tracker?

Commit messages can include quite a bit more than just a short
description of the commit, when it's relevant - e.g. I try to include
the literal log of the oops being fixed when appropriate.

It really helps when looking at things weeks or months later and trying
to remember "ok, exactly what was that code path I need to watch out
for?"
T.J. Mercier April 9, 2025, 10:10 p.m. UTC | #6
On Wed, Apr 9, 2025 at 2:57 PM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Wed, Apr 09, 2025 at 02:51:18PM -0700, T.J. Mercier wrote:
> > On Wed, Apr 9, 2025 at 2:11 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Wed, 9 Apr 2025 14:08:48 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > > On Wed,  9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote:
> > > >
> > > > > alloc_pages_bulk_node may partially succeed and allocate fewer than the
> > > > > requested nr_pages. There are several conditions under which this can
> > > > > occur, but we have encountered the case where CONFIG_PAGE_OWNER is
> > > > > enabled causing all bulk allocations to always fallback to single page
> > > > > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page
> > > > > allocator recursion with pagesets.lock held").
> > > > >
> > > > > Currently vm_module_tags_populate immediately fails when
> > > > > alloc_pages_bulk_node returns fewer than the requested number of pages.
> > > > > This patch causes vm_module_tags_populate to retry bulk allocations for
> > > > > the remaining memory instead.
> > > >
> > > > Please describe the userspace-visible runtime effects of this change.  In a way
> > > > which permits a user who is experiencing some problem can recognize that this
> > > > patch will address that problem.
> > > >
> > > > ...
> > > >
> > > > Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com>
> > >
> > > A Closes: link will presumably help with the above info.  checkpatch
> > > now warns about the absence of a Closes:
> >
> > Hi Andrew, This was reported on our internal bug tracker so there is
> > no public link I can provide here. If it's better not to add a
> > Reported-by in this case, then I will do that in the future.
>
> In that case perhaps cut and paste the info from your internal bug
> tracker?
>
> Commit messages can include quite a bit more than just a short
> description of the commit, when it's relevant - e.g. I try to include
> the literal log of the oops being fixed when appropriate.
>
> It really helps when looking at things weeks or months later and trying
> to remember "ok, exactly what was that code path I need to watch out
> for?"

Agreed, it would have been better to include this. I think the
modprobe errors I followed up with would be good to append to the
commit message.

Shall I send a v2?
Suren Baghdasaryan April 9, 2025, 10:24 p.m. UTC | #7
On Wed, Apr 9, 2025 at 3:11 PM T.J. Mercier <tjmercier@google.com> wrote:
>
> On Wed, Apr 9, 2025 at 2:57 PM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> >
> > On Wed, Apr 09, 2025 at 02:51:18PM -0700, T.J. Mercier wrote:
> > > On Wed, Apr 9, 2025 at 2:11 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > >
> > > > On Wed, 9 Apr 2025 14:08:48 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> > > >
> > > > > On Wed,  9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote:
> > > > >
> > > > > > alloc_pages_bulk_node may partially succeed and allocate fewer than the
> > > > > > requested nr_pages. There are several conditions under which this can
> > > > > > occur, but we have encountered the case where CONFIG_PAGE_OWNER is
> > > > > > enabled causing all bulk allocations to always fallback to single page
> > > > > > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page
> > > > > > allocator recursion with pagesets.lock held").
> > > > > >
> > > > > > Currently vm_module_tags_populate immediately fails when
> > > > > > alloc_pages_bulk_node returns fewer than the requested number of pages.
> > > > > > This patch causes vm_module_tags_populate to retry bulk allocations for
> > > > > > the remaining memory instead.
> > > > >
> > > > > Please describe the userspace-visible runtime effects of this change.  In a way
> > > > > which permits a user who is experiencing some problem can recognize that this
> > > > > patch will address that problem.
> > > > >
> > > > > ...
> > > > >
> > > > > Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com>
> > > >
> > > > A Closes: link will presumably help with the above info.  checkpatch
> > > > now warns about the absence of a Closes:
> > >
> > > Hi Andrew, This was reported on our internal bug tracker so there is
> > > no public link I can provide here. If it's better not to add a
> > > Reported-by in this case, then I will do that in the future.
> >
> > In that case perhaps cut and paste the info from your internal bug
> > tracker?
> >
> > Commit messages can include quite a bit more than just a short
> > description of the commit, when it's relevant - e.g. I try to include
> > the literal log of the oops being fixed when appropriate.
> >
> > It really helps when looking at things weeks or months later and trying
> > to remember "ok, exactly what was that code path I need to watch out
> > for?"
>
> Agreed, it would have been better to include this. I think the
> modprobe errors I followed up with would be good to append to the
> commit message.
>
> Shall I send a v2?

Yes please and add the userspace visible effect you posted earlier along with:

Fixes: 0f9b685626da "alloc_tag: populate memory for module tags as needed"

With that added:

Acked-by: Suren Baghdasaryan <surenb@google.com>
diff mbox series

Patch

diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 1d893e313614..25ecc1334b67 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -422,11 +422,20 @@  static int vm_module_tags_populate(void)
 		unsigned long old_shadow_end = ALIGN(phys_end, MODULE_ALIGN);
 		unsigned long new_shadow_end = ALIGN(new_end, MODULE_ALIGN);
 		unsigned long more_pages;
-		unsigned long nr;
+		unsigned long nr = 0;
 
 		more_pages = ALIGN(new_end - phys_end, PAGE_SIZE) >> PAGE_SHIFT;
-		nr = alloc_pages_bulk_node(GFP_KERNEL | __GFP_NOWARN,
-					   NUMA_NO_NODE, more_pages, next_page);
+		while (nr < more_pages) {
+			unsigned long allocated;
+
+			allocated = alloc_pages_bulk_node(GFP_KERNEL | __GFP_NOWARN,
+				NUMA_NO_NODE, more_pages - nr, next_page + nr);
+
+			if (!allocated)
+				break;
+			nr += allocated;
+		}
+
 		if (nr < more_pages ||
 		    vmap_pages_range(phys_end, phys_end + (nr << PAGE_SHIFT), PAGE_KERNEL,
 				     next_page, PAGE_SHIFT) < 0) {