Message ID | 20250409195448.3697351-1-tjmercier@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | alloc_tag: Handle incomplete bulk allocations in vm_module_tags_populate | expand |
On Wed, 9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote: > alloc_pages_bulk_node may partially succeed and allocate fewer than the > requested nr_pages. There are several conditions under which this can > occur, but we have encountered the case where CONFIG_PAGE_OWNER is > enabled causing all bulk allocations to always fallback to single page > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page > allocator recursion with pagesets.lock held"). > > Currently vm_module_tags_populate immediately fails when > alloc_pages_bulk_node returns fewer than the requested number of pages. > This patch causes vm_module_tags_populate to retry bulk allocations for > the remaining memory instead. Please describe the userspace-visible runtime effects of this change. In a way which permits a user who is experiencing some problem can recognize that this patch will address that problem.
On Wed, 9 Apr 2025 14:08:48 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote: > > > alloc_pages_bulk_node may partially succeed and allocate fewer than the > > requested nr_pages. There are several conditions under which this can > > occur, but we have encountered the case where CONFIG_PAGE_OWNER is > > enabled causing all bulk allocations to always fallback to single page > > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page > > allocator recursion with pagesets.lock held"). > > > > Currently vm_module_tags_populate immediately fails when > > alloc_pages_bulk_node returns fewer than the requested number of pages. > > This patch causes vm_module_tags_populate to retry bulk allocations for > > the remaining memory instead. > > Please describe the userspace-visible runtime effects of this change. In a way > which permits a user who is experiencing some problem can recognize that this > patch will address that problem. > > ... > > Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com> A Closes: link will presumably help with the above info. checkpatch now warns about the absence of a Closes:
On Wed, Apr 9, 2025 at 2:08 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Wed, 9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote: > > > alloc_pages_bulk_node may partially succeed and allocate fewer than the > > requested nr_pages. There are several conditions under which this can > > occur, but we have encountered the case where CONFIG_PAGE_OWNER is > > enabled causing all bulk allocations to always fallback to single page > > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page > > allocator recursion with pagesets.lock held"). > > > > Currently vm_module_tags_populate immediately fails when > > alloc_pages_bulk_node returns fewer than the requested number of pages. > > This patch causes vm_module_tags_populate to retry bulk allocations for > > the remaining memory instead. > > Please describe the userspace-visible runtime effects of this change. In a way > which permits a user who is experiencing some problem can recognize that this > patch will address that problem. The userspace visible effect is that memory allocation profiling will get disabled when the bulk allocation is incomplete, for example: [ 14.297583] [9: modprobe: 465] Failed to allocate memory for allocation tags in the module scsc_wlan. Memory allocation profiling is disabled! [ 14.299339] [9: modprobe: 465] modprobe: Failed to insmod '/vendor/lib/modules/scsc_wlan.ko' with args '': Out of memory
On Wed, Apr 9, 2025 at 2:11 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Wed, 9 Apr 2025 14:08:48 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > > On Wed, 9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote: > > > > > alloc_pages_bulk_node may partially succeed and allocate fewer than the > > > requested nr_pages. There are several conditions under which this can > > > occur, but we have encountered the case where CONFIG_PAGE_OWNER is > > > enabled causing all bulk allocations to always fallback to single page > > > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page > > > allocator recursion with pagesets.lock held"). > > > > > > Currently vm_module_tags_populate immediately fails when > > > alloc_pages_bulk_node returns fewer than the requested number of pages. > > > This patch causes vm_module_tags_populate to retry bulk allocations for > > > the remaining memory instead. > > > > Please describe the userspace-visible runtime effects of this change. In a way > > which permits a user who is experiencing some problem can recognize that this > > patch will address that problem. > > > > ... > > > > Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com> > > A Closes: link will presumably help with the above info. checkpatch > now warns about the absence of a Closes: Hi Andrew, This was reported on our internal bug tracker so there is no public link I can provide here. If it's better not to add a Reported-by in this case, then I will do that in the future.
On Wed, Apr 09, 2025 at 02:51:18PM -0700, T.J. Mercier wrote: > On Wed, Apr 9, 2025 at 2:11 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Wed, 9 Apr 2025 14:08:48 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > On Wed, 9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote: > > > > > > > alloc_pages_bulk_node may partially succeed and allocate fewer than the > > > > requested nr_pages. There are several conditions under which this can > > > > occur, but we have encountered the case where CONFIG_PAGE_OWNER is > > > > enabled causing all bulk allocations to always fallback to single page > > > > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page > > > > allocator recursion with pagesets.lock held"). > > > > > > > > Currently vm_module_tags_populate immediately fails when > > > > alloc_pages_bulk_node returns fewer than the requested number of pages. > > > > This patch causes vm_module_tags_populate to retry bulk allocations for > > > > the remaining memory instead. > > > > > > Please describe the userspace-visible runtime effects of this change. In a way > > > which permits a user who is experiencing some problem can recognize that this > > > patch will address that problem. > > > > > > ... > > > > > > Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com> > > > > A Closes: link will presumably help with the above info. checkpatch > > now warns about the absence of a Closes: > > Hi Andrew, This was reported on our internal bug tracker so there is > no public link I can provide here. If it's better not to add a > Reported-by in this case, then I will do that in the future. In that case perhaps cut and paste the info from your internal bug tracker? Commit messages can include quite a bit more than just a short description of the commit, when it's relevant - e.g. I try to include the literal log of the oops being fixed when appropriate. It really helps when looking at things weeks or months later and trying to remember "ok, exactly what was that code path I need to watch out for?"
On Wed, Apr 9, 2025 at 2:57 PM Kent Overstreet <kent.overstreet@linux.dev> wrote: > > On Wed, Apr 09, 2025 at 02:51:18PM -0700, T.J. Mercier wrote: > > On Wed, Apr 9, 2025 at 2:11 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > On Wed, 9 Apr 2025 14:08:48 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > On Wed, 9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote: > > > > > > > > > alloc_pages_bulk_node may partially succeed and allocate fewer than the > > > > > requested nr_pages. There are several conditions under which this can > > > > > occur, but we have encountered the case where CONFIG_PAGE_OWNER is > > > > > enabled causing all bulk allocations to always fallback to single page > > > > > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page > > > > > allocator recursion with pagesets.lock held"). > > > > > > > > > > Currently vm_module_tags_populate immediately fails when > > > > > alloc_pages_bulk_node returns fewer than the requested number of pages. > > > > > This patch causes vm_module_tags_populate to retry bulk allocations for > > > > > the remaining memory instead. > > > > > > > > Please describe the userspace-visible runtime effects of this change. In a way > > > > which permits a user who is experiencing some problem can recognize that this > > > > patch will address that problem. > > > > > > > > ... > > > > > > > > Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com> > > > > > > A Closes: link will presumably help with the above info. checkpatch > > > now warns about the absence of a Closes: > > > > Hi Andrew, This was reported on our internal bug tracker so there is > > no public link I can provide here. If it's better not to add a > > Reported-by in this case, then I will do that in the future. > > In that case perhaps cut and paste the info from your internal bug > tracker? > > Commit messages can include quite a bit more than just a short > description of the commit, when it's relevant - e.g. I try to include > the literal log of the oops being fixed when appropriate. > > It really helps when looking at things weeks or months later and trying > to remember "ok, exactly what was that code path I need to watch out > for?" Agreed, it would have been better to include this. I think the modprobe errors I followed up with would be good to append to the commit message. Shall I send a v2?
On Wed, Apr 9, 2025 at 3:11 PM T.J. Mercier <tjmercier@google.com> wrote: > > On Wed, Apr 9, 2025 at 2:57 PM Kent Overstreet > <kent.overstreet@linux.dev> wrote: > > > > On Wed, Apr 09, 2025 at 02:51:18PM -0700, T.J. Mercier wrote: > > > On Wed, Apr 9, 2025 at 2:11 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > On Wed, 9 Apr 2025 14:08:48 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > > On Wed, 9 Apr 2025 19:54:47 +0000 "T.J. Mercier" <tjmercier@google.com> wrote: > > > > > > > > > > > alloc_pages_bulk_node may partially succeed and allocate fewer than the > > > > > > requested nr_pages. There are several conditions under which this can > > > > > > occur, but we have encountered the case where CONFIG_PAGE_OWNER is > > > > > > enabled causing all bulk allocations to always fallback to single page > > > > > > allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page > > > > > > allocator recursion with pagesets.lock held"). > > > > > > > > > > > > Currently vm_module_tags_populate immediately fails when > > > > > > alloc_pages_bulk_node returns fewer than the requested number of pages. > > > > > > This patch causes vm_module_tags_populate to retry bulk allocations for > > > > > > the remaining memory instead. > > > > > > > > > > Please describe the userspace-visible runtime effects of this change. In a way > > > > > which permits a user who is experiencing some problem can recognize that this > > > > > patch will address that problem. > > > > > > > > > > ... > > > > > > > > > > Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com> > > > > > > > > A Closes: link will presumably help with the above info. checkpatch > > > > now warns about the absence of a Closes: > > > > > > Hi Andrew, This was reported on our internal bug tracker so there is > > > no public link I can provide here. If it's better not to add a > > > Reported-by in this case, then I will do that in the future. > > > > In that case perhaps cut and paste the info from your internal bug > > tracker? > > > > Commit messages can include quite a bit more than just a short > > description of the commit, when it's relevant - e.g. I try to include > > the literal log of the oops being fixed when appropriate. > > > > It really helps when looking at things weeks or months later and trying > > to remember "ok, exactly what was that code path I need to watch out > > for?" > > Agreed, it would have been better to include this. I think the > modprobe errors I followed up with would be good to append to the > commit message. > > Shall I send a v2? Yes please and add the userspace visible effect you posted earlier along with: Fixes: 0f9b685626da "alloc_tag: populate memory for module tags as needed" With that added: Acked-by: Suren Baghdasaryan <surenb@google.com>
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c index 1d893e313614..25ecc1334b67 100644 --- a/lib/alloc_tag.c +++ b/lib/alloc_tag.c @@ -422,11 +422,20 @@ static int vm_module_tags_populate(void) unsigned long old_shadow_end = ALIGN(phys_end, MODULE_ALIGN); unsigned long new_shadow_end = ALIGN(new_end, MODULE_ALIGN); unsigned long more_pages; - unsigned long nr; + unsigned long nr = 0; more_pages = ALIGN(new_end - phys_end, PAGE_SIZE) >> PAGE_SHIFT; - nr = alloc_pages_bulk_node(GFP_KERNEL | __GFP_NOWARN, - NUMA_NO_NODE, more_pages, next_page); + while (nr < more_pages) { + unsigned long allocated; + + allocated = alloc_pages_bulk_node(GFP_KERNEL | __GFP_NOWARN, + NUMA_NO_NODE, more_pages - nr, next_page + nr); + + if (!allocated) + break; + nr += allocated; + } + if (nr < more_pages || vmap_pages_range(phys_end, phys_end + (nr << PAGE_SHIFT), PAGE_KERNEL, next_page, PAGE_SHIFT) < 0) {
alloc_pages_bulk_node may partially succeed and allocate fewer than the requested nr_pages. There are several conditions under which this can occur, but we have encountered the case where CONFIG_PAGE_OWNER is enabled causing all bulk allocations to always fallback to single page allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page allocator recursion with pagesets.lock held"). Currently vm_module_tags_populate immediately fails when alloc_pages_bulk_node returns fewer than the requested number of pages. This patch causes vm_module_tags_populate to retry bulk allocations for the remaining memory instead. Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com> Signed-off-by: T.J. Mercier <tjmercier@google.com> --- lib/alloc_tag.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)