diff mbox series

[v5,20/37] mm: fix non-compound multi-order memory accounting in __free_pages

Message ID 20240306182440.2003814-21-surenb@google.com (mailing list archive)
State New, archived
Headers show
Series Memory allocation profiling | expand

Commit Message

Suren Baghdasaryan March 6, 2024, 6:24 p.m. UTC
When a non-compound multi-order page is freed, it is possible that a
speculative reference keeps the page pinned. In this case we free all
pages except for the first page, which will be freed later by the last
put_page(). However put_page() ignores the order of the page being freed,
treating it as a 0-order page. This creates a memory accounting imbalance
because the pages freed in __free_pages() do not have their own alloc_tag
and their memory was accounted to the first page. To fix this the first
page should adjust its allocation size counter when "tail" pages are freed.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/pgalloc_tag.h | 24 ++++++++++++++++++++++++
 mm/page_alloc.c             | 11 ++++++++++-
 2 files changed, 34 insertions(+), 1 deletion(-)

Comments

Vlastimil Babka March 13, 2024, 2:44 p.m. UTC | #1
On 3/6/24 19:24, Suren Baghdasaryan wrote:
> When a non-compound multi-order page is freed, it is possible that a
> speculative reference keeps the page pinned. In this case we free all
> pages except for the first page, which will be freed later by the last
> put_page(). However put_page() ignores the order of the page being freed,
> treating it as a 0-order page. This creates a memory accounting imbalance
> because the pages freed in __free_pages() do not have their own alloc_tag
> and their memory was accounted to the first page. To fix this the first
> page should adjust its allocation size counter when "tail" pages are freed.
> 
> Reported-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Matthew Wilcox (Oracle) March 13, 2024, 3:04 p.m. UTC | #2
On Wed, Mar 06, 2024 at 10:24:18AM -0800, Suren Baghdasaryan wrote:
> When a non-compound multi-order page is freed, it is possible that a
> speculative reference keeps the page pinned. In this case we free all
> pages except for the first page, which will be freed later by the last
> put_page(). However put_page() ignores the order of the page being freed,
> treating it as a 0-order page. This creates a memory accounting imbalance
> because the pages freed in __free_pages() do not have their own alloc_tag
> and their memory was accounted to the first page. To fix this the first
> page should adjust its allocation size counter when "tail" pages are freed.

It's not "ignored".  It's not available!

Better wording:

However the page passed to put_page() is indisinguishable from an
order-0 page, so it cannot do the accounting, just as it cannot free
the subsequent pages.  Do the accounting here, where we free the pages.

(I'm sure further improvements are possible)

> +static inline void pgalloc_tag_sub_bytes(struct alloc_tag *tag, unsigned int order)
> +{
> +	if (mem_alloc_profiling_enabled() && tag)
> +		this_cpu_sub(tag->counters->bytes, PAGE_SIZE << order);
> +}

This is a terribly named function.  And it's not even good for what we
want to use it for.

static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
{
	if (mem_alloc_profiling_enabled() && tag)
		this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr);
}

> +++ b/mm/page_alloc.c
> @@ -4697,12 +4697,21 @@ void __free_pages(struct page *page, unsigned int order)
>  {
>  	/* get PageHead before we drop reference */
>  	int head = PageHead(page);
> +	struct alloc_tag *tag = pgalloc_tag_get(page);
>  
>  	if (put_page_testzero(page))
>  		free_the_page(page, order);
>  	else if (!head)
> -		while (order-- > 0)
> +		while (order-- > 0) {
>  			free_the_page(page + (1 << order), order);
> +			/*
> +			 * non-compound multi-order page accounts all allocations
> +			 * to the first page (just like compound one), therefore
> +			 * we need to adjust the allocation size of the first
> +			 * page as its order is ignored when put_page() frees it.
> +			 */
> +			pgalloc_tag_sub_bytes(tag, order);

-	else if (!head
+	else if (!head) {
+		pgalloc_tag_sub_pages(1 << order - 1);
		while (order-- > 0)
			free_the_page(page + (1 << order), order);
+	}

It doesn't need a comment, it's obvious what you're doing.
Suren Baghdasaryan March 13, 2024, 3:31 p.m. UTC | #3
On Wed, Mar 13, 2024 at 3:04 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Mar 06, 2024 at 10:24:18AM -0800, Suren Baghdasaryan wrote:
> > When a non-compound multi-order page is freed, it is possible that a
> > speculative reference keeps the page pinned. In this case we free all
> > pages except for the first page, which will be freed later by the last
> > put_page(). However put_page() ignores the order of the page being freed,
> > treating it as a 0-order page. This creates a memory accounting imbalance
> > because the pages freed in __free_pages() do not have their own alloc_tag
> > and their memory was accounted to the first page. To fix this the first
> > page should adjust its allocation size counter when "tail" pages are freed.
>
> It's not "ignored".  It's not available!
>
> Better wording:
>
> However the page passed to put_page() is indisinguishable from an
> order-0 page, so it cannot do the accounting, just as it cannot free
> the subsequent pages.  Do the accounting here, where we free the pages.
>
> (I'm sure further improvements are possible)
>
> > +static inline void pgalloc_tag_sub_bytes(struct alloc_tag *tag, unsigned int order)
> > +{
> > +     if (mem_alloc_profiling_enabled() && tag)
> > +             this_cpu_sub(tag->counters->bytes, PAGE_SIZE << order);
> > +}
>
> This is a terribly named function.  And it's not even good for what we
> want to use it for.
>
> static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
> {
>         if (mem_alloc_profiling_enabled() && tag)
>                 this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr);
> }
>
> > +++ b/mm/page_alloc.c
> > @@ -4697,12 +4697,21 @@ void __free_pages(struct page *page, unsigned int order)
> >  {
> >       /* get PageHead before we drop reference */
> >       int head = PageHead(page);
> > +     struct alloc_tag *tag = pgalloc_tag_get(page);
> >
> >       if (put_page_testzero(page))
> >               free_the_page(page, order);
> >       else if (!head)
> > -             while (order-- > 0)
> > +             while (order-- > 0) {
> >                       free_the_page(page + (1 << order), order);
> > +                     /*
> > +                      * non-compound multi-order page accounts all allocations
> > +                      * to the first page (just like compound one), therefore
> > +                      * we need to adjust the allocation size of the first
> > +                      * page as its order is ignored when put_page() frees it.
> > +                      */
> > +                     pgalloc_tag_sub_bytes(tag, order);
>
> -       else if (!head
> +       else if (!head) {
> +               pgalloc_tag_sub_pages(1 << order - 1);
>                 while (order-- > 0)
>                         free_the_page(page + (1 << order), order);
> +       }
>
> It doesn't need a comment, it's obvious what you're doing.

All suggestions seem fine to me. I'll adjust the next version accordingly.
Thanks for reviewing and the feedback!

>
diff mbox series

Patch

diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index 9e6ad8e0e4aa..59de43172cc2 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -96,12 +96,36 @@  static inline void pgalloc_tag_split(struct page *page, unsigned int nr)
 	page_ext_put(page_ext);
 }
 
+static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
+{
+	struct alloc_tag *tag = NULL;
+
+	if (mem_alloc_profiling_enabled()) {
+		union codetag_ref *ref = get_page_tag_ref(page);
+
+		alloc_tag_sub_check(ref);
+		if (ref && ref->ct)
+			tag = ct_to_alloc_tag(ref->ct);
+		put_page_tag_ref(ref);
+	}
+
+	return tag;
+}
+
+static inline void pgalloc_tag_sub_bytes(struct alloc_tag *tag, unsigned int order)
+{
+	if (mem_alloc_profiling_enabled() && tag)
+		this_cpu_sub(tag->counters->bytes, PAGE_SIZE << order);
+}
+
 #else /* CONFIG_MEM_ALLOC_PROFILING */
 
 static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
 				   unsigned int order) {}
 static inline void pgalloc_tag_sub(struct page *page, unsigned int order) {}
 static inline void pgalloc_tag_split(struct page *page, unsigned int nr) {}
+static inline struct alloc_tag *pgalloc_tag_get(struct page *page) { return NULL; }
+static inline void pgalloc_tag_sub_bytes(struct alloc_tag *tag, unsigned int order) {}
 
 #endif /* CONFIG_MEM_ALLOC_PROFILING */
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 39dc4dcf14f5..b402149a795f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4697,12 +4697,21 @@  void __free_pages(struct page *page, unsigned int order)
 {
 	/* get PageHead before we drop reference */
 	int head = PageHead(page);
+	struct alloc_tag *tag = pgalloc_tag_get(page);
 
 	if (put_page_testzero(page))
 		free_the_page(page, order);
 	else if (!head)
-		while (order-- > 0)
+		while (order-- > 0) {
 			free_the_page(page + (1 << order), order);
+			/*
+			 * non-compound multi-order page accounts all allocations
+			 * to the first page (just like compound one), therefore
+			 * we need to adjust the allocation size of the first
+			 * page as its order is ignored when put_page() frees it.
+			 */
+			pgalloc_tag_sub_bytes(tag, order);
+		}
 }
 EXPORT_SYMBOL(__free_pages);