diff mbox series

[1/3] mm, page_owner: record page owner for each subpage

Message ID 20190816101401.32382-2-vbabka@suse.cz (mailing list archive)
State New, archived
Headers show
Series debug_pagealloc improvements through page_owner | expand

Commit Message

Vlastimil Babka Aug. 16, 2019, 10:13 a.m. UTC
Currently, page owner info is only recorded for the first page of a high-order
allocation, and copied to tail pages in the event of a split page. With the
plan to keep previous owner info after freeing the page, it would be benefical
to record page owner for each subpage upon allocation. This increases the
overhead for high orders, but that should be acceptable for a debugging option.

The order stored for each subpage is the order of the whole allocation. This
makes it possible to calculate the "head" pfn and to recognize "tail" pages
(quoted because not all high-order allocations are compound pages with true
head and tail pages). When reading the page_owner debugfs file, keep skipping
the "tail" pages so that stats gathered by existing scripts don't get inflated.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_owner.c | 40 ++++++++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 12 deletions(-)

Comments

Kirill A . Shutemov Aug. 16, 2019, 2:04 p.m. UTC | #1
On Fri, Aug 16, 2019 at 12:13:59PM +0200, Vlastimil Babka wrote:
> Currently, page owner info is only recorded for the first page of a high-order
> allocation, and copied to tail pages in the event of a split page. With the
> plan to keep previous owner info after freeing the page, it would be benefical
> to record page owner for each subpage upon allocation. This increases the
> overhead for high orders, but that should be acceptable for a debugging option.
> 
> The order stored for each subpage is the order of the whole allocation. This
> makes it possible to calculate the "head" pfn and to recognize "tail" pages
> (quoted because not all high-order allocations are compound pages with true
> head and tail pages). When reading the page_owner debugfs file, keep skipping
> the "tail" pages so that stats gathered by existing scripts don't get inflated.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Hm. That's all reasonable, but I have a question: do you see how page
owner thing works for THP now?

I don't see anything in split_huge_page() path (do not confuse it with
split_page() path) that would copy the information to tail pages. Do you?
Vlastimil Babka Aug. 19, 2019, 11:46 a.m. UTC | #2
On 8/16/19 4:04 PM, Kirill A. Shutemov wrote:
> On Fri, Aug 16, 2019 at 12:13:59PM +0200, Vlastimil Babka wrote:
>> Currently, page owner info is only recorded for the first page of a high-order
>> allocation, and copied to tail pages in the event of a split page. With the
>> plan to keep previous owner info after freeing the page, it would be benefical
>> to record page owner for each subpage upon allocation. This increases the
>> overhead for high orders, but that should be acceptable for a debugging option.
>>
>> The order stored for each subpage is the order of the whole allocation. This
>> makes it possible to calculate the "head" pfn and to recognize "tail" pages
>> (quoted because not all high-order allocations are compound pages with true
>> head and tail pages). When reading the page_owner debugfs file, keep skipping
>> the "tail" pages so that stats gathered by existing scripts don't get inflated.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Hm. That's all reasonable, but I have a question: do you see how page
> owner thing works for THP now?
> 
> I don't see anything in split_huge_page() path (do not confuse it with
> split_page() path) that would copy the information to tail pages. Do you?
 
You're right, it's missing. This patch fixes that and can be added e.g.
at the end of the series.

----8<----
From 56ac1b41559eecf52a2d453c49ce66dbbb227c64 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Mon, 19 Aug 2019 13:38:29 +0200
Subject: [PATCH] mm, page_owner: handle THP splits correctly

THP splitting path is missing the split_page_owner() call that split_page()
has. As a result, split THP pages are wrongly reported in the page_owner file
as order-9 pages. Furthermore when the former head page is freed, the remaining
former tail pages are not listed in the page_owner file at all. This patch
fixes that by adding the split_page_owner() call into __split_huge_page().

Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/huge_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 738065f765ab..d727a0401484 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -32,6 +32,7 @@
 #include <linux/shmem_fs.h>
 #include <linux/oom.h>
 #include <linux/numa.h>
+#include <linux/page_owner.h>
 
 #include <asm/tlb.h>
 #include <asm/pgalloc.h>
@@ -2533,6 +2534,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 
 	remap_page(head);
 
+	split_page_owner(head, HPAGE_PMD_ORDER);
+
 	for (i = 0; i < HPAGE_PMD_NR; i++) {
 		struct page *subpage = head + i;
 		if (subpage == page)
kirill.shutemov@linux.intel.com Aug. 19, 2019, 11:55 a.m. UTC | #3
On Mon, Aug 19, 2019 at 11:46:37AM +0000, Vlastimil Babka wrote:
> 
> On 8/16/19 4:04 PM, Kirill A. Shutemov wrote:
> > On Fri, Aug 16, 2019 at 12:13:59PM +0200, Vlastimil Babka wrote:
> >> Currently, page owner info is only recorded for the first page of a high-order
> >> allocation, and copied to tail pages in the event of a split page. With the
> >> plan to keep previous owner info after freeing the page, it would be benefical
> >> to record page owner for each subpage upon allocation. This increases the
> >> overhead for high orders, but that should be acceptable for a debugging option.
> >>
> >> The order stored for each subpage is the order of the whole allocation. This
> >> makes it possible to calculate the "head" pfn and to recognize "tail" pages
> >> (quoted because not all high-order allocations are compound pages with true
> >> head and tail pages). When reading the page_owner debugfs file, keep skipping
> >> the "tail" pages so that stats gathered by existing scripts don't get inflated.
> >>
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > Hm. That's all reasonable, but I have a question: do you see how page
> > owner thing works for THP now?
> > 
> > I don't see anything in split_huge_page() path (do not confuse it with
> > split_page() path) that would copy the information to tail pages. Do you?
>  
> You're right, it's missing. This patch fixes that and can be added e.g.
> at the end of the series.

I would rather put it the first. Possbily with stable@.

> ----8<----
> From 56ac1b41559eecf52a2d453c49ce66dbbb227c64 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Mon, 19 Aug 2019 13:38:29 +0200
> Subject: [PATCH] mm, page_owner: handle THP splits correctly
> 
> THP splitting path is missing the split_page_owner() call that split_page()
> has. As a result, split THP pages are wrongly reported in the page_owner file
> as order-9 pages. Furthermore when the former head page is freed, the remaining
> former tail pages are not listed in the page_owner file at all. This patch
> fixes that by adding the split_page_owner() call into __split_huge_page().
> 
> Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/huge_memory.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 738065f765ab..d727a0401484 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -32,6 +32,7 @@
>  #include <linux/shmem_fs.h>
>  #include <linux/oom.h>
>  #include <linux/numa.h>
> +#include <linux/page_owner.h>
>  
>  #include <asm/tlb.h>
>  #include <asm/pgalloc.h>
> @@ -2533,6 +2534,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>  
>  	remap_page(head);
>  
> +	split_page_owner(head, HPAGE_PMD_ORDER);
> +

I think it has to be before remap_page(). This way nobody would be able to
mess with the page until it has valid page_owner.

>  	for (i = 0; i < HPAGE_PMD_NR; i++) {
>  		struct page *subpage = head + i;
>  		if (subpage == page)
> -- 
> 2.22.0
>
kirill.shutemov@linux.intel.com Aug. 19, 2019, 11:57 a.m. UTC | #4
On Mon, Aug 19, 2019 at 11:55:51AM +0000, Kirill A. Shutemov wrote:
> > @@ -2533,6 +2534,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> >  
> >  	remap_page(head);
> >  
> > +	split_page_owner(head, HPAGE_PMD_ORDER);
> > +
> 
> I think it has to be before remap_page(). This way nobody would be able to
> mess with the page until it has valid page_owner.

Or rather next to ClearPageCompound().
Vlastimil Babka Aug. 19, 2019, 1:11 p.m. UTC | #5
On 8/19/19 1:57 PM, Kirill A. Shutemov wrote:
> On Mon, Aug 19, 2019 at 11:55:51AM +0000, Kirill A. Shutemov wrote:
>>> @@ -2533,6 +2534,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
>>>  
>>>  	remap_page(head);
>>>  
>>> +	split_page_owner(head, HPAGE_PMD_ORDER);
>>> +
>>
>> I think it has to be before remap_page(). This way nobody would be able to
>> mess with the page until it has valid page_owner.
> 
> Or rather next to ClearPageCompound().

OK, will wait for more feedback and repost with that patch first, and as
you suggest.
diff mbox series

Patch

diff --git a/mm/page_owner.c b/mm/page_owner.c
index addcbb2ae4e4..813fcb70547b 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -154,18 +154,23 @@  static noinline depot_stack_handle_t save_stack(gfp_t flags)
 	return handle;
 }
 
-static inline void __set_page_owner_handle(struct page_ext *page_ext,
-	depot_stack_handle_t handle, unsigned int order, gfp_t gfp_mask)
+static inline void __set_page_owner_handle(struct page *page,
+	struct page_ext *page_ext, depot_stack_handle_t handle,
+	unsigned int order, gfp_t gfp_mask)
 {
 	struct page_owner *page_owner;
+	int i;
 
-	page_owner = get_page_owner(page_ext);
-	page_owner->handle = handle;
-	page_owner->order = order;
-	page_owner->gfp_mask = gfp_mask;
-	page_owner->last_migrate_reason = -1;
+	for (i = 0; i < (1 << order); i++) {
+		page_owner = get_page_owner(page_ext);
+		page_owner->handle = handle;
+		page_owner->order = order;
+		page_owner->gfp_mask = gfp_mask;
+		page_owner->last_migrate_reason = -1;
+		__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
 
-	__set_bit(PAGE_EXT_OWNER, &page_ext->flags);
+		page_ext = lookup_page_ext(page + i);
+	}
 }
 
 noinline void __set_page_owner(struct page *page, unsigned int order,
@@ -178,7 +183,7 @@  noinline void __set_page_owner(struct page *page, unsigned int order,
 		return;
 
 	handle = save_stack(gfp_mask);
-	__set_page_owner_handle(page_ext, handle, order, gfp_mask);
+	__set_page_owner_handle(page, page_ext, handle, order, gfp_mask);
 }
 
 void __set_page_owner_migrate_reason(struct page *page, int reason)
@@ -204,8 +209,11 @@  void __split_page_owner(struct page *page, unsigned int order)
 
 	page_owner = get_page_owner(page_ext);
 	page_owner->order = 0;
-	for (i = 1; i < (1 << order); i++)
-		__copy_page_owner(page, page + i);
+	for (i = 1; i < (1 << order); i++) {
+		page_ext = lookup_page_ext(page + i);
+		page_owner = get_page_owner(page_ext);
+		page_owner->order = 0;
+	}
 }
 
 void __copy_page_owner(struct page *oldpage, struct page *newpage)
@@ -483,6 +491,13 @@  read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
 
 		page_owner = get_page_owner(page_ext);
 
+		/*
+		 * Don't print "tail" pages of high-order allocations as that
+		 * would inflate the stats.
+		 */
+		if (!IS_ALIGNED(pfn, 1 << page_owner->order))
+			continue;
+
 		/*
 		 * Access to page_ext->handle isn't synchronous so we should
 		 * be careful to access it.
@@ -562,7 +577,8 @@  static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
 				continue;
 
 			/* Found early allocated page */
-			__set_page_owner_handle(page_ext, early_handle, 0, 0);
+			__set_page_owner_handle(page, page_ext, early_handle,
+						0, 0);
 			count++;
 		}
 		cond_resched();