diff mbox series

[v12,2/6] mm: Use zone and order instead of free area in free_list manipulators

Message ID 20191022222805.17338.3243.stgit@localhost.localdomain (mailing list archive)
State New, archived
Headers show
Series mm / virtio: Provide support for unused page reporting | expand

Commit Message

Alexander Duyck Oct. 22, 2019, 10:28 p.m. UTC
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

In order to enable the use of the zone from the list manipulator functions
I will need access to the zone pointer. As it turns out most of the
accessors were always just being directly passed &zone->free_area[order]
anyway so it would make sense to just fold that into the function itself
and pass the zone and order as arguments instead of the free area.

In order to be able to reference the zone we need to move the declaration
of the functions down so that we have the zone defined before we define the
list manipulation functions. Since the functions are only used in the file
mm/page_alloc.c we can just move them there to reduce noise in the header.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 include/linux/mmzone.h |   32 -----------------------
 mm/page_alloc.c        |   67 +++++++++++++++++++++++++++++++++++-------------
 2 files changed, 49 insertions(+), 50 deletions(-)

Comments

David Hildenbrand Oct. 23, 2019, 8:26 a.m. UTC | #1
On 23.10.19 00:28, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 
> In order to enable the use of the zone from the list manipulator functions
> I will need access to the zone pointer. As it turns out most of the
> accessors were always just being directly passed &zone->free_area[order]
> anyway so it would make sense to just fold that into the function itself
> and pass the zone and order as arguments instead of the free area.
> 
> In order to be able to reference the zone we need to move the declaration
> of the functions down so that we have the zone defined before we define the
> list manipulation functions. Since the functions are only used in the file
> mm/page_alloc.c we can just move them there to reduce noise in the header.
> 
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>   include/linux/mmzone.h |   32 -----------------------
>   mm/page_alloc.c        |   67 +++++++++++++++++++++++++++++++++++-------------
>   2 files changed, 49 insertions(+), 50 deletions(-)

Did you see

https://lore.kernel.org/lkml/20191001152928.27008.8178.stgit@localhost.localdomain/T/#m4d2bc2f37bd7bdc3ae35c4f197905c275d0ad2f9

this time?

And the difference to the old patch is only an empty line.
Alexander Duyck Oct. 23, 2019, 3:16 p.m. UTC | #2
On Wed, 2019-10-23 at 10:26 +0200, David Hildenbrand wrote:
> On 23.10.19 00:28, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > 
> > In order to enable the use of the zone from the list manipulator functions
> > I will need access to the zone pointer. As it turns out most of the
> > accessors were always just being directly passed &zone->free_area[order]
> > anyway so it would make sense to just fold that into the function itself
> > and pass the zone and order as arguments instead of the free area.
> > 
> > In order to be able to reference the zone we need to move the declaration
> > of the functions down so that we have the zone defined before we define the
> > list manipulation functions. Since the functions are only used in the file
> > mm/page_alloc.c we can just move them there to reduce noise in the header.
> > 
> > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> > Reviewed-by: David Hildenbrand <david@redhat.com>
> > Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > ---
> >   include/linux/mmzone.h |   32 -----------------------
> >   mm/page_alloc.c        |   67 +++++++++++++++++++++++++++++++++++-------------
> >   2 files changed, 49 insertions(+), 50 deletions(-)
> 
> Did you see
> 
> https://lore.kernel.org/lkml/20191001152928.27008.8178.stgit@localhost.localdomain/T/#m4d2bc2f37bd7bdc3ae35c4f197905c275d0ad2f9
> 
> this time?
> 
> And the difference to the old patch is only an empty line.
> 

I saw the report. However I have not had much luck reproducing it in order
to get root cause. Here are my results for linux-next 20191021 with that
patch running page_fault2 over an average of 3 runs:

Baseline:   3734692.00
This patch: 3739878.67

Also I am not so sure about these results as the same patch had passed
previously before and instead it was patch 3 that was reported as having a
-1.2% regression[1]. All I changed in response to that report was to add
page_is_reported() which just wrapped the bit test for the reported flag
in a #ifdef to avoid testing it for the blocks that were already #ifdef
wrapped anyway.

I am still trying to see if I can get access to a system that would be a
better match for the one that reported the issue. My working theory is
that maybe it requires a high core count per node to reproduce. Either
that or it is some combination of the kernel being tested on and the patch
is causing some loop to go out of alignment and become more expensive.

I also included the page_fault2 results in my cover page as that seems to
show a slight improvement with all of the patches applied.

Thanks.

- Alex

[1]: https://lore.kernel.org/lkml/20190921152522.GU15734@shao2-debian/
David Hildenbrand Oct. 24, 2019, 9:32 a.m. UTC | #3
On 23.10.19 17:16, Alexander Duyck wrote:
> On Wed, 2019-10-23 at 10:26 +0200, David Hildenbrand wrote:
>> On 23.10.19 00:28, Alexander Duyck wrote:
>>> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>>
>>> In order to enable the use of the zone from the list manipulator functions
>>> I will need access to the zone pointer. As it turns out most of the
>>> accessors were always just being directly passed &zone->free_area[order]
>>> anyway so it would make sense to just fold that into the function itself
>>> and pass the zone and order as arguments instead of the free area.
>>>
>>> In order to be able to reference the zone we need to move the declaration
>>> of the functions down so that we have the zone defined before we define the
>>> list manipulation functions. Since the functions are only used in the file
>>> mm/page_alloc.c we can just move them there to reduce noise in the header.
>>>
>>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>> Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>> ---
>>>    include/linux/mmzone.h |   32 -----------------------
>>>    mm/page_alloc.c        |   67 +++++++++++++++++++++++++++++++++++-------------
>>>    2 files changed, 49 insertions(+), 50 deletions(-)
>>
>> Did you see
>>
>> https://lore.kernel.org/lkml/20191001152928.27008.8178.stgit@localhost.localdomain/T/#m4d2bc2f37bd7bdc3ae35c4f197905c275d0ad2f9
>>
>> this time?
>>
>> And the difference to the old patch is only an empty line.
>>
> 
> I saw the report. However I have not had much luck reproducing it in order
> to get root cause. Here are my results for linux-next 20191021 with that
> patch running page_fault2 over an average of 3 runs:

It would have been good if you'd reply to the report or sth. like that. 
Then people (including me) are aware that you looked into it and what 
your results of your investigation were.

> 
> Baseline:   3734692.00
> This patch: 3739878.67
> 
> Also I am not so sure about these results as the same patch had passed
> previously before and instead it was patch 3 that was reported as having a
> -1.2% regression[1]. All I changed in response to that report was to add

Well, previously there was also a regression in the successor 
PageReported() patch, not sure how they bisect in this case.

> page_is_reported() which just wrapped the bit test for the reported flag
> in a #ifdef to avoid testing it for the blocks that were already #ifdef
> wrapped anyway.
> 
> I am still trying to see if I can get access to a system that would be a
> better match for the one that reported the issue. My working theory is

I barely see false positives (well, I also barely see reports at all) on 
MM, that's why I asked.

> that maybe it requires a high core count per node to reproduce. Either
> that or it is some combination of the kernel being tested on and the patch
> is causing some loop to go out of alignment and become more expensive.

Yes, double check that the config and the setup roughly matches what has 
been reported.

> 
> I also included the page_fault2 results in my cover page as that seems to
> show a slight improvement with all of the patches applied.
> 
> Thanks.
> 
> - Alex
> 
> [1]: https://lore.kernel.org/lkml/20190921152522.GU15734@shao2-debian/
>
Alexander Duyck Oct. 24, 2019, 3:19 p.m. UTC | #4
On Thu, 2019-10-24 at 11:32 +0200, David Hildenbrand wrote:
> On 23.10.19 17:16, Alexander Duyck wrote:
> > On Wed, 2019-10-23 at 10:26 +0200, David Hildenbrand wrote:
> > > On 23.10.19 00:28, Alexander Duyck wrote:
> > > > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > > > 
> > > > In order to enable the use of the zone from the list manipulator functions
> > > > I will need access to the zone pointer. As it turns out most of the
> > > > accessors were always just being directly passed &zone->free_area[order]
> > > > anyway so it would make sense to just fold that into the function itself
> > > > and pass the zone and order as arguments instead of the free area.
> > > > 
> > > > In order to be able to reference the zone we need to move the declaration
> > > > of the functions down so that we have the zone defined before we define the
> > > > list manipulation functions. Since the functions are only used in the file
> > > > mm/page_alloc.c we can just move them there to reduce noise in the header.
> > > > 
> > > > Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> > > > Reviewed-by: David Hildenbrand <david@redhat.com>
> > > > Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
> > > > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > > > ---
> > > >    include/linux/mmzone.h |   32 -----------------------
> > > >    mm/page_alloc.c        |   67 +++++++++++++++++++++++++++++++++++-------------
> > > >    2 files changed, 49 insertions(+), 50 deletions(-)
> > > 
> > > Did you see
> > > 
> > > https://lore.kernel.org/lkml/20191001152928.27008.8178.stgit@localhost.localdomain/T/#m4d2bc2f37bd7bdc3ae35c4f197905c275d0ad2f9
> > > 
> > > this time?
> > > 
> > > And the difference to the old patch is only an empty line.
> > > 
> > 
> > I saw the report. However I have not had much luck reproducing it in order
> > to get root cause. Here are my results for linux-next 20191021 with that
> > patch running page_fault2 over an average of 3 runs:
> 
> It would have been good if you'd reply to the report or sth. like that. 
> Then people (including me) are aware that you looked into it and what 
> your results of your investigation were.

I'll try to be more careful to remember to do that in the future.

> > Baseline:   3734692.00
> > This patch: 3739878.67
> > 
> > Also I am not so sure about these results as the same patch had passed
> > previously before and instead it was patch 3 that was reported as having a
> > -1.2% regression[1]. All I changed in response to that report was to add
> 
> Well, previously there was also a regression in the successor 
> PageReported() patch, not sure how they bisect in this case.

This is one of the things that has me thinking this is a possible code
alignment type issue. In the past when chasing down network performance
issues I would see these kind of things when a loop went from being cache
line aligned to not being aligned.

> > page_is_reported() which just wrapped the bit test for the reported flag
> > in a #ifdef to avoid testing it for the blocks that were already #ifdef
> > wrapped anyway.
> > 
> > I am still trying to see if I can get access to a system that would be a
> > better match for the one that reported the issue. My working theory is
> 
> I barely see false positives (well, I also barely see reports at all) on 
> MM, that's why I asked.

Like I said, I will dig into this.

> > that maybe it requires a high core count per node to reproduce. Either
> > that or it is some combination of the kernel being tested on and the patch
> > is causing some loop to go out of alignment and become more expensive.
> 
> Yes, double check that the config and the setup roughly matches what has 
> been reported.

I've been testing with the same .config and version of gcc. The last bit I
am trying now is to test with the same exact kernel source. I figure if I
can reproduce the issue with that then I can figure out exact root cause,
even if there isn't much we can do since the issue doesn't appear with the
latest patch set.
diff mbox series

Patch

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f1361dd79757..da289a3f8c5e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,29 +100,6 @@  struct free_area {
 	unsigned long		nr_free;
 };
 
-/* Used for pages not on another list */
-static inline void add_to_free_area(struct page *page, struct free_area *area,
-			     int migratetype)
-{
-	list_add(&page->lru, &area->free_list[migratetype]);
-	area->nr_free++;
-}
-
-/* Used for pages not on another list */
-static inline void add_to_free_area_tail(struct page *page, struct free_area *area,
-				  int migratetype)
-{
-	list_add_tail(&page->lru, &area->free_list[migratetype]);
-	area->nr_free++;
-}
-
-/* Used for pages which are on another list */
-static inline void move_to_free_area(struct page *page, struct free_area *area,
-			     int migratetype)
-{
-	list_move(&page->lru, &area->free_list[migratetype]);
-}
-
 static inline struct page *get_page_from_free_area(struct free_area *area,
 					    int migratetype)
 {
@@ -130,15 +107,6 @@  static inline struct page *get_page_from_free_area(struct free_area *area,
 					struct page, lru);
 }
 
-static inline void del_page_from_free_area(struct page *page,
-		struct free_area *area)
-{
-	list_del(&page->lru);
-	__ClearPageBuddy(page);
-	set_page_private(page, 0);
-	area->nr_free--;
-}
-
 static inline bool free_area_empty(struct free_area *area, int migratetype)
 {
 	return list_empty(&area->free_list[migratetype]);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 02ca4e130985..ea11c6f65157 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -877,6 +877,44 @@  static inline struct capture_control *task_capc(struct zone *zone)
 }
 #endif /* CONFIG_COMPACTION */
 
+/* Used for pages not on another list */
+static inline void add_to_free_list(struct page *page, struct zone *zone,
+				    unsigned int order, int migratetype)
+{
+	struct free_area *area = &zone->free_area[order];
+
+	list_add(&page->lru, &area->free_list[migratetype]);
+	area->nr_free++;
+}
+
+/* Used for pages not on another list */
+static inline void add_to_free_list_tail(struct page *page, struct zone *zone,
+					 unsigned int order, int migratetype)
+{
+	struct free_area *area = &zone->free_area[order];
+
+	list_add_tail(&page->lru, &area->free_list[migratetype]);
+	area->nr_free++;
+}
+
+/* Used for pages which are on another list */
+static inline void move_to_free_list(struct page *page, struct zone *zone,
+				     unsigned int order, int migratetype)
+{
+	struct free_area *area = &zone->free_area[order];
+
+	list_move(&page->lru, &area->free_list[migratetype]);
+}
+
+static inline void del_page_from_free_list(struct page *page, struct zone *zone,
+					   unsigned int order)
+{
+	list_del(&page->lru);
+	__ClearPageBuddy(page);
+	set_page_private(page, 0);
+	zone->free_area[order].nr_free--;
+}
+
 /*
  * If this is not the largest possible page, check if the buddy
  * of the next-highest order is free. If it is, it's possible
@@ -939,7 +977,6 @@  static inline void __free_one_page(struct page *page,
 	struct capture_control *capc = task_capc(zone);
 	unsigned long uninitialized_var(buddy_pfn);
 	unsigned long combined_pfn;
-	struct free_area *area;
 	unsigned int max_order;
 	struct page *buddy;
 	bool to_tail;
@@ -977,7 +1014,7 @@  static inline void __free_one_page(struct page *page,
 		if (page_is_guard(buddy))
 			clear_page_guard(zone, buddy, order, migratetype);
 		else
-			del_page_from_free_area(buddy, &zone->free_area[order]);
+			del_page_from_free_list(buddy, zone, order);
 		combined_pfn = buddy_pfn & pfn;
 		page = page + (combined_pfn - pfn);
 		pfn = combined_pfn;
@@ -1011,16 +1048,15 @@  static inline void __free_one_page(struct page *page,
 done_merging:
 	set_page_order(page, order);
 
-	area = &zone->free_area[order];
 	if (is_shuffle_order(order))
 		to_tail = shuffle_pick_tail();
 	else
 		to_tail = buddy_merge_likely(pfn, buddy_pfn, page, order);
 
 	if (to_tail)
-		add_to_free_area_tail(page, area, migratetype);
+		add_to_free_list_tail(page, zone, order, migratetype);
 	else
-		add_to_free_area(page, area, migratetype);
+		add_to_free_list(page, zone, order, migratetype);
 }
 
 /*
@@ -2038,13 +2074,11 @@  void __init init_cma_reserved_pageblock(struct page *page)
  * -- nyc
  */
 static inline void expand(struct zone *zone, struct page *page,
-	int low, int high, struct free_area *area,
-	int migratetype)
+	int low, int high, int migratetype)
 {
 	unsigned long size = 1 << high;
 
 	while (high > low) {
-		area--;
 		high--;
 		size >>= 1;
 		VM_BUG_ON_PAGE(bad_range(zone, &page[size]), &page[size]);
@@ -2058,7 +2092,7 @@  static inline void expand(struct zone *zone, struct page *page,
 		if (set_page_guard(zone, &page[size], high, migratetype))
 			continue;
 
-		add_to_free_area(&page[size], area, migratetype);
+		add_to_free_list(&page[size], zone, high, migratetype);
 		set_page_order(&page[size], high);
 	}
 }
@@ -2216,8 +2250,8 @@  struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
 		page = get_page_from_free_area(area, migratetype);
 		if (!page)
 			continue;
-		del_page_from_free_area(page, area);
-		expand(zone, page, order, current_order, area, migratetype);
+		del_page_from_free_list(page, zone, current_order);
+		expand(zone, page, order, current_order, migratetype);
 		set_pcppage_migratetype(page, migratetype);
 		return page;
 	}
@@ -2291,7 +2325,7 @@  static int move_freepages(struct zone *zone,
 		VM_BUG_ON_PAGE(page_zone(page) != zone, page);
 
 		order = page_order(page);
-		move_to_free_area(page, &zone->free_area[order], migratetype);
+		move_to_free_list(page, zone, order, migratetype);
 		page += 1 << order;
 		pages_moved += 1 << order;
 	}
@@ -2407,7 +2441,6 @@  static void steal_suitable_fallback(struct zone *zone, struct page *page,
 		unsigned int alloc_flags, int start_type, bool whole_block)
 {
 	unsigned int current_order = page_order(page);
-	struct free_area *area;
 	int free_pages, movable_pages, alike_pages;
 	int old_block_type;
 
@@ -2478,8 +2511,7 @@  static void steal_suitable_fallback(struct zone *zone, struct page *page,
 	return;
 
 single_page:
-	area = &zone->free_area[current_order];
-	move_to_free_area(page, area, start_type);
+	move_to_free_list(page, zone, current_order, start_type);
 }
 
 /*
@@ -3150,7 +3182,6 @@  void split_page(struct page *page, unsigned int order)
 
 int __isolate_free_page(struct page *page, unsigned int order)
 {
-	struct free_area *area = &page_zone(page)->free_area[order];
 	unsigned long watermark;
 	struct zone *zone;
 	int mt;
@@ -3176,7 +3207,7 @@  int __isolate_free_page(struct page *page, unsigned int order)
 
 	/* Remove page from free list */
 
-	del_page_from_free_area(page, area);
+	del_page_from_free_list(page, zone, order);
 
 	/*
 	 * Set the pageblock if the isolated page is at least half of a
@@ -8721,7 +8752,7 @@  void zone_pcp_reset(struct zone *zone)
 		pr_info("remove from free list %lx %d %lx\n",
 			pfn, 1 << order, end_pfn);
 #endif
-		del_page_from_free_area(page, &zone->free_area[order]);
+		del_page_from_free_list(page, zone, order);
 		for (i = 0; i < (1 << order); i++)
 			SetPageReserved((page+i));
 		pfn += (1 << order);