Message ID | 20191022222805.17338.3243.stgit@localhost.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm / virtio: Provide support for unused page reporting | expand |
On 23.10.19 00:28, Alexander Duyck wrote: > From: Alexander Duyck <alexander.h.duyck@linux.intel.com> > > In order to enable the use of the zone from the list manipulator functions > I will need access to the zone pointer. As it turns out most of the > accessors were always just being directly passed &zone->free_area[order] > anyway so it would make sense to just fold that into the function itself > and pass the zone and order as arguments instead of the free area. > > In order to be able to reference the zone we need to move the declaration > of the functions down so that we have the zone defined before we define the > list manipulation functions. Since the functions are only used in the file > mm/page_alloc.c we can just move them there to reduce noise in the header. > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > Reviewed-by: David Hildenbrand <david@redhat.com> > Reviewed-by: Pankaj Gupta <pagupta@redhat.com> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> > --- > include/linux/mmzone.h | 32 ----------------------- > mm/page_alloc.c | 67 +++++++++++++++++++++++++++++++++++------------- > 2 files changed, 49 insertions(+), 50 deletions(-) Did you see https://lore.kernel.org/lkml/20191001152928.27008.8178.stgit@localhost.localdomain/T/#m4d2bc2f37bd7bdc3ae35c4f197905c275d0ad2f9 this time? And the difference to the old patch is only an empty line.
On Wed, 2019-10-23 at 10:26 +0200, David Hildenbrand wrote: > On 23.10.19 00:28, Alexander Duyck wrote: > > From: Alexander Duyck <alexander.h.duyck@linux.intel.com> > > > > In order to enable the use of the zone from the list manipulator functions > > I will need access to the zone pointer. As it turns out most of the > > accessors were always just being directly passed &zone->free_area[order] > > anyway so it would make sense to just fold that into the function itself > > and pass the zone and order as arguments instead of the free area. > > > > In order to be able to reference the zone we need to move the declaration > > of the functions down so that we have the zone defined before we define the > > list manipulation functions. Since the functions are only used in the file > > mm/page_alloc.c we can just move them there to reduce noise in the header. > > > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > > Reviewed-by: David Hildenbrand <david@redhat.com> > > Reviewed-by: Pankaj Gupta <pagupta@redhat.com> > > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> > > --- > > include/linux/mmzone.h | 32 ----------------------- > > mm/page_alloc.c | 67 +++++++++++++++++++++++++++++++++++------------- > > 2 files changed, 49 insertions(+), 50 deletions(-) > > Did you see > > https://lore.kernel.org/lkml/20191001152928.27008.8178.stgit@localhost.localdomain/T/#m4d2bc2f37bd7bdc3ae35c4f197905c275d0ad2f9 > > this time? > > And the difference to the old patch is only an empty line. > I saw the report. However I have not had much luck reproducing it in order to get root cause. Here are my results for linux-next 20191021 with that patch running page_fault2 over an average of 3 runs: Baseline: 3734692.00 This patch: 3739878.67 Also I am not so sure about these results as the same patch had passed previously before and instead it was patch 3 that was reported as having a -1.2% regression[1]. All I changed in response to that report was to add page_is_reported() which just wrapped the bit test for the reported flag in a #ifdef to avoid testing it for the blocks that were already #ifdef wrapped anyway. I am still trying to see if I can get access to a system that would be a better match for the one that reported the issue. My working theory is that maybe it requires a high core count per node to reproduce. Either that or it is some combination of the kernel being tested on and the patch is causing some loop to go out of alignment and become more expensive. I also included the page_fault2 results in my cover page as that seems to show a slight improvement with all of the patches applied. Thanks. - Alex [1]: https://lore.kernel.org/lkml/20190921152522.GU15734@shao2-debian/
On 23.10.19 17:16, Alexander Duyck wrote: > On Wed, 2019-10-23 at 10:26 +0200, David Hildenbrand wrote: >> On 23.10.19 00:28, Alexander Duyck wrote: >>> From: Alexander Duyck <alexander.h.duyck@linux.intel.com> >>> >>> In order to enable the use of the zone from the list manipulator functions >>> I will need access to the zone pointer. As it turns out most of the >>> accessors were always just being directly passed &zone->free_area[order] >>> anyway so it would make sense to just fold that into the function itself >>> and pass the zone and order as arguments instead of the free area. >>> >>> In order to be able to reference the zone we need to move the declaration >>> of the functions down so that we have the zone defined before we define the >>> list manipulation functions. Since the functions are only used in the file >>> mm/page_alloc.c we can just move them there to reduce noise in the header. >>> >>> Reviewed-by: Dan Williams <dan.j.williams@intel.com> >>> Reviewed-by: David Hildenbrand <david@redhat.com> >>> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> >>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> >>> --- >>> include/linux/mmzone.h | 32 ----------------------- >>> mm/page_alloc.c | 67 +++++++++++++++++++++++++++++++++++------------- >>> 2 files changed, 49 insertions(+), 50 deletions(-) >> >> Did you see >> >> https://lore.kernel.org/lkml/20191001152928.27008.8178.stgit@localhost.localdomain/T/#m4d2bc2f37bd7bdc3ae35c4f197905c275d0ad2f9 >> >> this time? >> >> And the difference to the old patch is only an empty line. >> > > I saw the report. However I have not had much luck reproducing it in order > to get root cause. Here are my results for linux-next 20191021 with that > patch running page_fault2 over an average of 3 runs: It would have been good if you'd reply to the report or sth. like that. Then people (including me) are aware that you looked into it and what your results of your investigation were. > > Baseline: 3734692.00 > This patch: 3739878.67 > > Also I am not so sure about these results as the same patch had passed > previously before and instead it was patch 3 that was reported as having a > -1.2% regression[1]. All I changed in response to that report was to add Well, previously there was also a regression in the successor PageReported() patch, not sure how they bisect in this case. > page_is_reported() which just wrapped the bit test for the reported flag > in a #ifdef to avoid testing it for the blocks that were already #ifdef > wrapped anyway. > > I am still trying to see if I can get access to a system that would be a > better match for the one that reported the issue. My working theory is I barely see false positives (well, I also barely see reports at all) on MM, that's why I asked. > that maybe it requires a high core count per node to reproduce. Either > that or it is some combination of the kernel being tested on and the patch > is causing some loop to go out of alignment and become more expensive. Yes, double check that the config and the setup roughly matches what has been reported. > > I also included the page_fault2 results in my cover page as that seems to > show a slight improvement with all of the patches applied. > > Thanks. > > - Alex > > [1]: https://lore.kernel.org/lkml/20190921152522.GU15734@shao2-debian/ >
On Thu, 2019-10-24 at 11:32 +0200, David Hildenbrand wrote: > On 23.10.19 17:16, Alexander Duyck wrote: > > On Wed, 2019-10-23 at 10:26 +0200, David Hildenbrand wrote: > > > On 23.10.19 00:28, Alexander Duyck wrote: > > > > From: Alexander Duyck <alexander.h.duyck@linux.intel.com> > > > > > > > > In order to enable the use of the zone from the list manipulator functions > > > > I will need access to the zone pointer. As it turns out most of the > > > > accessors were always just being directly passed &zone->free_area[order] > > > > anyway so it would make sense to just fold that into the function itself > > > > and pass the zone and order as arguments instead of the free area. > > > > > > > > In order to be able to reference the zone we need to move the declaration > > > > of the functions down so that we have the zone defined before we define the > > > > list manipulation functions. Since the functions are only used in the file > > > > mm/page_alloc.c we can just move them there to reduce noise in the header. > > > > > > > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > > > > Reviewed-by: David Hildenbrand <david@redhat.com> > > > > Reviewed-by: Pankaj Gupta <pagupta@redhat.com> > > > > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> > > > > --- > > > > include/linux/mmzone.h | 32 ----------------------- > > > > mm/page_alloc.c | 67 +++++++++++++++++++++++++++++++++++------------- > > > > 2 files changed, 49 insertions(+), 50 deletions(-) > > > > > > Did you see > > > > > > https://lore.kernel.org/lkml/20191001152928.27008.8178.stgit@localhost.localdomain/T/#m4d2bc2f37bd7bdc3ae35c4f197905c275d0ad2f9 > > > > > > this time? > > > > > > And the difference to the old patch is only an empty line. > > > > > > > I saw the report. However I have not had much luck reproducing it in order > > to get root cause. Here are my results for linux-next 20191021 with that > > patch running page_fault2 over an average of 3 runs: > > It would have been good if you'd reply to the report or sth. like that. > Then people (including me) are aware that you looked into it and what > your results of your investigation were. I'll try to be more careful to remember to do that in the future. > > Baseline: 3734692.00 > > This patch: 3739878.67 > > > > Also I am not so sure about these results as the same patch had passed > > previously before and instead it was patch 3 that was reported as having a > > -1.2% regression[1]. All I changed in response to that report was to add > > Well, previously there was also a regression in the successor > PageReported() patch, not sure how they bisect in this case. This is one of the things that has me thinking this is a possible code alignment type issue. In the past when chasing down network performance issues I would see these kind of things when a loop went from being cache line aligned to not being aligned. > > page_is_reported() which just wrapped the bit test for the reported flag > > in a #ifdef to avoid testing it for the blocks that were already #ifdef > > wrapped anyway. > > > > I am still trying to see if I can get access to a system that would be a > > better match for the one that reported the issue. My working theory is > > I barely see false positives (well, I also barely see reports at all) on > MM, that's why I asked. Like I said, I will dig into this. > > that maybe it requires a high core count per node to reproduce. Either > > that or it is some combination of the kernel being tested on and the patch > > is causing some loop to go out of alignment and become more expensive. > > Yes, double check that the config and the setup roughly matches what has > been reported. I've been testing with the same .config and version of gcc. The last bit I am trying now is to test with the same exact kernel source. I figure if I can reproduce the issue with that then I can figure out exact root cause, even if there isn't much we can do since the issue doesn't appear with the latest patch set.
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index f1361dd79757..da289a3f8c5e 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -100,29 +100,6 @@ struct free_area { unsigned long nr_free; }; -/* Used for pages not on another list */ -static inline void add_to_free_area(struct page *page, struct free_area *area, - int migratetype) -{ - list_add(&page->lru, &area->free_list[migratetype]); - area->nr_free++; -} - -/* Used for pages not on another list */ -static inline void add_to_free_area_tail(struct page *page, struct free_area *area, - int migratetype) -{ - list_add_tail(&page->lru, &area->free_list[migratetype]); - area->nr_free++; -} - -/* Used for pages which are on another list */ -static inline void move_to_free_area(struct page *page, struct free_area *area, - int migratetype) -{ - list_move(&page->lru, &area->free_list[migratetype]); -} - static inline struct page *get_page_from_free_area(struct free_area *area, int migratetype) { @@ -130,15 +107,6 @@ static inline struct page *get_page_from_free_area(struct free_area *area, struct page, lru); } -static inline void del_page_from_free_area(struct page *page, - struct free_area *area) -{ - list_del(&page->lru); - __ClearPageBuddy(page); - set_page_private(page, 0); - area->nr_free--; -} - static inline bool free_area_empty(struct free_area *area, int migratetype) { return list_empty(&area->free_list[migratetype]); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 02ca4e130985..ea11c6f65157 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -877,6 +877,44 @@ static inline struct capture_control *task_capc(struct zone *zone) } #endif /* CONFIG_COMPACTION */ +/* Used for pages not on another list */ +static inline void add_to_free_list(struct page *page, struct zone *zone, + unsigned int order, int migratetype) +{ + struct free_area *area = &zone->free_area[order]; + + list_add(&page->lru, &area->free_list[migratetype]); + area->nr_free++; +} + +/* Used for pages not on another list */ +static inline void add_to_free_list_tail(struct page *page, struct zone *zone, + unsigned int order, int migratetype) +{ + struct free_area *area = &zone->free_area[order]; + + list_add_tail(&page->lru, &area->free_list[migratetype]); + area->nr_free++; +} + +/* Used for pages which are on another list */ +static inline void move_to_free_list(struct page *page, struct zone *zone, + unsigned int order, int migratetype) +{ + struct free_area *area = &zone->free_area[order]; + + list_move(&page->lru, &area->free_list[migratetype]); +} + +static inline void del_page_from_free_list(struct page *page, struct zone *zone, + unsigned int order) +{ + list_del(&page->lru); + __ClearPageBuddy(page); + set_page_private(page, 0); + zone->free_area[order].nr_free--; +} + /* * If this is not the largest possible page, check if the buddy * of the next-highest order is free. If it is, it's possible @@ -939,7 +977,6 @@ static inline void __free_one_page(struct page *page, struct capture_control *capc = task_capc(zone); unsigned long uninitialized_var(buddy_pfn); unsigned long combined_pfn; - struct free_area *area; unsigned int max_order; struct page *buddy; bool to_tail; @@ -977,7 +1014,7 @@ static inline void __free_one_page(struct page *page, if (page_is_guard(buddy)) clear_page_guard(zone, buddy, order, migratetype); else - del_page_from_free_area(buddy, &zone->free_area[order]); + del_page_from_free_list(buddy, zone, order); combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); pfn = combined_pfn; @@ -1011,16 +1048,15 @@ static inline void __free_one_page(struct page *page, done_merging: set_page_order(page, order); - area = &zone->free_area[order]; if (is_shuffle_order(order)) to_tail = shuffle_pick_tail(); else to_tail = buddy_merge_likely(pfn, buddy_pfn, page, order); if (to_tail) - add_to_free_area_tail(page, area, migratetype); + add_to_free_list_tail(page, zone, order, migratetype); else - add_to_free_area(page, area, migratetype); + add_to_free_list(page, zone, order, migratetype); } /* @@ -2038,13 +2074,11 @@ void __init init_cma_reserved_pageblock(struct page *page) * -- nyc */ static inline void expand(struct zone *zone, struct page *page, - int low, int high, struct free_area *area, - int migratetype) + int low, int high, int migratetype) { unsigned long size = 1 << high; while (high > low) { - area--; high--; size >>= 1; VM_BUG_ON_PAGE(bad_range(zone, &page[size]), &page[size]); @@ -2058,7 +2092,7 @@ static inline void expand(struct zone *zone, struct page *page, if (set_page_guard(zone, &page[size], high, migratetype)) continue; - add_to_free_area(&page[size], area, migratetype); + add_to_free_list(&page[size], zone, high, migratetype); set_page_order(&page[size], high); } } @@ -2216,8 +2250,8 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, page = get_page_from_free_area(area, migratetype); if (!page) continue; - del_page_from_free_area(page, area); - expand(zone, page, order, current_order, area, migratetype); + del_page_from_free_list(page, zone, current_order); + expand(zone, page, order, current_order, migratetype); set_pcppage_migratetype(page, migratetype); return page; } @@ -2291,7 +2325,7 @@ static int move_freepages(struct zone *zone, VM_BUG_ON_PAGE(page_zone(page) != zone, page); order = page_order(page); - move_to_free_area(page, &zone->free_area[order], migratetype); + move_to_free_list(page, zone, order, migratetype); page += 1 << order; pages_moved += 1 << order; } @@ -2407,7 +2441,6 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page, unsigned int alloc_flags, int start_type, bool whole_block) { unsigned int current_order = page_order(page); - struct free_area *area; int free_pages, movable_pages, alike_pages; int old_block_type; @@ -2478,8 +2511,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page, return; single_page: - area = &zone->free_area[current_order]; - move_to_free_area(page, area, start_type); + move_to_free_list(page, zone, current_order, start_type); } /* @@ -3150,7 +3182,6 @@ void split_page(struct page *page, unsigned int order) int __isolate_free_page(struct page *page, unsigned int order) { - struct free_area *area = &page_zone(page)->free_area[order]; unsigned long watermark; struct zone *zone; int mt; @@ -3176,7 +3207,7 @@ int __isolate_free_page(struct page *page, unsigned int order) /* Remove page from free list */ - del_page_from_free_area(page, area); + del_page_from_free_list(page, zone, order); /* * Set the pageblock if the isolated page is at least half of a @@ -8721,7 +8752,7 @@ void zone_pcp_reset(struct zone *zone) pr_info("remove from free list %lx %d %lx\n", pfn, 1 << order, end_pfn); #endif - del_page_from_free_area(page, &zone->free_area[order]); + del_page_from_free_list(page, zone, order); for (i = 0; i < (1 << order); i++) SetPageReserved((page+i)); pfn += (1 << order);