Message ID | 154361479877.7497.2824031260670152276.stgit@ahduyck-desk1.amr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Deferred page init improvements | expand |
On Fri 30-11-18 13:53:18, Alexander Duyck wrote: > Modify the set_page_links function to include the setting of the reserved > flag via a simple AND and OR operation. The motivation for this is the fact > that the existing __set_bit call still seems to have effects on performance > as replacing the call with the AND and OR can reduce initialization time. > > Looking over the assembly code before and after the change the main > difference between the two is that the reserved bit is stored in a value > that is generated outside of the main initialization loop and is then > written with the other flags field values in one write to the page->flags > value. Previously the generated value was written and then then a btsq > instruction was issued. > > On my x86_64 test system with 3TB of persistent memory per node I saw the > persistent memory initialization time on average drop from 23.49s to > 19.12s per node. I have tried to explain why the whole reserved bit doesn't make much sense in this code several times already. You keep ignoring that and that is highly annoying. Especially when you add a tricky code to optimize something that is not really needed. Based on that I am not going to waste my time on other patches in this series to review and give feedback which might be ignored again.
On Wed, 2018-12-05 at 18:22 +0100, Michal Hocko wrote: > On Fri 30-11-18 13:53:18, Alexander Duyck wrote: > > Modify the set_page_links function to include the setting of the reserved > > flag via a simple AND and OR operation. The motivation for this is the fact > > that the existing __set_bit call still seems to have effects on performance > > as replacing the call with the AND and OR can reduce initialization time. > > > > Looking over the assembly code before and after the change the main > > difference between the two is that the reserved bit is stored in a value > > that is generated outside of the main initialization loop and is then > > written with the other flags field values in one write to the page->flags > > value. Previously the generated value was written and then then a btsq > > instruction was issued. > > > > On my x86_64 test system with 3TB of persistent memory per node I saw the > > persistent memory initialization time on average drop from 23.49s to > > 19.12s per node. > > I have tried to explain why the whole reserved bit doesn't make much > sense in this code several times already. You keep ignoring that and > that is highly annoying. Especially when you add a tricky code to > optimize something that is not really needed. > > Based on that I am not going to waste my time on other patches in this > series to review and give feedback which might be ignored again. I got your explanation. However Andrew had already applied the patches and I had some outstanding issues in them that needed to be addressed. So I thought it best to send out this set of patches with those fixes before the code in mm became too stale. I am still working on what to do about the Reserved bit, and plan to submit it as a follow-up set.
On Wed 05-12-18 09:55:17, Alexander Duyck wrote: > On Wed, 2018-12-05 at 18:22 +0100, Michal Hocko wrote: > > On Fri 30-11-18 13:53:18, Alexander Duyck wrote: > > > Modify the set_page_links function to include the setting of the reserved > > > flag via a simple AND and OR operation. The motivation for this is the fact > > > that the existing __set_bit call still seems to have effects on performance > > > as replacing the call with the AND and OR can reduce initialization time. > > > > > > Looking over the assembly code before and after the change the main > > > difference between the two is that the reserved bit is stored in a value > > > that is generated outside of the main initialization loop and is then > > > written with the other flags field values in one write to the page->flags > > > value. Previously the generated value was written and then then a btsq > > > instruction was issued. > > > > > > On my x86_64 test system with 3TB of persistent memory per node I saw the > > > persistent memory initialization time on average drop from 23.49s to > > > 19.12s per node. > > > > I have tried to explain why the whole reserved bit doesn't make much > > sense in this code several times already. You keep ignoring that and > > that is highly annoying. Especially when you add a tricky code to > > optimize something that is not really needed. > > > > Based on that I am not going to waste my time on other patches in this > > series to review and give feedback which might be ignored again. > > I got your explanation. However Andrew had already applied the patches > and I had some outstanding issues in them that needed to be addressed. > So I thought it best to send out this set of patches with those fixes > before the code in mm became too stale. I am still working on what to > do about the Reserved bit, and plan to submit it as a follow-up set. From my experience Andrew can drop patches between different versions of the patchset. Things can change a lot while they are in mmotm and under the discussion.
On Wed, 5 Dec 2018 21:42:47 +0100 Michal Hocko <mhocko@kernel.org> wrote: > > I got your explanation. However Andrew had already applied the patches > > and I had some outstanding issues in them that needed to be addressed. > > So I thought it best to send out this set of patches with those fixes > > before the code in mm became too stale. I am still working on what to > > do about the Reserved bit, and plan to submit it as a follow-up set. > > >From my experience Andrew can drop patches between different versions of > the patchset. Things can change a lot while they are in mmotm and under > the discussion. It's been a while and everyone has forgotten everything, so I'll drop this version of the patchset.
On Tue, 2019-03-12 at 15:07 -0700, Andrew Morton wrote: > On Wed, 5 Dec 2018 21:42:47 +0100 Michal Hocko <mhocko@kernel.org> wrote: > > > > I got your explanation. However Andrew had already applied the patches > > > and I had some outstanding issues in them that needed to be addressed. > > > So I thought it best to send out this set of patches with those fixes > > > before the code in mm became too stale. I am still working on what to > > > do about the Reserved bit, and plan to submit it as a follow-up set. > > > From my experience Andrew can drop patches between different versions of > > the patchset. Things can change a lot while they are in mmotm and under > > the discussion. > > It's been a while and everyone has forgotten everything, so I'll drop > this version of the patchset. > As far as getting to the reserved bit I probably won't have the time in the near future. If I were to resubmit the first 4 patches as a standalone patch set would that be acceptable, or would they be held up as well until the reserved bit issues is addressed? - Alex
On Tue, 12 Mar 2019 15:50:36 -0700 Alexander Duyck <alexander.h.duyck@linux.intel.com> wrote: > On Tue, 2019-03-12 at 15:07 -0700, Andrew Morton wrote: > > On Wed, 5 Dec 2018 21:42:47 +0100 Michal Hocko <mhocko@kernel.org> wrote: > > > > > > I got your explanation. However Andrew had already applied the patches > > > > and I had some outstanding issues in them that needed to be addressed. > > > > So I thought it best to send out this set of patches with those fixes > > > > before the code in mm became too stale. I am still working on what to > > > > do about the Reserved bit, and plan to submit it as a follow-up set. > > > > From my experience Andrew can drop patches between different versions of > > > the patchset. Things can change a lot while they are in mmotm and under > > > the discussion. > > > > It's been a while and everyone has forgotten everything, so I'll drop > > this version of the patchset. > > > > As far as getting to the reserved bit I probably won't have the time in > the near future. If I were to resubmit the first 4 patches as a > standalone patch set would that be acceptable, or would they be held up > as well until the reserved bit issues is addressed? > Yes, I think that merging the first four will be OK. As long as they don't add some bug which [5/5] corrects, which happens sometimes! Please redo, retest and resend sometime?
On Wed, 2019-03-13 at 09:33 -0700, Andrew Morton wrote: > On Tue, 12 Mar 2019 15:50:36 -0700 Alexander Duyck <alexander.h.duyck@linux.intel.com> wrote: > > > On Tue, 2019-03-12 at 15:07 -0700, Andrew Morton wrote: > > > On Wed, 5 Dec 2018 21:42:47 +0100 Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > I got your explanation. However Andrew had already applied the patches > > > > > and I had some outstanding issues in them that needed to be addressed. > > > > > So I thought it best to send out this set of patches with those fixes > > > > > before the code in mm became too stale. I am still working on what to > > > > > do about the Reserved bit, and plan to submit it as a follow-up set. > > > > > From my experience Andrew can drop patches between different versions of > > > > the patchset. Things can change a lot while they are in mmotm and under > > > > the discussion. > > > > > > It's been a while and everyone has forgotten everything, so I'll drop > > > this version of the patchset. > > > > > > > As far as getting to the reserved bit I probably won't have the time in > > the near future. If I were to resubmit the first 4 patches as a > > standalone patch set would that be acceptable, or would they be held up > > as well until the reserved bit issues is addressed? > > > > Yes, I think that merging the first four will be OK. As long as they > don't add some bug which [5/5] corrects, which happens sometimes! > > Please redo, retest and resend sometime? I had gone through and tested with each patch applied individually when I was performance testing them, and I am fairly certain there wasn't a bug introduced between any two patches. The issue that I recall Michal had was the fact that I was essentially embedding the setting of the reserved page under several layers of function calls, which would make it harder to remove. I started that work at about patch 5 which is why I figured I would resend the first 4, and hold off on 5-7 until I can get the reserved bit removal for hotplug done. I can probably have the patches ready to go in a couple days. I'll send updates once linux-next and mmotm with the patches dropped have been posted. Thanks. - Alex
diff --git a/include/linux/mm.h b/include/linux/mm.h index eb6e52b66bc2..5faf66dd4559 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1238,11 +1238,18 @@ static inline void set_page_node(struct page *page, unsigned long node) page->flags |= (node & NODES_MASK) << NODES_PGSHIFT; } +static inline void set_page_reserved(struct page *page, bool reserved) +{ + page->flags &= ~(1ul << PG_reserved); + page->flags |= (unsigned long)(!!reserved) << PG_reserved; +} + static inline void set_page_links(struct page *page, enum zone_type zone, - unsigned long node, unsigned long pfn) + unsigned long node, unsigned long pfn, bool reserved) { set_page_zone(page, zone); set_page_node(page, node); + set_page_reserved(page, reserved); #ifdef SECTION_IN_PAGE_FLAGS set_page_section(page, pfn_to_section_nr(pfn)); #endif diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 416bbb6f05ab..61eb9945d805 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1183,10 +1183,16 @@ static void free_one_page(struct zone *zone, static void __meminit __init_struct_page_nolru(struct page *page, unsigned long pfn, - unsigned long zone, int nid) + unsigned long zone, int nid, + bool is_reserved) { mm_zero_struct_page(page); - set_page_links(page, zone, nid, pfn); + + /* + * We can use a non-atomic operation for setting the + * PG_reserved flag as we are still initializing the pages. + */ + set_page_links(page, zone, nid, pfn, is_reserved); init_page_count(page); page_mapcount_reset(page); page_cpupid_reset_last(page); @@ -1202,14 +1208,15 @@ static void __meminit __init_struct_page_nolru(struct page *page, static void __meminit __init_single_page(struct page *page, unsigned long pfn, unsigned long zone, int nid) { - __init_struct_page_nolru(page, pfn, zone, nid); + __init_struct_page_nolru(page, pfn, zone, nid, false); INIT_LIST_HEAD(&page->lru); } static void __meminit __init_pageblock(unsigned long start_pfn, unsigned long nr_pages, unsigned long zone, int nid, - struct dev_pagemap *pgmap) + struct dev_pagemap *pgmap, + bool is_reserved) { unsigned long nr_pgmask = pageblock_nr_pages - 1; struct page *start_page = pfn_to_page(start_pfn); @@ -1235,15 +1242,8 @@ static void __meminit __init_pageblock(unsigned long start_pfn, * is not defined. */ for (page = start_page + nr_pages; page-- != start_page; pfn--) { - __init_struct_page_nolru(page, pfn, zone, nid); - /* - * Mark page reserved as it will need to wait for onlining - * phase for it to be fully associated with a zone. - * - * We can use the non-atomic __set_bit operation for setting - * the flag as we are still initializing the pages. - */ - __SetPageReserved(page); + __init_struct_page_nolru(page, pfn, zone, nid, is_reserved); + /* * ZONE_DEVICE pages union ->lru with a ->pgmap back * pointer and hmm_data. It is a bug if a ZONE_DEVICE @@ -5780,7 +5780,18 @@ static void __meminit __memmap_init_hotplug(unsigned long size, int nid, pfn = max(ALIGN_DOWN(pfn - 1, pageblock_nr_pages), start_pfn); stride -= pfn; - __init_pageblock(pfn, stride, zone, nid, pgmap); + /* + * The last argument of __init_pageblock is a boolean + * value indicating if the page will be marked as reserved. + * + * Mark page reserved as it will need to wait for onlining + * phase for it to be fully associated with a zone. + * + * Under certain circumstances ZONE_DEVICE pages may not + * need to be marked as reserved, however there is still + * code that is depending on this being set for now. + */ + __init_pageblock(pfn, stride, zone, nid, pgmap, true); cond_resched(); }