Message ID | 20181112071404.13620-1-richard.weiyang@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm, page_alloc: skip zone who has no managed_pages in calculate_totalreserve_pages() | expand |
On Mon 12-11-18 15:14:04, Wei Yang wrote: > Zone with no managed_pages doesn't contribute totalreserv_pages. And the > more nodes we have, the more empty zones there are. > > This patch skip the zones to save some cycles. What is the motivation for the patch? Does it really cause any measurable difference in performance? > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > --- > mm/page_alloc.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index a919ba5cb3c8..567de15e1106 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -7246,6 +7246,9 @@ static void calculate_totalreserve_pages(void) > struct zone *zone = pgdat->node_zones + i; > long max = 0; > > + if (!managed_zone(zone)) > + continue; > + > /* Find valid and maximum lowmem_reserve in the zone */ > for (j = i; j < MAX_NR_ZONES; j++) { > if (zone->lowmem_reserve[j] > max) > -- > 2.15.1 >
On Mon, Nov 12, 2018 at 09:09:26AM +0100, Michal Hocko wrote: >On Mon 12-11-18 15:14:04, Wei Yang wrote: >> Zone with no managed_pages doesn't contribute totalreserv_pages. And the >> more nodes we have, the more empty zones there are. >> >> This patch skip the zones to save some cycles. > >What is the motivation for the patch? Does it really cause any >measurable difference in performance? > The motivation here is to reduce some unnecessary work. Based on my understanding, almost every node has empty zones, since zones within a node are ordered in monotonic increasing memory address. The worst case is all zones has managed_pages. For example, there is only one node, or configured to have only ZONE_NORMAL and ZONE_MOVABLE. Otherwise, the more node/zone we have, the more empty zones there are. I didn't have detail tests on this patch, since I don't have machine with large numa nodes. While compared with the following ten lines of code, this check to skip them is worthwhile to me. >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >> --- >> mm/page_alloc.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index a919ba5cb3c8..567de15e1106 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -7246,6 +7246,9 @@ static void calculate_totalreserve_pages(void) >> struct zone *zone = pgdat->node_zones + i; >> long max = 0; >> >> + if (!managed_zone(zone)) >> + continue; >> + >> /* Find valid and maximum lowmem_reserve in the zone */ >> for (j = i; j < MAX_NR_ZONES; j++) { >> if (zone->lowmem_reserve[j] > max) >> -- >> 2.15.1 >> > >-- >Michal Hocko >SUSE Labs
On Mon 12-11-18 14:26:41, Wei Yang wrote: > On Mon, Nov 12, 2018 at 09:09:26AM +0100, Michal Hocko wrote: > >On Mon 12-11-18 15:14:04, Wei Yang wrote: > >> Zone with no managed_pages doesn't contribute totalreserv_pages. And the > >> more nodes we have, the more empty zones there are. > >> > >> This patch skip the zones to save some cycles. > > > >What is the motivation for the patch? Does it really cause any > >measurable difference in performance? > > > > The motivation here is to reduce some unnecessary work. I have guessed so even though the changelog was quite modest on the motivation. > Based on my understanding, almost every node has empty zones, since > zones within a node are ordered in monotonic increasing memory address. Yes, this is likely the case. Btw. a check for populated_zone or for_each_populated_zone would suite much better. > The worst case is all zones has managed_pages. For example, there is > only one node, or configured to have only ZONE_NORMAL and > ZONE_MOVABLE. Otherwise, the more node/zone we have, the more empty > zones there are. > > I didn't have detail tests on this patch, since I don't have machine > with large numa nodes. While compared with the following ten lines of > code, this check to skip them is worthwhile to me. Well, the main question is whether the optimization is really worth it. There is not much work done for each zone. I haven't looked closer whether the patch is actually correct, it seems to be though, but optimizations without measurable effect tend to be not that attractive.
On Mon, Nov 12, 2018 at 03:40:20PM +0100, Michal Hocko wrote: >On Mon 12-11-18 14:26:41, Wei Yang wrote: >> On Mon, Nov 12, 2018 at 09:09:26AM +0100, Michal Hocko wrote: >> >On Mon 12-11-18 15:14:04, Wei Yang wrote: >> >> Zone with no managed_pages doesn't contribute totalreserv_pages. And the >> >> more nodes we have, the more empty zones there are. >> >> >> >> This patch skip the zones to save some cycles. >> > >> >What is the motivation for the patch? Does it really cause any >> >measurable difference in performance? >> > >> >> The motivation here is to reduce some unnecessary work. > >I have guessed so even though the changelog was quite modest on the >motivation. > >> Based on my understanding, almost every node has empty zones, since >> zones within a node are ordered in monotonic increasing memory address. > >Yes, this is likely the case. Btw. a check for populated_zone or >for_each_populated_zone would suite much better. > Hmm... maybe not exact. populated_zone checks zone->present_pages managed_zone checks zone->managed_pages As the comment of managed_zone says, this one records the pages managed by buddy system. And when we look at the usage of totalreserve_pages, it is only used in page allocation. And finally, *max* is checked with managed_pages instead of present_pages. Because of this, managed_zone is more accurate at this place. Is my understanding correct? >> The worst case is all zones has managed_pages. For example, there is >> only one node, or configured to have only ZONE_NORMAL and >> ZONE_MOVABLE. Otherwise, the more node/zone we have, the more empty >> zones there are. >> >> I didn't have detail tests on this patch, since I don't have machine >> with large numa nodes. While compared with the following ten lines of >> code, this check to skip them is worthwhile to me. > >Well, the main question is whether the optimization is really worth it. >There is not much work done for each zone. > >I haven't looked closer whether the patch is actually correct, it seems >to be though, but optimizations without measurable effect tend to be not >that attractive. > I believe you are right to some extend, this tiny invisible change is far away from attractive. While I have another opinion about optimization. That would be great to have a strong optimizatioin which improve the system more than 10%. And there are another kind of optimization that improves the system a little. We may call it polish. One polish may not obvious, while cumulative polish make a system outstanding. Why German products are famous all around the world? Why people is willing to pay much more to get a ZWILLING knife than others? Because we trust German manufactures will polish their product day after day, year after year with any efforts they can. So as I am to linux kernel. BTW, I am also thinking about to reduce some unnecessary work of lowmem_reserve[] calculation. Because those empty zone's lowmem_reserve is never used. Even cumulative effect of these two optimization is trivial, I still think it is worth. >-- >Michal Hocko >SUSE Labs
On Tue 13-11-18 01:39:42, Wei Yang wrote: > On Mon, Nov 12, 2018 at 03:40:20PM +0100, Michal Hocko wrote: > >On Mon 12-11-18 14:26:41, Wei Yang wrote: > >> On Mon, Nov 12, 2018 at 09:09:26AM +0100, Michal Hocko wrote: > >> >On Mon 12-11-18 15:14:04, Wei Yang wrote: > >> >> Zone with no managed_pages doesn't contribute totalreserv_pages. And the > >> >> more nodes we have, the more empty zones there are. > >> >> > >> >> This patch skip the zones to save some cycles. > >> > > >> >What is the motivation for the patch? Does it really cause any > >> >measurable difference in performance? > >> > > >> > >> The motivation here is to reduce some unnecessary work. > > > >I have guessed so even though the changelog was quite modest on the > >motivation. > > > >> Based on my understanding, almost every node has empty zones, since > >> zones within a node are ordered in monotonic increasing memory address. > > > >Yes, this is likely the case. Btw. a check for populated_zone or > >for_each_populated_zone would suite much better. > > > > Hmm... maybe not exact. > > populated_zone checks zone->present_pages > managed_zone checks zone->managed_pages > > As the comment of managed_zone says, this one records the pages managed > by buddy system. And when we look at the usage of totalreserve_pages, it > is only used in page allocation. And finally, *max* is checked with > managed_pages instead of present_pages. > > Because of this, managed_zone is more accurate at this place. Is my > understanding correct? OK, fair enough. There is a certain discrepancy here. You are right that we do not care about pages out of the page allocator scope (e.g. early bootmem allocations, struct pages) but this is likely what other callers of populated_zone are looking for as well. It seems that managed pages counter which only came in later was not considered in other places. That being said this asks for a cleanup of some sort. And I think such a cleanup wold be appreciated much more than an optimization of an unknown effect and wonder why this check is used here and not at other places.
On Tue, Nov 13, 2018 at 09:08:34AM +0100, Michal Hocko wrote: >On Tue 13-11-18 01:39:42, Wei Yang wrote: >> On Mon, Nov 12, 2018 at 03:40:20PM +0100, Michal Hocko wrote: >> >On Mon 12-11-18 14:26:41, Wei Yang wrote: >> >> On Mon, Nov 12, 2018 at 09:09:26AM +0100, Michal Hocko wrote: >> >> >On Mon 12-11-18 15:14:04, Wei Yang wrote: >> >> >> Zone with no managed_pages doesn't contribute totalreserv_pages. And the >> >> >> more nodes we have, the more empty zones there are. >> >> >> >> >> >> This patch skip the zones to save some cycles. >> >> > >> >> >What is the motivation for the patch? Does it really cause any >> >> >measurable difference in performance? >> >> > >> >> >> >> The motivation here is to reduce some unnecessary work. >> > >> >I have guessed so even though the changelog was quite modest on the >> >motivation. >> > >> >> Based on my understanding, almost every node has empty zones, since >> >> zones within a node are ordered in monotonic increasing memory address. >> > >> >Yes, this is likely the case. Btw. a check for populated_zone or >> >for_each_populated_zone would suite much better. >> > >> >> Hmm... maybe not exact. >> >> populated_zone checks zone->present_pages >> managed_zone checks zone->managed_pages >> >> As the comment of managed_zone says, this one records the pages managed >> by buddy system. And when we look at the usage of totalreserve_pages, it >> is only used in page allocation. And finally, *max* is checked with >> managed_pages instead of present_pages. >> >> Because of this, managed_zone is more accurate at this place. Is my >> understanding correct? > >OK, fair enough. There is a certain discrepancy here. You are right that >we do not care about pages out of the page allocator scope (e.g. early >bootmem allocations, struct pages) but this is likely what other callers >of populated_zone are looking for as well. It seems that managed pages >counter which only came in later was not considered in other places. > >That being said this asks for a cleanup of some sort. And I think such a >cleanup wold be appreciated much more than an optimization of an unknown >effect and wonder why this check is used here and not at other places. You are right. There are three pages(spanned, managed, present) in a zone, which is a little confusing. So you are willing to get rid of present_pages, if I am right? >-- >Michal Hocko >SUSE Labs
On Tue 13-11-18 08:16:44, Wei Yang wrote: > On Tue, Nov 13, 2018 at 09:08:34AM +0100, Michal Hocko wrote: > >On Tue 13-11-18 01:39:42, Wei Yang wrote: > >> On Mon, Nov 12, 2018 at 03:40:20PM +0100, Michal Hocko wrote: > >> >On Mon 12-11-18 14:26:41, Wei Yang wrote: > >> >> On Mon, Nov 12, 2018 at 09:09:26AM +0100, Michal Hocko wrote: > >> >> >On Mon 12-11-18 15:14:04, Wei Yang wrote: > >> >> >> Zone with no managed_pages doesn't contribute totalreserv_pages. And the > >> >> >> more nodes we have, the more empty zones there are. > >> >> >> > >> >> >> This patch skip the zones to save some cycles. > >> >> > > >> >> >What is the motivation for the patch? Does it really cause any > >> >> >measurable difference in performance? > >> >> > > >> >> > >> >> The motivation here is to reduce some unnecessary work. > >> > > >> >I have guessed so even though the changelog was quite modest on the > >> >motivation. > >> > > >> >> Based on my understanding, almost every node has empty zones, since > >> >> zones within a node are ordered in monotonic increasing memory address. > >> > > >> >Yes, this is likely the case. Btw. a check for populated_zone or > >> >for_each_populated_zone would suite much better. > >> > > >> > >> Hmm... maybe not exact. > >> > >> populated_zone checks zone->present_pages > >> managed_zone checks zone->managed_pages > >> > >> As the comment of managed_zone says, this one records the pages managed > >> by buddy system. And when we look at the usage of totalreserve_pages, it > >> is only used in page allocation. And finally, *max* is checked with > >> managed_pages instead of present_pages. > >> > >> Because of this, managed_zone is more accurate at this place. Is my > >> understanding correct? > > > >OK, fair enough. There is a certain discrepancy here. You are right that > >we do not care about pages out of the page allocator scope (e.g. early > >bootmem allocations, struct pages) but this is likely what other callers > >of populated_zone are looking for as well. It seems that managed pages > >counter which only came in later was not considered in other places. > > > >That being said this asks for a cleanup of some sort. And I think such a > >cleanup wold be appreciated much more than an optimization of an unknown > >effect and wonder why this check is used here and not at other places. > > You are right. There are three pages(spanned, managed, present) in a > zone, which is a little confusing. > > So you are willing to get rid of present_pages, if I am right? No, I believe we want all three of them. But reviewing for_each_populated_zone users and explicit checks for present/managed pages and unify them would be a step forward both a more optimal code and more maintainable code. I haven't checked but for_each_populated_zone would seem like a proper user for managed page counter. But that really requires to review all current users.
On Tue, Nov 13, 2018 at 10:07:58AM +0100, Michal Hocko wrote: >On Tue 13-11-18 08:16:44, Wei Yang wrote: >> On Tue, Nov 13, 2018 at 09:08:34AM +0100, Michal Hocko wrote: >> >On Tue 13-11-18 01:39:42, Wei Yang wrote: >> >> On Mon, Nov 12, 2018 at 03:40:20PM +0100, Michal Hocko wrote: >> >> >On Mon 12-11-18 14:26:41, Wei Yang wrote: >> >> >> On Mon, Nov 12, 2018 at 09:09:26AM +0100, Michal Hocko wrote: >> >> >> >On Mon 12-11-18 15:14:04, Wei Yang wrote: >> >> >> >> Zone with no managed_pages doesn't contribute totalreserv_pages. And the >> >> >> >> more nodes we have, the more empty zones there are. >> >> >> >> >> >> >> >> This patch skip the zones to save some cycles. >> >> >> > >> >> >> >What is the motivation for the patch? Does it really cause any >> >> >> >measurable difference in performance? >> >> >> > >> >> >> >> >> >> The motivation here is to reduce some unnecessary work. >> >> > >> >> >I have guessed so even though the changelog was quite modest on the >> >> >motivation. >> >> > >> >> >> Based on my understanding, almost every node has empty zones, since >> >> >> zones within a node are ordered in monotonic increasing memory address. >> >> > >> >> >Yes, this is likely the case. Btw. a check for populated_zone or >> >> >for_each_populated_zone would suite much better. >> >> > >> >> >> >> Hmm... maybe not exact. >> >> >> >> populated_zone checks zone->present_pages >> >> managed_zone checks zone->managed_pages >> >> >> >> As the comment of managed_zone says, this one records the pages managed >> >> by buddy system. And when we look at the usage of totalreserve_pages, it >> >> is only used in page allocation. And finally, *max* is checked with >> >> managed_pages instead of present_pages. >> >> >> >> Because of this, managed_zone is more accurate at this place. Is my >> >> understanding correct? >> > >> >OK, fair enough. There is a certain discrepancy here. You are right that >> >we do not care about pages out of the page allocator scope (e.g. early >> >bootmem allocations, struct pages) but this is likely what other callers >> >of populated_zone are looking for as well. It seems that managed pages >> >counter which only came in later was not considered in other places. >> > >> >That being said this asks for a cleanup of some sort. And I think such a >> >cleanup wold be appreciated much more than an optimization of an unknown >> >effect and wonder why this check is used here and not at other places. >> >> You are right. There are three pages(spanned, managed, present) in a >> zone, which is a little confusing. >> >> So you are willing to get rid of present_pages, if I am right? > >No, I believe we want all three of them. But reviewing >for_each_populated_zone users and explicit checks for present/managed >pages and unify them would be a step forward both a more optimal code >and more maintainable code. I haven't checked but >for_each_populated_zone would seem like a proper user for managed page >counter. But that really requires to review all current users. Got your point. Let me take a look to see if I could make a cleanup. > >-- >Michal Hocko >SUSE Labs
On Tue, Nov 13, 2018 at 10:07:58AM +0100, Michal Hocko wrote: >On Tue 13-11-18 08:16:44, Wei Yang wrote: > >No, I believe we want all three of them. But reviewing >for_each_populated_zone users and explicit checks for present/managed >pages and unify them would be a step forward both a more optimal code >and more maintainable code. I haven't checked but >for_each_populated_zone would seem like a proper user for managed page >counter. But that really requires to review all current users. > To sync with your purpose, I searched the user of for_each_populated_zone() and replace it with a new loop for_each_managed_zone(). Here is a summary of what I have done. file used changed ---------------------------------------------- arch/s390/mm/page-states.c 1 1 kernel/power/snapshot.c 7 3 mm/highmem.c 1 1 mm/huge_memory.c 1 0 mm/khugepaged.c 1 1 mm/madvise.c 1 1 mm/page_alloc.c 8 8 mm/vmstat.c 5 5 The general idea to replace for_each_populated_zone() with for_each_populated_zone() is: * access zone->freelist * access zone pcp * access zone_page_state Is my understanding comply with what you want? >-- >Michal Hocko >SUSE Labs
On Wed 14-11-18 07:43:41, Wei Yang wrote: > On Tue, Nov 13, 2018 at 10:07:58AM +0100, Michal Hocko wrote: > >On Tue 13-11-18 08:16:44, Wei Yang wrote: > > > >No, I believe we want all three of them. But reviewing > >for_each_populated_zone users and explicit checks for present/managed > >pages and unify them would be a step forward both a more optimal code > >and more maintainable code. I haven't checked but > >for_each_populated_zone would seem like a proper user for managed page > >counter. But that really requires to review all current users. > > > > To sync with your purpose, I searched the user of > for_each_populated_zone() and replace it with a new loop > for_each_managed_zone(). I do not think we really want a new iterator. Is there any users of for_each_populated_zone which would be interested in something else than managed pages?
On Wed, Nov 14, 2018 at 08:48:21AM +0100, Michal Hocko wrote: >On Wed 14-11-18 07:43:41, Wei Yang wrote: >> On Tue, Nov 13, 2018 at 10:07:58AM +0100, Michal Hocko wrote: >> >On Tue 13-11-18 08:16:44, Wei Yang wrote: >> > >> >No, I believe we want all three of them. But reviewing >> >for_each_populated_zone users and explicit checks for present/managed >> >pages and unify them would be a step forward both a more optimal code >> >and more maintainable code. I haven't checked but >> >for_each_populated_zone would seem like a proper user for managed page >> >counter. But that really requires to review all current users. >> > >> >> To sync with your purpose, I searched the user of >> for_each_populated_zone() and replace it with a new loop >> for_each_managed_zone(). > >I do not think we really want a new iterator. Is there any users of >for_each_populated_zone which would be interested in something else than >managed pages? Your purpose is replace the populated_zone() in for_each_populated_zone() with managed_zone()? If this is the case, most of them is possible. Some places I am not sure is: kernel/power/snapshot.c mm/huge_memory.c mm/khugepaged.c For other places, I thinks it is ok to replace it. >-- >Michal Hocko >SUSE Labs
On Wed 14-11-18 08:20:47, Wei Yang wrote: > On Wed, Nov 14, 2018 at 08:48:21AM +0100, Michal Hocko wrote: > >On Wed 14-11-18 07:43:41, Wei Yang wrote: > >> On Tue, Nov 13, 2018 at 10:07:58AM +0100, Michal Hocko wrote: > >> >On Tue 13-11-18 08:16:44, Wei Yang wrote: > >> > > >> >No, I believe we want all three of them. But reviewing > >> >for_each_populated_zone users and explicit checks for present/managed > >> >pages and unify them would be a step forward both a more optimal code > >> >and more maintainable code. I haven't checked but > >> >for_each_populated_zone would seem like a proper user for managed page > >> >counter. But that really requires to review all current users. > >> > > >> > >> To sync with your purpose, I searched the user of > >> for_each_populated_zone() and replace it with a new loop > >> for_each_managed_zone(). > > > >I do not think we really want a new iterator. Is there any users of > >for_each_populated_zone which would be interested in something else than > >managed pages? > > Your purpose is replace the populated_zone() in > for_each_populated_zone() with managed_zone()? Well, we might rename as well but I if we have only one or two users then an opencoded variant with populated_zone() check sounds better than a new iterator. > If this is the case, most of them is possible. Some places I am not sure > is: > > kernel/power/snapshot.c This one really looks like it wants the full pfn range whether it is managed or not. So changing this to opencoded for_each_zone + populated_zone check should be OK. > mm/huge_memory.c > mm/khugepaged.c These two are definitely page allocator related so they do care about managed.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a919ba5cb3c8..567de15e1106 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7246,6 +7246,9 @@ static void calculate_totalreserve_pages(void) struct zone *zone = pgdat->node_zones + i; long max = 0; + if (!managed_zone(zone)) + continue; + /* Find valid and maximum lowmem_reserve in the zone */ for (j = i; j < MAX_NR_ZONES; j++) { if (zone->lowmem_reserve[j] > max)
Zone with no managed_pages doesn't contribute totalreserv_pages. And the more nodes we have, the more empty zones there are. This patch skip the zones to save some cycles. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> --- mm/page_alloc.c | 3 +++ 1 file changed, 3 insertions(+)