Message ID | 20181214023912.77474-1-richard.weiyang@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm, page_isolation: remove drain_all_pages() in set_migratetype_isolate() | expand |
On Fri, 14 Dec 2018 10:39:12 +0800 Wei Yang <richard.weiyang@gmail.com> wrote: > Below is a brief call flow for __offline_pages() Offtopic... set_migratetype_isolate() has the comment /* * immobile means "not-on-lru" pages. If immobile is larger than * removable-by-driver pages reported by notifier, we'll fail. */ what the heck does that mean? It used to talk about unmovable pages, but this was mysteriously changed to use the unique term "immobile" by Minchan's ee6f509c32 ("mm: factor out memory isolate functions"). Could someone please take a look? > and > alloc_contig_range(): > > __offline_pages()/alloc_contig_range() > start_isolate_page_range() > set_migratetype_isolate() > drain_all_pages() > drain_all_pages() > > Since set_migratetype_isolate() is only used in > start_isolate_page_range(), which is just used in __offline_pages() and > alloc_contig_range(). And both of them call drain_all_pages() if every > check looks good. This means it is not necessary call drain_all_pages() > in each iteration of set_migratetype_isolate(). > > By doing so, the logic seems a little bit clearer. > set_migratetype_isolate() handles pages in Buddy, while > drain_all_pages() takes care of pages in pcp. Well. drain_all_pages() moves pages from pcp to buddy so I'm not sure that argument holds water. Can we step back a bit and ask ourselves what all these draining operations are actually for? What is the intent behind each callsite? Figuring that out (and perhaps even documenting it!) would help us decide the most appropriate places from which to perform the drain.
On Thu, Dec 13, 2018 at 07:57:12PM -0800, Andrew Morton wrote: >On Fri, 14 Dec 2018 10:39:12 +0800 Wei Yang <richard.weiyang@gmail.com> wrote: > >> Below is a brief call flow for __offline_pages() > >Offtopic... > >set_migratetype_isolate() has the comment > > /* > * immobile means "not-on-lru" pages. If immobile is larger than > * removable-by-driver pages reported by notifier, we'll fail. > */ > >what the heck does that mean? It used to talk about unmovable pages, >but this was mysteriously changed to use the unique term "immobile" by >Minchan's ee6f509c32 ("mm: factor out memory isolate functions"). >Could someone please take a look? > What immobile stands for? I searched the whole kernel tree and just this place use this terminology. > >> and >> alloc_contig_range(): >> >> __offline_pages()/alloc_contig_range() >> start_isolate_page_range() >> set_migratetype_isolate() >> drain_all_pages() >> drain_all_pages() >> >> Since set_migratetype_isolate() is only used in >> start_isolate_page_range(), which is just used in __offline_pages() and >> alloc_contig_range(). And both of them call drain_all_pages() if every >> check looks good. This means it is not necessary call drain_all_pages() >> in each iteration of set_migratetype_isolate(). >> >> By doing so, the logic seems a little bit clearer. >> set_migratetype_isolate() handles pages in Buddy, while >> drain_all_pages() takes care of pages in pcp. > >Well. drain_all_pages() moves pages from pcp to buddy so I'm not sure >that argument holds water. > You mean the wartermark? >Can we step back a bit and ask ourselves what all these draining >operations are actually for? What is the intent behind each callsite? >Figuring that out (and perhaps even documenting it!) would help us >decide the most appropriate places from which to perform the drain. That is great. I found myself hard to understand current implementation. Let me try to write down what I understand now. Current mm subsystem manage memory with a hierarchic way. * Buddy system * pcp pageset * slub With this background, we handle pages differently for different layer. * set_migratetype_isolate() handle pages still in Buddy system. * drain_all_pages() handle pages in pcp pageset. * I don't know who handle pages in slub. While there are still pages out there, eg. page table, file pages, I don't understand how they are handled during offline. Especially, how to catch them all in a specific range. Now go back to this patch. __offline_pages()/alloc_contig_range() start_isolate_page_range() set_migratetype_isolate() drain_all_pages() drain_all_pages() start_isolate_page_range() will iterate a range with pageblock step to isolate them. Since both __offline_pages() and alloc_contig_range() require this range to be in the same zone, drain_all_pages() will drain the pcp pageset of the same zone several times. After that, drain_all_pages() will be called again to drain pages. One thing we can notice is after set_migratetype_isolate() for a particular range, this range's page will not be available for allocation. But the pages after this range still has a chance to be put on pcp pageset. And during this process, pages of the same zone but out of the whole range could be put on the pcp pageset. This means current implementation would drain those pages several times and may increase contention for this zone. This behavior seems suboptimal. And we can do this just in once to drain all of them.
On Thu, Dec 13, 2018 at 07:57:12PM -0800, Andrew Morton wrote: >On Fri, 14 Dec 2018 10:39:12 +0800 Wei Yang <richard.weiyang@gmail.com> wrote: > >> Below is a brief call flow for __offline_pages() > >Offtopic... > >set_migratetype_isolate() has the comment > > /* > * immobile means "not-on-lru" pages. If immobile is larger than > * removable-by-driver pages reported by notifier, we'll fail. > */ > >what the heck does that mean? It used to talk about unmovable pages, >but this was mysteriously changed to use the unique term "immobile" by >Minchan's ee6f509c32 ("mm: factor out memory isolate functions"). >Could someone please take a look? > > >> and >> alloc_contig_range(): >> >> __offline_pages()/alloc_contig_range() >> start_isolate_page_range() >> set_migratetype_isolate() >> drain_all_pages() >> drain_all_pages() >> >> Since set_migratetype_isolate() is only used in >> start_isolate_page_range(), which is just used in __offline_pages() and >> alloc_contig_range(). And both of them call drain_all_pages() if every >> check looks good. This means it is not necessary call drain_all_pages() >> in each iteration of set_migratetype_isolate(). >> >> By doing so, the logic seems a little bit clearer. >> set_migratetype_isolate() handles pages in Buddy, while >> drain_all_pages() takes care of pages in pcp. > >Well. drain_all_pages() moves pages from pcp to buddy so I'm not sure >that argument holds water. > >Can we step back a bit and ask ourselves what all these draining >operations are actually for? What is the intent behind each callsite? >Figuring that out (and perhaps even documenting it!) would help us >decide the most appropriate places from which to perform the drain. With some rethinking we even could take drain_all_pages() out of the repeat loop. Because after isolation, the page in this range will not be put to pcp pageset. So we just need to drain pages once. The change may look like this. diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 6910e0eea074..120e9fdfd055 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1590,6 +1590,7 @@ static int __ref __offline_pages(unsigned long start_pfn, if (ret) goto failed_removal; + drain_all_pages(zone); pfn = start_pfn; repeat: /* start memory hot removal */ @@ -1599,7 +1600,6 @@ static int __ref __offline_pages(unsigned long start_pfn, cond_resched(); lru_add_drain_all(); - drain_all_pages(zone); pfn = scan_movable_pages(start_pfn, end_pfn); if (pfn) { /* We have movable pages */
On Fri 14-12-18 15:17:56, Wei Yang wrote: > On Thu, Dec 13, 2018 at 07:57:12PM -0800, Andrew Morton wrote: > >On Fri, 14 Dec 2018 10:39:12 +0800 Wei Yang <richard.weiyang@gmail.com> wrote: > > > >> Below is a brief call flow for __offline_pages() > > > >Offtopic... > > > >set_migratetype_isolate() has the comment > > > > /* > > * immobile means "not-on-lru" pages. If immobile is larger than > > * removable-by-driver pages reported by notifier, we'll fail. > > */ > > > >what the heck does that mean? It used to talk about unmovable pages, > >but this was mysteriously changed to use the unique term "immobile" by > >Minchan's ee6f509c32 ("mm: factor out memory isolate functions"). > >Could someone please take a look? > > > > > >> and > >> alloc_contig_range(): > >> > >> __offline_pages()/alloc_contig_range() > >> start_isolate_page_range() > >> set_migratetype_isolate() > >> drain_all_pages() > >> drain_all_pages() > >> > >> Since set_migratetype_isolate() is only used in > >> start_isolate_page_range(), which is just used in __offline_pages() and > >> alloc_contig_range(). And both of them call drain_all_pages() if every > >> check looks good. This means it is not necessary call drain_all_pages() > >> in each iteration of set_migratetype_isolate(). > >> > >> By doing so, the logic seems a little bit clearer. > >> set_migratetype_isolate() handles pages in Buddy, while > >> drain_all_pages() takes care of pages in pcp. > > > >Well. drain_all_pages() moves pages from pcp to buddy so I'm not sure > >that argument holds water. > > > >Can we step back a bit and ask ourselves what all these draining > >operations are actually for? What is the intent behind each callsite? > >Figuring that out (and perhaps even documenting it!) would help us > >decide the most appropriate places from which to perform the drain. > > With some rethinking we even could take drain_all_pages() out of the > repeat loop. Because after isolation, the page in this range will not be > put to pcp pageset. So we just need to drain pages once. > > The change may look like this. No, this is incorrect. Draining pcp lists before scan_movable_pages is most likely sub-optimal, because scan_movable_pages will simply ignore pages being on the pcp lists. But we definitely want to drain before we terminate the offlining phase because we do not want to have isolated pages on those lists before we allow the final hotremove. The way how we retry the migration loop until there is no page in use just guarantees that drain_all_pages is called. If you put it out of the loop then you just break that assumption. Moving drain_all_pages down after the migration is done should work well AFAICS but I didn't really think through all potential side effects nor have time to do so now. > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 6910e0eea074..120e9fdfd055 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1590,6 +1590,7 @@ static int __ref __offline_pages(unsigned long start_pfn, > if (ret) > goto failed_removal; > > + drain_all_pages(zone); > pfn = start_pfn; > repeat: > /* start memory hot removal */ > @@ -1599,7 +1600,6 @@ static int __ref __offline_pages(unsigned long start_pfn, > > cond_resched(); > lru_add_drain_all(); > - drain_all_pages(zone); > > pfn = scan_movable_pages(start_pfn, end_pfn); > if (pfn) { /* We have movable pages */ > > -- > Wei Yang > Help you, Help me
On Fri 14-12-18 10:39:12, Wei Yang wrote: > Below is a brief call flow for __offline_pages() and > alloc_contig_range(): > > __offline_pages()/alloc_contig_range() > start_isolate_page_range() > set_migratetype_isolate() > drain_all_pages() > drain_all_pages() > > Since set_migratetype_isolate() is only used in > start_isolate_page_range(), which is just used in __offline_pages() and > alloc_contig_range(). And both of them call drain_all_pages() if every > check looks good. This means it is not necessary call drain_all_pages() > in each iteration of set_migratetype_isolate(). > > By doing so, the logic seems a little bit clearer. > set_migratetype_isolate() handles pages in Buddy, while > drain_all_pages() takes care of pages in pcp. I have to confess I am not sure about the purpose of the draining here. I suspect it is to make sure that pages in the pcp lists really get isolated and if that is the case then it makes sense. In any case I strongly suggest not touching this code without a very good explanation on why this is not needed. Callers do XYZ is not a proper explanation because assumes that all callers will know that this has to be done. So either we really need to drain and then it is better to make it here or we don't but that requires some explanation. > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > --- > mm/page_isolation.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 43e085608846..f44c0e333bed 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -83,8 +83,6 @@ static int set_migratetype_isolate(struct page *page, int migratetype, > } > > spin_unlock_irqrestore(&zone->lock, flags); > - if (!ret) > - drain_all_pages(zone); > return ret; > } > > -- > 2.15.1 >
On Mon, Dec 17, 2018 at 01:25:23PM +0100, Michal Hocko wrote: >On Fri 14-12-18 10:39:12, Wei Yang wrote: >> Below is a brief call flow for __offline_pages() and >> alloc_contig_range(): >> >> __offline_pages()/alloc_contig_range() >> start_isolate_page_range() >> set_migratetype_isolate() >> drain_all_pages() >> drain_all_pages() >> >> Since set_migratetype_isolate() is only used in >> start_isolate_page_range(), which is just used in __offline_pages() and >> alloc_contig_range(). And both of them call drain_all_pages() if every >> check looks good. This means it is not necessary call drain_all_pages() >> in each iteration of set_migratetype_isolate(). >> >> By doing so, the logic seems a little bit clearer. >> set_migratetype_isolate() handles pages in Buddy, while >> drain_all_pages() takes care of pages in pcp. > >I have to confess I am not sure about the purpose of the draining here. >I suspect it is to make sure that pages in the pcp lists really get >isolated and if that is the case then it makes sense. > >In any case I strongly suggest not touching this code without a very >good explanation on why this is not needed. Callers do XYZ is not a >proper explanation because assumes that all callers will know that this >has to be done. So either we really need to drain and then it is better >to make it here or we don't but that requires some explanation. > Yep, let me try to explain what is trying to do. Based on my understanding, online_pages do two things * adjust zone/pgdat status * put pages into Buddy Generally, offline_pages do the reverse * take pages out of Buddy * adjust zone/pgdat status While it is not that easy to take pages out of Buddy, since pages are * pcp list * slub * other usage This means before taking a page out of Buddy, we need to return it first to Buddy. Current implementation is interesting by introducing migrate type. By setting migrate type to MIGRATE_ISOLATE, this range of pages will never be allocated from Buddy. And every page returned in this range will never be touched by Buddy. Function start_isolate_page_range() just do this. Then let's focus on the pcp list. This is a little bit different than other allocated pages. These are actually "partially" allocated pages. They are not counted in Buddy Free pages, either no real use. So we have two choice to get back those pages: * wait until it is allocated to a real user and wait for return * or drain them directly Current implementation take 2nd approach. Then we can see there are also two way to drain them: * drain them range by range * drain them in a whole range Both looks good, but not necessary to do them both. Because after we set a pageblock migrate type to MIGRATE_ISOLATE, pages in this range will never be allocated nor be put on pcp list. So after we drain one particular range, it is not necessary to drain this range again. The reason why I choose to drain them in a whole range is current drain_all_pages() just carry zone information. For example, a zone may have 1G while a pageblock is 128M. The pageblock is 1/8 of this zone. This means in case there are 8 pages on pcp list, only 1 page drained by drain_all_pages belongs to this pageblock. But we drain other 7 healthy pages. CPU1 pcp list CPU2 pcp list +---------------+ +---------------+ |A1 B3 C8 F6 | |E1 G3 D8 B6 | +---------------+ +---------------+ A B C D E F G +---------+---------+---------+---------+---------+---------+---------+ |012345678| | | | | | | +---------+---------+---------+---------+---------+---------+---------+ |<-pgblk->| |<- Zone ->| This is a chart for illustration. In case we want to isolate pgblk D, while zone pcp list has 8 pages and only one belongs to this pgblk D. This means the drain on pgblk base has much side effect. And with one drain on each pgblk, this may increase the contention on this zone. Well, another approach is to enable drain_all_pages() with exact range information. But neither approach needs to do them both.
On Mon 17-12-18 15:08:19, Wei Yang wrote: > On Mon, Dec 17, 2018 at 01:25:23PM +0100, Michal Hocko wrote: > >On Fri 14-12-18 10:39:12, Wei Yang wrote: > >> Below is a brief call flow for __offline_pages() and > >> alloc_contig_range(): > >> > >> __offline_pages()/alloc_contig_range() > >> start_isolate_page_range() > >> set_migratetype_isolate() > >> drain_all_pages() > >> drain_all_pages() > >> > >> Since set_migratetype_isolate() is only used in > >> start_isolate_page_range(), which is just used in __offline_pages() and > >> alloc_contig_range(). And both of them call drain_all_pages() if every > >> check looks good. This means it is not necessary call drain_all_pages() > >> in each iteration of set_migratetype_isolate(). > >> > >> By doing so, the logic seems a little bit clearer. > >> set_migratetype_isolate() handles pages in Buddy, while > >> drain_all_pages() takes care of pages in pcp. > > > >I have to confess I am not sure about the purpose of the draining here. > >I suspect it is to make sure that pages in the pcp lists really get > >isolated and if that is the case then it makes sense. > > > >In any case I strongly suggest not touching this code without a very > >good explanation on why this is not needed. Callers do XYZ is not a > >proper explanation because assumes that all callers will know that this > >has to be done. So either we really need to drain and then it is better > >to make it here or we don't but that requires some explanation. > > > > Yep, let me try to explain what is trying to do. > > Based on my understanding, online_pages do two things > > * adjust zone/pgdat status > * put pages into Buddy > > Generally, offline_pages do the reverse > > * take pages out of Buddy > * adjust zone/pgdat status > > While it is not that easy to take pages out of Buddy, since pages are > > * pcp list > * slub > * other usage > > This means before taking a page out of Buddy, we need to return it first > to Buddy. > > Current implementation is interesting by introducing migrate type. By > setting migrate type to MIGRATE_ISOLATE, this range of pages will never > be allocated from Buddy. And every page returned in this range will > never be touched by Buddy. > > Function start_isolate_page_range() just do this. > > Then let's focus on the pcp list. This is a little bit different > than other allocated pages. These are actually "partially" allocated > pages. They are not counted in Buddy Free pages, either no real use. So > we have two choice to get back those pages: > > * wait until it is allocated to a real user and wait for return > * or drain them directly > > Current implementation take 2nd approach. > > Then we can see there are also two way to drain them: > > * drain them range by range > * drain them in a whole range > > Both looks good, but not necessary to do them both. Because after we set > a pageblock migrate type to MIGRATE_ISOLATE, pages in this range will > never be allocated nor be put on pcp list. So after we drain one > particular range, it is not necessary to drain this range again. OK, this is an important point and actually the argument that i am wrong. I have missed that free_unref_page_commit skips pcp lists for MIGRATE_ISOLATE (resp. all migrate types above MIGRATE_PCPTYPES). Then you are right that we are OK to drain the zone only once _after_ we have isolated the full range. So please send a new patch with this clarification in the changelog and I will ack it. > The reason why I choose to drain them in a whole range is current > drain_all_pages() just carry zone information. For example, a zone may > have 1G while a pageblock is 128M. The pageblock is 1/8 of this zone. > This means in case there are 8 pages on pcp list, only 1 page drained by > drain_all_pages belongs to this pageblock. But we drain other 7 healthy > pages. > > CPU1 pcp list CPU2 pcp list > > +---------------+ +---------------+ > |A1 B3 C8 F6 | |E1 G3 D8 B6 | > +---------------+ +---------------+ > > > A B C D E F G > +---------+---------+---------+---------+---------+---------+---------+ > |012345678| | | | | | | > +---------+---------+---------+---------+---------+---------+---------+ > |<-pgblk->| > |<- Zone ->| > > > This is a chart for illustration. In case we want to isolate pgblk D, > while zone pcp list has 8 pages and only one belongs to this pgblk D. > This means the drain on pgblk base has much side effect. And with one > drain on each pgblk, this may increase the contention on this zone. > > Well, another approach is to enable drain_all_pages() with exact range > information. But neither approach needs to do them both. Is this actually worth the additional complexity? Have you seen an actual workload that would benefit from that?
On Mon, Dec 17, 2018 at 04:48:12PM +0100, Michal Hocko wrote: >On Mon 17-12-18 15:08:19, Wei Yang wrote: >> On Mon, Dec 17, 2018 at 01:25:23PM +0100, Michal Hocko wrote: >> >On Fri 14-12-18 10:39:12, Wei Yang wrote: >> >> Below is a brief call flow for __offline_pages() and >> >> alloc_contig_range(): >> >> >> >> __offline_pages()/alloc_contig_range() >> >> start_isolate_page_range() >> >> set_migratetype_isolate() >> >> drain_all_pages() >> >> drain_all_pages() >> >> >> >> Since set_migratetype_isolate() is only used in >> >> start_isolate_page_range(), which is just used in __offline_pages() and >> >> alloc_contig_range(). And both of them call drain_all_pages() if every >> >> check looks good. This means it is not necessary call drain_all_pages() >> >> in each iteration of set_migratetype_isolate(). >> >> >> >> By doing so, the logic seems a little bit clearer. >> >> set_migratetype_isolate() handles pages in Buddy, while >> >> drain_all_pages() takes care of pages in pcp. >> > >> >I have to confess I am not sure about the purpose of the draining here. >> >I suspect it is to make sure that pages in the pcp lists really get >> >isolated and if that is the case then it makes sense. >> > >> >In any case I strongly suggest not touching this code without a very >> >good explanation on why this is not needed. Callers do XYZ is not a >> >proper explanation because assumes that all callers will know that this >> >has to be done. So either we really need to drain and then it is better >> >to make it here or we don't but that requires some explanation. >> > >> >> Yep, let me try to explain what is trying to do. >> >> Based on my understanding, online_pages do two things >> >> * adjust zone/pgdat status >> * put pages into Buddy >> >> Generally, offline_pages do the reverse >> >> * take pages out of Buddy >> * adjust zone/pgdat status >> >> While it is not that easy to take pages out of Buddy, since pages are >> >> * pcp list >> * slub >> * other usage >> >> This means before taking a page out of Buddy, we need to return it first >> to Buddy. >> >> Current implementation is interesting by introducing migrate type. By >> setting migrate type to MIGRATE_ISOLATE, this range of pages will never >> be allocated from Buddy. And every page returned in this range will >> never be touched by Buddy. >> >> Function start_isolate_page_range() just do this. >> >> Then let's focus on the pcp list. This is a little bit different >> than other allocated pages. These are actually "partially" allocated >> pages. They are not counted in Buddy Free pages, either no real use. So >> we have two choice to get back those pages: >> >> * wait until it is allocated to a real user and wait for return >> * or drain them directly >> >> Current implementation take 2nd approach. >> >> Then we can see there are also two way to drain them: >> >> * drain them range by range >> * drain them in a whole range >> >> Both looks good, but not necessary to do them both. Because after we set >> a pageblock migrate type to MIGRATE_ISOLATE, pages in this range will >> never be allocated nor be put on pcp list. So after we drain one >> particular range, it is not necessary to drain this range again. > >OK, this is an important point and actually the argument that i am >wrong. I have missed that free_unref_page_commit skips pcp lists for >MIGRATE_ISOLATE (resp. all migrate types above MIGRATE_PCPTYPES). >Then you are right that we are OK to drain the zone only once _after_ we >have isolated the full range. > >So please send a new patch with this clarification in the changelog and >I will ack it. > >> The reason why I choose to drain them in a whole range is current >> drain_all_pages() just carry zone information. For example, a zone may >> have 1G while a pageblock is 128M. The pageblock is 1/8 of this zone. >> This means in case there are 8 pages on pcp list, only 1 page drained by >> drain_all_pages belongs to this pageblock. But we drain other 7 healthy >> pages. >> >> CPU1 pcp list CPU2 pcp list >> >> +---------------+ +---------------+ >> |A1 B3 C8 F6 | |E1 G3 D8 B6 | >> +---------------+ +---------------+ >> >> >> A B C D E F G >> +---------+---------+---------+---------+---------+---------+---------+ >> |012345678| | | | | | | >> +---------+---------+---------+---------+---------+---------+---------+ >> |<-pgblk->| >> |<- Zone ->| >> >> >> This is a chart for illustration. In case we want to isolate pgblk D, >> while zone pcp list has 8 pages and only one belongs to this pgblk D. >> This means the drain on pgblk base has much side effect. And with one >> drain on each pgblk, this may increase the contention on this zone. >> >> Well, another approach is to enable drain_all_pages() with exact range >> information. But neither approach needs to do them both. > >Is this actually worth the additional complexity? Have you seen an >actual workload that would benefit from that? No, I just mention the possible approach in my mind. While currently drain pcp list once is enough. I will prepare v2 with more detailed changelog with migratetype thing. >-- >Michal Hocko >SUSE Labs
On Mon, Dec 17, 2018 at 01:21:32PM +0100, Michal Hocko wrote: >On Fri 14-12-18 15:17:56, Wei Yang wrote: >> On Thu, Dec 13, 2018 at 07:57:12PM -0800, Andrew Morton wrote: >> >On Fri, 14 Dec 2018 10:39:12 +0800 Wei Yang <richard.weiyang@gmail.com> wrote: >> > >> >> Below is a brief call flow for __offline_pages() >> > >> >Offtopic... >> > >> >set_migratetype_isolate() has the comment >> > >> > /* >> > * immobile means "not-on-lru" pages. If immobile is larger than >> > * removable-by-driver pages reported by notifier, we'll fail. >> > */ >> > >> >what the heck does that mean? It used to talk about unmovable pages, >> >but this was mysteriously changed to use the unique term "immobile" by >> >Minchan's ee6f509c32 ("mm: factor out memory isolate functions"). >> >Could someone please take a look? >> > >> > >> >> and >> >> alloc_contig_range(): >> >> >> >> __offline_pages()/alloc_contig_range() >> >> start_isolate_page_range() >> >> set_migratetype_isolate() >> >> drain_all_pages() >> >> drain_all_pages() >> >> >> >> Since set_migratetype_isolate() is only used in >> >> start_isolate_page_range(), which is just used in __offline_pages() and >> >> alloc_contig_range(). And both of them call drain_all_pages() if every >> >> check looks good. This means it is not necessary call drain_all_pages() >> >> in each iteration of set_migratetype_isolate(). >> >> >> >> By doing so, the logic seems a little bit clearer. >> >> set_migratetype_isolate() handles pages in Buddy, while >> >> drain_all_pages() takes care of pages in pcp. >> > >> >Well. drain_all_pages() moves pages from pcp to buddy so I'm not sure >> >that argument holds water. >> > >> >Can we step back a bit and ask ourselves what all these draining >> >operations are actually for? What is the intent behind each callsite? >> >Figuring that out (and perhaps even documenting it!) would help us >> >decide the most appropriate places from which to perform the drain. >> >> With some rethinking we even could take drain_all_pages() out of the >> repeat loop. Because after isolation, the page in this range will not be >> put to pcp pageset. So we just need to drain pages once. >> >> The change may look like this. > >No, this is incorrect. Draining pcp lists before scan_movable_pages is >most likely sub-optimal, because scan_movable_pages will simply ignore >pages being on the pcp lists. But we definitely want to drain before we >terminate the offlining phase because we do not want to have isolated >pages on those lists before we allow the final hotremove. > >The way how we retry the migration loop until there is no page in use >just guarantees that drain_all_pages is called. If you put it out of the >loop then you just break that assumption. Moving drain_all_pages down >after the migration is done should work well AFAICS but I didn't really >think through all potential side effects nor have time to do so now. > After discussion, do you agree with this proposal? >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 6910e0eea074..120e9fdfd055 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1590,6 +1590,7 @@ static int __ref __offline_pages(unsigned long start_pfn, >> if (ret) >> goto failed_removal; >> >> + drain_all_pages(zone); >> pfn = start_pfn; >> repeat: >> /* start memory hot removal */ >> @@ -1599,7 +1600,6 @@ static int __ref __offline_pages(unsigned long start_pfn, >> >> cond_resched(); >> lru_add_drain_all(); >> - drain_all_pages(zone); >> >> pfn = scan_movable_pages(start_pfn, end_pfn); >> if (pfn) { /* We have movable pages */ >> >> -- >> Wei Yang >> Help you, Help me > >-- >Michal Hocko >SUSE Labs
diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 43e085608846..f44c0e333bed 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -83,8 +83,6 @@ static int set_migratetype_isolate(struct page *page, int migratetype, } spin_unlock_irqrestore(&zone->lock, flags); - if (!ret) - drain_all_pages(zone); return ret; }
Below is a brief call flow for __offline_pages() and alloc_contig_range(): __offline_pages()/alloc_contig_range() start_isolate_page_range() set_migratetype_isolate() drain_all_pages() drain_all_pages() Since set_migratetype_isolate() is only used in start_isolate_page_range(), which is just used in __offline_pages() and alloc_contig_range(). And both of them call drain_all_pages() if every check looks good. This means it is not necessary call drain_all_pages() in each iteration of set_migratetype_isolate(). By doing so, the logic seems a little bit clearer. set_migratetype_isolate() handles pages in Buddy, while drain_all_pages() takes care of pages in pcp. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> --- mm/page_isolation.c | 2 -- 1 file changed, 2 deletions(-)