Message ID | 20181122101241.7965-1-richard.weiyang@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection | expand |
On Thu, Nov 22, 2018 at 06:12:41PM +0800, Wei Yang wrote: >During online_pages phase, pgdat->nr_zones will be updated in case this >zone is empty. > >Currently the online_pages phase is protected by the global lock >mem_hotplug_begin(), which ensures there is no contention during the >update of nr_zones. But this global lock introduces scalability issues. > >This patch is a preparation for removing the global lock during >online_pages phase. Also this patch changes the documentation of >node_size_lock to include the protectioin of nr_zones. > >Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Missed this, if I am correct. :-) Acked-by: Michal Hocko <mhocko@suse.com>
On Thu 22-11-18 10:15:57, Wei Yang wrote: > On Thu, Nov 22, 2018 at 06:12:41PM +0800, Wei Yang wrote: > >During online_pages phase, pgdat->nr_zones will be updated in case this > >zone is empty. > > > >Currently the online_pages phase is protected by the global lock > >mem_hotplug_begin(), which ensures there is no contention during the > >update of nr_zones. But this global lock introduces scalability issues. > > > >This patch is a preparation for removing the global lock during > >online_pages phase. Also this patch changes the documentation of > >node_size_lock to include the protectioin of nr_zones. I would just add that the patch moves init_currently_empty_zone under both zone_span_writelock and pgdat_resize_lock because both the pgdat state is changed (nr_zones) and the zone's start_pfn > > > >Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > > Missed this, if I am correct. :-) > > Acked-by: Michal Hocko <mhocko@suse.com> Yes, thank you.
On Thu, 2018-11-22 at 18:12 +0800, Wei Yang wrote: > > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Thanks ;-) Reviewed-by: Oscar Salvador <osalvador@suse.de>
On Thu, Nov 22, 2018 at 11:29:22AM +0100, Michal Hocko wrote: >On Thu 22-11-18 10:15:57, Wei Yang wrote: >> On Thu, Nov 22, 2018 at 06:12:41PM +0800, Wei Yang wrote: >> >During online_pages phase, pgdat->nr_zones will be updated in case this >> >zone is empty. >> > >> >Currently the online_pages phase is protected by the global lock >> >mem_hotplug_begin(), which ensures there is no contention during the >> >update of nr_zones. But this global lock introduces scalability issues. >> > >> >This patch is a preparation for removing the global lock during >> >online_pages phase. Also this patch changes the documentation of >> >node_size_lock to include the protectioin of nr_zones. > >I would just add that the patch moves init_currently_empty_zone under >both zone_span_writelock and pgdat_resize_lock because both the pgdat >state is changed (nr_zones) and the zone's start_pfn > >> > >> >Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >> >> Missed this, if I am correct. :-) >> >> Acked-by: Michal Hocko <mhocko@suse.com> > >Yes, thank you. My pleasure :-) >-- >Michal Hocko >SUSE Labs
On Thu, Nov 22, 2018 at 11:37:38AM +0100, osalvador wrote: >On Thu, 2018-11-22 at 18:12 +0800, Wei Yang wrote: >> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > >Thanks ;-) > Thanks for your suggestions :-) >Reviewed-by: Oscar Salvador <osalvador@suse.de>
On 22.11.18 11:12, Wei Yang wrote: > During online_pages phase, pgdat->nr_zones will be updated in case this > zone is empty. > > Currently the online_pages phase is protected by the global lock > mem_hotplug_begin(), which ensures there is no contention during the > update of nr_zones. But this global lock introduces scalability issues. > > This patch is a preparation for removing the global lock during > online_pages phase. Also this patch changes the documentation of > node_size_lock to include the protectioin of nr_zones. I looked into locking recently, and there is more to it. Please read: commit dee6da22efac451d361f5224a60be2796d847b51 Author: David Hildenbrand <david@redhat.com> Date: Tue Oct 30 15:10:44 2018 -0700 memory-hotplug.rst: add some details about locking internals Let's document the magic a bit, especially why device_hotplug_lock is required when adding/removing memory and how it all play together with requests to online/offline memory from user space. Short summary: Onlining/offlining of memory requires the device_hotplug_lock as of now. mem_hotplug_begin() is just an internal optimization. (we don't want everybody to take the device lock) > > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > --- > v2: > * commit log changes > * modify the code in move_pfn_range_to_zone() instead of in > init_currently_empty_zone() > * documentation change > > --- > include/linux/mmzone.h | 7 ++++--- > mm/memory_hotplug.c | 5 ++--- > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 68d7b558924b..1bb749bee284 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -636,9 +636,10 @@ typedef struct pglist_data { > #endif > #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT) > /* > - * Must be held any time you expect node_start_pfn, node_present_pages > - * or node_spanned_pages stay constant. Holding this will also > - * guarantee that any pfn_valid() stays that way. > + * Must be held any time you expect node_start_pfn, > + * node_present_pages, node_spanned_pages or nr_zones stay constant. > + * Holding this will also guarantee that any pfn_valid() stays that > + * way. > * > * pgdat_resize_lock() and pgdat_resize_unlock() are provided to > * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 61972da38d93..f626e7e5f57b 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -742,14 +742,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, > int nid = pgdat->node_id; > unsigned long flags; > > - if (zone_is_empty(zone)) > - init_currently_empty_zone(zone, start_pfn, nr_pages); > - > clear_zone_contiguous(zone); > > /* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */ > pgdat_resize_lock(pgdat, &flags); > zone_span_writelock(zone); > + if (zone_is_empty(zone)) > + init_currently_empty_zone(zone, start_pfn, nr_pages); > resize_zone_range(zone, start_pfn, nr_pages); > zone_span_writeunlock(zone); > resize_pgdat_range(pgdat, start_pfn, nr_pages); >
On Thu, Nov 22, 2018 at 04:26:40PM +0100, David Hildenbrand wrote: >On 22.11.18 11:12, Wei Yang wrote: >> During online_pages phase, pgdat->nr_zones will be updated in case this >> zone is empty. >> >> Currently the online_pages phase is protected by the global lock >> mem_hotplug_begin(), which ensures there is no contention during the >> update of nr_zones. But this global lock introduces scalability issues. >> >> This patch is a preparation for removing the global lock during >> online_pages phase. Also this patch changes the documentation of >> node_size_lock to include the protectioin of nr_zones. > >I looked into locking recently, and there is more to it. > >Please read: > >commit dee6da22efac451d361f5224a60be2796d847b51 >Author: David Hildenbrand <david@redhat.com> >Date: Tue Oct 30 15:10:44 2018 -0700 > > memory-hotplug.rst: add some details about locking internals > > Let's document the magic a bit, especially why device_hotplug_lock is > required when adding/removing memory and how it all play together with > requests to online/offline memory from user space. > >Short summary: Onlining/offlining of memory requires the device_hotplug_lock >as of now. > >mem_hotplug_begin() is just an internal optimization. (we don't want > everybody to take the device lock) > Hi, David Thanks for your comment. Hmm... I didn't catch your point. Related to memory hot-plug, there are (at least) three locks, * device_hotplug_lock (global) * device lock (device scope) * mem_hotplug_lock (global) But with two different hold sequence in two cases: * device_online() device_hotplug_lock device_lock mem_hotplug_lock * add_memory_resource() device_hotplug_lock mem_hotplug_lock device_lock ^ | I don't find where this is hold in add_memory_resource(). Would you mind giving me a hint? If my understanding is correct, what is your point? I guess your point is : just remove mem_hotplug_lock is not enough to resolve the scalability issue? Please correct me, if I am not. :-)
On 22.11.18 22:28, Wei Yang wrote: > On Thu, Nov 22, 2018 at 04:26:40PM +0100, David Hildenbrand wrote: >> On 22.11.18 11:12, Wei Yang wrote: >>> During online_pages phase, pgdat->nr_zones will be updated in case this >>> zone is empty. >>> >>> Currently the online_pages phase is protected by the global lock >>> mem_hotplug_begin(), which ensures there is no contention during the >>> update of nr_zones. But this global lock introduces scalability issues. >>> >>> This patch is a preparation for removing the global lock during >>> online_pages phase. Also this patch changes the documentation of >>> node_size_lock to include the protectioin of nr_zones. >> >> I looked into locking recently, and there is more to it. >> >> Please read: >> >> commit dee6da22efac451d361f5224a60be2796d847b51 >> Author: David Hildenbrand <david@redhat.com> >> Date: Tue Oct 30 15:10:44 2018 -0700 >> >> memory-hotplug.rst: add some details about locking internals >> >> Let's document the magic a bit, especially why device_hotplug_lock is >> required when adding/removing memory and how it all play together with >> requests to online/offline memory from user space. >> >> Short summary: Onlining/offlining of memory requires the device_hotplug_lock >> as of now. >> >> mem_hotplug_begin() is just an internal optimization. (we don't want >> everybody to take the device lock) >> > > Hi, David > > Thanks for your comment. My last sentence should have been "we don't want everybody to take the device hotplug lock" :) That caused confusion. > > Hmm... I didn't catch your point. > > Related to memory hot-plug, there are (at least) three locks, > > * device_hotplug_lock (global) > * device lock (device scope) > * mem_hotplug_lock (global) > > But with two different hold sequence in two cases: > > * device_online() > > device_hotplug_lock > device_lock > mem_hotplug_lock > > * add_memory_resource() > > device_hotplug_lock > mem_hotplug_lock > device_lock > ^ > | > I don't find where this is hold in add_memory_resource(). > Would you mind giving me a hint? > > If my understanding is correct, what is your point? > The point I was trying to make: Right now all onlining/offlining/adding/removing is protected by the device_hotplug_lock (and that's a good thing, things are fragile enough already). mem_hotplug_lock() is used in addition for get_online_mems(). "This patch is a preparation for removing the global lock during online_pages phase." - is more like "one global lock". > I guess your point is : just remove mem_hotplug_lock is not enough to > resolve the scalability issue? Depends on which scalability issue :) Getting rid of / removing the impact of mem_hotplug_lock is certainly a very good idea. And improves scalability of all callers of get_online_mems(). If that is the intention, very good :) If the intention is to make onlining/offlining more scalable (e.g. in parallel or such), then scalability is limited by device_hotplug_lock. > > Please correct me, if I am not. :-) > Guess I was just wondering which scalability issue we are trying to solve :)
On Thu, Nov 22, 2018 at 10:53:31PM +0100, David Hildenbrand wrote: >On 22.11.18 22:28, Wei Yang wrote: >> On Thu, Nov 22, 2018 at 04:26:40PM +0100, David Hildenbrand wrote: >>> On 22.11.18 11:12, Wei Yang wrote: >>>> During online_pages phase, pgdat->nr_zones will be updated in case this >>>> zone is empty. >>>> >>>> Currently the online_pages phase is protected by the global lock >>>> mem_hotplug_begin(), which ensures there is no contention during the >>>> update of nr_zones. But this global lock introduces scalability issues. >>>> >>>> This patch is a preparation for removing the global lock during >>>> online_pages phase. Also this patch changes the documentation of >>>> node_size_lock to include the protectioin of nr_zones. >>> >>> I looked into locking recently, and there is more to it. >>> >>> Please read: >>> >>> commit dee6da22efac451d361f5224a60be2796d847b51 >>> Author: David Hildenbrand <david@redhat.com> >>> Date: Tue Oct 30 15:10:44 2018 -0700 >>> >>> memory-hotplug.rst: add some details about locking internals >>> >>> Let's document the magic a bit, especially why device_hotplug_lock is >>> required when adding/removing memory and how it all play together with >>> requests to online/offline memory from user space. >>> >>> Short summary: Onlining/offlining of memory requires the device_hotplug_lock >>> as of now. >>> >>> mem_hotplug_begin() is just an internal optimization. (we don't want >>> everybody to take the device lock) >>> >> >> Hi, David >> >> Thanks for your comment. > >My last sentence should have been "we don't want everybody to take the >device hotplug lock" :) That caused confusion. > >> >> Hmm... I didn't catch your point. >> >> Related to memory hot-plug, there are (at least) three locks, >> >> * device_hotplug_lock (global) >> * device lock (device scope) >> * mem_hotplug_lock (global) >> >> But with two different hold sequence in two cases: >> >> * device_online() >> >> device_hotplug_lock >> device_lock >> mem_hotplug_lock >> >> * add_memory_resource() >> >> device_hotplug_lock >> mem_hotplug_lock >> device_lock >> ^ >> | >> I don't find where this is hold in add_memory_resource(). >> Would you mind giving me a hint? >> >> If my understanding is correct, what is your point? >> > >The point I was trying to make: > >Right now all onlining/offlining/adding/removing is protected by the >device_hotplug_lock (and that's a good thing, things are fragile enough >already). > >mem_hotplug_lock() is used in addition for get_online_mems(). > >"This patch is a preparation for removing the global lock during >online_pages phase." - is more like "one global lock". > Thanks for reminding. You are right. >> I guess your point is : just remove mem_hotplug_lock is not enough to >> resolve the scalability issue? > >Depends on which scalability issue :) > >Getting rid of / removing the impact of mem_hotplug_lock is certainly a >very good idea. And improves scalability of all callers of >get_online_mems(). If that is the intention, very good :) > Maybe not exact. The intention is to get rid of mem_hotplug_begin/done, if I am correct. >If the intention is to make onlining/offlining more scalable (e.g. in >parallel or such), then scalability is limited by device_hotplug_lock. > I didn't notice this lock. While this is a step by step improvement. > >> >> Please correct me, if I am not. :-) >> > >Guess I was just wondering which scalability issue we are trying to solve :) > >-- > >Thanks, > >David / dhildenb
On Thu 22-11-18 16:26:40, David Hildenbrand wrote: > On 22.11.18 11:12, Wei Yang wrote: > > During online_pages phase, pgdat->nr_zones will be updated in case this > > zone is empty. > > > > Currently the online_pages phase is protected by the global lock > > mem_hotplug_begin(), which ensures there is no contention during the > > update of nr_zones. But this global lock introduces scalability issues. > > > > This patch is a preparation for removing the global lock during > > online_pages phase. Also this patch changes the documentation of > > node_size_lock to include the protectioin of nr_zones. > > I looked into locking recently, and there is more to it. > > Please read: > > commit dee6da22efac451d361f5224a60be2796d847b51 > Author: David Hildenbrand <david@redhat.com> > Date: Tue Oct 30 15:10:44 2018 -0700 > > memory-hotplug.rst: add some details about locking internals > > Let's document the magic a bit, especially why device_hotplug_lock is > required when adding/removing memory and how it all play together with > requests to online/offline memory from user space. > > Short summary: Onlining/offlining of memory requires the device_hotplug_lock > as of now. Well, I would tend to disagree here. You might be describing the current state of art but the device_hotplug_lock doesn't make much sense for the memory hotplug in principle. There is absolutely nothing in the core MM that would require this lock. The current state just uses a BKL in some sense and we really want to get rid of that longterm. This patch is a tiny step in that direction and I suspect many more will need to come on the way. We really want to end up with a clear scope of each lock being taken. A project for a brave soul...
On 23.11.18 09:42, Michal Hocko wrote: > On Thu 22-11-18 16:26:40, David Hildenbrand wrote: >> On 22.11.18 11:12, Wei Yang wrote: >>> During online_pages phase, pgdat->nr_zones will be updated in case this >>> zone is empty. >>> >>> Currently the online_pages phase is protected by the global lock >>> mem_hotplug_begin(), which ensures there is no contention during the >>> update of nr_zones. But this global lock introduces scalability issues. >>> >>> This patch is a preparation for removing the global lock during >>> online_pages phase. Also this patch changes the documentation of >>> node_size_lock to include the protectioin of nr_zones. >> >> I looked into locking recently, and there is more to it. >> >> Please read: >> >> commit dee6da22efac451d361f5224a60be2796d847b51 >> Author: David Hildenbrand <david@redhat.com> >> Date: Tue Oct 30 15:10:44 2018 -0700 >> >> memory-hotplug.rst: add some details about locking internals >> >> Let's document the magic a bit, especially why device_hotplug_lock is >> required when adding/removing memory and how it all play together with >> requests to online/offline memory from user space. >> >> Short summary: Onlining/offlining of memory requires the device_hotplug_lock >> as of now. > > Well, I would tend to disagree here. You might be describing the current > state of art but the device_hotplug_lock doesn't make much sense for the > memory hotplug in principle. There is absolutely nothing in the core MM There are collisions with CPU hotplug that require this lock (when nodes come and go as far as I remember). And there is the problematic lock inversion that can happen when adding/remving memory. This all has to be sorted out, we'll have to see if we really need it for onlining/offlining, though, however ... > that would require this lock. The current state just uses a BKL in some > sense and we really want to get rid of that longterm. This patch is a tiny > step in that direction and I suspect many more will need to come on the > way. We really want to end up with a clear scope of each lock being > taken. A project for a brave soul... ... for now I don't consider "optimize for parallel onlining/offlining/adding/removing of memory and cpus" really necessary. What is necessary indeed is to not slowdown the whole system just because some memory is coming/going. Therefore I agree, this patch is a step into the right direction.
On Fri, Nov 23, 2018 at 09:46:52AM +0100, David Hildenbrand wrote: >On 23.11.18 09:42, Michal Hocko wrote: >> On Thu 22-11-18 16:26:40, David Hildenbrand wrote: >>> On 22.11.18 11:12, Wei Yang wrote: >>>> During online_pages phase, pgdat->nr_zones will be updated in case this >>>> zone is empty. >>>> >>>> Currently the online_pages phase is protected by the global lock >>>> mem_hotplug_begin(), which ensures there is no contention during the >>>> update of nr_zones. But this global lock introduces scalability issues. >>>> >>>> This patch is a preparation for removing the global lock during >>>> online_pages phase. Also this patch changes the documentation of >>>> node_size_lock to include the protectioin of nr_zones. >>> >>> I looked into locking recently, and there is more to it. >>> >>> Please read: >>> >>> commit dee6da22efac451d361f5224a60be2796d847b51 >>> Author: David Hildenbrand <david@redhat.com> >>> Date: Tue Oct 30 15:10:44 2018 -0700 >>> >>> memory-hotplug.rst: add some details about locking internals >>> >>> Let's document the magic a bit, especially why device_hotplug_lock is >>> required when adding/removing memory and how it all play together with >>> requests to online/offline memory from user space. >>> >>> Short summary: Onlining/offlining of memory requires the device_hotplug_lock >>> as of now. >> >> Well, I would tend to disagree here. You might be describing the current >> state of art but the device_hotplug_lock doesn't make much sense for the >> memory hotplug in principle. There is absolutely nothing in the core MM > >There are collisions with CPU hotplug that require this lock (when nodes >come and go as far as I remember). And there is the problematic lock >inversion that can happen when adding/remving memory. This all has to be >sorted out, we'll have to see if we really need it for >onlining/offlining, though, however ... > Seems I get a little understanding on this part. There are two hotplug: * CPU hotplug * Memory hotplug. There are two phase for Memory hotplug: * physical add/remove * logical online/offline All of them are protected by device_hotplug_lock now, so we need to be careful to release this in any case. Is my understanding correct? >> that would require this lock. The current state just uses a BKL in some >> sense and we really want to get rid of that longterm. This patch is a tiny >> step in that direction and I suspect many more will need to come on the >> way. We really want to end up with a clear scope of each lock being >> taken. A project for a brave soul... > >... for now I don't consider "optimize for parallel >onlining/offlining/adding/removing of memory and cpus" really necessary. >What is necessary indeed is to not slowdown the whole system just >because some memory is coming/going. Therefore I agree, this patch is a >step into the right direction. > Agree. The target is to accelerate the hot-plug without slow down the normal process. >-- > >Thanks, > >David / dhildenb
On 26.11.18 02:44, Wei Yang wrote: > On Fri, Nov 23, 2018 at 09:46:52AM +0100, David Hildenbrand wrote: >> On 23.11.18 09:42, Michal Hocko wrote: >>> On Thu 22-11-18 16:26:40, David Hildenbrand wrote: >>>> On 22.11.18 11:12, Wei Yang wrote: >>>>> During online_pages phase, pgdat->nr_zones will be updated in case this >>>>> zone is empty. >>>>> >>>>> Currently the online_pages phase is protected by the global lock >>>>> mem_hotplug_begin(), which ensures there is no contention during the >>>>> update of nr_zones. But this global lock introduces scalability issues. >>>>> >>>>> This patch is a preparation for removing the global lock during >>>>> online_pages phase. Also this patch changes the documentation of >>>>> node_size_lock to include the protectioin of nr_zones. >>>> >>>> I looked into locking recently, and there is more to it. >>>> >>>> Please read: >>>> >>>> commit dee6da22efac451d361f5224a60be2796d847b51 >>>> Author: David Hildenbrand <david@redhat.com> >>>> Date: Tue Oct 30 15:10:44 2018 -0700 >>>> >>>> memory-hotplug.rst: add some details about locking internals >>>> >>>> Let's document the magic a bit, especially why device_hotplug_lock is >>>> required when adding/removing memory and how it all play together with >>>> requests to online/offline memory from user space. >>>> >>>> Short summary: Onlining/offlining of memory requires the device_hotplug_lock >>>> as of now. >>> >>> Well, I would tend to disagree here. You might be describing the current >>> state of art but the device_hotplug_lock doesn't make much sense for the >>> memory hotplug in principle. There is absolutely nothing in the core MM >> >> There are collisions with CPU hotplug that require this lock (when nodes >> come and go as far as I remember). And there is the problematic lock >> inversion that can happen when adding/remving memory. This all has to be >> sorted out, we'll have to see if we really need it for >> onlining/offlining, though, however ... >> > > Seems I get a little understanding on this part. > > There are two hotplug: > * CPU hotplug > * Memory hotplug. > > There are two phase for Memory hotplug: > * physical add/remove > * logical online/offline > > All of them are protected by device_hotplug_lock now, so we need to be > careful to release this in any case. Is my understanding correct? Yes, e.g. the acpi driver always held the device_hotplug_lock (due to possible problems with concurrent cpu/memory hot(un)plug). Onlining offlining of devices (including cpu/memory) from user space always held the device_hotplug_lock. So this part was executed sequentially for a long time. I recently made sure that any adding/removing/onlining/offlining correctly grabs the device_hotplug_lock AND the mem_hotplug_lock in any case (because it was inconsistent and broken), so it is all executed sequentially. So when getting rid of mem_hotplug_lock we only have to care about all users that don't take the device_hotplug_lock. > >>> that would require this lock. The current state just uses a BKL in some >>> sense and we really want to get rid of that longterm. This patch is a tiny >>> step in that direction and I suspect many more will need to come on the >>> way. We really want to end up with a clear scope of each lock being >>> taken. A project for a brave soul... >> >> ... for now I don't consider "optimize for parallel >> onlining/offlining/adding/removing of memory and cpus" really necessary. >> What is necessary indeed is to not slowdown the whole system just >> because some memory is coming/going. Therefore I agree, this patch is a >> step into the right direction. >> > > Agree. > > The target is to accelerate the hot-plug without slow down the normal > process. Indeed!
On Mon, Nov 26, 2018 at 10:24:26AM +0100, David Hildenbrand wrote: >On 26.11.18 02:44, Wei Yang wrote: >> On Fri, Nov 23, 2018 at 09:46:52AM +0100, David Hildenbrand wrote: >>> On 23.11.18 09:42, Michal Hocko wrote: >>>> On Thu 22-11-18 16:26:40, David Hildenbrand wrote: >>>>> On 22.11.18 11:12, Wei Yang wrote: >>>>>> During online_pages phase, pgdat->nr_zones will be updated in case this >>>>>> zone is empty. >>>>>> >>>>>> Currently the online_pages phase is protected by the global lock >>>>>> mem_hotplug_begin(), which ensures there is no contention during the >>>>>> update of nr_zones. But this global lock introduces scalability issues. >>>>>> >>>>>> This patch is a preparation for removing the global lock during >>>>>> online_pages phase. Also this patch changes the documentation of >>>>>> node_size_lock to include the protectioin of nr_zones. >>>>> >>>>> I looked into locking recently, and there is more to it. >>>>> >>>>> Please read: >>>>> >>>>> commit dee6da22efac451d361f5224a60be2796d847b51 >>>>> Author: David Hildenbrand <david@redhat.com> >>>>> Date: Tue Oct 30 15:10:44 2018 -0700 >>>>> >>>>> memory-hotplug.rst: add some details about locking internals >>>>> >>>>> Let's document the magic a bit, especially why device_hotplug_lock is >>>>> required when adding/removing memory and how it all play together with >>>>> requests to online/offline memory from user space. >>>>> >>>>> Short summary: Onlining/offlining of memory requires the device_hotplug_lock >>>>> as of now. >>>> >>>> Well, I would tend to disagree here. You might be describing the current >>>> state of art but the device_hotplug_lock doesn't make much sense for the >>>> memory hotplug in principle. There is absolutely nothing in the core MM >>> >>> There are collisions with CPU hotplug that require this lock (when nodes >>> come and go as far as I remember). And there is the problematic lock >>> inversion that can happen when adding/remving memory. This all has to be >>> sorted out, we'll have to see if we really need it for >>> onlining/offlining, though, however ... >>> >> >> Seems I get a little understanding on this part. >> >> There are two hotplug: >> * CPU hotplug >> * Memory hotplug. >> >> There are two phase for Memory hotplug: >> * physical add/remove >> * logical online/offline >> >> All of them are protected by device_hotplug_lock now, so we need to be >> careful to release this in any case. Is my understanding correct? > >Yes, e.g. the acpi driver always held the device_hotplug_lock (due to >possible problems with concurrent cpu/memory hot(un)plug). Onlining >offlining of devices (including cpu/memory) from user space always held >the device_hotplug_lock. So this part was executed sequentially for a >long time. > >I recently made sure that any adding/removing/onlining/offlining >correctly grabs the device_hotplug_lock AND the mem_hotplug_lock in any >case (because it was inconsistent and broken), so it is all executed >sequentially. > >So when getting rid of mem_hotplug_lock we only have to care about all >users that don't take the device_hotplug_lock. > Thanks for your sharing. So there are two global locks to protect the hotplug procedure. * device_hotplug_lock * mem_hotplug_lock Seems even removing mem_hotplug_lock doesn't help much on scalability? Maybe we need to move one by one.
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 68d7b558924b..1bb749bee284 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -636,9 +636,10 @@ typedef struct pglist_data { #endif #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT) /* - * Must be held any time you expect node_start_pfn, node_present_pages - * or node_spanned_pages stay constant. Holding this will also - * guarantee that any pfn_valid() stays that way. + * Must be held any time you expect node_start_pfn, + * node_present_pages, node_spanned_pages or nr_zones stay constant. + * Holding this will also guarantee that any pfn_valid() stays that + * way. * * pgdat_resize_lock() and pgdat_resize_unlock() are provided to * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 61972da38d93..f626e7e5f57b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -742,14 +742,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, int nid = pgdat->node_id; unsigned long flags; - if (zone_is_empty(zone)) - init_currently_empty_zone(zone, start_pfn, nr_pages); - clear_zone_contiguous(zone); /* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */ pgdat_resize_lock(pgdat, &flags); zone_span_writelock(zone); + if (zone_is_empty(zone)) + init_currently_empty_zone(zone, start_pfn, nr_pages); resize_zone_range(zone, start_pfn, nr_pages); zone_span_writeunlock(zone); resize_pgdat_range(pgdat, start_pfn, nr_pages);
During online_pages phase, pgdat->nr_zones will be updated in case this zone is empty. Currently the online_pages phase is protected by the global lock mem_hotplug_begin(), which ensures there is no contention during the update of nr_zones. But this global lock introduces scalability issues. This patch is a preparation for removing the global lock during online_pages phase. Also this patch changes the documentation of node_size_lock to include the protectioin of nr_zones. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> --- v2: * commit log changes * modify the code in move_pfn_range_to_zone() instead of in init_currently_empty_zone() * documentation change --- include/linux/mmzone.h | 7 ++++--- mm/memory_hotplug.c | 5 ++--- 2 files changed, 6 insertions(+), 6 deletions(-)