diff mbox series

[v3] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection

Message ID 20181130065847.13714-1-richard.weiyang@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v3] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection | expand

Commit Message

Wei Yang Nov. 30, 2018, 6:58 a.m. UTC
During online_pages phase, pgdat->nr_zones will be updated in case this
zone is empty.

Currently the online_pages phase is protected by the global lock
mem_hotplug_begin(), which ensures there is no contention during the
update of nr_zones. But this global lock introduces scalability issues.

The patch moves init_currently_empty_zone under both zone_span_writelock
and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
and the zone's start_pfn. Also this patch changes the documentation
of node_size_lock to include the protectioin of nr_zones.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
CC: David Hildenbrand <david@redhat.com>

---
David, I may not catch you exact comment on the code or changelog. If I
missed, just let me know.

---
v3:
  * slightly modify the last paragraph of changelog based on Michal's
    comment
v2:
  * commit log changes
  * modify the code in move_pfn_range_to_zone() instead of in
    init_currently_empty_zone()
  * pgdat_resize_lock documentation change
---
 include/linux/mmzone.h | 7 ++++---
 mm/memory_hotplug.c    | 5 ++---
 2 files changed, 6 insertions(+), 6 deletions(-)

Comments

David Hildenbrand Nov. 30, 2018, 9:30 a.m. UTC | #1
On 30.11.18 07:58, Wei Yang wrote:
> During online_pages phase, pgdat->nr_zones will be updated in case this
> zone is empty.
> 
> Currently the online_pages phase is protected by the global lock
> mem_hotplug_begin(), which ensures there is no contention during the
> update of nr_zones. But this global lock introduces scalability issues.
> 
> The patch moves init_currently_empty_zone under both zone_span_writelock
> and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
> and the zone's start_pfn. Also this patch changes the documentation
> of node_size_lock to include the protectioin of nr_zones.

s/protectioin/protection/

> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> CC: David Hildenbrand <david@redhat.com>
> 
> ---
> David, I may not catch you exact comment on the code or changelog. If I
> missed, just let me know.

I guess I would have rewritten it to something like the following

"
Currently the online_pages phase is protected by two global locks
(device_device_hotplug_lock and mem_hotplug_lock). Especial the latter
can result in scalability issues, as it will slow down code relying on
get_online_mems(). Let's prepare code for not having to rely on
get_online_mems() but instead some more fine grained locks.

During online_pages phase, pgdat->nr_zones will be updated in case the
zone is empty. Right now mem_hotplug_lock ensures that there is no
contention during the update of nr_zones.

The patch moves init_currently_empty_zone under both zone_span_writelock
and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
and the zone's start_pfn. Also this patch changes the documentation
of node_size_lock to include the protection of nr_zones.
"

Does that make sense?

> 
> ---
> v3:
>   * slightly modify the last paragraph of changelog based on Michal's
>     comment
> v2:
>   * commit log changes
>   * modify the code in move_pfn_range_to_zone() instead of in
>     init_currently_empty_zone()
>   * pgdat_resize_lock documentation change
> ---
>  include/linux/mmzone.h | 7 ++++---
>  mm/memory_hotplug.c    | 5 ++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 3d0c472438d2..37d9c5c3faa6 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -635,9 +635,10 @@ typedef struct pglist_data {
>  #endif
>  #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
>  	/*
> -	 * Must be held any time you expect node_start_pfn, node_present_pages
> -	 * or node_spanned_pages stay constant.  Holding this will also
> -	 * guarantee that any pfn_valid() stays that way.
> +	 * Must be held any time you expect node_start_pfn,
> +	 * node_present_pages, node_spanned_pages or nr_zones stay constant.
> +	 * Holding this will also guarantee that any pfn_valid() stays that
> +	 * way.
>  	 *
>  	 * pgdat_resize_lock() and pgdat_resize_unlock() are provided to
>  	 * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 61972da38d93..f626e7e5f57b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -742,14 +742,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  	int nid = pgdat->node_id;
>  	unsigned long flags;
>  
> -	if (zone_is_empty(zone))
> -		init_currently_empty_zone(zone, start_pfn, nr_pages);
> -
>  	clear_zone_contiguous(zone);
>  
>  	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
>  	pgdat_resize_lock(pgdat, &flags);
>  	zone_span_writelock(zone);
> +	if (zone_is_empty(zone))
> +		init_currently_empty_zone(zone, start_pfn, nr_pages);
>  	resize_zone_range(zone, start_pfn, nr_pages);
>  	zone_span_writeunlock(zone);
>  	resize_pgdat_range(pgdat, start_pfn, nr_pages);
>
Wei Yang Dec. 1, 2018, 12:27 a.m. UTC | #2
On Fri, Nov 30, 2018 at 10:30:22AM +0100, David Hildenbrand wrote:
>On 30.11.18 07:58, Wei Yang wrote:
>> During online_pages phase, pgdat->nr_zones will be updated in case this
>> zone is empty.
>> 
>> Currently the online_pages phase is protected by the global lock
>> mem_hotplug_begin(), which ensures there is no contention during the
>> update of nr_zones. But this global lock introduces scalability issues.
>> 
>> The patch moves init_currently_empty_zone under both zone_span_writelock
>> and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
>> and the zone's start_pfn. Also this patch changes the documentation
>> of node_size_lock to include the protectioin of nr_zones.
>
>s/protectioin/protection/
>
>> 
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Acked-by: Michal Hocko <mhocko@suse.com>
>> Reviewed-by: Oscar Salvador <osalvador@suse.de>
>> CC: David Hildenbrand <david@redhat.com>
>> 
>> ---
>> David, I may not catch you exact comment on the code or changelog. If I
>> missed, just let me know.
>
>I guess I would have rewritten it to something like the following
>
>"
>Currently the online_pages phase is protected by two global locks
>(device_device_hotplug_lock and mem_hotplug_lock). Especial the latter
>can result in scalability issues, as it will slow down code relying on
>get_online_mems(). Let's prepare code for not having to rely on
>get_online_mems() but instead some more fine grained locks.

I am not sure why we specify get_online_mems() here. mem_hotplug_lock is
grabed in many places besides this one. In my mind, each place introduce
scalability issue, not only this one.

Or you want to say, the mem_hotplug_lock will introduce scalability
issue in two place:

  * hotplug process itself
  * slab allocation process

The second one is more critical. And this is what we try to address?

>
>During online_pages phase, pgdat->nr_zones will be updated in case the
>zone is empty. Right now mem_hotplug_lock ensures that there is no
>contention during the update of nr_zones.
>
>The patch moves init_currently_empty_zone under both zone_span_writelock
>and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
>and the zone's start_pfn. Also this patch changes the documentation
>of node_size_lock to include the protection of nr_zones.
>"
>
>Does that make sense?
>
>> 
>> ---
>> v3:
>>   * slightly modify the last paragraph of changelog based on Michal's
>>     comment
>> v2:
>>   * commit log changes
>>   * modify the code in move_pfn_range_to_zone() instead of in
>>     init_currently_empty_zone()
>>   * pgdat_resize_lock documentation change
>> ---
>>  include/linux/mmzone.h | 7 ++++---
>>  mm/memory_hotplug.c    | 5 ++---
>>  2 files changed, 6 insertions(+), 6 deletions(-)
>> 
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 3d0c472438d2..37d9c5c3faa6 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -635,9 +635,10 @@ typedef struct pglist_data {
>>  #endif
>>  #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
>>  	/*
>> -	 * Must be held any time you expect node_start_pfn, node_present_pages
>> -	 * or node_spanned_pages stay constant.  Holding this will also
>> -	 * guarantee that any pfn_valid() stays that way.
>> +	 * Must be held any time you expect node_start_pfn,
>> +	 * node_present_pages, node_spanned_pages or nr_zones stay constant.
>> +	 * Holding this will also guarantee that any pfn_valid() stays that
>> +	 * way.
>>  	 *
>>  	 * pgdat_resize_lock() and pgdat_resize_unlock() are provided to
>>  	 * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 61972da38d93..f626e7e5f57b 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -742,14 +742,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>>  	int nid = pgdat->node_id;
>>  	unsigned long flags;
>>  
>> -	if (zone_is_empty(zone))
>> -		init_currently_empty_zone(zone, start_pfn, nr_pages);
>> -
>>  	clear_zone_contiguous(zone);
>>  
>>  	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
>>  	pgdat_resize_lock(pgdat, &flags);
>>  	zone_span_writelock(zone);
>> +	if (zone_is_empty(zone))
>> +		init_currently_empty_zone(zone, start_pfn, nr_pages);
>>  	resize_zone_range(zone, start_pfn, nr_pages);
>>  	zone_span_writeunlock(zone);
>>  	resize_pgdat_range(pgdat, start_pfn, nr_pages);
>> 
>
>
>-- 
>
>Thanks,
>
>David / dhildenb
David Hildenbrand Dec. 3, 2018, 10:09 a.m. UTC | #3
On 01.12.18 01:27, Wei Yang wrote:
> On Fri, Nov 30, 2018 at 10:30:22AM +0100, David Hildenbrand wrote:
>> On 30.11.18 07:58, Wei Yang wrote:
>>> During online_pages phase, pgdat->nr_zones will be updated in case this
>>> zone is empty.
>>>
>>> Currently the online_pages phase is protected by the global lock
>>> mem_hotplug_begin(), which ensures there is no contention during the
>>> update of nr_zones. But this global lock introduces scalability issues.
>>>
>>> The patch moves init_currently_empty_zone under both zone_span_writelock
>>> and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
>>> and the zone's start_pfn. Also this patch changes the documentation
>>> of node_size_lock to include the protectioin of nr_zones.
>>
>> s/protectioin/protection/
>>
>>>
>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>> Acked-by: Michal Hocko <mhocko@suse.com>
>>> Reviewed-by: Oscar Salvador <osalvador@suse.de>
>>> CC: David Hildenbrand <david@redhat.com>
>>>
>>> ---
>>> David, I may not catch you exact comment on the code or changelog. If I
>>> missed, just let me know.
>>
>> I guess I would have rewritten it to something like the following
>>
>> "
>> Currently the online_pages phase is protected by two global locks
>> (device_device_hotplug_lock and mem_hotplug_lock). Especial the latter
>> can result in scalability issues, as it will slow down code relying on
>> get_online_mems(). Let's prepare code for not having to rely on
>> get_online_mems() but instead some more fine grained locks.
> 
> I am not sure why we specify get_online_mems() here. mem_hotplug_lock is
> grabed in many places besides this one. In my mind, each place introduce
> scalability issue, not only this one.

mem_hotplug_lock is grabbed in write only when
adding/removing/onlining/offlining memory and when adding/removing
device memory. The read locker are the critical part for now.

> 
> Or you want to say, the mem_hotplug_lock will introduce scalability
> issue in two place:
> 
>   * hotplug process itself
>   * slab allocation process
> 
> The second one is more critical. And this is what we try to address?

Indeed, especially as the first usually (except device memory) also uses
the device_hotplug_lock, I only consider the second one critical.

Feel free to change this description to whatever you like.
As I already stated scalability of adding/removing/onlining/offlining is
not really an issue as of now (prove me wrong :) ). So I would not care
about including such information in this patch.
Wei Yang Dec. 3, 2018, 8:37 p.m. UTC | #4
On Mon, Dec 03, 2018 at 11:09:52AM +0100, David Hildenbrand wrote:
>On 01.12.18 01:27, Wei Yang wrote:
>> On Fri, Nov 30, 2018 at 10:30:22AM +0100, David Hildenbrand wrote:
>>> On 30.11.18 07:58, Wei Yang wrote:
>>>> During online_pages phase, pgdat->nr_zones will be updated in case this
>>>> zone is empty.
>>>>
>>>> Currently the online_pages phase is protected by the global lock
>>>> mem_hotplug_begin(), which ensures there is no contention during the
>>>> update of nr_zones. But this global lock introduces scalability issues.
>>>>
>>>> The patch moves init_currently_empty_zone under both zone_span_writelock
>>>> and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
>>>> and the zone's start_pfn. Also this patch changes the documentation
>>>> of node_size_lock to include the protectioin of nr_zones.
>>>
>>> s/protectioin/protection/
>>>
>>>>
>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>>> Acked-by: Michal Hocko <mhocko@suse.com>
>>>> Reviewed-by: Oscar Salvador <osalvador@suse.de>
>>>> CC: David Hildenbrand <david@redhat.com>
>>>>
>>>> ---
>>>> David, I may not catch you exact comment on the code or changelog. If I
>>>> missed, just let me know.
>>>
>>> I guess I would have rewritten it to something like the following
>>>
>>> "
>>> Currently the online_pages phase is protected by two global locks
>>> (device_device_hotplug_lock and mem_hotplug_lock). Especial the latter
>>> can result in scalability issues, as it will slow down code relying on
>>> get_online_mems(). Let's prepare code for not having to rely on
>>> get_online_mems() but instead some more fine grained locks.
>> 
>> I am not sure why we specify get_online_mems() here. mem_hotplug_lock is
>> grabed in many places besides this one. In my mind, each place introduce
>> scalability issue, not only this one.
>
>mem_hotplug_lock is grabbed in write only when
>adding/removing/onlining/offlining memory and when adding/removing
>device memory. The read locker are the critical part for now.
>
>> 
>> Or you want to say, the mem_hotplug_lock will introduce scalability
>> issue in two place:
>> 
>>   * hotplug process itself
>>   * slab allocation process
>> 
>> The second one is more critical. And this is what we try to address?
>
>Indeed, especially as the first usually (except device memory) also uses
>the device_hotplug_lock, I only consider the second one critical.
>
>Feel free to change this description to whatever you like.
>As I already stated scalability of adding/removing/onlining/offlining is
>not really an issue as of now (prove me wrong :) ). So I would not care
>about including such information in this patch.
>

Thanks for your information.

Let me try to reword the changelog.

>-- 
>
>Thanks,
>
>David / dhildenb
diff mbox series

Patch

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3d0c472438d2..37d9c5c3faa6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -635,9 +635,10 @@  typedef struct pglist_data {
 #endif
 #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
 	/*
-	 * Must be held any time you expect node_start_pfn, node_present_pages
-	 * or node_spanned_pages stay constant.  Holding this will also
-	 * guarantee that any pfn_valid() stays that way.
+	 * Must be held any time you expect node_start_pfn,
+	 * node_present_pages, node_spanned_pages or nr_zones stay constant.
+	 * Holding this will also guarantee that any pfn_valid() stays that
+	 * way.
 	 *
 	 * pgdat_resize_lock() and pgdat_resize_unlock() are provided to
 	 * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 61972da38d93..f626e7e5f57b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -742,14 +742,13 @@  void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 	int nid = pgdat->node_id;
 	unsigned long flags;
 
-	if (zone_is_empty(zone))
-		init_currently_empty_zone(zone, start_pfn, nr_pages);
-
 	clear_zone_contiguous(zone);
 
 	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
 	pgdat_resize_lock(pgdat, &flags);
 	zone_span_writelock(zone);
+	if (zone_is_empty(zone))
+		init_currently_empty_zone(zone, start_pfn, nr_pages);
 	resize_zone_range(zone, start_pfn, nr_pages);
 	zone_span_writeunlock(zone);
 	resize_pgdat_range(pgdat, start_pfn, nr_pages);