diff mbox series

mm, show_mem: drop pgdat_resize_lock in show_mem()

Message ID 20181128210815.2134-1-richard.weiyang@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm, show_mem: drop pgdat_resize_lock in show_mem() | expand

Commit Message

Wei Yang Nov. 28, 2018, 9:08 p.m. UTC
Function show_mem() is used to print system memory status when user
requires or fail to allocate memory. Generally, this is a best effort
information and not willing to affect core mm subsystem.

The data protected by pgdat_resize_lock is mostly correct except there is:

   * page struct defer init
   * memory hotplug

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 lib/show_mem.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Andrew Morton Nov. 28, 2018, 10:07 p.m. UTC | #1
On Thu, 29 Nov 2018 05:08:15 +0800 Wei Yang <richard.weiyang@gmail.com> wrote:

> Function show_mem() is used to print system memory status when user
> requires or fail to allocate memory. Generally, this is a best effort
> information and not willing to affect core mm subsystem.
> 
> The data protected by pgdat_resize_lock is mostly correct except there is:
> 
>    * page struct defer init
>    * memory hotplug

What is the advantage in doing this?  What problem does the taking of
that lock cause?
Wei Yang Nov. 29, 2018, 1:52 a.m. UTC | #2
On Wed, Nov 28, 2018 at 02:07:51PM -0800, Andrew Morton wrote:
>On Thu, 29 Nov 2018 05:08:15 +0800 Wei Yang <richard.weiyang@gmail.com> wrote:
>
>> Function show_mem() is used to print system memory status when user
>> requires or fail to allocate memory. Generally, this is a best effort
>> information and not willing to affect core mm subsystem.
>> 
>> The data protected by pgdat_resize_lock is mostly correct except there is:
>> 
>>    * page struct defer init
>>    * memory hotplug
>
>What is the advantage in doing this?  What problem does the taking of
>that lock cause?

Michal and I had a discussion in https://patchwork.kernel.org/patch/10689759/

The purpose of this is to see whehter it is nessary to make
pgdat_resize_lock IRQ context safe. After went through the code, most of
the users are not from IRQ context.

If my understanding is correct, Michal's suggestion is to drop the lock
here. (The second last reply from Michal.)
Michal Hocko Nov. 29, 2018, 8:17 a.m. UTC | #3
On Thu 29-11-18 05:08:15, Wei Yang wrote:
> Function show_mem() is used to print system memory status when user
> requires or fail to allocate memory. Generally, this is a best effort
> information and not willing to affect core mm subsystem.

I would drop the part after and

> The data protected by pgdat_resize_lock is mostly correct except there is:
> 
>    * page struct defer init
>    * memory hotplug

This is more confusing than helpful. I would just drop it.

The changelog doesn't explain what is done and why. The second one is
much more important. I would say this

"
Function show_mem() is used to print system memory status when user
requires or fail to allocate memory. Generally, this is a best effort
information so any races with memory hotplug (or very theoretically an
early initialization) should be toleratable and the worst that could
happen is to print an imprecise node state.

Drop the resize lock because this is the only place which might hold the
lock from the interrupt context and so all other callers might use a
simple spinlock. Even though this doesn't solve any real issue it makes
the code easier to follow and tiny more effective.
"

> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> ---
>  lib/show_mem.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/lib/show_mem.c b/lib/show_mem.c
> index 0beaa1d899aa..1d996e5771ab 100644
> --- a/lib/show_mem.c
> +++ b/lib/show_mem.c
> @@ -21,7 +21,6 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
>  		unsigned long flags;

btw. you want to drop flags.
>  		int zoneid;
>  
> -		pgdat_resize_lock(pgdat, &flags);
>  		for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
>  			struct zone *zone = &pgdat->node_zones[zoneid];
>  			if (!populated_zone(zone))
> @@ -33,7 +32,6 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
>  			if (is_highmem_idx(zoneid))
>  				highmem += zone->present_pages;
>  		}
> -		pgdat_resize_unlock(pgdat, &flags);
>  	}
>  
>  	printk("%lu pages RAM\n", total);
> -- 
> 2.15.1
>
Wei Yang Nov. 29, 2018, 9:32 a.m. UTC | #4
On Thu, Nov 29, 2018 at 09:17:03AM +0100, Michal Hocko wrote:
>On Thu 29-11-18 05:08:15, Wei Yang wrote:
>> Function show_mem() is used to print system memory status when user
>> requires or fail to allocate memory. Generally, this is a best effort
>> information and not willing to affect core mm subsystem.
>
>I would drop the part after and
>
>> The data protected by pgdat_resize_lock is mostly correct except there is:
>> 
>>    * page struct defer init
>>    * memory hotplug
>
>This is more confusing than helpful. I would just drop it.
>
>The changelog doesn't explain what is done and why. The second one is
>much more important. I would say this
>
>"
>Function show_mem() is used to print system memory status when user
>requires or fail to allocate memory. Generally, this is a best effort
>information so any races with memory hotplug (or very theoretically an
>early initialization) should be toleratable and the worst that could
>happen is to print an imprecise node state.
>
>Drop the resize lock because this is the only place which might hold the
>lock from the interrupt context and so all other callers might use a
>simple spinlock. Even though this doesn't solve any real issue it makes
>the code easier to follow and tiny more effective.
>"

Ah, I have to admit this is much clearer and easier for audience to
understand the reason.

Thanks a lot.

>
>> 
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> ---
>>  lib/show_mem.c | 2 --
>>  1 file changed, 2 deletions(-)
>> 
>> diff --git a/lib/show_mem.c b/lib/show_mem.c
>> index 0beaa1d899aa..1d996e5771ab 100644
>> --- a/lib/show_mem.c
>> +++ b/lib/show_mem.c
>> @@ -21,7 +21,6 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
>>  		unsigned long flags;
>
>btw. you want to drop flags.

Oops, what a shame . :-(

>>  		int zoneid;
>>  
>> -		pgdat_resize_lock(pgdat, &flags);
>>  		for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
>>  			struct zone *zone = &pgdat->node_zones[zoneid];
>>  			if (!populated_zone(zone))
>> @@ -33,7 +32,6 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
>>  			if (is_highmem_idx(zoneid))
>>  				highmem += zone->present_pages;
>>  		}
>> -		pgdat_resize_unlock(pgdat, &flags);
>>  	}
>>  
>>  	printk("%lu pages RAM\n", total);
>> -- 
>> 2.15.1
>> 
>
>-- 
>Michal Hocko
>SUSE Labs
Wei Yang Nov. 29, 2018, 3:04 p.m. UTC | #5
On Thu, Nov 29, 2018 at 09:17:03AM +0100, Michal Hocko wrote:
>On Thu 29-11-18 05:08:15, Wei Yang wrote:
>> Function show_mem() is used to print system memory status when user
>> requires or fail to allocate memory. Generally, this is a best effort
>> information and not willing to affect core mm subsystem.
>
>I would drop the part after and
>
>> The data protected by pgdat_resize_lock is mostly correct except there is:
>> 
>>    * page struct defer init
>>    * memory hotplug
>
>This is more confusing than helpful. I would just drop it.
>
>The changelog doesn't explain what is done and why. The second one is
>much more important. I would say this
>
>"
>Function show_mem() is used to print system memory status when user
>requires or fail to allocate memory. Generally, this is a best effort
>information so any races with memory hotplug (or very theoretically an
>early initialization) should be toleratable and the worst that could
>happen is to print an imprecise node state.
>
>Drop the resize lock because this is the only place which might hold the

As I mentioned in https://patchwork.kernel.org/patch/10689759/, there is
one place used in __remove_zone(). I don't get your suggestion of this
place. And is __remove_zone() could be called in IRQ context?

>lock from the interrupt context and so all other callers might use a
>simple spinlock. Even though this doesn't solve any real issue it makes
>the code easier to follow and tiny more effective.
>"
>
Michal Hocko Nov. 29, 2018, 3:49 p.m. UTC | #6
On Thu 29-11-18 15:04:49, Wei Yang wrote:
> On Thu, Nov 29, 2018 at 09:17:03AM +0100, Michal Hocko wrote:
> >On Thu 29-11-18 05:08:15, Wei Yang wrote:
> >> Function show_mem() is used to print system memory status when user
> >> requires or fail to allocate memory. Generally, this is a best effort
> >> information and not willing to affect core mm subsystem.
> >
> >I would drop the part after and
> >
> >> The data protected by pgdat_resize_lock is mostly correct except there is:
> >> 
> >>    * page struct defer init
> >>    * memory hotplug
> >
> >This is more confusing than helpful. I would just drop it.
> >
> >The changelog doesn't explain what is done and why. The second one is
> >much more important. I would say this
> >
> >"
> >Function show_mem() is used to print system memory status when user
> >requires or fail to allocate memory. Generally, this is a best effort
> >information so any races with memory hotplug (or very theoretically an
> >early initialization) should be toleratable and the worst that could
> >happen is to print an imprecise node state.
> >
> >Drop the resize lock because this is the only place which might hold the
> 
> As I mentioned in https://patchwork.kernel.org/patch/10689759/, there is
> one place used in __remove_zone(). I don't get your suggestion of this
> place. And is __remove_zone() could be called in IRQ context?

It is only called from __remove_pages and that one calls cond_resched so
obviosly not.
Wei Yang Nov. 29, 2018, 4:05 p.m. UTC | #7
On Thu, Nov 29, 2018 at 04:49:22PM +0100, Michal Hocko wrote:
>On Thu 29-11-18 15:04:49, Wei Yang wrote:
>> On Thu, Nov 29, 2018 at 09:17:03AM +0100, Michal Hocko wrote:
>> >On Thu 29-11-18 05:08:15, Wei Yang wrote:
>> >> Function show_mem() is used to print system memory status when user
>> >> requires or fail to allocate memory. Generally, this is a best effort
>> >> information and not willing to affect core mm subsystem.
>> >
>> >I would drop the part after and
>> >
>> >> The data protected by pgdat_resize_lock is mostly correct except there is:
>> >> 
>> >>    * page struct defer init
>> >>    * memory hotplug
>> >
>> >This is more confusing than helpful. I would just drop it.
>> >
>> >The changelog doesn't explain what is done and why. The second one is
>> >much more important. I would say this
>> >
>> >"
>> >Function show_mem() is used to print system memory status when user
>> >requires or fail to allocate memory. Generally, this is a best effort
>> >information so any races with memory hotplug (or very theoretically an
>> >early initialization) should be toleratable and the worst that could
>> >happen is to print an imprecise node state.
>> >
>> >Drop the resize lock because this is the only place which might hold the
>> 
>> As I mentioned in https://patchwork.kernel.org/patch/10689759/, there is
>> one place used in __remove_zone(). I don't get your suggestion of this
>> place. And is __remove_zone() could be called in IRQ context?
>
>It is only called from __remove_pages and that one calls cond_resched so
>obviosly not.
>

Forgive my poor background knowledge, I went throught the code, but not
found where call cond_resched.

  __remove_pages()
    release_mem_region_adjustable()
    clear_zone_contiguous()
    __remove_section()
      unregister_memory_section()
      __remove_zone()
      sparse_remove_one_section()
    set_zone_contiguous()

Would you mind giving me a hint?

>-- 
>Michal Hocko
>SUSE Labs
Michal Hocko Nov. 29, 2018, 4:18 p.m. UTC | #8
On Thu 29-11-18 16:05:24, Wei Yang wrote:
> On Thu, Nov 29, 2018 at 04:49:22PM +0100, Michal Hocko wrote:
[...]
> >It is only called from __remove_pages and that one calls cond_resched so
> >obviosly not.
> >
> 
> Forgive my poor background knowledge, I went throught the code, but not
> found where call cond_resched.
> 
>   __remove_pages()
>     release_mem_region_adjustable()
>     clear_zone_contiguous()
>     __remove_section()
>       unregister_memory_section()
>       __remove_zone()
>       sparse_remove_one_section()
>     set_zone_contiguous()
> 
> Would you mind giving me a hint?

This is the code as of 4.20-rc2

	for (i = 0; i < sections_to_remove; i++) {
		unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;

		cond_resched();
		ret = __remove_section(zone, __pfn_to_section(pfn), map_offset,
				altmap);
		map_offset = 0;
		if (ret)
			break;
	}

Maybe things have changed in the meantime but in general the code is
sleepable (e.g. release_mem_region_adjustable does GFP_KERNEL
allocation) and that rules out IRQ context.
Wei Yang Nov. 29, 2018, 11:49 p.m. UTC | #9
On Thu, Nov 29, 2018 at 05:18:47PM +0100, Michal Hocko wrote:
>On Thu 29-11-18 16:05:24, Wei Yang wrote:
>> On Thu, Nov 29, 2018 at 04:49:22PM +0100, Michal Hocko wrote:
>[...]
>> >It is only called from __remove_pages and that one calls cond_resched so
>> >obviosly not.
>> >
>> 
>> Forgive my poor background knowledge, I went throught the code, but not
>> found where call cond_resched.
>> 
>>   __remove_pages()
>>     release_mem_region_adjustable()
>>     clear_zone_contiguous()
>>     __remove_section()
>>       unregister_memory_section()
>>       __remove_zone()
>>       sparse_remove_one_section()
>>     set_zone_contiguous()
>> 
>> Would you mind giving me a hint?
>
>This is the code as of 4.20-rc2
>
>	for (i = 0; i < sections_to_remove; i++) {
>		unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
>
>		cond_resched();
>		ret = __remove_section(zone, __pfn_to_section(pfn), map_offset,
>				altmap);
>		map_offset = 0;
>		if (ret)
>			break;
>	}
>
>Maybe things have changed in the meantime but in general the code is
>sleepable (e.g. release_mem_region_adjustable does GFP_KERNEL
>allocation) and that rules out IRQ context.

Thanks, my code is not up to date.

>-- 
>Michal Hocko
>SUSE Labs
diff mbox series

Patch

diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d899aa..1d996e5771ab 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -21,7 +21,6 @@  void show_mem(unsigned int filter, nodemask_t *nodemask)
 		unsigned long flags;
 		int zoneid;
 
-		pgdat_resize_lock(pgdat, &flags);
 		for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
 			struct zone *zone = &pgdat->node_zones[zoneid];
 			if (!populated_zone(zone))
@@ -33,7 +32,6 @@  void show_mem(unsigned int filter, nodemask_t *nodemask)
 			if (is_highmem_idx(zoneid))
 				highmem += zone->present_pages;
 		}
-		pgdat_resize_unlock(pgdat, &flags);
 	}
 
 	printk("%lu pages RAM\n", total);