diff mbox series

[RFC] mm/memory_hotplug: Don't take the cpu_hotplug_lock

Message ID 20190725092206.23712-1-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series [RFC] mm/memory_hotplug: Don't take the cpu_hotplug_lock | expand

Commit Message

David Hildenbrand July 25, 2019, 9:22 a.m. UTC
Commit 9852a7212324 ("mm: drop hotplug lock from lru_add_drain_all()")
states that lru_add_drain_all() "Doesn't need any cpu hotplug locking
because we do rely on per-cpu kworkers being shut down before our
page_alloc_cpu_dead callback is executed on the offlined cpu."

And also "Calling this function with cpu hotplug locks held can actually
lead to obscure indirect dependencies via WQ context.".

Since commit 3f906ba23689 ("mm/memory-hotplug: switch locking to a percpu
rwsem") we do a cpus_read_lock() in mem_hotplug_begin().

I don't see how that lock is still helpful, we already hold the
device_hotplug_lock to protect try_offline_node(), which is AFAIK one
problematic part that can race with CPU hotplug. If it is still
necessary, we should document why.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Michal Hocko July 26, 2019, 8:19 a.m. UTC | #1
On Thu 25-07-19 11:22:06, David Hildenbrand wrote:
> Commit 9852a7212324 ("mm: drop hotplug lock from lru_add_drain_all()")
> states that lru_add_drain_all() "Doesn't need any cpu hotplug locking
> because we do rely on per-cpu kworkers being shut down before our
> page_alloc_cpu_dead callback is executed on the offlined cpu."
> 
> And also "Calling this function with cpu hotplug locks held can actually
> lead to obscure indirect dependencies via WQ context.".
> 
> Since commit 3f906ba23689 ("mm/memory-hotplug: switch locking to a percpu
> rwsem") we do a cpus_read_lock() in mem_hotplug_begin().
> 
> I don't see how that lock is still helpful, we already hold the
> device_hotplug_lock to protect try_offline_node(), which is AFAIK one
> problematic part that can race with CPU hotplug. If it is still
> necessary, we should document why.

I have forgot all the juicy details. Maybe Thomas remembers. The
previous recursive home grown locking was just terrible. I do not see
stop_machine being used in the memory hotplug anymore.
 
I do support this kind of removal because binding CPU and MEM hotplug
locks is fragile and wrong. But this patch really needs more explanation
on why this is safe. In other words what does cpu_read_lock protects
from in mem hotplug paths.

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memory_hotplug.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index e7c3b219a305..43b8cd4b96f5 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -86,14 +86,12 @@ __setup("memhp_default_state=", setup_memhp_default_state);
>  
>  void mem_hotplug_begin(void)
>  {
> -	cpus_read_lock();
>  	percpu_down_write(&mem_hotplug_lock);
>  }
>  
>  void mem_hotplug_done(void)
>  {
>  	percpu_up_write(&mem_hotplug_lock);
> -	cpus_read_unlock();
>  }
>  
>  u64 max_mem_size = U64_MAX;
> -- 
> 2.21.0
David Hildenbrand July 26, 2019, 8:22 a.m. UTC | #2
On 26.07.19 10:19, Michal Hocko wrote:
> On Thu 25-07-19 11:22:06, David Hildenbrand wrote:
>> Commit 9852a7212324 ("mm: drop hotplug lock from lru_add_drain_all()")
>> states that lru_add_drain_all() "Doesn't need any cpu hotplug locking
>> because we do rely on per-cpu kworkers being shut down before our
>> page_alloc_cpu_dead callback is executed on the offlined cpu."
>>
>> And also "Calling this function with cpu hotplug locks held can actually
>> lead to obscure indirect dependencies via WQ context.".
>>
>> Since commit 3f906ba23689 ("mm/memory-hotplug: switch locking to a percpu
>> rwsem") we do a cpus_read_lock() in mem_hotplug_begin().
>>
>> I don't see how that lock is still helpful, we already hold the
>> device_hotplug_lock to protect try_offline_node(), which is AFAIK one
>> problematic part that can race with CPU hotplug. If it is still
>> necessary, we should document why.
> 
> I have forgot all the juicy details. Maybe Thomas remembers. The
> previous recursive home grown locking was just terrible. I do not see
> stop_machine being used in the memory hotplug anymore.
>  
> I do support this kind of removal because binding CPU and MEM hotplug
> locks is fragile and wrong. But this patch really needs more explanation
> on why this is safe. In other words what does cpu_read_lock protects
> from in mem hotplug paths.

And that is the purpose of marking this RFC, because I am not aware of
any :) Hopefully Thomas can clarify if we are missing something
important (undocumented) here - if so I'll document it.
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e7c3b219a305..43b8cd4b96f5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -86,14 +86,12 @@  __setup("memhp_default_state=", setup_memhp_default_state);
 
 void mem_hotplug_begin(void)
 {
-	cpus_read_lock();
 	percpu_down_write(&mem_hotplug_lock);
 }
 
 void mem_hotplug_done(void)
 {
 	percpu_up_write(&mem_hotplug_lock);
-	cpus_read_unlock();
 }
 
 u64 max_mem_size = U64_MAX;