diff mbox series

memory tier: fix deadlock warning while onlining pages

Message ID 20240827113614.1343049-1-yanfei.xu@intel.com (mailing list archive)
State New
Headers show
Series memory tier: fix deadlock warning while onlining pages | expand

Commit Message

Yanfei Xu Aug. 27, 2024, 11:36 a.m. UTC
The memory_tier_init() acquires unnecessary memory_tier_lock and
its locking scope includes the memory notifier registration of
hotplug callback, which also acqures the memory_tier_lock. This can
trigger an locking dependency detected of ABBA deadlock.

Specifically, If the memory online event occurs, the executing
memory notifier will access the read lock of the memory_chain.rwsem,
then the reigistration of the memory notifier in memory_tier_init()
acquires the write lock of the memory_chain.rwsem while holding
memory_tier_lock. Then the memory online event continues to invoke
the memory hotplug callback registered by memory_tier_init(). Since
this callback tries to acquire the memory_tier_lock, a deadlock
occurs.

In fact, this deadlock can't happen because the memory_tier_init()
always executes before memory online events happen due to the
subsys_initcall() has an higher priority than module_init().

[  133.491106] WARNING: possible circular locking dependency detected
[  133.493656] 6.11.0-rc2+ #146 Tainted: G           O     N
[  133.504290] ------------------------------------------------------
[  133.515194] (udev-worker)/1133 is trying to acquire lock:
[  133.525715] ffffffff87044e28 (memory_tier_lock){+.+.}-{3:3}, at: memtier_hotplug_callback+0x383/0x4b0
[  133.536449]
[  133.536449] but task is already holding lock:
[  133.549847] ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0
[  133.556781]
[  133.556781] which lock already depends on the new lock.
[  133.556781]
[  133.569957]
[  133.569957] the existing dependency chain (in reverse order) is:
[  133.577618]
[  133.577618] -> #1 ((memory_chain).rwsem){++++}-{3:3}:
[  133.584997]        down_write+0x97/0x210
[  133.588647]        blocking_notifier_chain_register+0x71/0xd0
[  133.592537]        register_memory_notifier+0x26/0x30
[  133.596314]        memory_tier_init+0x187/0x300
[  133.599864]        do_one_initcall+0x117/0x5d0
[  133.603399]        kernel_init_freeable+0xab0/0xeb0
[  133.606986]        kernel_init+0x28/0x2f0
[  133.610312]        ret_from_fork+0x59/0x90
[  133.613652]        ret_from_fork_asm+0x1a/0x30
[  133.617012]
[  133.617012] -> #0 (memory_tier_lock){+.+.}-{3:3}:
[  133.623390]        __lock_acquire+0x2efd/0x5c60
[  133.626730]        lock_acquire+0x1ce/0x580
[  133.629757]        __mutex_lock+0x15c/0x1490
[  133.632731]        mutex_lock_nested+0x1f/0x30
[  133.635717]        memtier_hotplug_callback+0x383/0x4b0
[  133.638748]        notifier_call_chain+0xbf/0x370
[  133.641647]        blocking_notifier_call_chain+0x76/0xb0
[  133.644636]        memory_notify+0x2e/0x40
[  133.647427]        online_pages+0x597/0x720
[  133.650246]        memory_subsys_online+0x4f6/0x7f0
[  133.653107]        device_online+0x141/0x1d0
[  133.655831]        online_memory_block+0x4d/0x60
[  133.658616]        walk_memory_blocks+0xc0/0x120
[  133.661419]        add_memory_resource+0x51d/0x6c0
[  133.664202]        add_memory_driver_managed+0xf5/0x180
[  133.667060]        dev_dax_kmem_probe+0x7f7/0xb40 [kmem]
[  133.669949]        dax_bus_probe+0x147/0x230
[  133.672687]        really_probe+0x27f/0xac0
[  133.675463]        __driver_probe_device+0x1f3/0x460
[  133.678493]        driver_probe_device+0x56/0x1b0
[  133.681366]        __driver_attach+0x277/0x570
[  133.684149]        bus_for_each_dev+0x145/0x1e0
[  133.686937]        driver_attach+0x49/0x60
[  133.689673]        bus_add_driver+0x2f3/0x6b0
[  133.692421]        driver_register+0x170/0x4b0
[  133.695118]        __dax_driver_register+0x141/0x1b0
[  133.697910]        dax_kmem_init+0x54/0xff0 [kmem]
[  133.700794]        do_one_initcall+0x117/0x5d0
[  133.703455]        do_init_module+0x277/0x750
[  133.706054]        load_module+0x5d1d/0x74f0
[  133.708602]        init_module_from_file+0x12c/0x1a0
[  133.711234]        idempotent_init_module+0x3f1/0x690
[  133.713937]        __x64_sys_finit_module+0x10e/0x1a0
[  133.716492]        x64_sys_call+0x184d/0x20d0
[  133.719053]        do_syscall_64+0x6d/0x140
[  133.721537]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  133.724239]
[  133.724239] other info that might help us debug this:
[  133.724239]
[  133.730832]  Possible unsafe locking scenario:
[  133.730832]
[  133.735298]        CPU0                    CPU1
[  133.737759]        ----                    ----
[  133.740165]   rlock((memory_chain).rwsem);
[  133.742623]                                lock(memory_tier_lock);
[  133.745357]                                lock((memory_chain).rwsem);
[  133.748141]   lock(memory_tier_lock);
[  133.750489]
[  133.750489]  *** DEADLOCK ***
[  133.750489]
[  133.756742] 6 locks held by (udev-worker)/1133:
[  133.759179]  #0: ffff888207be6158 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x26c/0x570
[  133.762299]  #1: ffffffff875b5868 (device_hotplug_lock){+.+.}-{3:3}, at: lock_device_hotplug+0x20/0x30
[  133.765565]  #2: ffff88820cf6a108 (&dev->mutex){....}-{3:3}, at: device_online+0x2f/0x1d0
[  133.768978]  #3: ffffffff86d08ff0 (cpu_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x17/0x30
[  133.772312]  #4: ffffffff8702dfb0 (mem_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x23/0x30
[  133.775544]  #5: ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0
[  133.779113]
[  133.779113] stack backtrace:
[  133.783728] CPU: 5 UID: 0 PID: 1133 Comm: (udev-worker) Tainted: G           O     N 6.11.0-rc2+ #146
[  133.787220] Tainted: [O]=OOT_MODULE, [N]=TEST
[  133.789948] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[  133.793291] Call Trace:
[  133.795826]  <TASK>
[  133.798284]  dump_stack_lvl+0xea/0x150
[  133.801025]  dump_stack+0x19/0x20
[  133.803609]  print_circular_bug+0x477/0x740
[  133.806341]  check_noncircular+0x2f4/0x3e0
[  133.809056]  ? __pfx_check_noncircular+0x10/0x10
[  133.811866]  ? __pfx_lockdep_lock+0x10/0x10
[  133.814670]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[  133.817610]  __lock_acquire+0x2efd/0x5c60
[  133.820339]  ? __pfx___lock_acquire+0x10/0x10
[  133.823128]  ? __dax_driver_register+0x141/0x1b0
[  133.825926]  ? do_one_initcall+0x117/0x5d0
[  133.828648]  lock_acquire+0x1ce/0x580
[  133.831349]  ? memtier_hotplug_callback+0x383/0x4b0
[  133.834293]  ? __pfx_lock_acquire+0x10/0x10
[  133.837134]  __mutex_lock+0x15c/0x1490
[  133.839829]  ? memtier_hotplug_callback+0x383/0x4b0
[  133.842753]  ? memtier_hotplug_callback+0x383/0x4b0
[  133.845602]  ? __this_cpu_preempt_check+0x21/0x30
[  133.848438]  ? __pfx___mutex_lock+0x10/0x10
[  133.851200]  ? __pfx_lock_acquire+0x10/0x10
[  133.853935]  ? global_dirty_limits+0xc0/0x160
[  133.856699]  ? __sanitizer_cov_trace_switch+0x58/0xa0
[  133.859564]  mutex_lock_nested+0x1f/0x30
[  133.862251]  ? mutex_lock_nested+0x1f/0x30
[  133.864964]  memtier_hotplug_callback+0x383/0x4b0
[  133.867752]  notifier_call_chain+0xbf/0x370
[  133.870550]  ? writeback_set_ratelimit+0xe8/0x160
[  133.873372]  blocking_notifier_call_chain+0x76/0xb0
[  133.876311]  memory_notify+0x2e/0x40
[  133.879013]  online_pages+0x597/0x720
[  133.881686]  ? irqentry_exit+0x3e/0xa0
[  133.884397]  ? __pfx_online_pages+0x10/0x10
[  133.887244]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[  133.890299]  ? mhp_init_memmap_on_memory+0x7a/0x1c0
[  133.893203]  memory_subsys_online+0x4f6/0x7f0
[  133.896099]  ? __pfx_memory_subsys_online+0x10/0x10
[  133.899039]  ? xa_load+0x16d/0x2e0
[  133.901667]  ? __pfx_xa_load+0x10/0x10
[  133.904366]  ? __pfx_memory_subsys_online+0x10/0x10
[  133.907218]  device_online+0x141/0x1d0
[  133.909845]  online_memory_block+0x4d/0x60
[  133.912494]  walk_memory_blocks+0xc0/0x120
[  133.915104]  ? __pfx_online_memory_block+0x10/0x10
[  133.917776]  add_memory_resource+0x51d/0x6c0
[  133.920404]  ? __pfx_add_memory_resource+0x10/0x10
[  133.923104]  ? _raw_write_unlock+0x31/0x60
[  133.925781]  ? register_memory_resource+0x119/0x180
[  133.928450]  add_memory_driver_managed+0xf5/0x180
[  133.931036]  dev_dax_kmem_probe+0x7f7/0xb40 [kmem]
[  133.933665]  ? __pfx_dev_dax_kmem_probe+0x10/0x10 [kmem]
[  133.936332]  ? __pfx___up_read+0x10/0x10
[  133.938878]  dax_bus_probe+0x147/0x230
[  133.941332]  ? __pfx_dax_bus_probe+0x10/0x10
[  133.943954]  really_probe+0x27f/0xac0
[  133.946387]  ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
[  133.949106]  __driver_probe_device+0x1f3/0x460
[  133.951704]  ? parse_option_str+0x149/0x190
[  133.954241]  driver_probe_device+0x56/0x1b0
[  133.956749]  __driver_attach+0x277/0x570
[  133.959228]  ? __pfx___driver_attach+0x10/0x10
[  133.961776]  bus_for_each_dev+0x145/0x1e0
[  133.964367]  ? __pfx_bus_for_each_dev+0x10/0x10
[  133.967019]  ? __kasan_check_read+0x15/0x20
[  133.969543]  ? _raw_spin_unlock+0x31/0x60
[  133.972132]  driver_attach+0x49/0x60
[  133.974536]  bus_add_driver+0x2f3/0x6b0
[  133.977044]  driver_register+0x170/0x4b0
[  133.979480]  __dax_driver_register+0x141/0x1b0
[  133.982126]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
[  133.984724]  dax_kmem_init+0x54/0xff0 [kmem]
[  133.987284]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
[  133.989965]  do_one_initcall+0x117/0x5d0
[  133.992506]  ? __pfx_do_one_initcall+0x10/0x10
[  133.995185]  ? __kasan_kmalloc+0x88/0xa0
[  133.997748]  ? kasan_poison+0x3e/0x60
[  134.000288]  ? kasan_unpoison+0x2c/0x60
[  134.002762]  ? kasan_poison+0x3e/0x60
[  134.005202]  ? __asan_register_globals+0x62/0x80
[  134.007753]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
[  134.010439]  do_init_module+0x277/0x750
[  134.012953]  load_module+0x5d1d/0x74f0
[  134.015406]  ? __pfx_load_module+0x10/0x10
[  134.017887]  ? __pfx_ima_post_read_file+0x10/0x10
[  134.020470]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[  134.023127]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[  134.025767]  ? security_kernel_post_read_file+0xa2/0xd0
[  134.028429]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[  134.031162]  ? kernel_read_file+0x503/0x820
[  134.033645]  ? __pfx_kernel_read_file+0x10/0x10
[  134.036232]  ? __pfx___lock_acquire+0x10/0x10
[  134.038766]  init_module_from_file+0x12c/0x1a0
[  134.041291]  ? init_module_from_file+0x12c/0x1a0
[  134.043936]  ? __pfx_init_module_from_file+0x10/0x10
[  134.046516]  ? __this_cpu_preempt_check+0x21/0x30
[  134.049091]  ? __kasan_check_read+0x15/0x20
[  134.051551]  ? do_raw_spin_unlock+0x60/0x210
[  134.054077]  idempotent_init_module+0x3f1/0x690
[  134.056643]  ? __pfx_idempotent_init_module+0x10/0x10
[  134.059318]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[  134.061995]  ? __fget_light+0x17d/0x210
[  134.064428]  __x64_sys_finit_module+0x10e/0x1a0
[  134.066976]  x64_sys_call+0x184d/0x20d0
[  134.069405]  do_syscall_64+0x6d/0x140
[  134.071926]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 823430c8e9d9 ("memory tier: consolidate the initialization of memory tiers")
Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
---
 mm/memory-tiers.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Huang, Ying Aug. 30, 2024, 7:11 a.m. UTC | #1
Hi, Yanfei,

Yanfei Xu <yanfei.xu@intel.com> writes:

> The memory_tier_init() acquires unnecessary memory_tier_lock and
> its locking scope includes the memory notifier registration of
> hotplug callback, which also acqures the memory_tier_lock. This can
> trigger an locking dependency detected of ABBA deadlock.
>
> Specifically, If the memory online event occurs, the executing
> memory notifier will access the read lock of the memory_chain.rwsem,
> then the reigistration of the memory notifier in memory_tier_init()
> acquires the write lock of the memory_chain.rwsem while holding
> memory_tier_lock. Then the memory online event continues to invoke
> the memory hotplug callback registered by memory_tier_init(). Since
> this callback tries to acquire the memory_tier_lock, a deadlock
> occurs.

Thank you very much to fix this!

> In fact, this deadlock can't happen because the memory_tier_init()
> always executes before memory online events happen due to the
> subsys_initcall() has an higher priority than module_init().
>
> [  133.491106] WARNING: possible circular locking dependency detected
> [  133.493656] 6.11.0-rc2+ #146 Tainted: G           O     N
> [  133.504290] ------------------------------------------------------
> [  133.515194] (udev-worker)/1133 is trying to acquire lock:
> [  133.525715] ffffffff87044e28 (memory_tier_lock){+.+.}-{3:3}, at: memtier_hotplug_callback+0x383/0x4b0
> [  133.536449]
> [  133.536449] but task is already holding lock:
> [  133.549847] ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0
> [  133.556781]
> [  133.556781] which lock already depends on the new lock.
> [  133.556781]
> [  133.569957]
> [  133.569957] the existing dependency chain (in reverse order) is:
> [  133.577618]
> [  133.577618] -> #1 ((memory_chain).rwsem){++++}-{3:3}:
> [  133.584997]        down_write+0x97/0x210
> [  133.588647]        blocking_notifier_chain_register+0x71/0xd0
> [  133.592537]        register_memory_notifier+0x26/0x30
> [  133.596314]        memory_tier_init+0x187/0x300
> [  133.599864]        do_one_initcall+0x117/0x5d0
> [  133.603399]        kernel_init_freeable+0xab0/0xeb0
> [  133.606986]        kernel_init+0x28/0x2f0
> [  133.610312]        ret_from_fork+0x59/0x90
> [  133.613652]        ret_from_fork_asm+0x1a/0x30
> [  133.617012]
> [  133.617012] -> #0 (memory_tier_lock){+.+.}-{3:3}:
> [  133.623390]        __lock_acquire+0x2efd/0x5c60
> [  133.626730]        lock_acquire+0x1ce/0x580
> [  133.629757]        __mutex_lock+0x15c/0x1490
> [  133.632731]        mutex_lock_nested+0x1f/0x30
> [  133.635717]        memtier_hotplug_callback+0x383/0x4b0
> [  133.638748]        notifier_call_chain+0xbf/0x370
> [  133.641647]        blocking_notifier_call_chain+0x76/0xb0
> [  133.644636]        memory_notify+0x2e/0x40
> [  133.647427]        online_pages+0x597/0x720
> [  133.650246]        memory_subsys_online+0x4f6/0x7f0
> [  133.653107]        device_online+0x141/0x1d0
> [  133.655831]        online_memory_block+0x4d/0x60
> [  133.658616]        walk_memory_blocks+0xc0/0x120
> [  133.661419]        add_memory_resource+0x51d/0x6c0
> [  133.664202]        add_memory_driver_managed+0xf5/0x180
> [  133.667060]        dev_dax_kmem_probe+0x7f7/0xb40 [kmem]
> [  133.669949]        dax_bus_probe+0x147/0x230
> [  133.672687]        really_probe+0x27f/0xac0
> [  133.675463]        __driver_probe_device+0x1f3/0x460
> [  133.678493]        driver_probe_device+0x56/0x1b0
> [  133.681366]        __driver_attach+0x277/0x570
> [  133.684149]        bus_for_each_dev+0x145/0x1e0
> [  133.686937]        driver_attach+0x49/0x60
> [  133.689673]        bus_add_driver+0x2f3/0x6b0
> [  133.692421]        driver_register+0x170/0x4b0
> [  133.695118]        __dax_driver_register+0x141/0x1b0
> [  133.697910]        dax_kmem_init+0x54/0xff0 [kmem]
> [  133.700794]        do_one_initcall+0x117/0x5d0
> [  133.703455]        do_init_module+0x277/0x750
> [  133.706054]        load_module+0x5d1d/0x74f0
> [  133.708602]        init_module_from_file+0x12c/0x1a0
> [  133.711234]        idempotent_init_module+0x3f1/0x690
> [  133.713937]        __x64_sys_finit_module+0x10e/0x1a0
> [  133.716492]        x64_sys_call+0x184d/0x20d0
> [  133.719053]        do_syscall_64+0x6d/0x140
> [  133.721537]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  133.724239]
> [  133.724239] other info that might help us debug this:
> [  133.724239]
> [  133.730832]  Possible unsafe locking scenario:
> [  133.730832]
> [  133.735298]        CPU0                    CPU1
> [  133.737759]        ----                    ----
> [  133.740165]   rlock((memory_chain).rwsem);
> [  133.742623]                                lock(memory_tier_lock);
> [  133.745357]                                lock((memory_chain).rwsem);
> [  133.748141]   lock(memory_tier_lock);
> [  133.750489]
> [  133.750489]  *** DEADLOCK ***
> [  133.750489]
> [  133.756742] 6 locks held by (udev-worker)/1133:
> [  133.759179]  #0: ffff888207be6158 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x26c/0x570
> [  133.762299]  #1: ffffffff875b5868 (device_hotplug_lock){+.+.}-{3:3}, at: lock_device_hotplug+0x20/0x30
> [  133.765565]  #2: ffff88820cf6a108 (&dev->mutex){....}-{3:3}, at: device_online+0x2f/0x1d0
> [  133.768978]  #3: ffffffff86d08ff0 (cpu_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x17/0x30
> [  133.772312]  #4: ffffffff8702dfb0 (mem_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x23/0x30
> [  133.775544]  #5: ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0
> [  133.779113]
> [  133.779113] stack backtrace:
> [  133.783728] CPU: 5 UID: 0 PID: 1133 Comm: (udev-worker) Tainted: G           O     N 6.11.0-rc2+ #146
> [  133.787220] Tainted: [O]=OOT_MODULE, [N]=TEST
> [  133.789948] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> [  133.793291] Call Trace:
> [  133.795826]  <TASK>
> [  133.798284]  dump_stack_lvl+0xea/0x150
> [  133.801025]  dump_stack+0x19/0x20
> [  133.803609]  print_circular_bug+0x477/0x740
> [  133.806341]  check_noncircular+0x2f4/0x3e0
> [  133.809056]  ? __pfx_check_noncircular+0x10/0x10
> [  133.811866]  ? __pfx_lockdep_lock+0x10/0x10
> [  133.814670]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
> [  133.817610]  __lock_acquire+0x2efd/0x5c60
> [  133.820339]  ? __pfx___lock_acquire+0x10/0x10
> [  133.823128]  ? __dax_driver_register+0x141/0x1b0
> [  133.825926]  ? do_one_initcall+0x117/0x5d0
> [  133.828648]  lock_acquire+0x1ce/0x580
> [  133.831349]  ? memtier_hotplug_callback+0x383/0x4b0
> [  133.834293]  ? __pfx_lock_acquire+0x10/0x10
> [  133.837134]  __mutex_lock+0x15c/0x1490
> [  133.839829]  ? memtier_hotplug_callback+0x383/0x4b0
> [  133.842753]  ? memtier_hotplug_callback+0x383/0x4b0
> [  133.845602]  ? __this_cpu_preempt_check+0x21/0x30
> [  133.848438]  ? __pfx___mutex_lock+0x10/0x10
> [  133.851200]  ? __pfx_lock_acquire+0x10/0x10
> [  133.853935]  ? global_dirty_limits+0xc0/0x160
> [  133.856699]  ? __sanitizer_cov_trace_switch+0x58/0xa0
> [  133.859564]  mutex_lock_nested+0x1f/0x30
> [  133.862251]  ? mutex_lock_nested+0x1f/0x30
> [  133.864964]  memtier_hotplug_callback+0x383/0x4b0
> [  133.867752]  notifier_call_chain+0xbf/0x370
> [  133.870550]  ? writeback_set_ratelimit+0xe8/0x160
> [  133.873372]  blocking_notifier_call_chain+0x76/0xb0
> [  133.876311]  memory_notify+0x2e/0x40
> [  133.879013]  online_pages+0x597/0x720
> [  133.881686]  ? irqentry_exit+0x3e/0xa0
> [  133.884397]  ? __pfx_online_pages+0x10/0x10
> [  133.887244]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
> [  133.890299]  ? mhp_init_memmap_on_memory+0x7a/0x1c0
> [  133.893203]  memory_subsys_online+0x4f6/0x7f0
> [  133.896099]  ? __pfx_memory_subsys_online+0x10/0x10
> [  133.899039]  ? xa_load+0x16d/0x2e0
> [  133.901667]  ? __pfx_xa_load+0x10/0x10
> [  133.904366]  ? __pfx_memory_subsys_online+0x10/0x10
> [  133.907218]  device_online+0x141/0x1d0
> [  133.909845]  online_memory_block+0x4d/0x60
> [  133.912494]  walk_memory_blocks+0xc0/0x120
> [  133.915104]  ? __pfx_online_memory_block+0x10/0x10
> [  133.917776]  add_memory_resource+0x51d/0x6c0
> [  133.920404]  ? __pfx_add_memory_resource+0x10/0x10
> [  133.923104]  ? _raw_write_unlock+0x31/0x60
> [  133.925781]  ? register_memory_resource+0x119/0x180
> [  133.928450]  add_memory_driver_managed+0xf5/0x180
> [  133.931036]  dev_dax_kmem_probe+0x7f7/0xb40 [kmem]
> [  133.933665]  ? __pfx_dev_dax_kmem_probe+0x10/0x10 [kmem]
> [  133.936332]  ? __pfx___up_read+0x10/0x10
> [  133.938878]  dax_bus_probe+0x147/0x230
> [  133.941332]  ? __pfx_dax_bus_probe+0x10/0x10
> [  133.943954]  really_probe+0x27f/0xac0
> [  133.946387]  ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
> [  133.949106]  __driver_probe_device+0x1f3/0x460
> [  133.951704]  ? parse_option_str+0x149/0x190
> [  133.954241]  driver_probe_device+0x56/0x1b0
> [  133.956749]  __driver_attach+0x277/0x570
> [  133.959228]  ? __pfx___driver_attach+0x10/0x10
> [  133.961776]  bus_for_each_dev+0x145/0x1e0
> [  133.964367]  ? __pfx_bus_for_each_dev+0x10/0x10
> [  133.967019]  ? __kasan_check_read+0x15/0x20
> [  133.969543]  ? _raw_spin_unlock+0x31/0x60
> [  133.972132]  driver_attach+0x49/0x60
> [  133.974536]  bus_add_driver+0x2f3/0x6b0
> [  133.977044]  driver_register+0x170/0x4b0
> [  133.979480]  __dax_driver_register+0x141/0x1b0
> [  133.982126]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
> [  133.984724]  dax_kmem_init+0x54/0xff0 [kmem]
> [  133.987284]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
> [  133.989965]  do_one_initcall+0x117/0x5d0
> [  133.992506]  ? __pfx_do_one_initcall+0x10/0x10
> [  133.995185]  ? __kasan_kmalloc+0x88/0xa0
> [  133.997748]  ? kasan_poison+0x3e/0x60
> [  134.000288]  ? kasan_unpoison+0x2c/0x60
> [  134.002762]  ? kasan_poison+0x3e/0x60
> [  134.005202]  ? __asan_register_globals+0x62/0x80
> [  134.007753]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
> [  134.010439]  do_init_module+0x277/0x750
> [  134.012953]  load_module+0x5d1d/0x74f0
> [  134.015406]  ? __pfx_load_module+0x10/0x10
> [  134.017887]  ? __pfx_ima_post_read_file+0x10/0x10
> [  134.020470]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
> [  134.023127]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [  134.025767]  ? security_kernel_post_read_file+0xa2/0xd0
> [  134.028429]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [  134.031162]  ? kernel_read_file+0x503/0x820
> [  134.033645]  ? __pfx_kernel_read_file+0x10/0x10
> [  134.036232]  ? __pfx___lock_acquire+0x10/0x10
> [  134.038766]  init_module_from_file+0x12c/0x1a0
> [  134.041291]  ? init_module_from_file+0x12c/0x1a0
> [  134.043936]  ? __pfx_init_module_from_file+0x10/0x10
> [  134.046516]  ? __this_cpu_preempt_check+0x21/0x30
> [  134.049091]  ? __kasan_check_read+0x15/0x20
> [  134.051551]  ? do_raw_spin_unlock+0x60/0x210
> [  134.054077]  idempotent_init_module+0x3f1/0x690
> [  134.056643]  ? __pfx_idempotent_init_module+0x10/0x10
> [  134.059318]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [  134.061995]  ? __fget_light+0x17d/0x210
> [  134.064428]  __x64_sys_finit_module+0x10e/0x1a0
> [  134.066976]  x64_sys_call+0x184d/0x20d0
> [  134.069405]  do_syscall_64+0x6d/0x140
> [  134.071926]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> Fixes: 823430c8e9d9 ("memory tier: consolidate the initialization of memory tiers")
> Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
> ---
>  mm/memory-tiers.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> index 4775b3a3dabe..dddcd6b38e28 100644
> --- a/mm/memory-tiers.c
> +++ b/mm/memory-tiers.c
> @@ -895,7 +895,6 @@ static int __init memory_tier_init(void)
>  	WARN_ON(!node_demotion);
>  #endif
>  
> -	guard(mutex)(&memory_tier_lock);
>  	/*
>  	 * For now we can have 4 faster memory tiers with smaller adistance
>  	 * than default DRAM tier.

Although it's not absolutely necessary, I still think that it's better
to just revert the locking change of memory_tier_init() in commit
823430c8e9d9 ("memory tier: consolidate the initialization of memory
tiers").  That is, to use mutex_lock/unlock() and exclude
hotplug_memory_notifier() from the locked region.  Because we will
always hold memory_tier_lock when working on memory tier related data
structures in this way.

--
Best Regards,
Huang, Ying
Yanfei Xu Aug. 30, 2024, 9:53 a.m. UTC | #2
[Snip]

On 8/30/2024 3:11 PM, Huang, Ying wrote:
>> Fixes: 823430c8e9d9 ("memory tier: consolidate the initialization of memory tiers")
>> Signed-off-by: Yanfei Xu<yanfei.xu@intel.com>
>> ---
>>   mm/memory-tiers.c | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
>> index 4775b3a3dabe..dddcd6b38e28 100644
>> --- a/mm/memory-tiers.c
>> +++ b/mm/memory-tiers.c
>> @@ -895,7 +895,6 @@ static int __init memory_tier_init(void)
>>   	WARN_ON(!node_demotion);
>>   #endif
>>   
>> -	guard(mutex)(&memory_tier_lock);
>>   	/*
>>   	 * For now we can have 4 faster memory tiers with smaller adistance
>>   	 * than default DRAM tier.
> Although it's not absolutely necessary, I still think that it's better
> to just revert the locking change of memory_tier_init() in commit
> 823430c8e9d9 ("memory tier: consolidate the initialization of memory
> tiers").  That is, to use mutex_lock/unlock() and exclude
> hotplug_memory_notifier() from the locked region.  Because we will
> always hold memory_tier_lock when working on memory tier related data
> structures in this way.

Thanks for pointing out it. Will do it as below in v2.

diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 4775b3a3dabe..ba3738b29cc5 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -895,13 +895,14 @@ static int __init memory_tier_init(void)
         WARN_ON(!node_demotion);
  #endif

-       guard(mutex)(&memory_tier_lock);
+       mutex_lock(&memory_tier_lock);
         /*
          * For now we can have 4 faster memory tiers with smaller adistance
          * than default DRAM tier.
          */
         default_dram_type = 
mt_find_alloc_memory_type(MEMTIER_ADISTANCE_DRAM,
  
&default_memory_types);
+       mutex_unlock(&memory_tier_lock);
         if (IS_ERR(default_dram_type))
                 panic("%s() failed to allocate default DRAM tier\n", 
__func__);


Best Regards,
Yanfei

> 
> --
> Best Regards,
> Huang, Ying
Huang, Ying Sept. 2, 2024, 1:37 a.m. UTC | #3
Yanfei Xu <yanfei.xu@intel.com> writes:

> [Snip]
>
> On 8/30/2024 3:11 PM, Huang, Ying wrote:
>>> Fixes: 823430c8e9d9 ("memory tier: consolidate the initialization of memory tiers")
>>> Signed-off-by: Yanfei Xu<yanfei.xu@intel.com>
>>> ---
>>>   mm/memory-tiers.c | 1 -
>>>   1 file changed, 1 deletion(-)
>>>
>>> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
>>> index 4775b3a3dabe..dddcd6b38e28 100644
>>> --- a/mm/memory-tiers.c
>>> +++ b/mm/memory-tiers.c
>>> @@ -895,7 +895,6 @@ static int __init memory_tier_init(void)
>>>   	WARN_ON(!node_demotion);
>>>   #endif
>>>   -	guard(mutex)(&memory_tier_lock);
>>>   	/*
>>>   	 * For now we can have 4 faster memory tiers with smaller adistance
>>>   	 * than default DRAM tier.
>> Although it's not absolutely necessary, I still think that it's better
>> to just revert the locking change of memory_tier_init() in commit
>> 823430c8e9d9 ("memory tier: consolidate the initialization of memory
>> tiers").  That is, to use mutex_lock/unlock() and exclude
>> hotplug_memory_notifier() from the locked region.  Because we will
>> always hold memory_tier_lock when working on memory tier related data
>> structures in this way.
>
> Thanks for pointing out it. Will do it as below in v2.
>
> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> index 4775b3a3dabe..ba3738b29cc5 100644
> --- a/mm/memory-tiers.c
> +++ b/mm/memory-tiers.c
> @@ -895,13 +895,14 @@ static int __init memory_tier_init(void)
>         WARN_ON(!node_demotion);
>  #endif
>
> -       guard(mutex)(&memory_tier_lock);
> +       mutex_lock(&memory_tier_lock);
>         /*
>          * For now we can have 4 faster memory tiers with smaller adistance
>          * than default DRAM tier.
>          */
>         default_dram_type =
>         mt_find_alloc_memory_type(MEMTIER_ADISTANCE_DRAM,
>  &default_memory_types);
> +       mutex_unlock(&memory_tier_lock);
>         if (IS_ERR(default_dram_type))
>                 panic("%s() failed to allocate default DRAM tier\n",
>                 __func__);
>
>

LGTM, feel free to add in the future version.

Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Yanfei Xu Sept. 2, 2024, 8:56 a.m. UTC | #4
On 9/2/2024 9:37 AM, Huang, Ying wrote:
> Yanfei Xu <yanfei.xu@intel.com> writes:
> 
>> [Snip]
>>
>> On 8/30/2024 3:11 PM, Huang, Ying wrote:
>>>> Fixes: 823430c8e9d9 ("memory tier: consolidate the initialization of memory tiers")
>>>> Signed-off-by: Yanfei Xu<yanfei.xu@intel.com>
>>>> ---
>>>>    mm/memory-tiers.c | 1 -
>>>>    1 file changed, 1 deletion(-)
>>>>
>>>> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
>>>> index 4775b3a3dabe..dddcd6b38e28 100644
>>>> --- a/mm/memory-tiers.c
>>>> +++ b/mm/memory-tiers.c
>>>> @@ -895,7 +895,6 @@ static int __init memory_tier_init(void)
>>>>    	WARN_ON(!node_demotion);
>>>>    #endif
>>>>    -	guard(mutex)(&memory_tier_lock);
>>>>    	/*
>>>>    	 * For now we can have 4 faster memory tiers with smaller adistance
>>>>    	 * than default DRAM tier.
>>> Although it's not absolutely necessary, I still think that it's better
>>> to just revert the locking change of memory_tier_init() in commit
>>> 823430c8e9d9 ("memory tier: consolidate the initialization of memory
>>> tiers").  That is, to use mutex_lock/unlock() and exclude
>>> hotplug_memory_notifier() from the locked region.  Because we will
>>> always hold memory_tier_lock when working on memory tier related data
>>> structures in this way.
>>
>> Thanks for pointing out it. Will do it as below in v2.
>>
>> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
>> index 4775b3a3dabe..ba3738b29cc5 100644
>> --- a/mm/memory-tiers.c
>> +++ b/mm/memory-tiers.c
>> @@ -895,13 +895,14 @@ static int __init memory_tier_init(void)
>>          WARN_ON(!node_demotion);
>>   #endif
>>
>> -       guard(mutex)(&memory_tier_lock);
>> +       mutex_lock(&memory_tier_lock);
>>          /*
>>           * For now we can have 4 faster memory tiers with smaller adistance
>>           * than default DRAM tier.
>>           */
>>          default_dram_type =
>>          mt_find_alloc_memory_type(MEMTIER_ADISTANCE_DRAM,
>>   &default_memory_types);
>> +       mutex_unlock(&memory_tier_lock);
>>          if (IS_ERR(default_dram_type))
>>                  panic("%s() failed to allocate default DRAM tier\n",
>>                  __func__);
>>
>>
> 
> LGTM, feel free to add in the future version.
> 
> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
> 
Hi Ying,

Thanks for your "Reviewed-by"! I sent the v2 last Friday evening. Could
you please reply to that thread with the "Reviewed-by"? If not, I can
send v3 and include your tag.

Thread name:
8/30/2024, 6:24 PM  [PATCH v2] memory tier: fix deadlock warning while 
onlining pages

Best Regards,
Yanfei
diff mbox series

Patch

diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 4775b3a3dabe..dddcd6b38e28 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -895,7 +895,6 @@  static int __init memory_tier_init(void)
 	WARN_ON(!node_demotion);
 #endif
 
-	guard(mutex)(&memory_tier_lock);
 	/*
 	 * For now we can have 4 faster memory tiers with smaller adistance
 	 * than default DRAM tier.