Message ID | 20240822-b4-regmap-maple-nolock-v1-2-d5e6dbae3396@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | regmap: Improve lock handling with maple tree | expand |
Dear Mark, On 22.08.2024 21:13, Mark Brown wrote: > For the benefit of the maple tree's lockdep checking hold the lock while > creating and exiting the cache. > > Signed-off-by: Mark Brown <broonie@kernel.org> This patch landed recently in linux-next as commit fd4ebc07b4df ("regmap: Hold the regmap lock when allocating and freeing the cache"). In my tests I found that it triggers the following warnings on Rockchip3568 based Odroid-M1 board: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:337 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 157, name: systemd-udevd preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 2 locks held by systemd-udevd/157: #0: ffff0001f0cf30f8 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x90/0x1ac #1: ffff0001f1196818 (rockchip_i2s_tdm:1300:(&rockchip_i2s_tdm_regmap_config)->lock){....}-{2:2}, at: regmap_lock_spinlock+0x18/0x2c irq event stamp: 11474 hardirqs last enabled at (11473): [<ffff8000812b4cc0>] _raw_spin_unlock_irqrestore+0x74/0x78 hardirqs last disabled at (11474): [<ffff8000812b40d4>] _raw_spin_lock_irqsave+0x84/0x88 softirqs last enabled at (9730): [<ffff8000800b13dc>] handle_softirqs+0x4cc/0x4e4 softirqs last disabled at (9721): [<ffff8000800105b0>] __do_softirq+0x14/0x20 CPU: 3 UID: 0 PID: 157 Comm: systemd-udevd Not tainted 6.11.0-rc3+ #15305 Hardware name: Hardkernel ODROID-M1 (DT) Call trace: dump_backtrace+0x94/0xec show_stack+0x18/0x24 dump_stack_lvl+0x90/0xd0 dump_stack+0x18/0x24 __might_resched+0x144/0x248 __might_sleep+0x48/0x98 __kmalloc_noprof+0x208/0x328 regcache_flat_init+0x40/0xb0 regcache_init+0x1ec/0x490 __regmap_init+0x8e0/0xd50 __devm_regmap_init+0x78/0xc8 __devm_regmap_init_mmio_clk+0x9c/0xc4 rockchip_i2s_tdm_probe+0x318/0x754 [snd_soc_rockchip_i2s_tdm] platform_probe+0x68/0xdc really_probe+0xbc/0x298 __driver_probe_device+0x78/0x12c driver_probe_device+0x40/0x164 __driver_attach+0x9c/0x1ac bus_for_each_dev+0x74/0xd4 driver_attach+0x24/0x30 bus_add_driver+0xe4/0x208 driver_register+0x60/0x128 __platform_driver_register+0x28/0x34 rockchip_i2s_tdm_driver_init+0x20/0x1000 [snd_soc_rockchip_i2s_tdm] do_one_initcall+0x68/0x300 do_init_module+0x60/0x224 load_module+0x1b0c/0x1cb0 init_module_from_file+0x84/0xc4 idempotent_init_module+0x18c/0x284 __arm64_sys_finit_module+0x64/0xa0 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0xc8/0xe8 do_el0_svc+0x20/0x2c el0_svc+0x4c/0x11c el0t_64_sync_handler+0x13c/0x158 el0t_64_sync+0x190/0x194 and ======================================================== WARNING: possible irq lock inversion dependency detected 6.11.0-rc3+ #15305 Tainted: G W -------------------------------------------------------- swapper/0/0 just changed the state of lock: ffff0001f2305018 (rockchip_drm_vop2:3114:(&vop2_regmap_config)->lock){-...}-{2:2}, at: regmap_lock_spinlock+0x18/0x2c but this lock took another, HARDIRQ-unsafe lock in the past: (fs_reclaim){+.+.}-{0:0} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); local_irq_disable(); lock(rockchip_drm_vop2:3114:(&vop2_regmap_config)->lock); lock(fs_reclaim); <Interrupt> lock(rockchip_drm_vop2:3114:(&vop2_regmap_config)->lock); *** DEADLOCK *** no locks held by swapper/0/0. the shortest dependencies between 2nd lock and 1st lock: -> (fs_reclaim){+.+.}-{0:0} { HARDIRQ-ON-W at: lock_acquire+0x200/0x340 fs_reclaim_acquire+0xdc/0xfc __kmalloc_cache_noprof+0x54/0x288 __kthread_create_worker+0x3c/0x150 kthread_create_worker+0x64/0x8c workqueue_init+0x30/0x3c4 kernel_init_freeable+0x11c/0x50c kernel_init+0x20/0x1d8 ret_from_fork+0x10/0x20 SOFTIRQ-ON-W at: lock_acquire+0x200/0x340 fs_reclaim_acquire+0xdc/0xfc __kmalloc_cache_noprof+0x54/0x288 __kthread_create_worker+0x3c/0x150 kthread_create_worker+0x64/0x8c workqueue_init+0x30/0x3c4 kernel_init_freeable+0x11c/0x50c kernel_init+0x20/0x1d8 ret_from_fork+0x10/0x20 INITIAL USE at: lock_acquire+0x200/0x340 fs_reclaim_acquire+0xdc/0xfc __kmalloc_cache_noprof+0x54/0x288 __kthread_create_worker+0x3c/0x150 kthread_create_worker+0x64/0x8c workqueue_init+0x30/0x3c4 kernel_init_freeable+0x11c/0x50c kernel_init+0x20/0x1d8 ret_from_fork+0x10/0x20 } ... key at: [<ffff800083097970>] __fs_reclaim_map+0x0/0x28 ... acquired at: fs_reclaim_acquire+0xdc/0xfc __kmalloc_cache_noprof+0x54/0x288 regcache_maple_init+0x2c/0x110 regcache_init+0x1ec/0x490 __regmap_init+0x8e0/0xd50 __devm_regmap_init+0x78/0xc8 __devm_regmap_init_mmio_clk+0x9c/0xc4 vop2_bind+0xf4/0xb10 [rockchipdrm] component_bind_all+0x118/0x24c rockchip_drm_bind+0xb0/0x1fc [rockchipdrm] try_to_bring_up_aggregate_device+0x168/0x1d4 component_master_add_with_match+0xb4/0xfc rockchip_drm_platform_probe+0x154/0x284 [rockchipdrm] platform_probe+0x68/0xdc really_probe+0xbc/0x298 __driver_probe_device+0x78/0x12c driver_probe_device+0x40/0x164 __driver_attach+0x9c/0x1ac bus_for_each_dev+0x74/0xd4 driver_attach+0x24/0x30 bus_add_driver+0xe4/0x208 driver_register+0x60/0x128 __platform_driver_register+0x28/0x34 dw_hdmi_cec_transmit+0x44/0xc4 [dw_hdmi_cec] do_one_initcall+0x68/0x300 do_init_module+0x60/0x224 load_module+0x1b0c/0x1cb0 init_module_from_file+0x84/0xc4 idempotent_init_module+0x18c/0x284 __arm64_sys_finit_module+0x64/0xa0 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0xc8/0xe8 do_el0_svc+0x20/0x2c el0_svc+0x4c/0x11c el0t_64_sync_handler+0x13c/0x158 el0t_64_sync+0x190/0x194 -> (rockchip_drm_vop2:3114:(&vop2_regmap_config)->lock){-...}-{2:2} { IN-HARDIRQ-W at: lock_acquire+0x200/0x340 _raw_spin_lock_irqsave+0x60/0x88 regmap_lock_spinlock+0x18/0x2c regmap_read+0x3c/0x78 vop2_isr+0x84/0x2a8 [rockchipdrm] __handle_irq_event_percpu+0x9c/0x304 handle_irq_event+0x4c/0xa8 handle_fasteoi_irq+0xa4/0x1cc generic_handle_domain_irq+0x2c/0x44 gic_handle_irq+0x4c/0x110 call_on_irq_stack+0x24/0x4c do_interrupt_handler+0x80/0x84 el1_interrupt+0x34/0x64 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x64/0x68 default_idle_call+0xa8/0x1e8 do_idle+0x220/0x284 cpu_startup_entry+0x34/0x3c rest_init+0xf4/0x184 start_kernel+0x680/0x78c __primary_switched+0x80/0x88 INITIAL USE at: lock_acquire+0x200/0x340 _raw_spin_lock_irqsave+0x60/0x88 regmap_lock_spinlock+0x18/0x2c regcache_init+0x1dc/0x490 __regmap_init+0x8e0/0xd50 __devm_regmap_init+0x78/0xc8 __devm_regmap_init_mmio_clk+0x9c/0xc4 vop2_bind+0xf4/0xb10 [rockchipdrm] component_bind_all+0x118/0x24c rockchip_drm_bind+0xb0/0x1fc [rockchipdrm] try_to_bring_up_aggregate_device+0x168/0x1d4 component_master_add_with_match+0xb4/0xfc rockchip_drm_platform_probe+0x154/0x284 [rockchipdrm] platform_probe+0x68/0xdc really_probe+0xbc/0x298 __driver_probe_device+0x78/0x12c driver_probe_device+0x40/0x164 __driver_attach+0x9c/0x1ac bus_for_each_dev+0x74/0xd4 driver_attach+0x24/0x30 bus_add_driver+0xe4/0x208 driver_register+0x60/0x128 __platform_driver_register+0x28/0x34 dw_hdmi_cec_transmit+0x44/0xc4 [dw_hdmi_cec] do_one_initcall+0x68/0x300 do_init_module+0x60/0x224 load_module+0x1b0c/0x1cb0 init_module_from_file+0x84/0xc4 idempotent_init_module+0x18c/0x284 __arm64_sys_finit_module+0x64/0xa0 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0xc8/0xe8 do_el0_svc+0x20/0x2c el0_svc+0x4c/0x11c el0t_64_sync_handler+0x13c/0x158 el0t_64_sync+0x190/0x194 } ... key at: [<ffff80007c3e3a58>] _key.6+0x0/0xffffffffffffd5a8 [rockchipdrm] ... acquired at: __lock_acquire+0xb10/0x21a0 lock_acquire+0x200/0x340 _raw_spin_lock_irqsave+0x60/0x88 regmap_lock_spinlock+0x18/0x2c regmap_read+0x3c/0x78 vop2_isr+0x84/0x2a8 [rockchipdrm] __handle_irq_event_percpu+0x9c/0x304 handle_irq_event+0x4c/0xa8 handle_fasteoi_irq+0xa4/0x1cc generic_handle_domain_irq+0x2c/0x44 gic_handle_irq+0x4c/0x110 call_on_irq_stack+0x24/0x4c do_interrupt_handler+0x80/0x84 el1_interrupt+0x34/0x64 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x64/0x68 default_idle_call+0xa8/0x1e8 do_idle+0x220/0x284 cpu_startup_entry+0x34/0x3c rest_init+0xf4/0x184 start_kernel+0x680/0x78c __primary_switched+0x80/0x88 stack backtrace: CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.11.0-rc3+ #15305 Tainted: [W]=WARN Hardware name: Hardkernel ODROID-M1 (DT) Call trace: dump_backtrace+0x94/0xec show_stack+0x18/0x24 dump_stack_lvl+0x90/0xd0 dump_stack+0x18/0x24 print_irq_inversion_bug.part.0+0x1ec/0x27c mark_lock+0x63c/0x954 __lock_acquire+0xb10/0x21a0 lock_acquire+0x200/0x340 _raw_spin_lock_irqsave+0x60/0x88 regmap_lock_spinlock+0x18/0x2c regmap_read+0x3c/0x78 vop2_isr+0x84/0x2a8 [rockchipdrm] __handle_irq_event_percpu+0x9c/0x304 handle_irq_event+0x4c/0xa8 handle_fasteoi_irq+0xa4/0x1cc generic_handle_domain_irq+0x2c/0x44 gic_handle_irq+0x4c/0x110 call_on_irq_stack+0x24/0x4c do_interrupt_handler+0x80/0x84 el1_interrupt+0x34/0x64 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x64/0x68 default_idle_call+0xa8/0x1e8 do_idle+0x220/0x284 cpu_startup_entry+0x34/0x3c rest_init+0xf4/0x184 start_kernel+0x680/0x78c __primary_switched+0x80/0x88 Console: switching to colour frame buffer device 240x67 rockchip-drm display-subsystem: [drm] fb0: rockchipdrmfb frame buffer device Both issues can be fixed by the following patch, but I'm not sure if this is not an overuse of the GFP_ATOMIC just for the initialization phase: diff --git a/drivers/base/regmap/regcache-flat.c b/drivers/base/regmap/regcache-flat.c index 9b17c77dec9d..8e8cf328bffb 100644 --- a/drivers/base/regmap/regcache-flat.c +++ b/drivers/base/regmap/regcache-flat.c @@ -27,7 +27,8 @@static int regcache_flat_init(struct regmap *map) return -EINVAL; map->cache = kcalloc(regcache_flat_get_index(map, map->max_register) - + 1, sizeof(unsigned int), GFP_KERNEL); + + 1, sizeof(unsigned int), + map->can_sleep ? GFP_KERNEL : GFP_ATOMIC); if (!map->cache) return -ENOMEM; diff --git a/drivers/base/regmap/regcache-maple.c b/drivers/base/regmap/regcache-maple.c index 2dea9d259c49..b95130127bdc 100644 --- a/drivers/base/regmap/regcache-maple.c +++ b/drivers/base/regmap/regcache-maple.c @@ -348,7 +348,7 @@static int regcache_maple_init(struct regmap *map) int ret; int range_start; - mt = kmalloc(sizeof(*mt), GFP_KERNEL); +mt = kmalloc(sizeof(*mt), map->can_sleep ? GFP_KERNEL : GFP_ATOMIC); if (!mt) return -ENOMEM; map->cache = mt; diff --git a/drivers/base/regmap/regcache-rbtree.c b/drivers/base/regmap/regcache-rbtree.c index 3db88bbcae0f..c53c38a965d5 100644 --- a/drivers/base/regmap/regcache-rbtree.c +++ b/drivers/base/regmap/regcache-rbtree.c @@ -187,7 +187,8 @@static int regcache_rbtree_init(struct regmap *map) int i; int ret; - map->cache = kmalloc(sizeof *rbtree_ctx, GFP_KERNEL); +map->cache = kmalloc(sizeof *rbtree_ctx, + map->can_sleep ? GFP_KERNEL : GFP_ATOMIC); if (!map->cache) return -ENOMEM; Let me know if such approach is fine, then I will submit a proper patch in a few minutes. > --- > drivers/base/regmap/regcache.c | 4 ++++ > drivers/base/regmap/regmap.c | 1 + > 2 files changed, 5 insertions(+) > > diff --git a/drivers/base/regmap/regcache.c b/drivers/base/regmap/regcache.c > index 7ec1ec605335..d3659ba3cc11 100644 > --- a/drivers/base/regmap/regcache.c > +++ b/drivers/base/regmap/regcache.c > @@ -195,7 +195,9 @@ int regcache_init(struct regmap *map, const struct regmap_config *config) > if (map->cache_ops->init) { > dev_dbg(map->dev, "Initializing %s cache\n", > map->cache_ops->name); > + map->lock(map->lock_arg); > ret = map->cache_ops->init(map); > + map->unlock(map->lock_arg); > if (ret) > goto err_free; > } > @@ -223,7 +225,9 @@ void regcache_exit(struct regmap *map) > if (map->cache_ops->exit) { > dev_dbg(map->dev, "Destroying %s cache\n", > map->cache_ops->name); > + map->lock(map->lock_arg); > map->cache_ops->exit(map); > + map->unlock(map->lock_arg); > } > } > > diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c > index bfc6bc1eb3a4..9ed842d17642 100644 > --- a/drivers/base/regmap/regmap.c > +++ b/drivers/base/regmap/regmap.c > @@ -1445,6 +1445,7 @@ void regmap_exit(struct regmap *map) > struct regmap_async *async; > > regcache_exit(map); > + > regmap_debugfs_exit(map); > regmap_range_exit(map); > if (map->bus && map->bus->free_context) > Best regards
On Wed, Aug 28, 2024 at 12:02:38PM +0200, Marek Szyprowski wrote: > On 22.08.2024 21:13, Mark Brown wrote: > BUG: sleeping function called from invalid context at > include/linux/sched/mm.h:337 > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 157, name: > systemd-udevd Please, put any actual content in your mail *before* huge log spams, especially absurdly big things like lockdep. It makes it very hard to find anything you've written, I very nearly didn't here and it'd be a lot of effort for me to wade through all that text to find it again to reply to the mail. Your patch looks basically fine but should use alloc_flags.
diff --git a/drivers/base/regmap/regcache.c b/drivers/base/regmap/regcache.c index 7ec1ec605335..d3659ba3cc11 100644 --- a/drivers/base/regmap/regcache.c +++ b/drivers/base/regmap/regcache.c @@ -195,7 +195,9 @@ int regcache_init(struct regmap *map, const struct regmap_config *config) if (map->cache_ops->init) { dev_dbg(map->dev, "Initializing %s cache\n", map->cache_ops->name); + map->lock(map->lock_arg); ret = map->cache_ops->init(map); + map->unlock(map->lock_arg); if (ret) goto err_free; } @@ -223,7 +225,9 @@ void regcache_exit(struct regmap *map) if (map->cache_ops->exit) { dev_dbg(map->dev, "Destroying %s cache\n", map->cache_ops->name); + map->lock(map->lock_arg); map->cache_ops->exit(map); + map->unlock(map->lock_arg); } } diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c index bfc6bc1eb3a4..9ed842d17642 100644 --- a/drivers/base/regmap/regmap.c +++ b/drivers/base/regmap/regmap.c @@ -1445,6 +1445,7 @@ void regmap_exit(struct regmap *map) struct regmap_async *async; regcache_exit(map); + regmap_debugfs_exit(map); regmap_range_exit(map); if (map->bus && map->bus->free_context)
For the benefit of the maple tree's lockdep checking hold the lock while creating and exiting the cache. Signed-off-by: Mark Brown <broonie@kernel.org> --- drivers/base/regmap/regcache.c | 4 ++++ drivers/base/regmap/regmap.c | 1 + 2 files changed, 5 insertions(+)