Message ID | 20240102034335.34842-1-lishifeng@sangfor.com.cn (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | RDMA/device: Fix a race between mad_client and cm_client init | expand |
On Mon, Jan 01, 2024 at 07:43:35PM -0800, Shifeng Li wrote: > The mad_client will be initialized in enable_device_and_get(), while the > devices_rwsem will be downgraded to a read semaphore. There is a window > that leads to the failed initialization for cm_client, since it can not > get matched mad port from ib_mad_port_list, and the matched mad port will > be added to the list after that. > > mad_client | cm_client > ------------------|-------------------------------------------------------- > ib_register_device| > enable_device_and_get > down_write(&devices_rwsem) > xa_set_mark(&devices, DEVICE_REGISTERED) > downgrade_write(&devices_rwsem) > | > |ib_cm_init > |ib_register_client(&cm_client) > |down_read(&devices_rwsem) > |xa_for_each_marked (&devices, DEVICE_REGISTERED) > |add_client_context > |cm_add_one > |ib_register_mad_agent > |ib_get_mad_port > |__ib_get_mad_port > |list_for_each_entry(entry, &ib_mad_port_list, port_list) > |return NULL > |up_read(&devices_rwsem) > | > add_client_context| > ib_mad_init_device| > ib_mad_port_open | > list_add_tail(&port_priv->port_list, &ib_mad_port_list) > up_read(&devices_rwsem) > | How is this stack possible? ib_register_device() is called by drivers and happens much later than ib_cm_init(). Thanks > > Fix it by using the devices_rwsem write semaphore to protect the mad_client > init flow in enable_device_and_get(). > > Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration") > Cc: Shifeng Li <lishifeng1992@126.com> > Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> > --- > drivers/infiniband/core/device.c | 8 +------- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c > index 67bcea7a153c..85782786993d 100644 > --- a/drivers/infiniband/core/device.c > +++ b/drivers/infiniband/core/device.c > @@ -1315,12 +1315,6 @@ static int enable_device_and_get(struct ib_device *device) > down_write(&devices_rwsem); > xa_set_mark(&devices, device->index, DEVICE_REGISTERED); > > - /* > - * By using downgrade_write() we ensure that no other thread can clear > - * DEVICE_REGISTERED while we are completing the client setup. > - */ > - downgrade_write(&devices_rwsem); > - > if (device->ops.enable_driver) { > ret = device->ops.enable_driver(device); > if (ret) > @@ -1337,7 +1331,7 @@ static int enable_device_and_get(struct ib_device *device) > if (!ret) > ret = add_compat_devs(device); > out: > - up_read(&devices_rwsem); > + up_write(&devices_rwsem); > return ret; > } > > -- > 2.25.1 > >
On 2024/1/2 16:58, Leon Romanovsky wrote: > On Mon, Jan 01, 2024 at 07:43:35PM -0800, Shifeng Li wrote: >> The mad_client will be initialized in enable_device_and_get(), while the >> devices_rwsem will be downgraded to a read semaphore. There is a window >> that leads to the failed initialization for cm_client, since it can not >> get matched mad port from ib_mad_port_list, and the matched mad port will >> be added to the list after that. >> >> mad_client | cm_client >> ------------------|-------------------------------------------------------- >> ib_register_device| >> enable_device_and_get >> down_write(&devices_rwsem) >> xa_set_mark(&devices, DEVICE_REGISTERED) >> downgrade_write(&devices_rwsem) >> | >> |ib_cm_init >> |ib_register_client(&cm_client) >> |down_read(&devices_rwsem) >> |xa_for_each_marked (&devices, DEVICE_REGISTERED) >> |add_client_context >> |cm_add_one >> |ib_register_mad_agent >> |ib_get_mad_port >> |__ib_get_mad_port >> |list_for_each_entry(entry, &ib_mad_port_list, port_list) >> |return NULL >> |up_read(&devices_rwsem) >> | >> add_client_context| >> ib_mad_init_device| >> ib_mad_port_open | >> list_add_tail(&port_priv->port_list, &ib_mad_port_list) >> up_read(&devices_rwsem) >> | > > How is this stack possible? > > ib_register_device() is called by drivers and happens much later than ib_cm_init(). > I've caught the stack and err log as follows, ib_mad_init_device() called after cm_add_one(). [ 98.281786] CPU: 18 PID: 30079 Comm: modprobe Kdump: loaded [ 98.281787] Call Trace: [ 98.281790] dump_stack+0x71/0xab [ 98.281805] ib_register_mad_agent+0x1c6/0x27f0 [ib_core] [ 98.281840] cm_add_one+0x4b0/0x9d0 [ib_cm] [ 98.281865] add_client_context+0x2b9/0x380 [ib_core] [ 98.281890] ib_register_client+0x22a/0x2a0 [ib_core] [ 98.281908] __init_backport+0x12f/0x1000 [ib_cm] [ 98.281912] do_one_initcall+0x87/0x2e2 [ 98.281926] do_init_module+0x1c3/0x5f7 [ 98.281928] load_module+0x4fe0/0x68d0 [ 98.281945] __do_sys_finit_module+0x14d/0x180 [ 98.281952] do_syscall_64+0xa0/0x370 [ 98.281957] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 98.281958] RIP: 0033:0x7f51d3a4373d [ 98.281961] RSP: 002b:00007ffceb925938 EFLAGS: 00000202 ORIG_RAX: 0000000000000139 [ 98.281964] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f51d3a4373d [ 98.281965] RDX: 0000000000000000 RSI: 0000000001636b70 RDI: 0000000000000003 [ 98.281966] RBP: 00007ffceb925950 R08: 0000000000000000 R09: 0000000001636b70 [ 98.281967] R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000402410 [ 98.281968] R13: 00007ffceb926e60 R14: 0000000000000000 R15: 0000000000000000 [ 98.281977] infiniband mlx5_1: ib_register_mad_agent: Invalid port 1 [ 98.349040] CPU: 38 PID: 29896 Comm: modprobe Kdump: loaded [ 98.349043] Call Trace: [ 98.349053] dump_stack+0x71/0xab [ 98.349076] ib_register_mad_agent+0x1c6/0x27f0 [ib_core] [ 98.349149] ib_agent_port_open+0xe2/0x2d0 [ib_core] [ 98.349164] ib_mad_init_device+0x818/0x1d70 [ib_core] [ 98.349197] add_client_context+0x2b9/0x380 [ib_core] [ 98.349221] enable_device_and_get+0x1ab/0x340 [ib_core] [ 98.349292] ib_register_device+0xcbf/0xfd0 [ib_core] [ 98.349355] __mlx5_ib_add+0x44/0xf0 [mlx5_ib] [ 98.349404] mlx5_add_device+0xc3/0x280 [mlx5_core] [ 98.349434] mlx5_register_interface+0x109/0x190 [mlx5_core] [ 98.349442] do_one_initcall+0x87/0x2e2 [ 98.349460] do_init_module+0x1c3/0x5f7 [ 98.349462] load_module+0x4fe0/0x68d0 [ 98.349481] __do_sys_init_module+0x1f9/0x220 [ 98.349486] do_syscall_64+0xa0/0x370 [ 98.349492] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 98.349494] RIP: 0033:0x7fbc327faa7a [ 98.349500] RSP: 002b:00007fff890df7f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000af [ 98.349502] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbc327faa7a [ 98.349503] RDX: 00000000004250c0 RSI: 00000000001b0230 RDI: 00007fbc31919010 [ 98.349505] RBP: 00007fff890df860 R08: 00000000025045b0 R09: 0000000000000000 [ 98.349506] R10: 00000000001b0230 R11: 0000000000000202 R12: 0000000000402410 [ 98.349507] R13: 00007fff890e0cf0 R14: 0000000000000000 R15: 0000000000000000 Thanks > Thanks > >> >> Fix it by using the devices_rwsem write semaphore to protect the mad_client >> init flow in enable_device_and_get(). >> >> Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration") >> Cc: Shifeng Li <lishifeng1992@126.com> >> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> >> --- >> drivers/infiniband/core/device.c | 8 +------- >> 1 file changed, 1 insertion(+), 7 deletions(-) >> >> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c >> index 67bcea7a153c..85782786993d 100644 >> --- a/drivers/infiniband/core/device.c >> +++ b/drivers/infiniband/core/device.c >> @@ -1315,12 +1315,6 @@ static int enable_device_and_get(struct ib_device *device) >> down_write(&devices_rwsem); >> xa_set_mark(&devices, device->index, DEVICE_REGISTERED); >> >> - /* >> - * By using downgrade_write() we ensure that no other thread can clear >> - * DEVICE_REGISTERED while we are completing the client setup. >> - */ >> - downgrade_write(&devices_rwsem); >> - >> if (device->ops.enable_driver) { >> ret = device->ops.enable_driver(device); >> if (ret) >> @@ -1337,7 +1331,7 @@ static int enable_device_and_get(struct ib_device *device) >> if (!ret) >> ret = add_compat_devs(device); >> out: >> - up_read(&devices_rwsem); >> + up_write(&devices_rwsem); >> return ret; >> } >> >> -- >> 2.25.1 >> >> >
On Tue, Jan 02, 2024 at 07:33:41PM +0800, Shifeng Li wrote: > On 2024/1/2 16:58, Leon Romanovsky wrote: > > On Mon, Jan 01, 2024 at 07:43:35PM -0800, Shifeng Li wrote: > > > The mad_client will be initialized in enable_device_and_get(), while the > > > devices_rwsem will be downgraded to a read semaphore. There is a window > > > that leads to the failed initialization for cm_client, since it can not > > > get matched mad port from ib_mad_port_list, and the matched mad port will > > > be added to the list after that. > > > > > > mad_client | cm_client > > > ------------------|-------------------------------------------------------- > > > ib_register_device| > > > enable_device_and_get > > > down_write(&devices_rwsem) > > > xa_set_mark(&devices, DEVICE_REGISTERED) > > > downgrade_write(&devices_rwsem) > > > | > > > |ib_cm_init > > > |ib_register_client(&cm_client) > > > |down_read(&devices_rwsem) > > > |xa_for_each_marked (&devices, DEVICE_REGISTERED) > > > |add_client_context > > > |cm_add_one > > > |ib_register_mad_agent > > > |ib_get_mad_port > > > |__ib_get_mad_port > > > |list_for_each_entry(entry, &ib_mad_port_list, port_list) > > > |return NULL > > > |up_read(&devices_rwsem) > > > | > > > add_client_context| > > > ib_mad_init_device| > > > ib_mad_port_open | > > > list_add_tail(&port_priv->port_list, &ib_mad_port_list) > > > up_read(&devices_rwsem) > > > | > > > > How is this stack possible? > > > > ib_register_device() is called by drivers and happens much later than ib_cm_init(). > > > > I've caught the stack and err log as follows, ib_mad_init_device() called after ib_mad_init_device != ib_register_device You mentioned ib_register_device() in your callstack, while you caught ib_mad_init_device(). Thanks > cm_add_one(). > > [ 98.281786] CPU: 18 PID: 30079 Comm: modprobe Kdump: loaded > [ 98.281787] Call Trace: > [ 98.281790] dump_stack+0x71/0xab > [ 98.281805] ib_register_mad_agent+0x1c6/0x27f0 [ib_core] > [ 98.281840] cm_add_one+0x4b0/0x9d0 [ib_cm] > [ 98.281865] add_client_context+0x2b9/0x380 [ib_core] > [ 98.281890] ib_register_client+0x22a/0x2a0 [ib_core] > [ 98.281908] __init_backport+0x12f/0x1000 [ib_cm] > [ 98.281912] do_one_initcall+0x87/0x2e2 > [ 98.281926] do_init_module+0x1c3/0x5f7 > [ 98.281928] load_module+0x4fe0/0x68d0 > [ 98.281945] __do_sys_finit_module+0x14d/0x180 > [ 98.281952] do_syscall_64+0xa0/0x370 > [ 98.281957] entry_SYSCALL_64_after_hwframe+0x65/0xca > [ 98.281958] RIP: 0033:0x7f51d3a4373d > [ 98.281961] RSP: 002b:00007ffceb925938 EFLAGS: 00000202 ORIG_RAX: 0000000000000139 > [ 98.281964] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f51d3a4373d > [ 98.281965] RDX: 0000000000000000 RSI: 0000000001636b70 RDI: 0000000000000003 > [ 98.281966] RBP: 00007ffceb925950 R08: 0000000000000000 R09: 0000000001636b70 > [ 98.281967] R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000402410 > [ 98.281968] R13: 00007ffceb926e60 R14: 0000000000000000 R15: 0000000000000000 > > [ 98.281977] infiniband mlx5_1: ib_register_mad_agent: Invalid port 1 > > [ 98.349040] CPU: 38 PID: 29896 Comm: modprobe Kdump: loaded > [ 98.349043] Call Trace: > [ 98.349053] dump_stack+0x71/0xab > [ 98.349076] ib_register_mad_agent+0x1c6/0x27f0 [ib_core] > [ 98.349149] ib_agent_port_open+0xe2/0x2d0 [ib_core] > [ 98.349164] ib_mad_init_device+0x818/0x1d70 [ib_core] > [ 98.349197] add_client_context+0x2b9/0x380 [ib_core] > [ 98.349221] enable_device_and_get+0x1ab/0x340 [ib_core] > [ 98.349292] ib_register_device+0xcbf/0xfd0 [ib_core] > [ 98.349355] __mlx5_ib_add+0x44/0xf0 [mlx5_ib] > [ 98.349404] mlx5_add_device+0xc3/0x280 [mlx5_core] > [ 98.349434] mlx5_register_interface+0x109/0x190 [mlx5_core] > [ 98.349442] do_one_initcall+0x87/0x2e2 > [ 98.349460] do_init_module+0x1c3/0x5f7 > [ 98.349462] load_module+0x4fe0/0x68d0 > [ 98.349481] __do_sys_init_module+0x1f9/0x220 > [ 98.349486] do_syscall_64+0xa0/0x370 > [ 98.349492] entry_SYSCALL_64_after_hwframe+0x65/0xca > [ 98.349494] RIP: 0033:0x7fbc327faa7a > [ 98.349500] RSP: 002b:00007fff890df7f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000af > [ 98.349502] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbc327faa7a > [ 98.349503] RDX: 00000000004250c0 RSI: 00000000001b0230 RDI: 00007fbc31919010 > [ 98.349505] RBP: 00007fff890df860 R08: 00000000025045b0 R09: 0000000000000000 > [ 98.349506] R10: 00000000001b0230 R11: 0000000000000202 R12: 0000000000402410 > [ 98.349507] R13: 00007fff890e0cf0 R14: 0000000000000000 R15: 0000000000000000 > > Thanks > > > Thanks > > > > > > > > Fix it by using the devices_rwsem write semaphore to protect the mad_client > > > init flow in enable_device_and_get(). > > > > > > Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration") > > > Cc: Shifeng Li <lishifeng1992@126.com> > > > Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> > > > --- > > > drivers/infiniband/core/device.c | 8 +------- > > > 1 file changed, 1 insertion(+), 7 deletions(-) > > > > > > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c > > > index 67bcea7a153c..85782786993d 100644 > > > --- a/drivers/infiniband/core/device.c > > > +++ b/drivers/infiniband/core/device.c > > > @@ -1315,12 +1315,6 @@ static int enable_device_and_get(struct ib_device *device) > > > down_write(&devices_rwsem); > > > xa_set_mark(&devices, device->index, DEVICE_REGISTERED); > > > - /* > > > - * By using downgrade_write() we ensure that no other thread can clear > > > - * DEVICE_REGISTERED while we are completing the client setup. > > > - */ > > > - downgrade_write(&devices_rwsem); > > > - > > > if (device->ops.enable_driver) { > > > ret = device->ops.enable_driver(device); > > > if (ret) > > > @@ -1337,7 +1331,7 @@ static int enable_device_and_get(struct ib_device *device) > > > if (!ret) > > > ret = add_compat_devs(device); > > > out: > > > - up_read(&devices_rwsem); > > > + up_write(&devices_rwsem); > > > return ret; > > > } > > > -- > > > 2.25.1 > > > > > > > > > >
On 2024/1/2 19:58, Leon Romanovsky wrote: > On Tue, Jan 02, 2024 at 07:33:41PM +0800, Shifeng Li wrote: >> On 2024/1/2 16:58, Leon Romanovsky wrote: >>> On Mon, Jan 01, 2024 at 07:43:35PM -0800, Shifeng Li wrote: >>>> The mad_client will be initialized in enable_device_and_get(), while the >>>> devices_rwsem will be downgraded to a read semaphore. There is a window >>>> that leads to the failed initialization for cm_client, since it can not >>>> get matched mad port from ib_mad_port_list, and the matched mad port will >>>> be added to the list after that. >>>> >>>> mad_client | cm_client >>>> ------------------|-------------------------------------------------------- >>>> ib_register_device| >>>> enable_device_and_get >>>> down_write(&devices_rwsem) >>>> xa_set_mark(&devices, DEVICE_REGISTERED) >>>> downgrade_write(&devices_rwsem) >>>> | >>>> |ib_cm_init >>>> |ib_register_client(&cm_client) >>>> |down_read(&devices_rwsem) >>>> |xa_for_each_marked (&devices, DEVICE_REGISTERED) >>>> |add_client_context >>>> |cm_add_one >>>> |ib_register_mad_agent >>>> |ib_get_mad_port >>>> |__ib_get_mad_port >>>> |list_for_each_entry(entry, &ib_mad_port_list, port_list) >>>> |return NULL >>>> |up_read(&devices_rwsem) >>>> | >>>> add_client_context| >>>> ib_mad_init_device| >>>> ib_mad_port_open | >>>> list_add_tail(&port_priv->port_list, &ib_mad_port_list) >>>> up_read(&devices_rwsem) >>>> | >>> >>> How is this stack possible? >>> >>> ib_register_device() is called by drivers and happens much later than ib_cm_init(). >>> >> >> I've caught the stack and err log as follows, ib_mad_init_device() called after > > > ib_mad_init_device != ib_register_device > > You mentioned ib_register_device() in your callstack, while you caught ib_mad_init_device(). > ib_register_device() is called by mlx5_ib driver, and then ib_register_device() calls ib_mad_init_device(). Thanks > Thanks > >> cm_add_one(). >> >> [ 98.281786] CPU: 18 PID: 30079 Comm: modprobe Kdump: loaded >> [ 98.281787] Call Trace: >> [ 98.281790] dump_stack+0x71/0xab >> [ 98.281805] ib_register_mad_agent+0x1c6/0x27f0 [ib_core] >> [ 98.281840] cm_add_one+0x4b0/0x9d0 [ib_cm] >> [ 98.281865] add_client_context+0x2b9/0x380 [ib_core] >> [ 98.281890] ib_register_client+0x22a/0x2a0 [ib_core] >> [ 98.281908] __init_backport+0x12f/0x1000 [ib_cm] >> [ 98.281912] do_one_initcall+0x87/0x2e2 >> [ 98.281926] do_init_module+0x1c3/0x5f7 >> [ 98.281928] load_module+0x4fe0/0x68d0 >> [ 98.281945] __do_sys_finit_module+0x14d/0x180 >> [ 98.281952] do_syscall_64+0xa0/0x370 >> [ 98.281957] entry_SYSCALL_64_after_hwframe+0x65/0xca >> [ 98.281958] RIP: 0033:0x7f51d3a4373d >> [ 98.281961] RSP: 002b:00007ffceb925938 EFLAGS: 00000202 ORIG_RAX: 0000000000000139 >> [ 98.281964] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f51d3a4373d >> [ 98.281965] RDX: 0000000000000000 RSI: 0000000001636b70 RDI: 0000000000000003 >> [ 98.281966] RBP: 00007ffceb925950 R08: 0000000000000000 R09: 0000000001636b70 >> [ 98.281967] R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000402410 >> [ 98.281968] R13: 00007ffceb926e60 R14: 0000000000000000 R15: 0000000000000000 >> >> [ 98.281977] infiniband mlx5_1: ib_register_mad_agent: Invalid port 1 >> >> [ 98.349040] CPU: 38 PID: 29896 Comm: modprobe Kdump: loaded >> [ 98.349043] Call Trace: >> [ 98.349053] dump_stack+0x71/0xab >> [ 98.349076] ib_register_mad_agent+0x1c6/0x27f0 [ib_core] >> [ 98.349149] ib_agent_port_open+0xe2/0x2d0 [ib_core] >> [ 98.349164] ib_mad_init_device+0x818/0x1d70 [ib_core] >> [ 98.349197] add_client_context+0x2b9/0x380 [ib_core] >> [ 98.349221] enable_device_and_get+0x1ab/0x340 [ib_core] >> [ 98.349292] ib_register_device+0xcbf/0xfd0 [ib_core] >> [ 98.349355] __mlx5_ib_add+0x44/0xf0 [mlx5_ib] >> [ 98.349404] mlx5_add_device+0xc3/0x280 [mlx5_core] >> [ 98.349434] mlx5_register_interface+0x109/0x190 [mlx5_core] >> [ 98.349442] do_one_initcall+0x87/0x2e2 >> [ 98.349460] do_init_module+0x1c3/0x5f7 >> [ 98.349462] load_module+0x4fe0/0x68d0 >> [ 98.349481] __do_sys_init_module+0x1f9/0x220 >> [ 98.349486] do_syscall_64+0xa0/0x370 >> [ 98.349492] entry_SYSCALL_64_after_hwframe+0x65/0xca >> [ 98.349494] RIP: 0033:0x7fbc327faa7a >> [ 98.349500] RSP: 002b:00007fff890df7f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000af >> [ 98.349502] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbc327faa7a >> [ 98.349503] RDX: 00000000004250c0 RSI: 00000000001b0230 RDI: 00007fbc31919010 >> [ 98.349505] RBP: 00007fff890df860 R08: 00000000025045b0 R09: 0000000000000000 >> [ 98.349506] R10: 00000000001b0230 R11: 0000000000000202 R12: 0000000000402410 >> [ 98.349507] R13: 00007fff890e0cf0 R14: 0000000000000000 R15: 0000000000000000 >> >> Thanks >> >>> Thanks >>> >>>> >>>> Fix it by using the devices_rwsem write semaphore to protect the mad_client >>>> init flow in enable_device_and_get(). >>>> >>>> Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration") >>>> Cc: Shifeng Li <lishifeng1992@126.com> >>>> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> >>>> --- >>>> drivers/infiniband/core/device.c | 8 +------- >>>> 1 file changed, 1 insertion(+), 7 deletions(-) >>>> >>>> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c >>>> index 67bcea7a153c..85782786993d 100644 >>>> --- a/drivers/infiniband/core/device.c >>>> +++ b/drivers/infiniband/core/device.c >>>> @@ -1315,12 +1315,6 @@ static int enable_device_and_get(struct ib_device *device) >>>> down_write(&devices_rwsem); >>>> xa_set_mark(&devices, device->index, DEVICE_REGISTERED); >>>> - /* >>>> - * By using downgrade_write() we ensure that no other thread can clear >>>> - * DEVICE_REGISTERED while we are completing the client setup. >>>> - */ >>>> - downgrade_write(&devices_rwsem); >>>> - >>>> if (device->ops.enable_driver) { >>>> ret = device->ops.enable_driver(device); >>>> if (ret) >>>> @@ -1337,7 +1331,7 @@ static int enable_device_and_get(struct ib_device *device) >>>> if (!ret) >>>> ret = add_compat_devs(device); >>>> out: >>>> - up_read(&devices_rwsem); >>>> + up_write(&devices_rwsem); >>>> return ret; >>>> } >>>> -- >>>> 2.25.1 >>>> >>>> >>> >> >> >
On Mon, Jan 01, 2024 at 07:43:35PM -0800, Shifeng Li wrote: > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c > index 67bcea7a153c..85782786993d 100644 > --- a/drivers/infiniband/core/device.c > +++ b/drivers/infiniband/core/device.c > @@ -1315,12 +1315,6 @@ static int enable_device_and_get(struct ib_device *device) > down_write(&devices_rwsem); > xa_set_mark(&devices, device->index, DEVICE_REGISTERED); > > - /* > - * By using downgrade_write() we ensure that no other thread can clear > - * DEVICE_REGISTERED while we are completing the client setup. > - */ > - downgrade_write(&devices_rwsem); > - > if (device->ops.enable_driver) { > ret = device->ops.enable_driver(device); > if (ret) > @@ -1337,7 +1331,7 @@ static int enable_device_and_get(struct ib_device *device) > if (!ret) > ret = add_compat_devs(device); > out: > - up_read(&devices_rwsem); > + up_write(&devices_rwsem); > return ret; > } I don't think messing with the devices_rwsem here is a great idea, it would be better to address this on the clients_rwsem side like: diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 67bcea7a153c6a..b956c9f8e62d34 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -1730,7 +1730,7 @@ static int assign_client_id(struct ib_client *client) { int ret; - down_write(&clients_rwsem); + lockdep_assert_held(&clients_rwsem); /* * The add/remove callbacks must be called in FIFO/LIFO order. To * achieve this we assign client_ids so they are sorted in @@ -1739,14 +1739,11 @@ static int assign_client_id(struct ib_client *client) client->client_id = highest_client_id; ret = xa_insert(&clients, client->client_id, client, GFP_KERNEL); if (ret) - goto out; + return ret; highest_client_id++; xa_set_mark(&clients, client->client_id, CLIENT_REGISTERED); - -out: - up_write(&clients_rwsem); - return ret; + return 0; } static void remove_client_id(struct ib_client *client) @@ -1776,25 +1773,31 @@ int ib_register_client(struct ib_client *client) { struct ib_device *device; unsigned long index; + bool need_unreg = false; int ret; refcount_set(&client->uses, 1); init_completion(&client->uses_zero); - ret = assign_client_id(client); - if (ret) - return ret; down_read(&devices_rwsem); + down_write(&clients_rwsem); + ret = assign_client_id(client); + if (ret) + goto out; + + need_unreg = true; xa_for_each_marked (&devices, index, device, DEVICE_REGISTERED) { ret = add_client_context(device, client); - if (ret) { - up_read(&devices_rwsem); - ib_unregister_client(client); - return ret; - } + if (ret) + goto out; } + ret = 0; +out: + up_write(&clients_rwsem); up_read(&devices_rwsem); - return 0; + if (need_unreg && ret) + ib_unregister_client(client); + return ret; } EXPORT_SYMBOL(ib_register_client);
On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote: > The root cause is that mad_client and cm_client may init concurrently > when devices_rwsem write semaphore is downgraded in enable_device_and_get() like: That can't be true, the module loader infrastructue ensures those two things are sequential. You are trying to say that the post-client fixup stuff will still see the DEVICE_REGISTERED before it reaches the clients_rwsem lock? That probably just says the clients_rwsem should be obtained before changing the DEVICE_STATE too :\ Jason
On 2024/1/4 20:37, Jason Gunthorpe wrote: > On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote: > >> The root cause is that mad_client and cm_client may init concurrently >> when devices_rwsem write semaphore is downgraded in enable_device_and_get() like: > > That can't be true, the module loader infrastructue ensures those two > things are sequential. > I'm a bit confused how the module loader infrastructue ensures that mad_client.add() and cm_client.add() are sequential. Could you explain in more detail please? We know that the ib_cm driver and mlx5_ib driver can load concurrently. Thanks. > You are trying to say that the post-client fixup stuff will still see > the DEVICE_REGISTERED before it reaches the clients_rwsem lock? > > That probably just says the clients_rwsem should be obtained before > changing the DEVICE_STATE too :\ > > Jason >
On Fri, Jan 05, 2024 at 04:15:18PM +0800, Shifeng Li wrote: > On 2024/1/4 20:37, Jason Gunthorpe wrote: > > On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote: > > > > > The root cause is that mad_client and cm_client may init concurrently > > > when devices_rwsem write semaphore is downgraded in enable_device_and_get() like: > > > > That can't be true, the module loader infrastructue ensures those two > > things are sequential. > > > > I'm a bit confused how the module loader infrastructue ensures that mad_client.add() and > cm_client.add() are sequential. Could you explain in more detail > please? ib_cm has a symbol dependency on ib_mad, so the module loader will not allow ib_cm to start running until all its symbol dependencies have completed loading. > We know that the ib_cm driver and mlx5_ib driver can load concurrently. Yes, this is possible Jason
On 2024/1/4 20:37, Jason Gunthorpe wrote: > On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote: > >> The root cause is that mad_client and cm_client may init concurrently >> when devices_rwsem write semaphore is downgraded in enable_device_and_get() like: > > That can't be true, the module loader infrastructue ensures those two > things are sequential. > Please consider the sequence again and notice that: 1. We agree that dependencies ensure mad_client be registered before cm_client. 2. But the mad_client.add() is not invoked in ib_register_client(), since there is no DEVICE_REGISTERED device at that time. Instead, it will be delayed until the device driver init (e.g. mlx5_core) in enable_device_and_get(). 3. The ib_cm and mlx5_core can be loaded concurrently, after setting DEVICE_REGISTERED and downgrade_write(&devices_rwsem) in enable_device_and_get(), there is a chance that cm_client.add() can be invoked before mad_client.add(). T1(ib_core init) | T2(device driver init) | T3(ib_cm init) --------------------------------------------------------------------------------------------------- ib_register_client mad_client assign_client_id add clients CLIENT_REGISTERED (with clients_rwsem write) down_read(&devices_rwsem); xa_for_each_marked (&devices, DEVICE_REGISTERED) nop # no devices up_read(&devices_rwsem); ib_register_device enable_device_and_get down_write(&devices_rwsem); set DEVICE_REGISTERED downgrade_write(&devices_rwsem); ib_register_client cm_client assign_client_id add clients CLIENT_REGISTERED (with clients_rwsem write) down_read(&devices_rwsem); xa_for_each_marked (&devices, DEVICE_REGISTERED) add_client_context down_write(&device->client_data_rwsem); get CLIENT_DATA_REGISTERED downgrade_write(&device->client_data_rwsem); cm_client.add cm_add_one ib_register_mad_agent ib_get_mad_port __ib_get_mad_port return NULL! set CLIENT_DATA_REGISTERED up_read(&device->client_data_rwsem); up_read(&devices_rwsem); down_read(&clients_rwsem); xa_for_each_marked (&clients, CLIENT_REGISTERED) add_client_context [mad] mad_client.add add_client_context [cm] nop # already CLIENT_DATA_REGISTERED up_read(&clients_rwsem); up_read(&devices_rwsem); > You are trying to say that the post-client fixup stuff will still see > the DEVICE_REGISTERED before it reaches the clients_rwsem lock? > > That probably just says the clients_rwsem should be obtained before > changing the DEVICE_STATE too :\ >
On 2024/1/5 22:19, Jason Gunthorpe wrote: > On Fri, Jan 05, 2024 at 04:15:18PM +0800, Shifeng Li wrote: >> On 2024/1/4 20:37, Jason Gunthorpe wrote: >>> On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote: >>> >>>> The root cause is that mad_client and cm_client may init concurrently >>>> when devices_rwsem write semaphore is downgraded in enable_device_and_get() like: >>> >>> That can't be true, the module loader infrastructue ensures those two >>> things are sequential. >>> >> >> I'm a bit confused how the module loader infrastructue ensures that mad_client.add() and >> cm_client.add() are sequential. Could you explain in more detail >> please? > > ib_cm has a symbol dependency on ib_mad, so the module loader will not > allow ib_cm to start running until all its symbol dependencies have > completed loading. I have found a method to reproduce the issue as follow: 1. modprobe -r ib_cm; modprobe -r ib_core; modprobe -r mlx5_ib; 2. Compile and replace ib_core with following patch; 3. modprobe ib_cm; 4. modprobe mlx5_ib; diff -Narup ./drivers/infiniband/core/device.c ./drivers/infiniband/core/device.c.reproduce --- ./drivers/infiniband/core/device.c 2024-01-15 11:14:08.063094430 +0800 +++ ./drivers/infiniband/core/device.c.reproduce 2024-01-15 11:16:22.577096953 +0800 @@ -64,6 +64,8 @@ struct workqueue_struct *ib_wq; EXPORT_SYMBOL_GPL(ib_wq); static struct workqueue_struct *ib_unreg_wq; +int ib_cm_run_flag = 0; + /* * Each of the three rwsem locks (devices, clients, client_data) protects the * xarray of the same name. Specifically it allows the caller to assert that @@ -1371,6 +1373,9 @@ static int enable_device_and_get(struct */ downgrade_write(&devices_rwsem); + ib_cm_run_flag = 1; + msleep(30000); + if (device->ops.enable_driver) { ret = device->ops.enable_driver(device); if (ret) @@ -1843,6 +1848,12 @@ int ib_register_client(struct ib_client if (ret) return ret; + if (strncmp(client->name, "cm", strlen("cm")) == 0) { + while (!ib_cm_run_flag) { + cond_resched(); + } + } + down_read(&devices_rwsem); xa_for_each_marked (&devices, index, device, DEVICE_REGISTERED) { ret = add_client_context(device, client); > >> We know that the ib_cm driver and mlx5_ib driver can load concurrently. > > Yes, this is possible > > Jason >
On Sat, Jan 06, 2024 at 10:12:17AM +0800, Ding Hui wrote: > On 2024/1/4 20:37, Jason Gunthorpe wrote: > > On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote: > > > > > The root cause is that mad_client and cm_client may init concurrently > > > when devices_rwsem write semaphore is downgraded in enable_device_and_get() like: > > > > That can't be true, the module loader infrastructue ensures those two > > things are sequential. > > > > Please consider the sequence again and notice that: > > 1. We agree that dependencies ensure mad_client be registered before cm_client. > 2. But the mad_client.add() is not invoked in ib_register_client(), since > there is no DEVICE_REGISTERED device at that time. > Instead, it will be delayed until the device driver init (e.g. mlx5_core) > in enable_device_and_get(). > 3. The ib_cm and mlx5_core can be loaded concurrently, after setting DEVICE_REGISTERED > and downgrade_write(&devices_rwsem) in enable_device_and_get(), there is a chance > that cm_client.add() can be invoked before mad_client.add(). > > > T1(ib_core init) | T2(device driver init) | T3(ib_cm init) > --------------------------------------------------------------------------------------------------- > ib_register_client mad_client > assign_client_id > add clients CLIENT_REGISTERED > (with clients_rwsem write) > down_read(&devices_rwsem); > xa_for_each_marked (&devices, DEVICE_REGISTERED) > nop # no devices > up_read(&devices_rwsem); > > ib_register_device > enable_device_and_get > down_write(&devices_rwsem); > set DEVICE_REGISTERED > downgrade_write(&devices_rwsem); > ib_register_client cm_client > assign_client_id > add clients CLIENT_REGISTERED > (with clients_rwsem write) > down_read(&devices_rwsem); > xa_for_each_marked (&devices, DEVICE_REGISTERED) > add_client_context > down_write(&device->client_data_rwsem); > get CLIENT_DATA_REGISTERED > downgrade_write(&device->client_data_rwsem); > cm_client.add > cm_add_one > ib_register_mad_agent > ib_get_mad_port > __ib_get_mad_port return NULL! > set CLIENT_DATA_REGISTERED > up_read(&device->client_data_rwsem); > up_read(&devices_rwsem); > down_read(&clients_rwsem); > xa_for_each_marked (&clients, CLIENT_REGISTERED) > add_client_context [mad] > mad_client.add > add_client_context [cm] > nop # already CLIENT_DATA_REGISTERED > up_read(&clients_rwsem); > up_read(&devices_rwsem); Take the draft I sent previously and use down_write(&devices_rwsem) in ib_register_client() Jason
On 2024/1/16 10:12, Ding Hui wrote: > On 2024/1/15 21:47, Jason Gunthorpe wrote: >> On Sat, Jan 06, 2024 at 10:12:17AM +0800, Ding Hui wrote: >>> On 2024/1/4 20:37, Jason Gunthorpe wrote: >>>> On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote: >>>> >>>>> The root cause is that mad_client and cm_client may init concurrently >>>>> when devices_rwsem write semaphore is downgraded in enable_device_and_get() like: >>>> >>>> That can't be true, the module loader infrastructue ensures those two >>>> things are sequential. >>>> >>> >>> Please consider the sequence again and notice that: >>> >>> 1. We agree that dependencies ensure mad_client be registered before cm_client. >>> 2. But the mad_client.add() is not invoked in ib_register_client(), since >>> there is no DEVICE_REGISTERED device at that time. >>> Instead, it will be delayed until the device driver init (e.g. mlx5_core) >>> in enable_device_and_get(). >>> 3. The ib_cm and mlx5_core can be loaded concurrently, after setting DEVICE_REGISTERED >>> and downgrade_write(&devices_rwsem) in enable_device_and_get(), there is a chance >>> that cm_client.add() can be invoked before mad_client.add(). >>> >>> >>> T1(ib_core init) | T2(device driver init) | T3(ib_cm init) >>> --------------------------------------------------------------------------------------------------- >>> ib_register_client mad_client >>> assign_client_id >>> add clients CLIENT_REGISTERED >>> (with clients_rwsem write) >>> down_read(&devices_rwsem); >>> xa_for_each_marked (&devices, DEVICE_REGISTERED) >>> nop # no devices >>> up_read(&devices_rwsem); >>> >>> ib_register_device >>> enable_device_and_get >>> down_write(&devices_rwsem); >>> set DEVICE_REGISTERED >>> downgrade_write(&devices_rwsem); >>> ib_register_client cm_client >>> assign_client_id >>> add clients CLIENT_REGISTERED >>> (with clients_rwsem write) >>> down_read(&devices_rwsem); >>> xa_for_each_marked (&devices, DEVICE_REGISTERED) >>> add_client_context >>> down_write(&device->client_data_rwsem); >>> get CLIENT_DATA_REGISTERED >>> downgrade_write(&device->client_data_rwsem); >>> cm_client.add >>> cm_add_one >>> ib_register_mad_agent >>> ib_get_mad_port >>> __ib_get_mad_port return NULL! >>> set CLIENT_DATA_REGISTERED >>> up_read(&device->client_data_rwsem); >>> up_read(&devices_rwsem); >>> down_read(&clients_rwsem); >>> xa_for_each_marked (&clients, CLIENT_REGISTERED) >>> add_client_context [mad] >>> mad_client.add >>> add_client_context [cm] >>> nop # already CLIENT_DATA_REGISTERED >>> up_read(&clients_rwsem); >>> up_read(&devices_rwsem); >> >> Take the draft I sent previously and use down_write(&devices_rwsem) in >> ib_register_client() >> > > I believe this modification is effective, rather than expanding the clients_rwsem range, > the key point is down_write(&devices_rwsem), which prevents ib_register_client() from > being executed in the gap of ib_register_device(). > > However, this may cause a little confusion, as ib_register_client() does not modify > anything related to devices, but it is protected by a write lock. > Hi Jason, Do you have any differing opinions about above?
On Fri, Jan 26, 2024 at 10:25:01AM +0800, Shifeng Li wrote: > > I believe this modification is effective, rather than expanding the clients_rwsem range, > > the key point is down_write(&devices_rwsem), which prevents ib_register_client() from > > being executed in the gap of ib_register_device(). > > > > However, this may cause a little confusion, as ib_register_client() does not modify > > anything related to devices, but it is protected by a write lock. > > > Do you have any differing opinions about above? I'd rather see the client side have a comment explaining the confusing lock then disturb the locking on the devices side.. I think the write/read lock scheme there was done for a reason I just haven't had time to recall exactly why.. Please send a patch with this arrangement Jason
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 67bcea7a153c..85782786993d 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -1315,12 +1315,6 @@ static int enable_device_and_get(struct ib_device *device) down_write(&devices_rwsem); xa_set_mark(&devices, device->index, DEVICE_REGISTERED); - /* - * By using downgrade_write() we ensure that no other thread can clear - * DEVICE_REGISTERED while we are completing the client setup. - */ - downgrade_write(&devices_rwsem); - if (device->ops.enable_driver) { ret = device->ops.enable_driver(device); if (ret) @@ -1337,7 +1331,7 @@ static int enable_device_and_get(struct ib_device *device) if (!ret) ret = add_compat_devs(device); out: - up_read(&devices_rwsem); + up_write(&devices_rwsem); return ret; }
The mad_client will be initialized in enable_device_and_get(), while the devices_rwsem will be downgraded to a read semaphore. There is a window that leads to the failed initialization for cm_client, since it can not get matched mad port from ib_mad_port_list, and the matched mad port will be added to the list after that. mad_client | cm_client ------------------|-------------------------------------------------------- ib_register_device| enable_device_and_get down_write(&devices_rwsem) xa_set_mark(&devices, DEVICE_REGISTERED) downgrade_write(&devices_rwsem) | |ib_cm_init |ib_register_client(&cm_client) |down_read(&devices_rwsem) |xa_for_each_marked (&devices, DEVICE_REGISTERED) |add_client_context |cm_add_one |ib_register_mad_agent |ib_get_mad_port |__ib_get_mad_port |list_for_each_entry(entry, &ib_mad_port_list, port_list) |return NULL |up_read(&devices_rwsem) | add_client_context| ib_mad_init_device| ib_mad_port_open | list_add_tail(&port_priv->port_list, &ib_mad_port_list) up_read(&devices_rwsem) | Fix it by using the devices_rwsem write semaphore to protect the mad_client init flow in enable_device_and_get(). Fixes: d0899892edd0 ("RDMA/device: Provide APIs from the core code to help unregistration") Cc: Shifeng Li <lishifeng1992@126.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> --- drivers/infiniband/core/device.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)