Message ID | 20200812111447.256822-1-kamalheib1@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create() | expand |
On 8/12/2020 7:14 PM, Kamal Heib wrote: > To avoid the following kernel panic when calling kmem_cache_create() > with a NULL pointer from pool_cache(), What is the root cause of this kernel panic? Zhu Yanjun > move the rxe_cache_init() to the > context of device creation. > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b > PGD 0 P4D 0 > Oops: 0000 [#1] SMP NOPTI > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 > Call Trace: > rxe_alloc+0xc8/0x160 [rdma_rxe] > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] > __ib_alloc_pd+0xcb/0x160 [ib_core] > ib_mad_init_device+0x296/0x8b0 [ib_core] > add_client_context+0x11a/0x160 [ib_core] > enable_device_and_get+0xdc/0x1d0 [ib_core] > ib_register_device+0x572/0x6b0 [ib_core] > ? crypto_create_tfm+0x32/0xe0 > ? crypto_create_tfm+0x7a/0xe0 > ? crypto_alloc_tfm+0x58/0xf0 > rxe_register_device+0x19d/0x1c0 [rdma_rxe] > rxe_net_add+0x3d/0x70 [rdma_rxe] > ? dev_get_by_name_rcu+0x73/0x90 > rxe_param_set_add+0xaf/0xc0 [rdma_rxe] > parse_args+0x179/0x370 > ? ref_module+0x1b0/0x1b0 > load_module+0x135e/0x17e0 > ? ref_module+0x1b0/0x1b0 > ? __do_sys_init_module+0x13b/0x180 > __do_sys_init_module+0x13b/0x180 > do_syscall_64+0x5b/0x1a0 > entry_SYSCALL_64_after_hwframe+0x65/0xca > RIP: 0033:0x7f9137ed296e > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > Signed-off-by: Kamal Heib <kamalheib1@gmail.com> > --- > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++------- > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++ > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++ > 3 files changed, 17 insertions(+), 7 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c > index 5642eefb4ba1..60d5086dd34d 100644 > --- a/drivers/infiniband/sw/rxe/rxe.c > +++ b/drivers/infiniband/sw/rxe/rxe.c > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev) > goto err; > } > > + /* initialize slab caches for managed objects */ > + err = rxe_cache_init(); > + if (err) { > + pr_err("unable to init object pools\n"); > + goto err; > + } > + > err = rxe_net_add(ibdev_name, ndev); > if (err) { > pr_err("failed to add %s\n", ndev->name); > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void) > { > int err; > > - /* initialize slab caches for managed objects */ > - err = rxe_cache_init(); > - if (err) { > - pr_err("unable to init object pools\n"); > - return err; > - } > - > err = rxe_net_init(); > if (err) > return err; > diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c > index fbcbac52290b..06c6d1f835b7 100644 > --- a/drivers/infiniband/sw/rxe/rxe_pool.c > +++ b/drivers/infiniband/sw/rxe/rxe_pool.c > @@ -139,6 +139,9 @@ int rxe_cache_init(void) > for (i = 0; i < RXE_NUM_TYPES; i++) { > type = &rxe_type_info[i]; > size = ALIGN(type->size, RXE_POOL_ALIGN); > + if (type->cache) > + continue; > + > if (!(type->flags & RXE_POOL_NO_ALLOC)) { > type->cache = > kmem_cache_create(type->name, size, > diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c > index ccda5f5a3bc0..d0af48ba0110 100644 > --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c > +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c > @@ -81,6 +81,13 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) > goto err; > } > > + /* initialize slab caches for managed objects */ > + err = rxe_cache_init(); > + if (err) { > + pr_err("unable to init object pools\n"); > + goto err; > + } > + > err = rxe_net_add("rxe%d", ndev); > if (err) { > pr_err("failed to add %s\n", intf);
On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote: > On 8/12/2020 7:14 PM, Kamal Heib wrote: > > To avoid the following kernel panic when calling kmem_cache_create() > > with a NULL pointer from pool_cache(), > > What is the root cause of this kernel panic? > The kernel panic is triggered using the following command and it happen because the cache is not getting initialized. modprobe rdma_rxe add=eno1 Thanks, Kamal > Zhu Yanjun > > > move the rxe_cache_init() to the > > context of device creation. > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b > > PGD 0 P4D 0 > > Oops: 0000 [#1] SMP NOPTI > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 > > Call Trace: > > rxe_alloc+0xc8/0x160 [rdma_rxe] > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] > > __ib_alloc_pd+0xcb/0x160 [ib_core] > > ib_mad_init_device+0x296/0x8b0 [ib_core] > > add_client_context+0x11a/0x160 [ib_core] > > enable_device_and_get+0xdc/0x1d0 [ib_core] > > ib_register_device+0x572/0x6b0 [ib_core] > > ? crypto_create_tfm+0x32/0xe0 > > ? crypto_create_tfm+0x7a/0xe0 > > ? crypto_alloc_tfm+0x58/0xf0 > > rxe_register_device+0x19d/0x1c0 [rdma_rxe] > > rxe_net_add+0x3d/0x70 [rdma_rxe] > > ? dev_get_by_name_rcu+0x73/0x90 > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe] > > parse_args+0x179/0x370 > > ? ref_module+0x1b0/0x1b0 > > load_module+0x135e/0x17e0 > > ? ref_module+0x1b0/0x1b0 > > ? __do_sys_init_module+0x13b/0x180 > > __do_sys_init_module+0x13b/0x180 > > do_syscall_64+0x5b/0x1a0 > > entry_SYSCALL_64_after_hwframe+0x65/0xca > > RIP: 0033:0x7f9137ed296e > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com> > > --- > > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++------- > > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++ > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++ > > 3 files changed, 17 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c > > index 5642eefb4ba1..60d5086dd34d 100644 > > --- a/drivers/infiniband/sw/rxe/rxe.c > > +++ b/drivers/infiniband/sw/rxe/rxe.c > > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev) > > goto err; > > } > > + /* initialize slab caches for managed objects */ > > + err = rxe_cache_init(); > > + if (err) { > > + pr_err("unable to init object pools\n"); > > + goto err; > > + } > > + > > err = rxe_net_add(ibdev_name, ndev); > > if (err) { > > pr_err("failed to add %s\n", ndev->name); > > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void) > > { > > int err; > > - /* initialize slab caches for managed objects */ > > - err = rxe_cache_init(); > > - if (err) { > > - pr_err("unable to init object pools\n"); > > - return err; > > - } > > - > > err = rxe_net_init(); > > if (err) > > return err; > > diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c > > index fbcbac52290b..06c6d1f835b7 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_pool.c > > +++ b/drivers/infiniband/sw/rxe/rxe_pool.c > > @@ -139,6 +139,9 @@ int rxe_cache_init(void) > > for (i = 0; i < RXE_NUM_TYPES; i++) { > > type = &rxe_type_info[i]; > > size = ALIGN(type->size, RXE_POOL_ALIGN); > > + if (type->cache) > > + continue; > > + > > if (!(type->flags & RXE_POOL_NO_ALLOC)) { > > type->cache = > > kmem_cache_create(type->name, size, > > diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c > > index ccda5f5a3bc0..d0af48ba0110 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c > > +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c > > @@ -81,6 +81,13 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) > > goto err; > > } > > + /* initialize slab caches for managed objects */ > > + err = rxe_cache_init(); > > + if (err) { > > + pr_err("unable to init object pools\n"); > > + goto err; > > + } > > + > > err = rxe_net_add("rxe%d", ndev); > > if (err) { > > pr_err("failed to add %s\n", intf); > >
On 8/17/2020 6:12 AM, Kamal Heib wrote: > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote: >> On 8/12/2020 7:14 PM, Kamal Heib wrote: >>> To avoid the following kernel panic when calling kmem_cache_create() >>> with a NULL pointer from pool_cache(), >> What is the root cause of this kernel panic? >> > The kernel panic is triggered using the following command and it happen > because the cache is not getting initialized. > > modprobe rdma_rxe add=eno1 > > Thanks, > Kamal > >> Zhu Yanjun >> >>> move the rxe_cache_init() to the >>> context of device creation. >>> >>> BUG: unable to handle kernel NULL pointer dereference at 000000000000000b >>> PGD 0 P4D 0 >>> Oops: 0000 [#1] SMP NOPTI >>> CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 >>> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 >>> RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 >>> Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 >>> RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 >>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 >>> RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 >>> RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 >>> R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 >>> R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 >>> FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 >>> Call Trace: >>> rxe_alloc+0xc8/0x160 [rdma_rxe] >>> rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] >>> __ib_alloc_pd+0xcb/0x160 [ib_core] >>> ib_mad_init_device+0x296/0x8b0 [ib_core] >>> add_client_context+0x11a/0x160 [ib_core] >>> enable_device_and_get+0xdc/0x1d0 [ib_core] >>> ib_register_device+0x572/0x6b0 [ib_core] >>> ? crypto_create_tfm+0x32/0xe0 >>> ? crypto_create_tfm+0x7a/0xe0 >>> ? crypto_alloc_tfm+0x58/0xf0 >>> rxe_register_device+0x19d/0x1c0 [rdma_rxe] >>> rxe_net_add+0x3d/0x70 [rdma_rxe] >>> ? dev_get_by_name_rcu+0x73/0x90 >>> rxe_param_set_add+0xaf/0xc0 [rdma_rxe] >>> parse_args+0x179/0x370 >>> ? ref_module+0x1b0/0x1b0 >>> load_module+0x135e/0x17e0 >>> ? ref_module+0x1b0/0x1b0 >>> ? __do_sys_init_module+0x13b/0x180 >>> __do_sys_init_module+0x13b/0x180 >>> do_syscall_64+0x5b/0x1a0 >>> entry_SYSCALL_64_after_hwframe+0x65/0xca >>> RIP: 0033:0x7f9137ed296e >>> >>> Fixes: 8700e3e7c485 ("Soft RoCE driver") >>> Signed-off-by: Kamal Heib <kamalheib1@gmail.com> >>> --- >>> drivers/infiniband/sw/rxe/rxe.c | 14 +++++++------- >>> drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++ >>> drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++ >>> 3 files changed, 17 insertions(+), 7 deletions(-) >>> >>> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c >>> index 5642eefb4ba1..60d5086dd34d 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe.c >>> +++ b/drivers/infiniband/sw/rxe/rxe.c >>> @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev) >>> goto err; >>> } >>> + /* initialize slab caches for managed objects */ >>> + err = rxe_cache_init(); >>> + if (err) { >>> + pr_err("unable to init object pools\n"); >>> + goto err; >>> + } >>> + >>> err = rxe_net_add(ibdev_name, ndev); >>> if (err) { >>> pr_err("failed to add %s\n", ndev->name); >>> @@ -336,13 +343,6 @@ static int __init rxe_module_init(void) >>> { >>> int err; >>> - /* initialize slab caches for managed objects */ >>> - err = rxe_cache_init(); When modprobe rdma_rxe, rxe_module_init should be called. Then rxe_cache_init should be also called. Why does the above call trace occur? Zhu Yanjun >>> - if (err) { >>> - pr_err("unable to init object pools\n"); >>> - return err; >>> - } >>> - >>> err = rxe_net_init(); >>> if (err) >>> return err; >>> diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c >>> index fbcbac52290b..06c6d1f835b7 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_pool.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_pool.c >>> @@ -139,6 +139,9 @@ int rxe_cache_init(void) >>> for (i = 0; i < RXE_NUM_TYPES; i++) { >>> type = &rxe_type_info[i]; >>> size = ALIGN(type->size, RXE_POOL_ALIGN); >>> + if (type->cache) >>> + continue; >>> + >>> if (!(type->flags & RXE_POOL_NO_ALLOC)) { >>> type->cache = >>> kmem_cache_create(type->name, size, >>> diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c >>> index ccda5f5a3bc0..d0af48ba0110 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c >>> @@ -81,6 +81,13 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) >>> goto err; >>> } >>> + /* initialize slab caches for managed objects */ >>> + err = rxe_cache_init(); >>> + if (err) { >>> + pr_err("unable to init object pools\n"); >>> + goto err; >>> + } >>> + >>> err = rxe_net_add("rxe%d", ndev); >>> if (err) { >>> pr_err("failed to add %s\n", intf); >>
On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote: > On 8/17/2020 6:12 AM, Kamal Heib wrote: > > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote: > > > On 8/12/2020 7:14 PM, Kamal Heib wrote: > > > > To avoid the following kernel panic when calling kmem_cache_create() > > > > with a NULL pointer from pool_cache(), > > > What is the root cause of this kernel panic? > > > > > The kernel panic is triggered using the following command and it happen > > because the cache is not getting initialized. > > > > modprobe rdma_rxe add=eno1 > > > > Thanks, > > Kamal > > > > > Zhu Yanjun > > > > > > > move the rxe_cache_init() to the > > > > context of device creation. > > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b > > > > PGD 0 P4D 0 > > > > Oops: 0000 [#1] SMP NOPTI > > > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 > > > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 > > > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 > > > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 > > > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 > > > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 > > > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 > > > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 > > > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 > > > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 > > > > Call Trace: > > > > rxe_alloc+0xc8/0x160 [rdma_rxe] > > > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] > > > > __ib_alloc_pd+0xcb/0x160 [ib_core] > > > > ib_mad_init_device+0x296/0x8b0 [ib_core] > > > > add_client_context+0x11a/0x160 [ib_core] > > > > enable_device_and_get+0xdc/0x1d0 [ib_core] > > > > ib_register_device+0x572/0x6b0 [ib_core] > > > > ? crypto_create_tfm+0x32/0xe0 > > > > ? crypto_create_tfm+0x7a/0xe0 > > > > ? crypto_alloc_tfm+0x58/0xf0 > > > > rxe_register_device+0x19d/0x1c0 [rdma_rxe] > > > > rxe_net_add+0x3d/0x70 [rdma_rxe] > > > > ? dev_get_by_name_rcu+0x73/0x90 > > > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe] > > > > parse_args+0x179/0x370 > > > > ? ref_module+0x1b0/0x1b0 > > > > load_module+0x135e/0x17e0 > > > > ? ref_module+0x1b0/0x1b0 > > > > ? __do_sys_init_module+0x13b/0x180 > > > > __do_sys_init_module+0x13b/0x180 > > > > do_syscall_64+0x5b/0x1a0 > > > > entry_SYSCALL_64_after_hwframe+0x65/0xca > > > > RIP: 0033:0x7f9137ed296e > > > > > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > > > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com> > > > > --- > > > > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++------- > > > > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++ > > > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++ > > > > 3 files changed, 17 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c > > > > index 5642eefb4ba1..60d5086dd34d 100644 > > > > --- a/drivers/infiniband/sw/rxe/rxe.c > > > > +++ b/drivers/infiniband/sw/rxe/rxe.c > > > > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev) > > > > goto err; > > > > } > > > > + /* initialize slab caches for managed objects */ > > > > + err = rxe_cache_init(); > > > > + if (err) { > > > > + pr_err("unable to init object pools\n"); > > > > + goto err; > > > > + } > > > > + > > > > err = rxe_net_add(ibdev_name, ndev); > > > > if (err) { > > > > pr_err("failed to add %s\n", ndev->name); > > > > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void) > > > > { > > > > int err; > > > > - /* initialize slab caches for managed objects */ > > > > - err = rxe_cache_init(); > > When modprobe rdma_rxe, rxe_module_init should be called. Then > rxe_cache_init should be also called. > > Why does the above call trace occur? > > Zhu Yanjun > As you can see in the call trace attached to the commit message, When running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add() is called before rxe_module_init() (without init the caches), so the call trace occurs when trying to register the allocated rxe device from the context of rxe_param_set_add() without initialize the caches. Thanks, Kamal > > > > - if (err) { > > > > - pr_err("unable to init object pools\n"); > > > > - return err; > > > > - } > > > > - > > > > err = rxe_net_init(); > > > > if (err) > > > > return err; > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c > > > > index fbcbac52290b..06c6d1f835b7 100644 > > > > --- a/drivers/infiniband/sw/rxe/rxe_pool.c > > > > +++ b/drivers/infiniband/sw/rxe/rxe_pool.c > > > > @@ -139,6 +139,9 @@ int rxe_cache_init(void) > > > > for (i = 0; i < RXE_NUM_TYPES; i++) { > > > > type = &rxe_type_info[i]; > > > > size = ALIGN(type->size, RXE_POOL_ALIGN); > > > > + if (type->cache) > > > > + continue; > > > > + > > > > if (!(type->flags & RXE_POOL_NO_ALLOC)) { > > > > type->cache = > > > > kmem_cache_create(type->name, size, > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c > > > > index ccda5f5a3bc0..d0af48ba0110 100644 > > > > --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c > > > > +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c > > > > @@ -81,6 +81,13 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) > > > > goto err; > > > > } > > > > + /* initialize slab caches for managed objects */ > > > > + err = rxe_cache_init(); > > > > + if (err) { > > > > + pr_err("unable to init object pools\n"); > > > > + goto err; > > > > + } > > > > + > > > > err = rxe_net_add("rxe%d", ndev); > > > > if (err) { > > > > pr_err("failed to add %s\n", intf); > > > >
On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote: > On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote: > > On 8/17/2020 6:12 AM, Kamal Heib wrote: > > > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote: > > > > On 8/12/2020 7:14 PM, Kamal Heib wrote: > > > > > To avoid the following kernel panic when calling kmem_cache_create() > > > > > with a NULL pointer from pool_cache(), > > > > What is the root cause of this kernel panic? > > > > > > > The kernel panic is triggered using the following command and it happen > > > because the cache is not getting initialized. > > > > > > modprobe rdma_rxe add=eno1 > > > > > > Thanks, > > > Kamal > > > > > > > Zhu Yanjun > > > > > > > > > move the rxe_cache_init() to the > > > > > context of device creation. > > > > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b > > > > > PGD 0 P4D 0 > > > > > Oops: 0000 [#1] SMP NOPTI > > > > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 > > > > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 > > > > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 > > > > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 > > > > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 > > > > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 > > > > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 > > > > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 > > > > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 > > > > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 > > > > > Call Trace: > > > > > rxe_alloc+0xc8/0x160 [rdma_rxe] > > > > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] > > > > > __ib_alloc_pd+0xcb/0x160 [ib_core] > > > > > ib_mad_init_device+0x296/0x8b0 [ib_core] > > > > > add_client_context+0x11a/0x160 [ib_core] > > > > > enable_device_and_get+0xdc/0x1d0 [ib_core] > > > > > ib_register_device+0x572/0x6b0 [ib_core] > > > > > ? crypto_create_tfm+0x32/0xe0 > > > > > ? crypto_create_tfm+0x7a/0xe0 > > > > > ? crypto_alloc_tfm+0x58/0xf0 > > > > > rxe_register_device+0x19d/0x1c0 [rdma_rxe] > > > > > rxe_net_add+0x3d/0x70 [rdma_rxe] > > > > > ? dev_get_by_name_rcu+0x73/0x90 > > > > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe] > > > > > parse_args+0x179/0x370 > > > > > ? ref_module+0x1b0/0x1b0 > > > > > load_module+0x135e/0x17e0 > > > > > ? ref_module+0x1b0/0x1b0 > > > > > ? __do_sys_init_module+0x13b/0x180 > > > > > __do_sys_init_module+0x13b/0x180 > > > > > do_syscall_64+0x5b/0x1a0 > > > > > entry_SYSCALL_64_after_hwframe+0x65/0xca > > > > > RIP: 0033:0x7f9137ed296e > > > > > > > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > > > > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com> > > > > > --- > > > > > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++------- > > > > > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++ > > > > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++ > > > > > 3 files changed, 17 insertions(+), 7 deletions(-) > > > > > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c > > > > > index 5642eefb4ba1..60d5086dd34d 100644 > > > > > --- a/drivers/infiniband/sw/rxe/rxe.c > > > > > +++ b/drivers/infiniband/sw/rxe/rxe.c > > > > > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev) > > > > > goto err; > > > > > } > > > > > + /* initialize slab caches for managed objects */ > > > > > + err = rxe_cache_init(); > > > > > + if (err) { > > > > > + pr_err("unable to init object pools\n"); > > > > > + goto err; > > > > > + } > > > > > + > > > > > err = rxe_net_add(ibdev_name, ndev); > > > > > if (err) { > > > > > pr_err("failed to add %s\n", ndev->name); > > > > > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void) > > > > > { > > > > > int err; > > > > > - /* initialize slab caches for managed objects */ > > > > > - err = rxe_cache_init(); > > > > When modprobe rdma_rxe, rxe_module_init should be called. Then > > rxe_cache_init should be also called. > > > > Why does the above call trace occur? > > > > Zhu Yanjun > > > > As you can see in the call trace attached to the commit message, When > running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add() > is called before rxe_module_init() (without init the caches), so the > call trace occurs when trying to register the allocated rxe device from > the context of rxe_param_set_add() without initialize the caches. I would expect the fix being in rxe_init() instead of putting calls to rxe_cache_init() in all places. Thanks
On Tue, Aug 18, 2020 at 10:49:56AM +0300, Leon Romanovsky wrote: > On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote: > > On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote: > > > On 8/17/2020 6:12 AM, Kamal Heib wrote: > > > > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote: > > > > > On 8/12/2020 7:14 PM, Kamal Heib wrote: > > > > > > To avoid the following kernel panic when calling kmem_cache_create() > > > > > > with a NULL pointer from pool_cache(), > > > > > What is the root cause of this kernel panic? > > > > > > > > > The kernel panic is triggered using the following command and it happen > > > > because the cache is not getting initialized. > > > > > > > > modprobe rdma_rxe add=eno1 > > > > > > > > Thanks, > > > > Kamal > > > > > > > > > Zhu Yanjun > > > > > > > > > > > move the rxe_cache_init() to the > > > > > > context of device creation. > > > > > > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b > > > > > > PGD 0 P4D 0 > > > > > > Oops: 0000 [#1] SMP NOPTI > > > > > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 > > > > > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 > > > > > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 > > > > > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 > > > > > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 > > > > > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 > > > > > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 > > > > > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 > > > > > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 > > > > > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 > > > > > > Call Trace: > > > > > > rxe_alloc+0xc8/0x160 [rdma_rxe] > > > > > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] > > > > > > __ib_alloc_pd+0xcb/0x160 [ib_core] > > > > > > ib_mad_init_device+0x296/0x8b0 [ib_core] > > > > > > add_client_context+0x11a/0x160 [ib_core] > > > > > > enable_device_and_get+0xdc/0x1d0 [ib_core] > > > > > > ib_register_device+0x572/0x6b0 [ib_core] > > > > > > ? crypto_create_tfm+0x32/0xe0 > > > > > > ? crypto_create_tfm+0x7a/0xe0 > > > > > > ? crypto_alloc_tfm+0x58/0xf0 > > > > > > rxe_register_device+0x19d/0x1c0 [rdma_rxe] > > > > > > rxe_net_add+0x3d/0x70 [rdma_rxe] > > > > > > ? dev_get_by_name_rcu+0x73/0x90 > > > > > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe] > > > > > > parse_args+0x179/0x370 > > > > > > ? ref_module+0x1b0/0x1b0 > > > > > > load_module+0x135e/0x17e0 > > > > > > ? ref_module+0x1b0/0x1b0 > > > > > > ? __do_sys_init_module+0x13b/0x180 > > > > > > __do_sys_init_module+0x13b/0x180 > > > > > > do_syscall_64+0x5b/0x1a0 > > > > > > entry_SYSCALL_64_after_hwframe+0x65/0xca > > > > > > RIP: 0033:0x7f9137ed296e > > > > > > > > > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > > > > > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com> > > > > > > --- > > > > > > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++------- > > > > > > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++ > > > > > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++ > > > > > > 3 files changed, 17 insertions(+), 7 deletions(-) > > > > > > > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c > > > > > > index 5642eefb4ba1..60d5086dd34d 100644 > > > > > > --- a/drivers/infiniband/sw/rxe/rxe.c > > > > > > +++ b/drivers/infiniband/sw/rxe/rxe.c > > > > > > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev) > > > > > > goto err; > > > > > > } > > > > > > + /* initialize slab caches for managed objects */ > > > > > > + err = rxe_cache_init(); > > > > > > + if (err) { > > > > > > + pr_err("unable to init object pools\n"); > > > > > > + goto err; > > > > > > + } > > > > > > + > > > > > > err = rxe_net_add(ibdev_name, ndev); > > > > > > if (err) { > > > > > > pr_err("failed to add %s\n", ndev->name); > > > > > > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void) > > > > > > { > > > > > > int err; > > > > > > - /* initialize slab caches for managed objects */ > > > > > > - err = rxe_cache_init(); > > > > > > When modprobe rdma_rxe, rxe_module_init should be called. Then > > > rxe_cache_init should be also called. > > > > > > Why does the above call trace occur? > > > > > > Zhu Yanjun > > > > > > > As you can see in the call trace attached to the commit message, When > > running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add() > > is called before rxe_module_init() (without init the caches), so the > > call trace occurs when trying to register the allocated rxe device from > > the context of rxe_param_set_add() without initialize the caches. > > I would expect the fix being in rxe_init() instead of putting calls to > rxe_cache_init() in all places. > > Thanks OK, I agree. I'll post v2. Thanks, Kamal
On 8/18/2020 3:49 PM, Leon Romanovsky wrote: > On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote: >> On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote: >>> On 8/17/2020 6:12 AM, Kamal Heib wrote: >>>> On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote: >>>>> On 8/12/2020 7:14 PM, Kamal Heib wrote: >>>>>> To avoid the following kernel panic when calling kmem_cache_create() >>>>>> with a NULL pointer from pool_cache(), >>>>> What is the root cause of this kernel panic? >>>>> >>>> The kernel panic is triggered using the following command and it happen >>>> because the cache is not getting initialized. >>>> >>>> modprobe rdma_rxe add=eno1 >>>> >>>> Thanks, >>>> Kamal >>>> >>>>> Zhu Yanjun >>>>> >>>>>> move the rxe_cache_init() to the >>>>>> context of device creation. >>>>>> >>>>>> BUG: unable to handle kernel NULL pointer dereference at 000000000000000b >>>>>> PGD 0 P4D 0 >>>>>> Oops: 0000 [#1] SMP NOPTI >>>>>> CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 >>>>>> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 >>>>>> RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 >>>>>> Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 >>>>>> RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 >>>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 >>>>>> RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 >>>>>> RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 >>>>>> R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 >>>>>> R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 >>>>>> FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 >>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 >>>>>> Call Trace: >>>>>> rxe_alloc+0xc8/0x160 [rdma_rxe] >>>>>> rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] >>>>>> __ib_alloc_pd+0xcb/0x160 [ib_core] >>>>>> ib_mad_init_device+0x296/0x8b0 [ib_core] >>>>>> add_client_context+0x11a/0x160 [ib_core] >>>>>> enable_device_and_get+0xdc/0x1d0 [ib_core] >>>>>> ib_register_device+0x572/0x6b0 [ib_core] >>>>>> ? crypto_create_tfm+0x32/0xe0 >>>>>> ? crypto_create_tfm+0x7a/0xe0 >>>>>> ? crypto_alloc_tfm+0x58/0xf0 >>>>>> rxe_register_device+0x19d/0x1c0 [rdma_rxe] >>>>>> rxe_net_add+0x3d/0x70 [rdma_rxe] >>>>>> ? dev_get_by_name_rcu+0x73/0x90 >>>>>> rxe_param_set_add+0xaf/0xc0 [rdma_rxe] >>>>>> parse_args+0x179/0x370 >>>>>> ? ref_module+0x1b0/0x1b0 >>>>>> load_module+0x135e/0x17e0 >>>>>> ? ref_module+0x1b0/0x1b0 >>>>>> ? __do_sys_init_module+0x13b/0x180 >>>>>> __do_sys_init_module+0x13b/0x180 >>>>>> do_syscall_64+0x5b/0x1a0 >>>>>> entry_SYSCALL_64_after_hwframe+0x65/0xca >>>>>> RIP: 0033:0x7f9137ed296e >>>>>> >>>>>> Fixes: 8700e3e7c485 ("Soft RoCE driver") >>>>>> Signed-off-by: Kamal Heib <kamalheib1@gmail.com> >>>>>> --- >>>>>> drivers/infiniband/sw/rxe/rxe.c | 14 +++++++------- >>>>>> drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++ >>>>>> drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++ >>>>>> 3 files changed, 17 insertions(+), 7 deletions(-) >>>>>> >>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c >>>>>> index 5642eefb4ba1..60d5086dd34d 100644 >>>>>> --- a/drivers/infiniband/sw/rxe/rxe.c >>>>>> +++ b/drivers/infiniband/sw/rxe/rxe.c >>>>>> @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev) >>>>>> goto err; >>>>>> } >>>>>> + /* initialize slab caches for managed objects */ >>>>>> + err = rxe_cache_init(); >>>>>> + if (err) { >>>>>> + pr_err("unable to init object pools\n"); >>>>>> + goto err; >>>>>> + } >>>>>> + >>>>>> err = rxe_net_add(ibdev_name, ndev); >>>>>> if (err) { >>>>>> pr_err("failed to add %s\n", ndev->name); >>>>>> @@ -336,13 +343,6 @@ static int __init rxe_module_init(void) >>>>>> { >>>>>> int err; >>>>>> - /* initialize slab caches for managed objects */ >>>>>> - err = rxe_cache_init(); >>> When modprobe rdma_rxe, rxe_module_init should be called. Then >>> rxe_cache_init should be also called. >>> >>> Why does the above call trace occur? >>> >>> Zhu Yanjun >>> >> As you can see in the call trace attached to the commit message, When >> running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add() >> is called before rxe_module_init() (without init the caches), so the >> call trace occurs when trying to register the allocated rxe device from >> the context of rxe_param_set_add() without initialize the caches. > I would expect the fix being in rxe_init() instead of putting calls to > rxe_cache_init() in all places. I agree with you. Is it possible to make rxe_module_init be called before rxe_param_set_add? Thanks > > Thanks
On Wed, Aug 19, 2020 at 11:07:56AM +0800, Zhu Yanjun wrote: > On 8/18/2020 3:49 PM, Leon Romanovsky wrote: > > On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote: > > > On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote: > > > > On 8/17/2020 6:12 AM, Kamal Heib wrote: > > > > > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote: > > > > > > On 8/12/2020 7:14 PM, Kamal Heib wrote: > > > > > > > To avoid the following kernel panic when calling kmem_cache_create() > > > > > > > with a NULL pointer from pool_cache(), > > > > > > What is the root cause of this kernel panic? > > > > > > > > > > > The kernel panic is triggered using the following command and it happen > > > > > because the cache is not getting initialized. > > > > > > > > > > modprobe rdma_rxe add=eno1 > > > > > > > > > > Thanks, > > > > > Kamal > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > move the rxe_cache_init() to the > > > > > > > context of device creation. > > > > > > > > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b > > > > > > > PGD 0 P4D 0 > > > > > > > Oops: 0000 [#1] SMP NOPTI > > > > > > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 > > > > > > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 > > > > > > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 > > > > > > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 > > > > > > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 > > > > > > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 > > > > > > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 > > > > > > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 > > > > > > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 > > > > > > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 > > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 > > > > > > > Call Trace: > > > > > > > rxe_alloc+0xc8/0x160 [rdma_rxe] > > > > > > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] > > > > > > > __ib_alloc_pd+0xcb/0x160 [ib_core] > > > > > > > ib_mad_init_device+0x296/0x8b0 [ib_core] > > > > > > > add_client_context+0x11a/0x160 [ib_core] > > > > > > > enable_device_and_get+0xdc/0x1d0 [ib_core] > > > > > > > ib_register_device+0x572/0x6b0 [ib_core] > > > > > > > ? crypto_create_tfm+0x32/0xe0 > > > > > > > ? crypto_create_tfm+0x7a/0xe0 > > > > > > > ? crypto_alloc_tfm+0x58/0xf0 > > > > > > > rxe_register_device+0x19d/0x1c0 [rdma_rxe] > > > > > > > rxe_net_add+0x3d/0x70 [rdma_rxe] > > > > > > > ? dev_get_by_name_rcu+0x73/0x90 > > > > > > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe] > > > > > > > parse_args+0x179/0x370 > > > > > > > ? ref_module+0x1b0/0x1b0 > > > > > > > load_module+0x135e/0x17e0 > > > > > > > ? ref_module+0x1b0/0x1b0 > > > > > > > ? __do_sys_init_module+0x13b/0x180 > > > > > > > __do_sys_init_module+0x13b/0x180 > > > > > > > do_syscall_64+0x5b/0x1a0 > > > > > > > entry_SYSCALL_64_after_hwframe+0x65/0xca > > > > > > > RIP: 0033:0x7f9137ed296e > > > > > > > > > > > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > > > > > > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com> > > > > > > > --- > > > > > > > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++------- > > > > > > > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++ > > > > > > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++ > > > > > > > 3 files changed, 17 insertions(+), 7 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c > > > > > > > index 5642eefb4ba1..60d5086dd34d 100644 > > > > > > > --- a/drivers/infiniband/sw/rxe/rxe.c > > > > > > > +++ b/drivers/infiniband/sw/rxe/rxe.c > > > > > > > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev) > > > > > > > goto err; > > > > > > > } > > > > > > > + /* initialize slab caches for managed objects */ > > > > > > > + err = rxe_cache_init(); > > > > > > > + if (err) { > > > > > > > + pr_err("unable to init object pools\n"); > > > > > > > + goto err; > > > > > > > + } > > > > > > > + > > > > > > > err = rxe_net_add(ibdev_name, ndev); > > > > > > > if (err) { > > > > > > > pr_err("failed to add %s\n", ndev->name); > > > > > > > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void) > > > > > > > { > > > > > > > int err; > > > > > > > - /* initialize slab caches for managed objects */ > > > > > > > - err = rxe_cache_init(); > > > > When modprobe rdma_rxe, rxe_module_init should be called. Then > > > > rxe_cache_init should be also called. > > > > > > > > Why does the above call trace occur? > > > > > > > > Zhu Yanjun > > > > > > > As you can see in the call trace attached to the commit message, When > > > running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add() > > > is called before rxe_module_init() (without init the caches), so the > > > call trace occurs when trying to register the allocated rxe device from > > > the context of rxe_param_set_add() without initialize the caches. > > I would expect the fix being in rxe_init() instead of putting calls to > > rxe_cache_init() in all places. > > I agree with you. > > Is it possible to make rxe_module_init be called before rxe_param_set_add? The best solution will be to delete module_parameters() from RXE. Thanks > > Thanks > > > > > Thanks > >
On Wed, Aug 19, 2020 at 12:58 PM Leon Romanovsky <leon@kernel.org> wrote: > > On Wed, Aug 19, 2020 at 11:07:56AM +0800, Zhu Yanjun wrote: > > On 8/18/2020 3:49 PM, Leon Romanovsky wrote: > > > On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote: > > > > On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote: > > > > > On 8/17/2020 6:12 AM, Kamal Heib wrote: > > > > > > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote: > > > > > > > On 8/12/2020 7:14 PM, Kamal Heib wrote: > > > > > > > > To avoid the following kernel panic when calling kmem_cache_create() > > > > > > > > with a NULL pointer from pool_cache(), > > > > > > > What is the root cause of this kernel panic? > > > > > > > > > > > > > The kernel panic is triggered using the following command and it happen > > > > > > because the cache is not getting initialized. > > > > > > > > > > > > modprobe rdma_rxe add=eno1 > > > > > > > > > > > > Thanks, > > > > > > Kamal > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > move the rxe_cache_init() to the > > > > > > > > context of device creation. > > > > > > > > > > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b > > > > > > > > PGD 0 P4D 0 > > > > > > > > Oops: 0000 [#1] SMP NOPTI > > > > > > > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 > > > > > > > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 > > > > > > > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 > > > > > > > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 > > > > > > > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 > > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 > > > > > > > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 > > > > > > > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 > > > > > > > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 > > > > > > > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 > > > > > > > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 > > > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > > > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 > > > > > > > > Call Trace: > > > > > > > > rxe_alloc+0xc8/0x160 [rdma_rxe] > > > > > > > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] > > > > > > > > __ib_alloc_pd+0xcb/0x160 [ib_core] > > > > > > > > ib_mad_init_device+0x296/0x8b0 [ib_core] > > > > > > > > add_client_context+0x11a/0x160 [ib_core] > > > > > > > > enable_device_and_get+0xdc/0x1d0 [ib_core] > > > > > > > > ib_register_device+0x572/0x6b0 [ib_core] > > > > > > > > ? crypto_create_tfm+0x32/0xe0 > > > > > > > > ? crypto_create_tfm+0x7a/0xe0 > > > > > > > > ? crypto_alloc_tfm+0x58/0xf0 > > > > > > > > rxe_register_device+0x19d/0x1c0 [rdma_rxe] > > > > > > > > rxe_net_add+0x3d/0x70 [rdma_rxe] > > > > > > > > ? dev_get_by_name_rcu+0x73/0x90 > > > > > > > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe] > > > > > > > > parse_args+0x179/0x370 > > > > > > > > ? ref_module+0x1b0/0x1b0 > > > > > > > > load_module+0x135e/0x17e0 > > > > > > > > ? ref_module+0x1b0/0x1b0 > > > > > > > > ? __do_sys_init_module+0x13b/0x180 > > > > > > > > __do_sys_init_module+0x13b/0x180 > > > > > > > > do_syscall_64+0x5b/0x1a0 > > > > > > > > entry_SYSCALL_64_after_hwframe+0x65/0xca > > > > > > > > RIP: 0033:0x7f9137ed296e > > > > > > > > > > > > > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > > > > > > > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com> > > > > > > > > --- > > > > > > > > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++------- > > > > > > > > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++ > > > > > > > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++ > > > > > > > > 3 files changed, 17 insertions(+), 7 deletions(-) > > > > > > > > > > > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c > > > > > > > > index 5642eefb4ba1..60d5086dd34d 100644 > > > > > > > > --- a/drivers/infiniband/sw/rxe/rxe.c > > > > > > > > +++ b/drivers/infiniband/sw/rxe/rxe.c > > > > > > > > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev) > > > > > > > > goto err; > > > > > > > > } > > > > > > > > + /* initialize slab caches for managed objects */ > > > > > > > > + err = rxe_cache_init(); > > > > > > > > + if (err) { > > > > > > > > + pr_err("unable to init object pools\n"); > > > > > > > > + goto err; > > > > > > > > + } > > > > > > > > + > > > > > > > > err = rxe_net_add(ibdev_name, ndev); > > > > > > > > if (err) { > > > > > > > > pr_err("failed to add %s\n", ndev->name); > > > > > > > > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void) > > > > > > > > { > > > > > > > > int err; > > > > > > > > - /* initialize slab caches for managed objects */ > > > > > > > > - err = rxe_cache_init(); > > > > > When modprobe rdma_rxe, rxe_module_init should be called. Then > > > > > rxe_cache_init should be also called. > > > > > > > > > > Why does the above call trace occur? > > > > > > > > > > Zhu Yanjun > > > > > > > > > As you can see in the call trace attached to the commit message, When > > > > running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add() > > > > is called before rxe_module_init() (without init the caches), so the > > > > call trace occurs when trying to register the allocated rxe device from > > > > the context of rxe_param_set_add() without initialize the caches. > > > I would expect the fix being in rxe_init() instead of putting calls to > > > rxe_cache_init() in all places. > > > > I agree with you. > > > > Is it possible to make rxe_module_init be called before rxe_param_set_add? > > The best solution will be to delete module_parameters() from RXE. Sure. I am curious why the parameters are set before rxe_module_init. Is this a bug? Zhu Yanjun > > Thanks > > > > > Thanks > > > > > > > > Thanks > > > >
On Wed, Aug 19, 2020 at 02:19:20PM +0800, Zhu Yanjun wrote: > On Wed, Aug 19, 2020 at 12:58 PM Leon Romanovsky <leon@kernel.org> wrote: > > > > On Wed, Aug 19, 2020 at 11:07:56AM +0800, Zhu Yanjun wrote: > > > On 8/18/2020 3:49 PM, Leon Romanovsky wrote: > > > > On Tue, Aug 18, 2020 at 08:50:57AM +0300, Kamal Heib wrote: > > > > > On Tue, Aug 18, 2020 at 09:48:43AM +0800, Zhu Yanjun wrote: > > > > > > On 8/17/2020 6:12 AM, Kamal Heib wrote: > > > > > > > On Sat, Aug 15, 2020 at 02:58:45PM +0800, Zhu Yanjun wrote: > > > > > > > > On 8/12/2020 7:14 PM, Kamal Heib wrote: > > > > > > > > > To avoid the following kernel panic when calling kmem_cache_create() > > > > > > > > > with a NULL pointer from pool_cache(), > > > > > > > > What is the root cause of this kernel panic? > > > > > > > > > > > > > > > The kernel panic is triggered using the following command and it happen > > > > > > > because the cache is not getting initialized. > > > > > > > > > > > > > > modprobe rdma_rxe add=eno1 > > > > > > > > > > > > > > Thanks, > > > > > > > Kamal > > > > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > > > > > > > move the rxe_cache_init() to the > > > > > > > > > context of device creation. > > > > > > > > > > > > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b > > > > > > > > > PGD 0 P4D 0 > > > > > > > > > Oops: 0000 [#1] SMP NOPTI > > > > > > > > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 > > > > > > > > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 > > > > > > > > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 > > > > > > > > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 > > > > > > > > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 > > > > > > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 > > > > > > > > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 > > > > > > > > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 > > > > > > > > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 > > > > > > > > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 > > > > > > > > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 > > > > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > > > > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 > > > > > > > > > Call Trace: > > > > > > > > > rxe_alloc+0xc8/0x160 [rdma_rxe] > > > > > > > > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] > > > > > > > > > __ib_alloc_pd+0xcb/0x160 [ib_core] > > > > > > > > > ib_mad_init_device+0x296/0x8b0 [ib_core] > > > > > > > > > add_client_context+0x11a/0x160 [ib_core] > > > > > > > > > enable_device_and_get+0xdc/0x1d0 [ib_core] > > > > > > > > > ib_register_device+0x572/0x6b0 [ib_core] > > > > > > > > > ? crypto_create_tfm+0x32/0xe0 > > > > > > > > > ? crypto_create_tfm+0x7a/0xe0 > > > > > > > > > ? crypto_alloc_tfm+0x58/0xf0 > > > > > > > > > rxe_register_device+0x19d/0x1c0 [rdma_rxe] > > > > > > > > > rxe_net_add+0x3d/0x70 [rdma_rxe] > > > > > > > > > ? dev_get_by_name_rcu+0x73/0x90 > > > > > > > > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe] > > > > > > > > > parse_args+0x179/0x370 > > > > > > > > > ? ref_module+0x1b0/0x1b0 > > > > > > > > > load_module+0x135e/0x17e0 > > > > > > > > > ? ref_module+0x1b0/0x1b0 > > > > > > > > > ? __do_sys_init_module+0x13b/0x180 > > > > > > > > > __do_sys_init_module+0x13b/0x180 > > > > > > > > > do_syscall_64+0x5b/0x1a0 > > > > > > > > > entry_SYSCALL_64_after_hwframe+0x65/0xca > > > > > > > > > RIP: 0033:0x7f9137ed296e > > > > > > > > > > > > > > > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > > > > > > > > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com> > > > > > > > > > --- > > > > > > > > > drivers/infiniband/sw/rxe/rxe.c | 14 +++++++------- > > > > > > > > > drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++ > > > > > > > > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++ > > > > > > > > > 3 files changed, 17 insertions(+), 7 deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c > > > > > > > > > index 5642eefb4ba1..60d5086dd34d 100644 > > > > > > > > > --- a/drivers/infiniband/sw/rxe/rxe.c > > > > > > > > > +++ b/drivers/infiniband/sw/rxe/rxe.c > > > > > > > > > @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev) > > > > > > > > > goto err; > > > > > > > > > } > > > > > > > > > + /* initialize slab caches for managed objects */ > > > > > > > > > + err = rxe_cache_init(); > > > > > > > > > + if (err) { > > > > > > > > > + pr_err("unable to init object pools\n"); > > > > > > > > > + goto err; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > err = rxe_net_add(ibdev_name, ndev); > > > > > > > > > if (err) { > > > > > > > > > pr_err("failed to add %s\n", ndev->name); > > > > > > > > > @@ -336,13 +343,6 @@ static int __init rxe_module_init(void) > > > > > > > > > { > > > > > > > > > int err; > > > > > > > > > - /* initialize slab caches for managed objects */ > > > > > > > > > - err = rxe_cache_init(); > > > > > > When modprobe rdma_rxe, rxe_module_init should be called. Then > > > > > > rxe_cache_init should be also called. > > > > > > > > > > > > Why does the above call trace occur? > > > > > > > > > > > > Zhu Yanjun > > > > > > > > > > > As you can see in the call trace attached to the commit message, When > > > > > running the "modprobe rdma_rxe add=eno1" command the rxe_param_set_add() > > > > > is called before rxe_module_init() (without init the caches), so the > > > > > call trace occurs when trying to register the allocated rxe device from > > > > > the context of rxe_param_set_add() without initialize the caches. > > > > I would expect the fix being in rxe_init() instead of putting calls to > > > > rxe_cache_init() in all places. > > > > > > I agree with you. > > > > > > Is it possible to make rxe_module_init be called before rxe_param_set_add? > > > > The best solution will be to delete module_parameters() from RXE. > > Sure. I am curious why the parameters are set before rxe_module_init. > Is this a bug? Yes and no. The part of receiving user input is correct and it should be done before rxe_module_init(), so RXE can initialize properly based on the input. The call to rxe_net_add() later inside of rxe_param_set_add() is wrong. It should be done after rxe_module_init() finishes. Thanks > > Zhu Yanjun > > > > Thanks > > > > > > > > Thanks > > > > > > > > > > > Thanks > > > > > >
diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 5642eefb4ba1..60d5086dd34d 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -318,6 +318,13 @@ static int rxe_newlink(const char *ibdev_name, struct net_device *ndev) goto err; } + /* initialize slab caches for managed objects */ + err = rxe_cache_init(); + if (err) { + pr_err("unable to init object pools\n"); + goto err; + } + err = rxe_net_add(ibdev_name, ndev); if (err) { pr_err("failed to add %s\n", ndev->name); @@ -336,13 +343,6 @@ static int __init rxe_module_init(void) { int err; - /* initialize slab caches for managed objects */ - err = rxe_cache_init(); - if (err) { - pr_err("unable to init object pools\n"); - return err; - } - err = rxe_net_init(); if (err) return err; diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c index fbcbac52290b..06c6d1f835b7 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.c +++ b/drivers/infiniband/sw/rxe/rxe_pool.c @@ -139,6 +139,9 @@ int rxe_cache_init(void) for (i = 0; i < RXE_NUM_TYPES; i++) { type = &rxe_type_info[i]; size = ALIGN(type->size, RXE_POOL_ALIGN); + if (type->cache) + continue; + if (!(type->flags & RXE_POOL_NO_ALLOC)) { type->cache = kmem_cache_create(type->name, size, diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c index ccda5f5a3bc0..d0af48ba0110 100644 --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c @@ -81,6 +81,13 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) goto err; } + /* initialize slab caches for managed objects */ + err = rxe_cache_init(); + if (err) { + pr_err("unable to init object pools\n"); + goto err; + } + err = rxe_net_add("rxe%d", ndev); if (err) { pr_err("failed to add %s\n", intf);
To avoid the following kernel panic when calling kmem_cache_create() with a NULL pointer from pool_cache(), move the rxe_cache_init() to the context of device creation. BUG: unable to handle kernel NULL pointer dereference at 000000000000000b PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 Call Trace: rxe_alloc+0xc8/0x160 [rdma_rxe] rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] __ib_alloc_pd+0xcb/0x160 [ib_core] ib_mad_init_device+0x296/0x8b0 [ib_core] add_client_context+0x11a/0x160 [ib_core] enable_device_and_get+0xdc/0x1d0 [ib_core] ib_register_device+0x572/0x6b0 [ib_core] ? crypto_create_tfm+0x32/0xe0 ? crypto_create_tfm+0x7a/0xe0 ? crypto_alloc_tfm+0x58/0xf0 rxe_register_device+0x19d/0x1c0 [rdma_rxe] rxe_net_add+0x3d/0x70 [rdma_rxe] ? dev_get_by_name_rcu+0x73/0x90 rxe_param_set_add+0xaf/0xc0 [rdma_rxe] parse_args+0x179/0x370 ? ref_module+0x1b0/0x1b0 load_module+0x135e/0x17e0 ? ref_module+0x1b0/0x1b0 ? __do_sys_init_module+0x13b/0x180 __do_sys_init_module+0x13b/0x180 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca RIP: 0033:0x7f9137ed296e Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> --- drivers/infiniband/sw/rxe/rxe.c | 14 +++++++------- drivers/infiniband/sw/rxe/rxe_pool.c | 3 +++ drivers/infiniband/sw/rxe/rxe_sysfs.c | 7 +++++++ 3 files changed, 17 insertions(+), 7 deletions(-)