Message ID | 20200825151725.254046-1-kamalheib1@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | [v4,for-rc] RDMA/rxe: Fix panic when calling kmem_cache_create() | expand |
On Tue, Aug 25, 2020 at 06:17:25PM +0300, Kamal Heib wrote: > To avoid the following kernel panic when calling kmem_cache_create() > with a NULL pointer from pool_cache(), Block the rxe_param_set_add() > from running if the rdma_rxe module is not initialized. > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b > PGD 0 P4D 0 > Oops: 0000 [#1] SMP NOPTI > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 > Call Trace: > rxe_alloc+0xc8/0x160 [rdma_rxe] > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] > __ib_alloc_pd+0xcb/0x160 [ib_core] > ib_mad_init_device+0x296/0x8b0 [ib_core] > add_client_context+0x11a/0x160 [ib_core] > enable_device_and_get+0xdc/0x1d0 [ib_core] > ib_register_device+0x572/0x6b0 [ib_core] > ? crypto_create_tfm+0x32/0xe0 > ? crypto_create_tfm+0x7a/0xe0 > ? crypto_alloc_tfm+0x58/0xf0 > rxe_register_device+0x19d/0x1c0 [rdma_rxe] > rxe_net_add+0x3d/0x70 [rdma_rxe] > ? dev_get_by_name_rcu+0x73/0x90 > rxe_param_set_add+0xaf/0xc0 [rdma_rxe] > parse_args+0x179/0x370 > ? ref_module+0x1b0/0x1b0 > load_module+0x135e/0x17e0 > ? ref_module+0x1b0/0x1b0 > ? __do_sys_init_module+0x13b/0x180 > __do_sys_init_module+0x13b/0x180 > do_syscall_64+0x5b/0x1a0 > entry_SYSCALL_64_after_hwframe+0x65/0xca > RIP: 0033:0x7f9137ed296e > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > Signed-off-by: Kamal Heib <kamalheib1@gmail.com> > --- > drivers/infiniband/sw/rxe/rxe.c | 4 ++++ > drivers/infiniband/sw/rxe/rxe.h | 2 ++ > drivers/infiniband/sw/rxe/rxe_sysfs.c | 5 +++++ > 3 files changed, 11 insertions(+) Can you send a PR to rdma-core to delete rxe_cfg as well? In preperation to remove the module parameters Applied to for-rc Thanks, Jason
On Thu, Aug 27, 2020 at 09:18:22AM -0300, Jason Gunthorpe wrote: > On Tue, Aug 25, 2020 at 06:17:25PM +0300, Kamal Heib wrote: > > To avoid the following kernel panic when calling kmem_cache_create() > > with a NULL pointer from pool_cache(), Block the rxe_param_set_add() > > from running if the rdma_rxe module is not initialized. > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b > > PGD 0 P4D 0 > > Oops: 0000 [#1] SMP NOPTI > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 > > Call Trace: > > rxe_alloc+0xc8/0x160 [rdma_rxe] > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] > > __ib_alloc_pd+0xcb/0x160 [ib_core] > > ib_mad_init_device+0x296/0x8b0 [ib_core] > > add_client_context+0x11a/0x160 [ib_core] > > enable_device_and_get+0xdc/0x1d0 [ib_core] > > ib_register_device+0x572/0x6b0 [ib_core] > > ? crypto_create_tfm+0x32/0xe0 > > ? crypto_create_tfm+0x7a/0xe0 > > ? crypto_alloc_tfm+0x58/0xf0 > > rxe_register_device+0x19d/0x1c0 [rdma_rxe] > > rxe_net_add+0x3d/0x70 [rdma_rxe] > > ? dev_get_by_name_rcu+0x73/0x90 > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe] > > parse_args+0x179/0x370 > > ? ref_module+0x1b0/0x1b0 > > load_module+0x135e/0x17e0 > > ? ref_module+0x1b0/0x1b0 > > ? __do_sys_init_module+0x13b/0x180 > > __do_sys_init_module+0x13b/0x180 > > do_syscall_64+0x5b/0x1a0 > > entry_SYSCALL_64_after_hwframe+0x65/0xca > > RIP: 0033:0x7f9137ed296e > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com> > > --- > > drivers/infiniband/sw/rxe/rxe.c | 4 ++++ > > drivers/infiniband/sw/rxe/rxe.h | 2 ++ > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 5 +++++ > > 3 files changed, 11 insertions(+) > > Can you send a PR to rdma-core to delete rxe_cfg as well? In > preperation to remove the module parameters > Someone already did that :-) commit 0d2ff0e1502ebc63346bc9ffd37deb3c4fd0dbc9 Author: Jason Gunthorpe <jgg@ziepe.ca> Date: Tue Jan 28 15:53:07 2020 -0400 rxe: Remove rxe_cfg This is obsoleted by iproute2's 'rdma link add' command. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Thanks, Kamal > Applied to for-rc > > Thanks, > Jason
On Thu, Aug 27, 2020 at 10:31 PM Kamal Heib <kamalheib1@gmail.com> wrote: > > On Thu, Aug 27, 2020 at 09:18:22AM -0300, Jason Gunthorpe wrote: > > On Tue, Aug 25, 2020 at 06:17:25PM +0300, Kamal Heib wrote: > > > To avoid the following kernel panic when calling kmem_cache_create() > > > with a NULL pointer from pool_cache(), Block the rxe_param_set_add() > > > from running if the rdma_rxe module is not initialized. > > > > > > BUG: unable to handle kernel NULL pointer dereference at 000000000000000b > > > PGD 0 P4D 0 > > > Oops: 0000 [#1] SMP NOPTI > > > CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 > > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 > > > RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 > > > Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 > > > RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 > > > RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 > > > RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 > > > R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 > > > R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 > > > FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 > > > Call Trace: > > > rxe_alloc+0xc8/0x160 [rdma_rxe] > > > rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] > > > __ib_alloc_pd+0xcb/0x160 [ib_core] > > > ib_mad_init_device+0x296/0x8b0 [ib_core] > > > add_client_context+0x11a/0x160 [ib_core] > > > enable_device_and_get+0xdc/0x1d0 [ib_core] > > > ib_register_device+0x572/0x6b0 [ib_core] > > > ? crypto_create_tfm+0x32/0xe0 > > > ? crypto_create_tfm+0x7a/0xe0 > > > ? crypto_alloc_tfm+0x58/0xf0 > > > rxe_register_device+0x19d/0x1c0 [rdma_rxe] > > > rxe_net_add+0x3d/0x70 [rdma_rxe] > > > ? dev_get_by_name_rcu+0x73/0x90 > > > rxe_param_set_add+0xaf/0xc0 [rdma_rxe] > > > parse_args+0x179/0x370 > > > ? ref_module+0x1b0/0x1b0 > > > load_module+0x135e/0x17e0 > > > ? ref_module+0x1b0/0x1b0 > > > ? __do_sys_init_module+0x13b/0x180 > > > __do_sys_init_module+0x13b/0x180 > > > do_syscall_64+0x5b/0x1a0 > > > entry_SYSCALL_64_after_hwframe+0x65/0xca > > > RIP: 0033:0x7f9137ed296e > > > > > > Fixes: 8700e3e7c485 ("Soft RoCE driver") > > > Signed-off-by: Kamal Heib <kamalheib1@gmail.com> > > > --- > > > drivers/infiniband/sw/rxe/rxe.c | 4 ++++ > > > drivers/infiniband/sw/rxe/rxe.h | 2 ++ > > > drivers/infiniband/sw/rxe/rxe_sysfs.c | 5 +++++ > > > 3 files changed, 11 insertions(+) > > > > Can you send a PR to rdma-core to delete rxe_cfg as well? In > > preperation to remove the module parameters > > > > Someone already did that :-) > > commit 0d2ff0e1502ebc63346bc9ffd37deb3c4fd0dbc9 > Author: Jason Gunthorpe <jgg@ziepe.ca> > Date: Tue Jan 28 15:53:07 2020 -0400 > > rxe: Remove rxe_cfg Now rdma link add is the only choice. > > This is obsoleted by iproute2's 'rdma link add' command. > > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> > > Thanks, > Kamal > > > Applied to for-rc > > > > Thanks, > > Jason
On Thu, Aug 27, 2020 at 05:29:55PM +0300, Kamal Heib wrote: > > Can you send a PR to rdma-core to delete rxe_cfg as well? In > > preperation to remove the module parameters > > > > Someone already did that :-) > > commit 0d2ff0e1502ebc63346bc9ffd37deb3c4fd0dbc9 > Author: Jason Gunthorpe <jgg@ziepe.ca> > Date: Tue Jan 28 15:53:07 2020 -0400 > > rxe: Remove rxe_cfg > > This is obsoleted by iproute2's 'rdma link add' command. > > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Oh! Lets drop the kernel side of this in Jan 2021 then? Jason
On 2020-08-27 07:54, Jason Gunthorpe wrote: > On Thu, Aug 27, 2020 at 05:29:55PM +0300, Kamal Heib wrote: >>> Can you send a PR to rdma-core to delete rxe_cfg as well? In >>> preperation to remove the module parameters >>> >> >> Someone already did that :-) >> >> commit 0d2ff0e1502ebc63346bc9ffd37deb3c4fd0dbc9 >> Author: Jason Gunthorpe <jgg@ziepe.ca> >> Date: Tue Jan 28 15:53:07 2020 -0400 >> >> rxe: Remove rxe_cfg >> >> This is obsoleted by iproute2's 'rdma link add' command. >> >> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> > > Oh! Lets drop the kernel side of this in Jan 2021 then? I think the person who wants to remove the kernel side of this is responsible for modifying blktests such that blktests does not break. From the blktests source code: modprobe rdma_rxe || return $? ( cd /sys/class/net && for i in *; do if [ -e "$i" ] && ! has_rdma_rxe "$i"; then echo "$i" > /sys/module/rdma_rxe/parameters/add || echo "Failed to bind the rdma_rxe driver to $i" fi done ) Thanks, Bart.
On Thu, Aug 27, 2020 at 09:22:56AM -0700, Bart Van Assche wrote: > On 2020-08-27 07:54, Jason Gunthorpe wrote: > > On Thu, Aug 27, 2020 at 05:29:55PM +0300, Kamal Heib wrote: > >>> Can you send a PR to rdma-core to delete rxe_cfg as well? In > >>> preperation to remove the module parameters > >>> > >> > >> Someone already did that :-) > >> > >> commit 0d2ff0e1502ebc63346bc9ffd37deb3c4fd0dbc9 > >> Author: Jason Gunthorpe <jgg@ziepe.ca> > >> Date: Tue Jan 28 15:53:07 2020 -0400 > >> > >> rxe: Remove rxe_cfg > >> > >> This is obsoleted by iproute2's 'rdma link add' command. > >> > >> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> > > > > Oh! Lets drop the kernel side of this in Jan 2021 then? > > I think the person who wants to remove the kernel side of this is responsible > for modifying blktests such that blktests does not break. From the blktests > source code: Just replace the whole thing with rdma link add - it does module autoloading and everything. Jason
On Thu, Aug 27, 2020 at 11:54:50AM -0300, Jason Gunthorpe wrote: > On Thu, Aug 27, 2020 at 05:29:55PM +0300, Kamal Heib wrote: > > > Can you send a PR to rdma-core to delete rxe_cfg as well? In > > > preperation to remove the module parameters > > > > > > > Someone already did that :-) > > > > commit 0d2ff0e1502ebc63346bc9ffd37deb3c4fd0dbc9 > > Author: Jason Gunthorpe <jgg@ziepe.ca> > > Date: Tue Jan 28 15:53:07 2020 -0400 > > > > rxe: Remove rxe_cfg > > > > This is obsoleted by iproute2's 'rdma link add' command. > > > > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> > > Oh! Lets drop the kernel side of this in Jan 2021 then? > > Jason Works for me. Thanks, Kamal
diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 907203afbd99..77f2c7cd1216 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -40,6 +40,8 @@ MODULE_AUTHOR("Bob Pearson, Frank Zago, John Groves, Kamal Heib"); MODULE_DESCRIPTION("Soft RDMA transport"); MODULE_LICENSE("Dual BSD/GPL"); +bool rxe_initialized; + /* free resources for a rxe device all objects created for this device must * have been destroyed */ @@ -315,6 +317,7 @@ static int __init rxe_module_init(void) return err; rdma_link_register(&rxe_link_ops); + rxe_initialized = true; pr_info("loaded\n"); return 0; } @@ -326,6 +329,7 @@ static void __exit rxe_module_exit(void) rxe_net_exit(); rxe_cache_exit(); + rxe_initialized = false; pr_info("unloaded\n"); } diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h index fb07eed9e402..cae1b0a24c85 100644 --- a/drivers/infiniband/sw/rxe/rxe.h +++ b/drivers/infiniband/sw/rxe/rxe.h @@ -67,6 +67,8 @@ #define RXE_ROCE_V2_SPORT (0xc000) +extern bool rxe_initialized; + static inline u32 rxe_crc32(struct rxe_dev *rxe, u32 crc, void *next, size_t len) { diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c index ccda5f5a3bc0..2af31d421bfc 100644 --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c @@ -61,6 +61,11 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) struct net_device *ndev; struct rxe_dev *exists; + if (!rxe_initialized) { + pr_err("Module parameters are not supported, use rdma link add or rxe_cfg\n"); + return -EAGAIN; + } + len = sanitize_arg(val, intf, sizeof(intf)); if (!len) { pr_err("add: invalid interface name\n");
To avoid the following kernel panic when calling kmem_cache_create() with a NULL pointer from pool_cache(), Block the rxe_param_set_add() from running if the rdma_rxe module is not initialized. BUG: unable to handle kernel NULL pointer dereference at 000000000000000b PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 Call Trace: rxe_alloc+0xc8/0x160 [rdma_rxe] rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] __ib_alloc_pd+0xcb/0x160 [ib_core] ib_mad_init_device+0x296/0x8b0 [ib_core] add_client_context+0x11a/0x160 [ib_core] enable_device_and_get+0xdc/0x1d0 [ib_core] ib_register_device+0x572/0x6b0 [ib_core] ? crypto_create_tfm+0x32/0xe0 ? crypto_create_tfm+0x7a/0xe0 ? crypto_alloc_tfm+0x58/0xf0 rxe_register_device+0x19d/0x1c0 [rdma_rxe] rxe_net_add+0x3d/0x70 [rdma_rxe] ? dev_get_by_name_rcu+0x73/0x90 rxe_param_set_add+0xaf/0xc0 [rdma_rxe] parse_args+0x179/0x370 ? ref_module+0x1b0/0x1b0 load_module+0x135e/0x17e0 ? ref_module+0x1b0/0x1b0 ? __do_sys_init_module+0x13b/0x180 __do_sys_init_module+0x13b/0x180 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca RIP: 0033:0x7f9137ed296e Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> --- drivers/infiniband/sw/rxe/rxe.c | 4 ++++ drivers/infiniband/sw/rxe/rxe.h | 2 ++ drivers/infiniband/sw/rxe/rxe_sysfs.c | 5 +++++ 3 files changed, 11 insertions(+)