Message ID | 20200212072635.682689-9-leon@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | Fixes for v5.6 | expand |
On Wed, Feb 12, 2020 at 09:26:34AM +0200, Leon Romanovsky wrote: > From: Yonatan Cohen <yonatanc@mellanox.com> > > When unloading ib_umad, remove ibdev sys file 1st before > port removal to prevent kernel oops. > > ib_mad's method ibdev_show() might access a umad port > whoes ibdev field has already been NULLed when rmmod ib_umad > was issued from another shell. > > Consider this scenario > shell-1 shell-2 > rmmod ib_mod cat /sys/devices/../ibdev > | | > ib_umad_kill_port() ibdev_show() > port->ib_dev = NULL dev_name(port->ib_dev) > > kernel stack > PF: error_code(0x0000) - not-present page > Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI > RIP: 0010:ibdev_show+0x18/0x50 [ib_umad] > RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282 > RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000 > RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870 > RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000 > R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40 > R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58 > FS: 00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0 > Call Trace: > dev_attr_show+0x15/0x50 > sysfs_kf_seq_show+0xb8/0x1a0 > seq_read+0x12d/0x350 > vfs_read+0x89/0x140 > ksys_read+0x55/0xd0 > do_syscall_64+0x55/0x1b0 > entry_SYSCALL_64_after_hwframe+0x44/0xa9: > > Fixes: e9dd5daf884c ("IB/umad: Refactor code to use cdev_device_add()") This is the wrong fixes line, this ordering change was actually deliberately done: commit cf7ad3030271c55a7119a8c2162563e3f6e93879 Author: Parav Pandit <parav@mellanox.com> Date: Fri Dec 21 16:19:24 2018 +0200 IB/umad: Avoid destroying device while it is accessed ib_umad_reg_agent2() and ib_umad_reg_agent() access the device name in dev_notice(), while concurrently, ib_umad_kill_port() can destroy the device using device_destroy(). cpu-0 cpu-1 ----- ----- ib_umad_ioctl() [...] ib_umad_kill_port() device_destroy(dev) ib_umad_reg_agent() dev_notice(dev) The mistake in the above was to move the device_dstroy() down, not split it into device_del() above and put_device() below. Now that is already split we are OK. Jason
On Thu, Feb 13, 2020 at 10:28:18AM -0400, Jason Gunthorpe wrote: > On Wed, Feb 12, 2020 at 09:26:34AM +0200, Leon Romanovsky wrote: > > From: Yonatan Cohen <yonatanc@mellanox.com> > > > > When unloading ib_umad, remove ibdev sys file 1st before > > port removal to prevent kernel oops. > > > > ib_mad's method ibdev_show() might access a umad port > > whoes ibdev field has already been NULLed when rmmod ib_umad > > was issued from another shell. > > > > Consider this scenario > > shell-1 shell-2 > > rmmod ib_mod cat /sys/devices/../ibdev > > | | > > ib_umad_kill_port() ibdev_show() > > port->ib_dev = NULL dev_name(port->ib_dev) > > > > kernel stack > > PF: error_code(0x0000) - not-present page > > Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI > > RIP: 0010:ibdev_show+0x18/0x50 [ib_umad] > > RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282 > > RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000 > > RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870 > > RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000 > > R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40 > > R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58 > > FS: 00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0 > > Call Trace: > > dev_attr_show+0x15/0x50 > > sysfs_kf_seq_show+0xb8/0x1a0 > > seq_read+0x12d/0x350 > > vfs_read+0x89/0x140 > > ksys_read+0x55/0xd0 > > do_syscall_64+0x55/0x1b0 > > entry_SYSCALL_64_after_hwframe+0x44/0xa9: > > > > Fixes: e9dd5daf884c ("IB/umad: Refactor code to use cdev_device_add()") > > This is the wrong fixes line, this ordering change was actually > deliberately done: > Can you please fix the fixes line, so I will resend unaccepted patches only? Thanks
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index d1407fa378e8..1235ffb2389b 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -1312,6 +1312,9 @@ static void ib_umad_kill_port(struct ib_umad_port *port) struct ib_umad_file *file; int id; + cdev_device_del(&port->sm_cdev, &port->sm_dev); + cdev_device_del(&port->cdev, &port->dev); + mutex_lock(&port->file_mutex); /* Mark ib_dev NULL and block ioctl or other file ops to progress @@ -1331,8 +1334,6 @@ static void ib_umad_kill_port(struct ib_umad_port *port) mutex_unlock(&port->file_mutex); - cdev_device_del(&port->sm_cdev, &port->sm_dev); - cdev_device_del(&port->cdev, &port->dev); ida_free(&umad_ida, port->dev_num); /* balances device_initialize() */