diff mbox

[rdma-rc] RDMA/rdma_cm: Fix use after free race with process_one_req

Message ID VI1PR0502MB3008BCA1BEF667DD55450CF2D1A90@VI1PR0502MB3008.eurprd05.prod.outlook.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Parav Pandit March 22, 2018, 8:59 p.m. UTC
Hi Jason,

> -----Original Message-----
> From: Jason Gunthorpe
> Sent: Thursday, March 22, 2018 3:04 PM
> To: linux-rdma@vger.kernel.org; Leon Romanovsky <leonro@mellanox.com>;
> Parav Pandit <parav@mellanox.com>; Mark Bloch <markb@mellanox.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>; syzbot
> <syzbot+3b4acab09b6463472d0a@syzkaller.appspotmail.com>; Daniel Jurgens
> <danielj@mellanox.com>; dledford@redhat.com; Johannes Berg
> <johannes.berg@intel.com>; syzkaller-bugs@googlegroups.com
> Subject: [PATCH rdma-rc] RDMA/rdma_cm: Fix use after free race with
> process_one_req
> 
> process_one_req() can race with rdma_addr_cancel():
> 
>            CPU0                                 CPU1
>            ====                                 ====
>  process_one_work()
>   debug_work_deactivate(work);
>   process_one_req()
>                                         rdma_addr_cancel()
> 	                                  mutex_lock(&lock);
>  			    	           set_timeout(&req->work,..);
>                                               __queue_work()
> 				   	       debug_work_activate(work);
> 	                                  mutex_unlock(&lock);
> 
>    mutex_lock(&lock);
> [..]
> 	list_del(&req->list);
>    mutex_unlock(&lock);
> [..]
> 
>    // ODEBUG explodes since the work is still queued.
>    kfree(req);
> 
> Causing ODEBUG to detect the use after free:
> 
> ODEBUG: free active (active state 0) object type: work_struct hint:
> process_one_req+0x0/0x6c0 include/net/dst.h:165
> WARNING: CPU: 0 PID: 79 at lib/debugobjects.c:291
> debug_print_object+0x166/0x220 lib/debugobjects.c:288
> kvm: emulating exchange as write
> Kernel panic - not syncing: panic_on_warn set ...
> 
> CPU: 0 PID: 79 Comm: kworker/u4:3 Not tainted 4.16.0-rc6+ #361 Hardware
> name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> 01/01/2011
> Workqueue: ib_addr process_one_req
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]  dump_stack+0x194/0x24d
> lib/dump_stack.c:53  panic+0x1e4/0x41c kernel/panic.c:183
>  __warn+0x1dc/0x200 kernel/panic.c:547
>  report_bug+0x1f4/0x2b0 lib/bug.c:186
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178  fixup_bug
> arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
> RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288
> RSP: 0000:ffff8801d966f210 EFLAGS: 00010086
> RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff815acd6e
> RDX: 0000000000000000 RSI: 1ffff1003b2cddf2 RDI: 0000000000000000
> RBP: ffff8801d966f250 R08: 0000000000000000 R09: 1ffff1003b2cddc8
> R10: ffffed003b2cde71 R11: ffffffff86f39a98 R12: 0000000000000001
> R13: ffffffff86f15540 R14: ffffffff86408700 R15: ffffffff8147c0a0
> __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
> debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
>  kfree+0xc7/0x260 mm/slab.c:3799
>  process_one_req+0x2e7/0x6c0 drivers/infiniband/core/addr.c:592
>  process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2247
>  kthread+0x33c/0x400 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> 
> Fixes: 5fff41e1f89d ("IB/core: Fix race condition in resolving IP to MAC")
> Reported-by: syzbot+3b4acab09b6463472d0a@syzkaller.appspotmail.com
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> ---
>  drivers/infiniband/core/addr.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> Leon, I took a look at this last bug you noted so we can get cleaned up for the
> next kernel release.
> 
> I didn't repo it, but I did confirm the C repo is calling rdma_addr_cancel, so I
> think this is very likely to be the bug..
> 
> Parav/Mark: Does this make sense?
>
Thanks for looking into it.
Though I didn't ack on the ML, I had debugged it similar report few days back, fixed and verified it differently by rdma_addr_cancel() doing cancel_delayed_work_sync() and completing the request after that.
And if already cancelled than process_one_req to skip processing it.
I wasn't sure if work item itself can cancel itself reliably while it is running. Do you know?
cancel_delayed_work_sync() appears more robust to me following similar other users.
This also avoid canceling it on every execution in process_one_req().

Mark has also reviewed it and it is already in Leon's queue too.

Assuming cancel_delayed_work can cancel itself based on kdoc comment that it can be called from any context, patch you posted should work too.
Minor modified hunk to reuse code in process_one_req() and process_req() on top of your patch below.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jason Gunthorpe March 22, 2018, 9:21 p.m. UTC | #1
On Thu, Mar 22, 2018 at 02:59:24PM -0600, Parav Pandit wrote:

> Thanks for looking into it.
> Though I didn't ack on the ML, I had debugged it similar report few
> days back, fixed and verified it differently by rdma_addr_cancel()
> doing cancel_delayed_work_sync() and completing the request after
> that.

> And if already cancelled than process_one_req to skip processing it.
> I wasn't sure if work item itself can cancel itself reliably while
> it is running. Do you know?

Yes, the work pointer can be free'd within the worker, the workqueue
code does not touch the pointer after the function runs to allow for
this.

> cancel_delayed_work_sync() appears more robust to me following similar other users.
> This also avoid canceling it on every execution in process_one_req().

I like yours better because it ensures the callback will not be called
after rdma_addr_cancel() which is overall much saner behavior, and I
think possibly another bug?

But let just do that with flush_workqueue() ?

> Assuming cancel_delayed_work can cancel itself based on kdoc comment
> that it can be called from any context, patch you posted should work
> too.  Minor modified hunk to reuse code in process_one_req() and
> process_req() on top of your patch below.

Yes.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Parav Pandit March 22, 2018, 9:30 p.m. UTC | #2
> -----Original Message-----
> From: Jason Gunthorpe
> Sent: Thursday, March 22, 2018 4:22 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: linux-rdma@vger.kernel.org; Leon Romanovsky <leonro@mellanox.com>;
> Mark Bloch <markb@mellanox.com>; Dmitry Vyukov <dvyukov@google.com>;
> syzbot <syzbot+3b4acab09b6463472d0a@syzkaller.appspotmail.com>; Daniel
> Jurgens <danielj@mellanox.com>; dledford@redhat.com; Johannes Berg
> <johannes.berg@intel.com>; syzkaller-bugs@googlegroups.com
> Subject: Re: [PATCH rdma-rc] RDMA/rdma_cm: Fix use after free race with
> process_one_req
> 
> On Thu, Mar 22, 2018 at 02:59:24PM -0600, Parav Pandit wrote:
> 
> > Thanks for looking into it.
> > Though I didn't ack on the ML, I had debugged it similar report few
> > days back, fixed and verified it differently by rdma_addr_cancel()
> > doing cancel_delayed_work_sync() and completing the request after
> > that.
> 
> > And if already cancelled than process_one_req to skip processing it.
> > I wasn't sure if work item itself can cancel itself reliably while it
> > is running. Do you know?
> 
> Yes, the work pointer can be free'd within the worker, the workqueue code does
> not touch the pointer after the function runs to allow for this.
> 
> > cancel_delayed_work_sync() appears more robust to me following similar
> other users.
> > This also avoid canceling it on every execution in process_one_req().
> 
> I like yours better because it ensures the callback will not be called after
> rdma_addr_cancel() which is overall much saner behavior, and I think possibly
> another bug?
> 
> But let just do that with flush_workqueue() ?
> 
flush_workqueue() will force to execute all the work items for all pending entries, all must have to completed.
Those pending delayed entries are unrelated to this work item/request in progress, and if they are large number of entries having 1 sec timeout, flush_workqueue() might take long.
So one rdma_destroy_id will wait for other requests to be completed, which I think we should avoid.


> > Assuming cancel_delayed_work can cancel itself based on kdoc comment
> > that it can be called from any context, patch you posted should work
> > too.  Minor modified hunk to reuse code in process_one_req() and
> > process_req() on top of your patch below.
> 
> Yes.
> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe March 22, 2018, 10:30 p.m. UTC | #3
On Thu, Mar 22, 2018 at 09:30:37PM +0000, Parav Pandit wrote:

> flush_workqueue() will force to execute all the work items for all
> pending entries, all must have to completed.  Those pending delayed
> entries are unrelated to this work item/request in progress, and if
> they are large number of entries having 1 sec timeout,
> flush_workqueue() might take long.

OK then

> So one rdma_destroy_id will wait for other requests to be completed,
> which I think we should avoid.

I looked at this for a bit.. I really don't like how this code works.

The idea that rdma_destroy_id() doesn't fence the callback is a bad
design, but doesn't apparently cause any bug I can see.

I also can't understand why the rdma_addr_client nonsense should
exist, it seems to be rolled into the idea that cancel doesn't
actually cancel. :(

So lets just use the one line patch and save the rest for some other
day..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Parav Pandit March 23, 2018, 3:16 a.m. UTC | #4
> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@ziepe.ca]
> Sent: Thursday, March 22, 2018 5:31 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: linux-rdma@vger.kernel.org; Leon Romanovsky <leonro@mellanox.com>;
> Mark Bloch <markb@mellanox.com>; Dmitry Vyukov <dvyukov@google.com>;
> syzbot <syzbot+3b4acab09b6463472d0a@syzkaller.appspotmail.com>; Daniel
> Jurgens <danielj@mellanox.com>; dledford@redhat.com; Johannes Berg
> <johannes.berg@intel.com>; syzkaller-bugs@googlegroups.com
> Subject: Re: [PATCH rdma-rc] RDMA/rdma_cm: Fix use after free race with
> process_one_req
> 
> On Thu, Mar 22, 2018 at 09:30:37PM +0000, Parav Pandit wrote:
> 
> > flush_workqueue() will force to execute all the work items for all
> > pending entries, all must have to completed.  Those pending delayed
> > entries are unrelated to this work item/request in progress, and if
> > they are large number of entries having 1 sec timeout,
> > flush_workqueue() might take long.
> 
> OK then
> 
> > So one rdma_destroy_id will wait for other requests to be completed,
> > which I think we should avoid.
> 
> I looked at this for a bit.. I really don't like how this code works.
> 
> The idea that rdma_destroy_id() doesn't fence the callback is a bad design, but
> doesn't apparently cause any bug I can see.
> 
> I also can't understand why the rdma_addr_client nonsense should exist, it
> seems to be rolled into the idea that cancel doesn't actually cancel. :(
> 
> So lets just use the one line patch and save the rest for some other day..
> 
Ok. A helper function in the hunk is preferred as the code is same in both the functions.
I will test it on Friday once.

> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe March 23, 2018, 4:02 a.m. UTC | #5
On Fri, Mar 23, 2018 at 03:16:20AM +0000, Parav Pandit wrote:
> 
> 
> > From: Jason Gunthorpe [mailto:jgg@ziepe.ca]
> > Sent: Thursday, March 22, 2018 5:31 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: linux-rdma@vger.kernel.org; Leon Romanovsky <leonro@mellanox.com>;
> > Mark Bloch <markb@mellanox.com>; Dmitry Vyukov <dvyukov@google.com>;
> > syzbot <syzbot+3b4acab09b6463472d0a@syzkaller.appspotmail.com>; Daniel
> > Jurgens <danielj@mellanox.com>; dledford@redhat.com; Johannes Berg
> > <johannes.berg@intel.com>; syzkaller-bugs@googlegroups.com
> > Subject: Re: [PATCH rdma-rc] RDMA/rdma_cm: Fix use after free race with
> > process_one_req
> > 
> > On Thu, Mar 22, 2018 at 09:30:37PM +0000, Parav Pandit wrote:
> > 
> > > flush_workqueue() will force to execute all the work items for all
> > > pending entries, all must have to completed.  Those pending delayed
> > > entries are unrelated to this work item/request in progress, and if
> > > they are large number of entries having 1 sec timeout,
> > > flush_workqueue() might take long.
> > 
> > OK then
> > 
> > > So one rdma_destroy_id will wait for other requests to be completed,
> > > which I think we should avoid.
> > 
> > I looked at this for a bit.. I really don't like how this code works.
> > 
> > The idea that rdma_destroy_id() doesn't fence the callback is a bad design, but
> > doesn't apparently cause any bug I can see.
> > 
> > I also can't understand why the rdma_addr_client nonsense should exist, it
> > seems to be rolled into the idea that cancel doesn't actually cancel. :(
> > 
> > So lets just use the one line patch and save the rest for some other day..
>
> Ok. A helper function in the hunk is preferred as the code is same
> in both the functions.  I will test it on Friday once.

Now that I've looked at this closely enough to find the syzkaller bug,
my preference is to tidy up the whole thing:

https://github.com/jgunthorpe/linux/commits/cma-fix

- Remove duplication, only one function processes work. Get rid
  of hold over list sorting, use the workqueue infrastructure itself
- Make rdma_addr_cancel() a sane and safe API by having it fence
- Get rid of the totally useless struct rdma_add_client and related

Which is a pretty sweet little cleanup:

 drivers/infiniband/core/addr.c | 132 +++++++++++++++++++++++++++++++------------------------------------------------------------------
 drivers/infiniband/core/cma.c  |   6 +----
 include/rdma/ib_addr.h         |  20 +--------------
 3 files changed, 44 insertions(+), 114 deletions(-)

What do you think Parav?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Parav Pandit March 23, 2018, 4:56 a.m. UTC | #6
Hi Jason,

> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Jason Gunthorpe
> Sent: Thursday, March 22, 2018 11:03 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: linux-rdma@vger.kernel.org; Leon Romanovsky <leonro@mellanox.com>;
> Mark Bloch <markb@mellanox.com>; Dmitry Vyukov <dvyukov@google.com>;
> syzbot <syzbot+3b4acab09b6463472d0a@syzkaller.appspotmail.com>; Daniel
> Jurgens <danielj@mellanox.com>; dledford@redhat.com; Johannes Berg
> <johannes.berg@intel.com>; syzkaller-bugs@googlegroups.com
> Subject: Re: [PATCH rdma-rc] RDMA/rdma_cm: Fix use after free race with
> process_one_req
> 
> On Fri, Mar 23, 2018 at 03:16:20AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Jason Gunthorpe [mailto:jgg@ziepe.ca]
> > > Sent: Thursday, March 22, 2018 5:31 PM
> > > To: Parav Pandit <parav@mellanox.com>
> > > Cc: linux-rdma@vger.kernel.org; Leon Romanovsky
> > > <leonro@mellanox.com>; Mark Bloch <markb@mellanox.com>; Dmitry
> > > Vyukov <dvyukov@google.com>; syzbot
> > > <syzbot+3b4acab09b6463472d0a@syzkaller.appspotmail.com>; Daniel
> > > Jurgens <danielj@mellanox.com>; dledford@redhat.com; Johannes Berg
> > > <johannes.berg@intel.com>; syzkaller-bugs@googlegroups.com
> > > Subject: Re: [PATCH rdma-rc] RDMA/rdma_cm: Fix use after free race
> > > with process_one_req
> > >
> > > On Thu, Mar 22, 2018 at 09:30:37PM +0000, Parav Pandit wrote:
> > >
> > > > flush_workqueue() will force to execute all the work items for all
> > > > pending entries, all must have to completed.  Those pending
> > > > delayed entries are unrelated to this work item/request in
> > > > progress, and if they are large number of entries having 1 sec
> > > > timeout,
> > > > flush_workqueue() might take long.
> > >
> > > OK then
> > >
> > > > So one rdma_destroy_id will wait for other requests to be
> > > > completed, which I think we should avoid.
> > >
> > > I looked at this for a bit.. I really don't like how this code works.
> > >
> > > The idea that rdma_destroy_id() doesn't fence the callback is a bad
> > > design, but doesn't apparently cause any bug I can see.
> > >
> > > I also can't understand why the rdma_addr_client nonsense should
> > > exist, it seems to be rolled into the idea that cancel doesn't
> > > actually cancel. :(
> > >
> > > So lets just use the one line patch and save the rest for some other day..
> >
> > Ok. A helper function in the hunk is preferred as the code is same in
> > both the functions.  I will test it on Friday once.
> 
> Now that I've looked at this closely enough to find the syzkaller bug, my
> preference is to tidy up the whole thing:
> 
> https://github.com/jgunthorpe/linux/commits/cma-fix
> 
> - Remove duplication, only one function processes work. Get rid
>   of hold over list sorting, use the workqueue infrastructure itself
> - Make rdma_addr_cancel() a sane and safe API by having it fence
> - Get rid of the totally useless struct rdma_add_client and related
> 
> Which is a pretty sweet little cleanup:
> 
>  drivers/infiniband/core/addr.c | 132 +++++++++++++++++++++++++++++++------
> ------------------------------------------------------------
>  drivers/infiniband/core/cma.c  |   6 +----
>  include/rdma/ib_addr.h         |  20 +--------------
>  3 files changed, 44 insertions(+), 114 deletions(-)
> 
> What do you think Parav?
> 
I added comments inline in the github.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe March 23, 2018, 5:01 p.m. UTC | #7
On Fri, Mar 23, 2018 at 04:56:57AM +0000, Parav Pandit wrote:

> > What do you think Parav?
> > 
> I added comments inline in the github.

Yep, thanks, that spinlock thing is good.

I updated it again, if you like it, then I can send it through
Mellanox QA next week, no rush.

https://github.com/jgunthorpe/linux/commits/cma-fix

But I will apply the one liner for syzkaller, right?

https://patchwork.kernel.org/patch/10302205/


Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index b0a52c9..0679f4f 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -561,6 +561,23 @@  static int addr_resolve(struct sockaddr *src_in,
 	return ret;
 }
 
+static void complete_addr_req(struct addr_req *req)
+{
+	/*
+	 * Although the work will normally have been canceled by the
+	 * workqueue, it can still be requeued as long as it is on the
+	 * req_list by rdma_addr_cancel(), so it could have been requeued
+	 * before we grabbed &lock.
+	 * We need to cancel it after it is removed from req_list to really be
+	 * sure it is safe to free.
+	 */
+	cancel_delayed_work(&req->work);
+	req->callback(req->status, (struct sockaddr *) &req->src_addr,
+		req->addr, req->context);
+	put_client(req->client);
+	kfree(req);
+}
+
 static void process_one_req(struct work_struct *_work)
 {
 	struct addr_req *req;
@@ -586,10 +603,7 @@  static void process_one_req(struct work_struct *_work)
 	list_del(&req->list);
 	mutex_unlock(&lock);
 
-	req->callback(req->status, (struct sockaddr *)&req->src_addr,
-		req->addr, req->context);
-	put_client(req->client);
-	kfree(req);
+	complete_addr_req(req);
 }
 
 static void process_req(struct work_struct *work)
@@ -621,15 +635,7 @@  static void process_req(struct work_struct *work)
 
 	list_for_each_entry_safe(req, temp_req, &done_list, list) {
 		list_del(&req->list);
-		/* It is safe to cancel other work items from this work item
-		 * because at a time there can be only one work item running
-		 * with this single threaded work queue.
-		 */
-		cancel_delayed_work(&req->work);
-		req->callback(req->status, (struct sockaddr *) &req->src_addr,
-			req->addr, req->context);
-		put_client(req->client);
-		kfree(req);
+		complete_addr_req(req);
 	}
 }