diff mbox

RDMA/cma: Avoid using invalid state during rdma_bind_addr()

Message ID 20180515213352.29848-1-roland@kernel.org (mailing list archive)
State Changes Requested
Delegated to: Jason Gunthorpe
Headers show

Commit Message

Roland Dreier May 15, 2018, 9:33 p.m. UTC
From: Roland Dreier <roland@purestorage.com>

There is a race (which userspace can trigger through ucma) that leads to
use-after free in cma:

  rdma_bind_addr(id, bogus address)

    cma_comp_exch(RDMA_CM_IDLE,
                  RDMA_CM_ADDR_BOUND) [succeed]
    copy address into id_priv struct
    fail to find device for address

                                          rdma_listen(id, any address);
                                            id_priv state isn't RDMA_CM_IDLE
					    cma_comp_exch(RDMA_CM_ADDR_BOUND,
					                  RDMA_CM_LISTEN) [succeed]
				            id->device not set, call
                                            cma_listen_on_all()
					      add id_priv->list to listen_any_list
					    return success

    cma_comp_exch(RDMA_CM_ADDR_BOUND,
                  RDMA_CM_IDLE) [fail]
    return failure

Now, when the id is destroyed, cma_release_dev() won't be called because
id_priv->cma_dev isn't set.  And cma_cancel_operation() won't call
cma_cancel_listens() because cma_src_addr(id_priv) is the bogus address
passed into rdma_bind_addr().  So neither of the paths that does

    list_del(&id_priv->list);

will be followed, but the code will go ahead and kfree(id_priv) even
though it is still linked into the listen_any_list.  So we end up with
use-after-free when listen_any_list is traversed.

We can close this race by having rdma_bind_addr() put the CM ID into an
intermediate "binding" state during the time that we are modifying the
state but don't know whether the bind operation will succeed yet.

Reported-and-tested-by: syzbot+db1c219466daac1083df@syzkaller.appspotmail.com
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
---
 drivers/infiniband/core/cma.c      | 6 ++++--
 drivers/infiniband/core/cma_priv.h | 1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

Comments

Jason Gunthorpe May 15, 2018, 9:53 p.m. UTC | #1
On Tue, May 15, 2018 at 02:33:52PM -0700, Roland Dreier wrote:
> From: Roland Dreier <roland@purestorage.com>
> 
> There is a race (which userspace can trigger through ucma) that leads to
> use-after free in cma:
> 
>   rdma_bind_addr(id, bogus address)
> 
>     cma_comp_exch(RDMA_CM_IDLE,
>                   RDMA_CM_ADDR_BOUND) [succeed]
>     copy address into id_priv struct
>     fail to find device for address
> 
>                                           rdma_listen(id, any address);
>                                             id_priv state isn't RDMA_CM_IDLE
> 					    cma_comp_exch(RDMA_CM_ADDR_BOUND,
> 					                  RDMA_CM_LISTEN) [succeed]
> 				            id->device not set, call
>                                             cma_listen_on_all()
> 					      add id_priv->list to listen_any_list
> 					    return success
> 
>     cma_comp_exch(RDMA_CM_ADDR_BOUND,
>                   RDMA_CM_IDLE) [fail]
>     return failure
> 
> Now, when the id is destroyed, cma_release_dev() won't be called because
> id_priv->cma_dev isn't set.  And cma_cancel_operation() won't call
> cma_cancel_listens() because cma_src_addr(id_priv) is the bogus address
> passed into rdma_bind_addr().  So neither of the paths that does
> 
>     list_del(&id_priv->list);
> 
> will be followed, but the code will go ahead and kfree(id_priv) even
> though it is still linked into the listen_any_list.  So we end up with
> use-after-free when listen_any_list is traversed.
> 
> We can close this race by having rdma_bind_addr() put the CM ID into an
> intermediate "binding" state during the time that we are modifying the
> state but don't know whether the bind operation will succeed yet.
> 
> Reported-and-tested-by: syzbot+db1c219466daac1083df@syzkaller.appspotmail.com
> Reported-by: Eric Biggers <ebiggers3@gmail.com>
> Signed-off-by: Roland Dreier <roland@purestorage.com>
> ---
>  drivers/infiniband/core/cma.c      | 6 ++++--
>  drivers/infiniband/core/cma_priv.h | 1 +
>  2 files changed, 5 insertions(+), 2 deletions(-)

This is basically what I was thinking as well..

My fear is that we have lots and lots of places where the
cma_comp_exch "lock" isn't right in this same sort of way, and
everything has to be audited..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roland Dreier May 15, 2018, 11:41 p.m. UTC | #2
> This is basically what I was thinking as well..
>
> My fear is that we have lots and lots of places where the
> cma_comp_exch "lock" isn't right in this same sort of way, and
> everything has to be audited..

OK, glad you like the approach.

Yeah, I think at least any function that has a cma_comp_exch() to
unwind things in the error return path is suspect.  For example, can
rdma_connect() race against rdma_set_ib_path()?  I can take a more
comprehensive look tomorrow.

What do you think about applying this patch to close this syzbot bug?
Would you rather wait for a bigger patch?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe May 16, 2018, 2:51 a.m. UTC | #3
On Tue, May 15, 2018 at 04:41:00PM -0700, Roland Dreier wrote:
> > This is basically what I was thinking as well..
> >
> > My fear is that we have lots and lots of places where the
> > cma_comp_exch "lock" isn't right in this same sort of way, and
> > everything has to be audited..
> 
> OK, glad you like the approach.
> 
> Yeah, I think at least any function that has a cma_comp_exch() to
> unwind things in the error return path is suspect.  For example, can
> rdma_connect() race against rdma_set_ib_path()?  I can take a more
> comprehensive look tomorrow.

The thing that gives me pause, is that I don't have a clear picture in
my head how this this entire comp_exch based 'locking' scheme is even
supposed to work. I just don't get it. It seems completely broken.

Am I missing something? Did it start out OK and just get broken over
the last decade?

The answer matters, because if adding "ING"s states is the right
design overall then, OK, but if the answer is this entire thing needs
to go away and be replaced with a lock, then lets just do that
instead.

For instance, look at this..

int rdma_set_ib_path(struct rdma_cm_id *id,
		     struct sa_path_rec *path_rec)
{
	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED,
			   RDMA_CM_ROUTE_RESOLVED))
		return -EINVAL;

	id->route.path_rec = kmemdup(path_rec, sizeof(*path_rec),
				     GFP_KERNEL);
[..]
err_free:
	kfree(id->route.path_rec);
	id->route.path_rec = NULL;
err:
	cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_ADDR_RESOLVED);

So this has the same error problem, but even on the success it is racy
nightmare.

eg I race the above between comp_exch and kmemdup with this:

int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
{
	if (!cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_CONNECT))
		return -EINVAL;

And then link with this:

static int cma_sidr_rep_handler(struct ib_cm_id *cm_id,
				struct ib_cm_event *ib_event)
{
	mutex_lock(&id_priv->handler_mutex);
	if (id_priv->state != RDMA_CM_CONNECT)
		goto out;
[..]
		ib_init_ah_attr_from_path(id_priv->id.device,
					  id_priv->id.port_num,
					  id_priv->id.route.path_rec,
					  &event.param.ud.ah_attr);

And go boom with a null pointer deref without requiring the error
path, or I use the error path and get a more reliable NULL.

Is there some subtle reason why these three calls can't happen
concurrently? It was easy to find this combination, so I be there are
lots more even if this particular avenue is blocked off.

Basically, for what I can tell, the entire thing is total crazy - I
don't think we can stomp syzkaller bugs one at a time and ever get
anywhere when the entire FSM idea seems deeply flawed.

This is why I didn't make a patch yet :)

So, what is the solution here? How should this actually look? Is there
some sensible design pattern with a multi-threaded comp_exch based FSM
that I am not aware of?

The most familiar answer is to replace comp_exch with a proper mutex
across the entire state entry action and exit and forget about adding *INGS.

Perhaps we could do that without too much invasion by doing something
like this:

   if (!cma_state_check_enter(id_priv, RDMA_CM_IDLE, RDMA_CM_ADDR_BOUND))
[..]
   cma_state_success(id_priv);
err:
   cma_state_rollback(id_priv);

At least there is some kind of sane and auditable pattern for building
this FSM.. I think I prefer this to adding a bunch of "ING"s..

The semantic would be that no cma_state_check_enter can succeed during
the period between enter and success/rollback. Then every single
function that does cma_exch gets audited and gains a success or
rollback on its exit path, forming a good try_lock around the entire
state action block.

At the very least this guarentees that the FSM actions and arcs are
sane for concurrency - that just leaves auditing the non-arc methods..

But, again, I don't know why it is like this, and I fear it will just
cause other runtime bugs, as we now cause comp_exch to fail while any
action is running, which opens a whole other avenue of auditing :(

Saen? Parav? Any light on this?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky May 16, 2018, 6:29 a.m. UTC | #4
On Tue, May 15, 2018 at 08:51:15PM -0600, Jason Gunthorpe wrote:
> On Tue, May 15, 2018 at 04:41:00PM -0700, Roland Dreier wrote:
> > > This is basically what I was thinking as well..
> > >
> > > My fear is that we have lots and lots of places where the
> > > cma_comp_exch "lock" isn't right in this same sort of way, and
> > > everything has to be audited..
> >
> > OK, glad you like the approach.
> >
> > Yeah, I think at least any function that has a cma_comp_exch() to
> > unwind things in the error return path is suspect.  For example, can
> > rdma_connect() race against rdma_set_ib_path()?  I can take a more
> > comprehensive look tomorrow.
>
> The thing that gives me pause, is that I don't have a clear picture in
> my head how this this entire comp_exch based 'locking' scheme is even
> supposed to work. I just don't get it. It seems completely broken.
>
> Am I missing something? Did it start out OK and just get broken over
> the last decade?
>
> The answer matters, because if adding "ING"s states is the right
> design overall then, OK, but if the answer is this entire thing needs
> to go away and be replaced with a lock, then lets just do that
> instead.
>
> For instance, look at this..
>
> int rdma_set_ib_path(struct rdma_cm_id *id,
> 		     struct sa_path_rec *path_rec)
> {
> 	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_RESOLVED,
> 			   RDMA_CM_ROUTE_RESOLVED))
> 		return -EINVAL;
>
> 	id->route.path_rec = kmemdup(path_rec, sizeof(*path_rec),
> 				     GFP_KERNEL);
> [..]
> err_free:
> 	kfree(id->route.path_rec);
> 	id->route.path_rec = NULL;
> err:
> 	cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_ADDR_RESOLVED);
>
> So this has the same error problem, but even on the success it is racy
> nightmare.
>
> eg I race the above between comp_exch and kmemdup with this:
>
> int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
> {
> 	if (!cma_comp_exch(id_priv, RDMA_CM_ROUTE_RESOLVED, RDMA_CM_CONNECT))
> 		return -EINVAL;
>
> And then link with this:
>
> static int cma_sidr_rep_handler(struct ib_cm_id *cm_id,
> 				struct ib_cm_event *ib_event)
> {
> 	mutex_lock(&id_priv->handler_mutex);
> 	if (id_priv->state != RDMA_CM_CONNECT)
> 		goto out;
> [..]
> 		ib_init_ah_attr_from_path(id_priv->id.device,
> 					  id_priv->id.port_num,
> 					  id_priv->id.route.path_rec,
> 					  &event.param.ud.ah_attr);
>
> And go boom with a null pointer deref without requiring the error
> path, or I use the error path and get a more reliable NULL.
>
> Is there some subtle reason why these three calls can't happen
> concurrently? It was easy to find this combination, so I be there are
> lots more even if this particular avenue is blocked off.
>
> Basically, for what I can tell, the entire thing is total crazy - I
> don't think we can stomp syzkaller bugs one at a time and ever get
> anywhere when the entire FSM idea seems deeply flawed.
>
> This is why I didn't make a patch yet :)
>
> So, what is the solution here? How should this actually look? Is there
> some sensible design pattern with a multi-threaded comp_exch based FSM
> that I am not aware of?
>
> The most familiar answer is to replace comp_exch with a proper mutex
> across the entire state entry action and exit and forget about adding *INGS.
>
> Perhaps we could do that without too much invasion by doing something
> like this:
>
>    if (!cma_state_check_enter(id_priv, RDMA_CM_IDLE, RDMA_CM_ADDR_BOUND))
> [..]
>    cma_state_success(id_priv);
> err:
>    cma_state_rollback(id_priv);
>
> At least there is some kind of sane and auditable pattern for building
> this FSM.. I think I prefer this to adding a bunch of "ING"s..
>
> The semantic would be that no cma_state_check_enter can succeed during
> the period between enter and success/rollback. Then every single
> function that does cma_exch gets audited and gains a success or
> rollback on its exit path, forming a good try_lock around the entire
> state action block.
>
> At the very least this guarentees that the FSM actions and arcs are
> sane for concurrency - that just leaves auditing the non-arc methods..
>
> But, again, I don't know why it is like this, and I fear it will just
> cause other runtime bugs, as we now cause comp_exch to fail while any
> action is running, which opens a whole other avenue of auditing :(
>

If you want more food for brain, the idea of mixing struct work,
handler_mutex and regular paths is really exciting too.

1648 void rdma_destroy_id(struct rdma_cm_id *id)
1649 {
1650         struct rdma_id_private *id_priv;
1651         enum rdma_cm_state state;
1652
1653         id_priv = container_of(id, struct rdma_id_private, id);
1654         state = cma_exch(id_priv, RDMA_CM_DESTROYING);
1655         cma_cancel_operation(id_priv, state);
1656
1657         /*
1658          * Wait for any active callback to finish.  New callbacks will find
1659          * the id_priv state set to destroying and abort.
1660          */
1661         mutex_lock(&id_priv->handler_mutex);
1662         mutex_unlock(&id_priv->handler_mutex);


and

2407 static void cma_work_handler(struct work_struct *_work)
2408 {
2409         struct cma_work *work = container_of(_work, struct cma_work, work);
2410         struct rdma_id_private *id_priv = work->id;
2411         int destroy = 0;
2412
2413         mutex_lock(&id_priv->handler_mutex);
2414         if (!cma_comp_exch(id_priv, work->old_state, work->new_state))
2415                 goto out;
2416
2417         if (id_priv->id.event_handler(&id_priv->id, &work->event)) {
2418                 cma_exch(id_priv, RDMA_CM_DESTROYING);
2419                 destroy = 1;
2420         }
2421 out:
2422         mutex_unlock(&id_priv->handler_mutex);

most probably racy too in changing state to RDMA_CM_DESTROYING,
but I have no clue about design decision for that part of code.

Thanks



>
> Jason
Roland Dreier May 16, 2018, 3:35 p.m. UTC | #5
> Am I missing something? Did it start out OK and just get broken over
> the last decade?

As far as I can see, the bug is already there in the very first CMA commit
e51060f08a61.

I'm not sure we can make id_priv state changes transactional - looks like
some of the state may be hard to roll back.  And anyway this is just
obfuscated locking, except that simultaneous operations fail rather than
wait for each other.  Maybe the right fix is just to make access to id_priv
exclusive - have every function take a per-id lock while state is changing,
so everything happens atomically if multiple threads access the same id.

  - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe May 16, 2018, 4:01 p.m. UTC | #6
On Wed, May 16, 2018 at 08:35:56AM -0700, Roland Dreier wrote:
> > Am I missing something? Did it start out OK and just get broken over
> > the last decade?
> 
> As far as I can see, the bug is already there in the very first CMA commit
> e51060f08a61.
> 
> I'm not sure we can make id_priv state changes transactional - looks like
> some of the state may be hard to roll back.

Well, if rollback is impossible then it can move forward as well, just
the few I looked at appeared to use that pattern..

> And anyway this is just obfuscated locking, except that simultaneous
> operations fail rather than wait for each other.

Yes, it would be implemented internally with try_lock I think.

> fix is just to make access to id_priv exclusive - have every
> function take a per-id lock while state is changing, so everything
> happens atomically if multiple threads access the same id.

Yes, that is where I was driving too as well, but I don't know why
this is try and fail with no lock today. Sean? Do you recall?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean May 16, 2018, 4:43 p.m. UTC | #7
> Yes, that is where I was driving too as well, but I don't know why
> this is try and fail with no lock today. Sean? Do you recall?

I don't recall off the top of my head why the lock couldn't be held throughout the call.  Guessing, I don't think all calling threads could block, and part of the scheme resulted from trying to handle device removal.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean May 17, 2018, 4:33 p.m. UTC | #8
> > Yes, that is where I was driving too as well, but I don't know why
> > this is try and fail with no lock today. Sean? Do you recall?
> 
> I don't recall off the top of my head why the lock couldn't be held
> throughout the call.  Guessing, I don't think all calling threads
> could block, and part of the scheme resulted from trying to handle
> device removal.

Overall, I think the locks were trying to protect the state against CM callbacks, not a kernel caller invoking functions willy-nilly.  It might be easier to focus the fix in the ucma and serialize calls into the cma from there.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe May 17, 2018, 4:40 p.m. UTC | #9
On Thu, May 17, 2018 at 04:33:24PM +0000, Hefty, Sean wrote:
> > > Yes, that is where I was driving too as well, but I don't know why
> > > this is try and fail with no lock today. Sean? Do you recall?
> > 
> > I don't recall off the top of my head why the lock couldn't be held
> > throughout the call.  Guessing, I don't think all calling threads
> > could block, and part of the scheme resulted from trying to handle
> > device removal.
> 
> Overall, I think the locks were trying to protect the state against
> CM callbacks, not a kernel caller invoking functions willy-nilly.
> It might be easier to focus the fix in the ucma and serialize calls
> into the cma from there.

Interesting, yes.. That would be simpler.

So the rule for cma is that the caller must serialize all calls?

I wonder if ULPs did that right, but I can live with that.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roland Dreier May 17, 2018, 6:33 p.m. UTC | #10
> > Overall, I think the locks were trying to protect the state against
> > CM callbacks, not a kernel caller invoking functions willy-nilly.
> > It might be easier to focus the fix in the ucma and serialize calls
> > into the cma from there.

We would still need to serialize between CM callbacks and UCMA calling into
CMA, or at least audit that there are no races there.

  - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe May 17, 2018, 7:29 p.m. UTC | #11
On Thu, May 17, 2018 at 11:33:51AM -0700, Roland Dreier wrote:
> > > Overall, I think the locks were trying to protect the state against
> > > CM callbacks, not a kernel caller invoking functions willy-nilly.
> > > It might be easier to focus the fix in the ucma and serialize calls
> > > into the cma from there.
> 
> We would still need to serialize between CM callbacks and UCMA calling into
> CMA, or at least audit that there are no races there.

If the design of this was that the caller is responsible for all
serialization then I think the try_lock scheme I outlined before is
the right approach. Putting the locking internal to the functions also
protects agains busted kernel ULPs.

I guess anthing that isn't already serializing is busted, and
hopefully that is a smaller list...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roland Dreier May 18, 2018, 6:01 p.m. UTC | #12
>> We would still need to serialize between CM callbacks and UCMA calling into
>> CMA, or at least audit that there are no races there.
>
> If the design of this was that the caller is responsible for all
> serialization then I think the try_lock scheme I outlined before is
> the right approach. Putting the locking internal to the functions also
> protects agains busted kernel ULPs.
>
> I guess anthing that isn't already serializing is busted, and
> hopefully that is a smaller list...

I don't quite understand your reply.  I was pointing out a problem
with the "call is responsible for all serialization" idea.  The
problem is that the CM will call into the CMA when incoming CM
messages are received, and the consumer of the CMA API has no way to
control when this happens.  I don't see a way to avoid a messy mix of
internal and external locking.

What is the advantage of try_lock vs just locking?  Can we just say
that each id_priv can only be accessed by a single execution context
at any given time?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe May 18, 2018, 9:42 p.m. UTC | #13
On Fri, May 18, 2018 at 11:01:32AM -0700, Roland Dreier wrote:
> >> We would still need to serialize between CM callbacks and UCMA calling into
> >> CMA, or at least audit that there are no races there.
> >
> > If the design of this was that the caller is responsible for all
> > serialization then I think the try_lock scheme I outlined before is
> > the right approach. Putting the locking internal to the functions also
> > protects agains busted kernel ULPs.
> >
> > I guess anthing that isn't already serializing is busted, and
> > hopefully that is a smaller list...
> 
> I don't quite understand your reply.  I was pointing out a problem
> with the "call is responsible for all serialization" idea.  The
> problem is that the CM will call into the CMA when incoming CM
> messages are received, and the consumer of the CMA API has no way to
> control when this happens.  I don't see a way to avoid a messy mix of
> internal and external locking.

I'm assuming those are somehow handled today, as Sean says that was on
his mind when he designed it...

But I haven't had time to really look. I'm assuming somehow the CM
callbacks are synchronized with the CMA internal state so the
concurrency is ok..

Otherwise this thing is already hopelessly busted, right?

> What is the advantage of try_lock vs just locking?  Can we just say
> that each id_priv can only be accessed by a single execution context
> at any given time?

try_lock is just closer to what we have today so it is less likely to
cause unexpected problems. It is already the case that calling the
routines out of sequence and unsynchronized most often fails, not
waits, so continuing that path and closing the race seems 'safest'
only because I don't want to audit everything :)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky May 27, 2018, 4:33 p.m. UTC | #14
On Tue, May 15, 2018 at 02:33:52PM -0700, Roland Dreier wrote:
> From: Roland Dreier <roland@purestorage.com>
>
> There is a race (which userspace can trigger through ucma) that leads to
> use-after free in cma:
>
>   rdma_bind_addr(id, bogus address)
>
>     cma_comp_exch(RDMA_CM_IDLE,
>                   RDMA_CM_ADDR_BOUND) [succeed]
>     copy address into id_priv struct
>     fail to find device for address
>
>                                           rdma_listen(id, any address);
>                                             id_priv state isn't RDMA_CM_IDLE
> 					    cma_comp_exch(RDMA_CM_ADDR_BOUND,
> 					                  RDMA_CM_LISTEN) [succeed]
> 				            id->device not set, call
>                                             cma_listen_on_all()
> 					      add id_priv->list to listen_any_list
> 					    return success
>
>     cma_comp_exch(RDMA_CM_ADDR_BOUND,
>                   RDMA_CM_IDLE) [fail]
>     return failure
>
> Now, when the id is destroyed, cma_release_dev() won't be called because
> id_priv->cma_dev isn't set.  And cma_cancel_operation() won't call
> cma_cancel_listens() because cma_src_addr(id_priv) is the bogus address
> passed into rdma_bind_addr().  So neither of the paths that does
>
>     list_del(&id_priv->list);
>
> will be followed, but the code will go ahead and kfree(id_priv) even
> though it is still linked into the listen_any_list.  So we end up with
> use-after-free when listen_any_list is traversed.
>
> We can close this race by having rdma_bind_addr() put the CM ID into an
> intermediate "binding" state during the time that we are modifying the
> state but don't know whether the bind operation will succeed yet.
>
> Reported-and-tested-by: syzbot+db1c219466daac1083df@syzkaller.appspotmail.com
> Reported-by: Eric Biggers <ebiggers3@gmail.com>
> Signed-off-by: Roland Dreier <roland@purestorage.com>
> ---
>  drivers/infiniband/core/cma.c      | 6 ++++--
>  drivers/infiniband/core/cma_priv.h | 1 +
>  2 files changed, 5 insertions(+), 2 deletions(-)
>

What is the decision regarding this race?

Thanks
Jason Gunthorpe May 28, 2018, 5:31 p.m. UTC | #15
On Sun, May 27, 2018 at 07:33:39PM +0300, Leon Romanovsky wrote:
> On Tue, May 15, 2018 at 02:33:52PM -0700, Roland Dreier wrote:
> > From: Roland Dreier <roland@purestorage.com>
> >
> > There is a race (which userspace can trigger through ucma) that leads to
> > use-after free in cma:
> >
> >   rdma_bind_addr(id, bogus address)
> >
> >     cma_comp_exch(RDMA_CM_IDLE,
> >                   RDMA_CM_ADDR_BOUND) [succeed]
> >     copy address into id_priv struct
> >     fail to find device for address
> >
> >                                           rdma_listen(id, any address);
> >                                             id_priv state isn't RDMA_CM_IDLE
> > 					    cma_comp_exch(RDMA_CM_ADDR_BOUND,
> > 					                  RDMA_CM_LISTEN) [succeed]
> > 				            id->device not set, call
> >                                             cma_listen_on_all()
> > 					      add id_priv->list to listen_any_list
> > 					    return success
> >
> >     cma_comp_exch(RDMA_CM_ADDR_BOUND,
> >                   RDMA_CM_IDLE) [fail]
> >     return failure
> >
> > Now, when the id is destroyed, cma_release_dev() won't be called because
> > id_priv->cma_dev isn't set.  And cma_cancel_operation() won't call
> > cma_cancel_listens() because cma_src_addr(id_priv) is the bogus address
> > passed into rdma_bind_addr().  So neither of the paths that does
> >
> >     list_del(&id_priv->list);
> >
> > will be followed, but the code will go ahead and kfree(id_priv) even
> > though it is still linked into the listen_any_list.  So we end up with
> > use-after-free when listen_any_list is traversed.
> >
> > We can close this race by having rdma_bind_addr() put the CM ID into an
> > intermediate "binding" state during the time that we are modifying the
> > state but don't know whether the bind operation will succeed yet.
> >
> > Reported-and-tested-by: syzbot+db1c219466daac1083df@syzkaller.appspotmail.com
> > Reported-by: Eric Biggers <ebiggers3@gmail.com>
> > Signed-off-by: Roland Dreier <roland@purestorage.com>
> >  drivers/infiniband/core/cma.c      | 6 ++++--
> >  drivers/infiniband/core/cma_priv.h | 1 +
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> >
> 
> What is the decision regarding this race?

The fix is good, but it only solves this one race.

I am thinking we should fix all the issues here at once though.. That
needs a respin to do one of the ideas around locking from the 'test'
cmpxchng to the 'set' in all places that follow that pattern

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index a693fcd4c513..826b4ffbf259 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -3345,7 +3345,7 @@  int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 		return -EAFNOSUPPORT;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
-	if (!cma_comp_exch(id_priv, RDMA_CM_IDLE, RDMA_CM_ADDR_BOUND))
+	if (!cma_comp_exch(id_priv, RDMA_CM_IDLE, RDMA_CM_ADDR_BINDING))
 		return -EINVAL;
 
 	ret = cma_check_linklocal(&id->route.addr.dev_addr, addr);
@@ -3381,6 +3381,8 @@  int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 	if (ret)
 		goto err2;
 
+	cma_comp_exch(id_priv, RDMA_CM_ADDR_BINDING, RDMA_CM_ADDR_BOUND);
+
 	return 0;
 err2:
 	if (id_priv->cma_dev) {
@@ -3388,7 +3390,7 @@  int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 		cma_release_dev(id_priv);
 	}
 err1:
-	cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_IDLE);
+	cma_comp_exch(id_priv, RDMA_CM_ADDR_BINDING, RDMA_CM_IDLE);
 	return ret;
 }
 EXPORT_SYMBOL(rdma_bind_addr);
diff --git a/drivers/infiniband/core/cma_priv.h b/drivers/infiniband/core/cma_priv.h
index 194cfe78c447..8d0f8715dd51 100644
--- a/drivers/infiniband/core/cma_priv.h
+++ b/drivers/infiniband/core/cma_priv.h
@@ -44,6 +44,7 @@  enum rdma_cm_state {
 	RDMA_CM_ROUTE_RESOLVED,
 	RDMA_CM_CONNECT,
 	RDMA_CM_DISCONNECT,
+	RDMA_CM_ADDR_BINDING,
 	RDMA_CM_ADDR_BOUND,
 	RDMA_CM_LISTEN,
 	RDMA_CM_DEVICE_REMOVAL,