mbox series

[for-next,0/6] RDMA/rxe: Fix potential races

Message ID 20211010235931.24042-1-rpearsonhpe@gmail.com (mailing list archive)
Headers show
Series RDMA/rxe: Fix potential races | expand

Message

Bob Pearson Oct. 10, 2021, 11:59 p.m. UTC
There are possible race conditions related to attempting to access
rxe pool objects at the same time as the pools or elements are being
freed. This series of patches addresses these races.

Bob Pearson (6):
  RDMA/rxe: Make rxe_alloc() take pool lock
  RDMA/rxe: Copy setup parameters into rxe_pool
  RDMA/rxe: Save object pointer in pool element
  RDMA/rxe: Combine rxe_add_index with rxe_alloc
  RDMA/rxe: Combine rxe_add_key with rxe_alloc
  RDMA/rxe: Fix potential race condition in rxe_pool

 drivers/infiniband/sw/rxe/rxe_mcast.c |   5 +-
 drivers/infiniband/sw/rxe/rxe_mr.c    |   1 -
 drivers/infiniband/sw/rxe/rxe_mw.c    |   5 +-
 drivers/infiniband/sw/rxe/rxe_pool.c  | 235 +++++++++++++-------------
 drivers/infiniband/sw/rxe/rxe_pool.h  |  67 +++-----
 drivers/infiniband/sw/rxe/rxe_verbs.c |  10 --
 6 files changed, 140 insertions(+), 183 deletions(-)

Comments

Leon Romanovsky Oct. 12, 2021, 6:34 a.m. UTC | #1
On Sun, Oct 10, 2021 at 06:59:25PM -0500, Bob Pearson wrote:
> There are possible race conditions related to attempting to access
> rxe pool objects at the same time as the pools or elements are being
> freed. This series of patches addresses these races.

Can we get rid of this pool?

Thanks

> 
> Bob Pearson (6):
>   RDMA/rxe: Make rxe_alloc() take pool lock
>   RDMA/rxe: Copy setup parameters into rxe_pool
>   RDMA/rxe: Save object pointer in pool element
>   RDMA/rxe: Combine rxe_add_index with rxe_alloc
>   RDMA/rxe: Combine rxe_add_key with rxe_alloc
>   RDMA/rxe: Fix potential race condition in rxe_pool
> 
>  drivers/infiniband/sw/rxe/rxe_mcast.c |   5 +-
>  drivers/infiniband/sw/rxe/rxe_mr.c    |   1 -
>  drivers/infiniband/sw/rxe/rxe_mw.c    |   5 +-
>  drivers/infiniband/sw/rxe/rxe_pool.c  | 235 +++++++++++++-------------
>  drivers/infiniband/sw/rxe/rxe_pool.h  |  67 +++-----
>  drivers/infiniband/sw/rxe/rxe_verbs.c |  10 --
>  6 files changed, 140 insertions(+), 183 deletions(-)
> 
> -- 
> 2.30.2
>
Bob Pearson Oct. 12, 2021, 8:19 p.m. UTC | #2
On 10/12/21 1:34 AM, Leon Romanovsky wrote:
> On Sun, Oct 10, 2021 at 06:59:25PM -0500, Bob Pearson wrote:
>> There are possible race conditions related to attempting to access
>> rxe pool objects at the same time as the pools or elements are being
>> freed. This series of patches addresses these races.
> 
> Can we get rid of this pool?
> 
> Thanks
> 
>>
>> Bob Pearson (6):
>>   RDMA/rxe: Make rxe_alloc() take pool lock
>>   RDMA/rxe: Copy setup parameters into rxe_pool
>>   RDMA/rxe: Save object pointer in pool element
>>   RDMA/rxe: Combine rxe_add_index with rxe_alloc
>>   RDMA/rxe: Combine rxe_add_key with rxe_alloc
>>   RDMA/rxe: Fix potential race condition in rxe_pool
>>
>>  drivers/infiniband/sw/rxe/rxe_mcast.c |   5 +-
>>  drivers/infiniband/sw/rxe/rxe_mr.c    |   1 -
>>  drivers/infiniband/sw/rxe/rxe_mw.c    |   5 +-
>>  drivers/infiniband/sw/rxe/rxe_pool.c  | 235 +++++++++++++-------------
>>  drivers/infiniband/sw/rxe/rxe_pool.h  |  67 +++-----
>>  drivers/infiniband/sw/rxe/rxe_verbs.c |  10 --
>>  6 files changed, 140 insertions(+), 183 deletions(-)
>>
>> -- 
>> 2.30.2
>>

Not sure which 'this' you mean? This set of patches is motivated by someone at HPE
running into seg faults caused very infrequently by rdma packets causing seg faults
when trying to copy data to or from an MR. This can only happen (other than just dumb
bug which doesn't seem to be the case) by a late packet arriving after the MR is
de-registered. The root cause of that is the way rxe currently defers cleaning up
objects with krefs and potential races between cleanup and new packets looking up
rkeys. I found a lot of potential race conditions and tried to close them off. There
are another couple of patches coming as well.

This is an attempt to fix up the code the way it is now. Later I would like to use
xarrays to handle rkey indices and qpns etc which looks cleaner.

Pools is mostly a misnomer since you moved all the allocates into rdma-core except for
a couple. Really they are a way to add indices or keys to the objects that are already
there.

Bob
Leon Romanovsky Oct. 19, 2021, 1:07 p.m. UTC | #3
On Tue, Oct 12, 2021 at 03:19:46PM -0500, Bob Pearson wrote:
> On 10/12/21 1:34 AM, Leon Romanovsky wrote:
> > On Sun, Oct 10, 2021 at 06:59:25PM -0500, Bob Pearson wrote:
> >> There are possible race conditions related to attempting to access
> >> rxe pool objects at the same time as the pools or elements are being
> >> freed. This series of patches addresses these races.
> > 
> > Can we get rid of this pool?
> > 
> > Thanks
> > 
> >>
> >> Bob Pearson (6):
> >>   RDMA/rxe: Make rxe_alloc() take pool lock
> >>   RDMA/rxe: Copy setup parameters into rxe_pool
> >>   RDMA/rxe: Save object pointer in pool element
> >>   RDMA/rxe: Combine rxe_add_index with rxe_alloc
> >>   RDMA/rxe: Combine rxe_add_key with rxe_alloc
> >>   RDMA/rxe: Fix potential race condition in rxe_pool
> >>
> >>  drivers/infiniband/sw/rxe/rxe_mcast.c |   5 +-
> >>  drivers/infiniband/sw/rxe/rxe_mr.c    |   1 -
> >>  drivers/infiniband/sw/rxe/rxe_mw.c    |   5 +-
> >>  drivers/infiniband/sw/rxe/rxe_pool.c  | 235 +++++++++++++-------------
> >>  drivers/infiniband/sw/rxe/rxe_pool.h  |  67 +++-----
> >>  drivers/infiniband/sw/rxe/rxe_verbs.c |  10 --
> >>  6 files changed, 140 insertions(+), 183 deletions(-)
> >>
> >> -- 
> >> 2.30.2
> >>
> 
> Not sure which 'this' you mean? This set of patches is motivated by someone at HPE
> running into seg faults caused very infrequently by rdma packets causing seg faults
> when trying to copy data to or from an MR. This can only happen (other than just dumb
> bug which doesn't seem to be the case) by a late packet arriving after the MR is
> de-registered. The root cause of that is the way rxe currently defers cleaning up
> objects with krefs and potential races between cleanup and new packets looking up
> rkeys. I found a lot of potential race conditions and tried to close them off. There
> are another couple of patches coming as well.

I have no doubts that this series fixes RXE, but my request was more general.
Is there way/path to remove everything declared in rxe_pool.c|h?

Thanks
Bob Pearson Oct. 19, 2021, 4:35 p.m. UTC | #4
On 10/19/21 8:07 AM, Leon Romanovsky wrote:
> On Tue, Oct 12, 2021 at 03:19:46PM -0500, Bob Pearson wrote:
>> On 10/12/21 1:34 AM, Leon Romanovsky wrote:
>>> On Sun, Oct 10, 2021 at 06:59:25PM -0500, Bob Pearson wrote:
>>>> There are possible race conditions related to attempting to access
>>>> rxe pool objects at the same time as the pools or elements are being
>>>> freed. This series of patches addresses these races.
>>>
>>> Can we get rid of this pool?
>>>
>>> Thanks
>>>
>>>>
>>>> Bob Pearson (6):
>>>>   RDMA/rxe: Make rxe_alloc() take pool lock
>>>>   RDMA/rxe: Copy setup parameters into rxe_pool
>>>>   RDMA/rxe: Save object pointer in pool element
>>>>   RDMA/rxe: Combine rxe_add_index with rxe_alloc
>>>>   RDMA/rxe: Combine rxe_add_key with rxe_alloc
>>>>   RDMA/rxe: Fix potential race condition in rxe_pool
>>>>
>>>>  drivers/infiniband/sw/rxe/rxe_mcast.c |   5 +-
>>>>  drivers/infiniband/sw/rxe/rxe_mr.c    |   1 -
>>>>  drivers/infiniband/sw/rxe/rxe_mw.c    |   5 +-
>>>>  drivers/infiniband/sw/rxe/rxe_pool.c  | 235 +++++++++++++-------------
>>>>  drivers/infiniband/sw/rxe/rxe_pool.h  |  67 +++-----
>>>>  drivers/infiniband/sw/rxe/rxe_verbs.c |  10 --
>>>>  6 files changed, 140 insertions(+), 183 deletions(-)
>>>>
>>>> -- 
>>>> 2.30.2
>>>>
>>
>> Not sure which 'this' you mean? This set of patches is motivated by someone at HPE
>> running into seg faults caused very infrequently by rdma packets causing seg faults
>> when trying to copy data to or from an MR. This can only happen (other than just dumb
>> bug which doesn't seem to be the case) by a late packet arriving after the MR is
>> de-registered. The root cause of that is the way rxe currently defers cleaning up
>> objects with krefs and potential races between cleanup and new packets looking up
>> rkeys. I found a lot of potential race conditions and tried to close them off. There
>> are another couple of patches coming as well.
> 
> I have no doubts that this series fixes RXE, but my request was more general.
> Is there way/path to remove everything declared in rxe_pool.c|h?
> 
> Thanks
> 

Take a look at the note I copied you on more recently. There is some progress but not
complete elimination of rxe_pool. There is another project suggested by Jason which is
replacing red black trees by xarrays as an alternative approach to indexing rdma objects.
This would still duplicate the indexing done by rdma-core. A while back I looked at
trying to reuse the rdma-core indexing but no effort was made to make that easy. All
of the APIs are private to rdma-core. These indices are managed by the rxe driver for
use as lkeys/rkeys, qpns, srqns, and more recently address handles. xarrays seem to be
more efficient when the indices are fairly compact. There is a suggestion that IB and RoCE
should attempt to make indices that are visible on the network more sparse. Nothing
will make them secure but they could be a lot more secure than they are currently. I
believe mlx5 is now using random keys for this reason.

Bob
Jason Gunthorpe Oct. 19, 2021, 6:43 p.m. UTC | #5
On Tue, Oct 19, 2021 at 11:35:30AM -0500, Bob Pearson wrote:

> Take a look at the note I copied you on more recently. There is some
> progress but not complete elimination of rxe_pool. There is another
> project suggested by Jason which is replacing red black trees by
> xarrays as an alternative approach to indexing rdma objects.  This
> would still duplicate the indexing done by rdma-core. A while back I
> looked at trying to reuse the rdma-core indexing but no effort was
> made to make that easy.

I have no expecation that a driver can re-use the various rdma-core
indexes.. that is not what they are for, and they have a different
lifetime semantic from wha the driver needs.

> of the APIs are private to rdma-core. These indices are managed by
> the rxe driver for use as lkeys/rkeys, qpns, srqns, and more
> recently address handles. xarrays seem to be more efficient when the
> indices are fairly compact. There is a suggestion that IB and RoCE
> should attempt to make indices that are visible on the network more
> sparse. Nothing will make them secure but they could be a lot more
> secure than they are currently. I believe mlx5 is now using random
> keys for this reason.

Only qpn really benifits from something like this, and it is more
about maximum lifetime before qpn re-use which is a cyclic allocating
xarray.

Jason
Bob Pearson Oct. 19, 2021, 10:51 p.m. UTC | #6
On 10/19/21 1:43 PM, Jason Gunthorpe wrote:
> On Tue, Oct 19, 2021 at 11:35:30AM -0500, Bob Pearson wrote:
> 
>> Take a look at the note I copied you on more recently. There is some
>> progress but not complete elimination of rxe_pool. There is another
>> project suggested by Jason which is replacing red black trees by
>> xarrays as an alternative approach to indexing rdma objects.  This
>> would still duplicate the indexing done by rdma-core. A while back I
>> looked at trying to reuse the rdma-core indexing but no effort was
>> made to make that easy.
> 
> I have no expecation that a driver can re-use the various rdma-core
> indexes.. that is not what they are for, and they have a different
> lifetime semantic from wha the driver needs.
> 
>> of the APIs are private to rdma-core. These indices are managed by
>> the rxe driver for use as lkeys/rkeys, qpns, srqns, and more
>> recently address handles. xarrays seem to be more efficient when the
>> indices are fairly compact. There is a suggestion that IB and RoCE
>> should attempt to make indices that are visible on the network more
>> sparse. Nothing will make them secure but they could be a lot more
>> secure than they are currently. I believe mlx5 is now using random
>> keys for this reason.
> 
> Only qpn really benifits from something like this, and it is more
> about maximum lifetime before qpn re-use which is a cyclic allocating
> xarray.
> 
> Jason
> 

Thanks. I had given up long ago on using the rdma-core indices. Leon would like
to get rid of rxe pools. Actually (I just checked) there is only one remaining
object type that is not already allocated in rdma-core and that is MR. (The multicast
groups are a special case and not really in the same league as PD, QP, CQ, etc. They
probably should just be replaced with open coded kzalloc/kfree. They are not shared
or visible to rdma-core.) So with this exception the pools are really just a way to add
indices to objects which can be done cleanly with xarrays (I have a version that
already does this and it works fine. No performance difference with red-black trees
though. Still looking to get rid of the spinlocks and use the rcu locking in xarrays.)

Bob