mbox series

[for-next,v16,0/2] Fix race conditions in rxe_pool

Message ID 20220612223434.31462-1-rpearsonhpe@gmail.com (mailing list archive)
Headers show
Series Fix race conditions in rxe_pool | expand

Message

Bob Pearson June 12, 2022, 10:34 p.m. UTC
There are several race conditions discovered in the current rdma_rxe
driver.  They mostly relate to races between normal operations and
destroying objects.  This patch series includes the remaining two
patches of the original series.

Applies cleanly to current for-next after the two oneline patches
submitted by Dongliang Mu that fixed an error in the error checking
code from xa_alloc_cyclic().

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
v16
  Addressed some issues raised by Jason Gunthorpe.
  - Added if(sleepable) might_sleep() calls.
  - Dropped the stray fence patch that got in there somehow.
  - Left the timeout for AH cleanup. We can drop it later if
    it isn't needed.
v15
  Rebased to the current for-next branch of 5.19.0-rc1+.
  Adds support for RDMA_AH_CREATE/DESTROY_SLEEPABLE.
v14
  Rebased to current wip/jgg-for-next
  Removed patch 3 as unnecessary
  Waited for resolution of bugs in rxe_resp and some locking bugs.

  Note: With rcu read lock in rxe_pool_get_index there are no bottom
  half spinlocks from looking up AH or non AH objects to conflict
  with the default xa_lock so no lockdep warnings occur. The rxe_pool
  alloc functions can hold locks simultanteously with the rcu read
  lock so it does not have to prevent soft or hard IRQs.
v13
  Rebased to current for-next
  Addressed Jason's comments on patch 1, 8 and 9. 8 and 9 are
  combined into one patch.
v12
  Rebased to current wip/jgg-for-next
  Dropped patches already accepted from v11.
  Moved all object cleanup code to type specific cleanup routines.
  Renamed to match Jason's requests.
  Addressed some other issued raised.
  Kept the contentious rxe_hide() function but renamed to
  rxe_disable_lookup() which says what it does. I am still convinced
  this cleaner than other alternatives like moving xa_erase to the
  top of destroy routines but just for indexed objects.
v11
  Rebased to current for-next.
  Reordered patches and made other changes to respond to issues
  reported by Jason Gunthorpe.
v10
  Rebased to current wip/jgg-for-next.
  Split some patches into smaller ones.
v9
  Corrected issues reported by Jason Gunthorpe,
  Converted locking in rxe_mcast.c and rxe_pool.c to use RCU
  Split up the patches into smaller changes
v8
  Fixed an additional race in 3/8 which was not handled correctly.
v7
  Corrected issues reported by Jason Gunthorpe
Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/
v6
  Fixed a kzalloc flags bug.
  Fixed comment bug reported by 'Kernel Test Robot'.
  Changed type of rxe_pool.c in __rxe_fini().
v5
  Removed patches already accepted into for-next and addressed comments
  from Jason Gunthorpe.
v4
  Restructured patch series to change to xarray earlier which
  greatly simplified the changes.
  Rebased to current for-next
v3
  Changed rxe_alloc to use GFP_KERNEL
  Addressed other comments by Jason Gunthorp
  Merged the previous 06/10 and 07/10 patches into one since they overlapped
  Added some minor cleanups as 10/10
v2
  Rebased to current for-next.
  Added 4 additional patches

Bob Pearson (2):
  RDMA/rxe: Stop lookup of partially built objects
  RDMA/rxe: Convert read side locking to rcu

 drivers/infiniband/sw/rxe/rxe_mr.c    |   2 +-
 drivers/infiniband/sw/rxe/rxe_mw.c    |   4 +-
 drivers/infiniband/sw/rxe/rxe_pool.c  | 102 +++++++++++++++++++++++---
 drivers/infiniband/sw/rxe/rxe_pool.h  |  18 +++--
 drivers/infiniband/sw/rxe/rxe_verbs.c |  39 ++++++----
 5 files changed, 133 insertions(+), 32 deletions(-)


base-commit: 61414011df6607415c14805dabf0687663090e0a

Comments

Jason Gunthorpe June 30, 2022, 1:58 p.m. UTC | #1
On Sun, Jun 12, 2022 at 05:34:33PM -0500, Bob Pearson wrote:
> There are several race conditions discovered in the current rdma_rxe
> driver.  They mostly relate to races between normal operations and
> destroying objects.  This patch series includes the remaining two
> patches of the original series.
> 
> Applies cleanly to current for-next after the two oneline patches
> submitted by Dongliang Mu that fixed an error in the error checking
> code from xa_alloc_cyclic().

Applied to for-next, thanks

I moved the might_sleep hunk from the last patch to the first though

Jason
Bob Pearson June 30, 2022, 4:58 p.m. UTC | #2
On 6/30/22 08:58, Jason Gunthorpe wrote:
> On Sun, Jun 12, 2022 at 05:34:33PM -0500, Bob Pearson wrote:
>> There are several race conditions discovered in the current rdma_rxe
>> driver.  They mostly relate to races between normal operations and
>> destroying objects.  This patch series includes the remaining two
>> patches of the original series.
>>
>> Applies cleanly to current for-next after the two oneline patches
>> submitted by Dongliang Mu that fixed an error in the error checking
>> code from xa_alloc_cyclic().
> 
> Applied to for-next, thanks
> 
> I moved the might_sleep hunk from the last patch to the first though
> 
> Jason
thanks. Been a long time coming. Glad we're there.

Bob