mbox series

[RFC,v9,00/26]

Message ID 20220127213755.31697-1-rpearsonhpe@gmail.com (mailing list archive)
Headers show
Series [RFC,v9,01/26] RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c | expand

Message

Bob Pearson Jan. 27, 2022, 9:37 p.m. UTC
There are several race conditions discovered in the current rdma_rxe
driver.  They mostly relate to races between normal operations and
destroying objects.  This patch series
 - Makes several minor cleanups in rxe_pool.[ch]
 - Replaces the red-black trees currently used by xarrays for indices
 - Simplifies the API for keyed objects
 - Corrects several reference counting errors
 - Adds wait for completions to the paths in verbs APIs which destroy
   objects.

The patch series has been changed to RFC PATCH instead of PATCH for-next
because I have little experience with rcu locking and would like
someone else to review this code (in 18/26 and 24/26). RCU locking
should improve performance at large scale but this has not been tested
yet.

This patch series applies cleanly to current for-next.
commit e783362eb54cd99b2cac8b3a9aeac942e6f6ac07 (tag: v5.17-rc1,
		origin/wip/jgg-for-rc, origin/wip/jgg-for-next,
		origin/wip/for-testing, origin/for-rc,
		origin/for-next, origin/HEAD, for-next)

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
v9
  Corrected issues reported by Jason Gunthorpe,
  Converted locking in rxe_mcast.c and rxe_pool.c to use RCU
  Split up the patches into smaller changes
v8
  Fixed an additional race in 3/8 which was not handled correctly.
v7
  Corrected issues reported by Jason Gunthorpe
Link: https://lore.kernel.org/linux-rdma/20211207190947.GH6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207191857.GI6385@nvidia.com/
Link: https://lore.kernel.org/linux-rdma/20211207192824.GJ6385@nvidia.com/
v6
  Fixed a kzalloc flags bug.
  Fixed comment bug reported by 'Kernel Test Robot'.
  Changed type of rxe_pool.c in __rxe_fini().
v5
  Removed patches already accepted into for-next and addressed comments
  from Jason Gunthorpe.
v4
  Restructured patch series to change to xarray earlier which
  greatly simplified the changes.
  Rebased to current for-next
v3
  Changed rxe_alloc to use GFP_KERNEL
  Addressed other comments by Jason Gunthorp
  Merged the previous 06/10 and 07/10 patches into one since they overlapped
  Added some minor cleanups as 10/10
v2
  Rebased to current for-next.
  Added 4 additional patches

Bob Pearson (26):
  RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c
  RDMA/rxe: Move rxe_mcast_attach/detach to rxe_mcast.c
  RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem
  RDMA/rxe: Enforce IBA o10-2.2.3
  RDMA/rxe: Remove rxe_drop_all_macst_groups
  RDMA/rxe: Remove qp->grp_lock and qp->grp_list
  RDMA/rxe: Use kzmalloc/kfree for mca
  RDMA/rxe: Rename grp to mcg and mce to mca
  RDMA/rxe: Introduce RXECB(skb)
  RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases
  RDMA/rxe: Replace locks by rxe->mcg_lock
  RDMA/rxe: Replace pool key by rxe->mcg_tree
  RDMA/rxe: Remove key'ed object support
  RDMA/rxe: Remove mcg from rxe pools
  RDMA/rxe: Add code to cleanup mcast memory
  RDMA/rxe: Add comments to rxe_mcast.c
  RDMA/rxe: Separate code into subroutines
  RDMA/rxe: Convert mca read locking to RCU
  RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
  RDMA/rxe: Delete _locked() APIs for pool objects
  RDMA/rxe: Replace obj by elem in declaration
  RDMA/rxe: Replace red-black trees by xarrays
  RDMA/rxe: Change pool locking to RCU
  RDMA/rxe: Add wait_for_completion to pool objects
  RDMA/rxe: Fix ref error in rxe_av.c
  RDMA/rxe: Replace mr by rkey in responder resources

 drivers/infiniband/sw/rxe/rxe.c       | 107 +---
 drivers/infiniband/sw/rxe/rxe_av.c    |  19 +-
 drivers/infiniband/sw/rxe/rxe_hdr.h   |   3 +
 drivers/infiniband/sw/rxe/rxe_loc.h   |  33 +-
 drivers/infiniband/sw/rxe/rxe_mcast.c | 678 ++++++++++++++++------
 drivers/infiniband/sw/rxe/rxe_mr.c    |   2 +-
 drivers/infiniband/sw/rxe/rxe_mw.c    |  11 +-
 drivers/infiniband/sw/rxe/rxe_net.c   |  35 +-
 drivers/infiniband/sw/rxe/rxe_pool.c  | 798 ++++++++++----------------
 drivers/infiniband/sw/rxe/rxe_pool.h  | 233 +++-----
 drivers/infiniband/sw/rxe/rxe_qp.c    |  29 +-
 drivers/infiniband/sw/rxe/rxe_recv.c  |  98 ++--
 drivers/infiniband/sw/rxe/rxe_req.c   |  55 +-
 drivers/infiniband/sw/rxe/rxe_resp.c  | 125 ++--
 drivers/infiniband/sw/rxe/rxe_verbs.c |  54 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h |  26 +-
 16 files changed, 1159 insertions(+), 1147 deletions(-)
 rewrite drivers/infiniband/sw/rxe/rxe_mcast.c (86%)
 rewrite drivers/infiniband/sw/rxe/rxe_pool.c (67%)
 rewrite drivers/infiniband/sw/rxe/rxe_pool.h (73%)

Comments

Jason Gunthorpe Jan. 28, 2022, 6:42 p.m. UTC | #1
On Thu, Jan 27, 2022 at 03:37:29PM -0600, Bob Pearson wrote:
> There are several race conditions discovered in the current rdma_rxe
> 
> Bob Pearson (26):
>   RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c
>   RDMA/rxe: Move rxe_mcast_attach/detach to rxe_mcast.c
>   RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem
>   RDMA/rxe: Enforce IBA o10-2.2.3
>   RDMA/rxe: Remove rxe_drop_all_macst_groups
>   RDMA/rxe: Remove qp->grp_lock and qp->grp_list

I took these patches to for-next

>   RDMA/rxe: Use kzmalloc/kfree for mca
>   RDMA/rxe: Rename grp to mcg and mce to mca
>   RDMA/rxe: Introduce RXECB(skb)
>   RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases
>   RDMA/rxe: Replace locks by rxe->mcg_lock
>   RDMA/rxe: Replace pool key by rxe->mcg_tree
>   RDMA/rxe: Remove key'ed object support
>   RDMA/rxe: Remove mcg from rxe pools
>   RDMA/rxe: Add code to cleanup mcast memory
>   RDMA/rxe: Add comments to rxe_mcast.c
>   RDMA/rxe: Separate code into subroutines

I think you should try to get up to here done in one series and
merged, it looked OK

>   RDMA/rxe: Convert mca read locking to RCU

Not sure this can ever work..

>   RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
>   RDMA/rxe: Delete _locked() APIs for pool objects
>   RDMA/rxe: Replace obj by elem in declaration
>   RDMA/rxe: Replace red-black trees by xarrays
>   RDMA/rxe: Change pool locking to RCU
>   RDMA/rxe: Add wait_for_completion to pool objects
>   RDMA/rxe: Fix ref error in rxe_av.c
>   RDMA/rxe: Replace mr by rkey in responder resources

These also seem reasonable, didn't follow why we needed the RCU patch?

Jason
Bob Pearson Feb. 7, 2022, 7:20 p.m. UTC | #2
On 1/28/22 12:42, Jason Gunthorpe wrote:
> On Thu, Jan 27, 2022 at 03:37:29PM -0600, Bob Pearson wrote:
>> There are several race conditions discovered in the current rdma_rxe
>>
>> Bob Pearson (26):
>>   RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c
>>   RDMA/rxe: Move rxe_mcast_attach/detach to rxe_mcast.c
>>   RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem
>>   RDMA/rxe: Enforce IBA o10-2.2.3
>>   RDMA/rxe: Remove rxe_drop_all_macst_groups
>>   RDMA/rxe: Remove qp->grp_lock and qp->grp_list
> 
> I took these patches to for-next
> 
>>   RDMA/rxe: Use kzmalloc/kfree for mca
>>   RDMA/rxe: Rename grp to mcg and mce to mca
>>   RDMA/rxe: Introduce RXECB(skb)
>>   RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases
>>   RDMA/rxe: Replace locks by rxe->mcg_lock
>>   RDMA/rxe: Replace pool key by rxe->mcg_tree
>>   RDMA/rxe: Remove key'ed object support
>>   RDMA/rxe: Remove mcg from rxe pools
>>   RDMA/rxe: Add code to cleanup mcast memory
>>   RDMA/rxe: Add comments to rxe_mcast.c
>>   RDMA/rxe: Separate code into subroutines
> 
> I think you should try to get up to here done in one series and
> merged, it looked OK

Jason,

I have these ready again. It is a little restructured but gets to the same place.
Last time I sent things in you had a complaint but it got mangled somehow so I
couldn't read it. Is there anything else I should be looking at before posting these
again?

Bob
> 
>>   RDMA/rxe: Convert mca read locking to RCU
> 
> Not sure this can ever work..
> 
>>   RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
>>   RDMA/rxe: Delete _locked() APIs for pool objects
>>   RDMA/rxe: Replace obj by elem in declaration
>>   RDMA/rxe: Replace red-black trees by xarrays
>>   RDMA/rxe: Change pool locking to RCU
>>   RDMA/rxe: Add wait_for_completion to pool objects
>>   RDMA/rxe: Fix ref error in rxe_av.c
>>   RDMA/rxe: Replace mr by rkey in responder resources
> 
> These also seem reasonable, didn't follow why we needed the RCU patch?
> 
> Jason
Jason Gunthorpe Feb. 7, 2022, 7:38 p.m. UTC | #3
On Mon, Feb 07, 2022 at 01:20:32PM -0600, Bob Pearson wrote:
> On 1/28/22 12:42, Jason Gunthorpe wrote:
> > On Thu, Jan 27, 2022 at 03:37:29PM -0600, Bob Pearson wrote:
> >> There are several race conditions discovered in the current rdma_rxe
> >>
> >> Bob Pearson (26):
> >>   RDMA/rxe: Move rxe_mcast_add/delete to rxe_mcast.c
> >>   RDMA/rxe: Move rxe_mcast_attach/detach to rxe_mcast.c
> >>   RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem
> >>   RDMA/rxe: Enforce IBA o10-2.2.3
> >>   RDMA/rxe: Remove rxe_drop_all_macst_groups
> >>   RDMA/rxe: Remove qp->grp_lock and qp->grp_list
> > 
> > I took these patches to for-next
> > 
> >>   RDMA/rxe: Use kzmalloc/kfree for mca
> >>   RDMA/rxe: Rename grp to mcg and mce to mca
> >>   RDMA/rxe: Introduce RXECB(skb)
> >>   RDMA/rxe: Split rxe_rcv_mcast_pkt into two phases
> >>   RDMA/rxe: Replace locks by rxe->mcg_lock
> >>   RDMA/rxe: Replace pool key by rxe->mcg_tree
> >>   RDMA/rxe: Remove key'ed object support
> >>   RDMA/rxe: Remove mcg from rxe pools
> >>   RDMA/rxe: Add code to cleanup mcast memory
> >>   RDMA/rxe: Add comments to rxe_mcast.c
> >>   RDMA/rxe: Separate code into subroutines
> > 
> > I think you should try to get up to here done in one series and
> > merged, it looked OK
> 
> Jason,
> 
> I have these ready again. It is a little restructured but gets to the same place.
> Last time I sent things in you had a complaint but it got mangled somehow so I
> couldn't read it. Is there anything else I should be looking at before posting these
> again?

I think I said you shouldn't re-send patches I've already applied?

Jason