mbox series

[for-next,v5,0/6] Replace AV by AH in UD sends

Message ID 20211006015815.28350-1-rpearsonhpe@gmail.com (mailing list archive)
Headers show
Series Replace AV by AH in UD sends | expand

Message

Bob Pearson Oct. 6, 2021, 1:58 a.m. UTC
Currently the rdma_rxe driver and its user space provider exchange
addressing information for UD sends by having the provider compute an
address vector (AV) and send it with each WQE. This is not the way
that the RDMA verbs API was intended to operate.

This series of patches modifies the way UD send WQEs work by exchanging
an index identifying the AH replacing the 88 byte AV by a 4 byte AH
index. In order to not break compatibility with the existing API the
rdma_rxe driver will recognise when an older version of the provider
is not sending an index (i.e. it is 0) and will use the AV instead.

This series of patches is identical to the previous version
but rebased to 5.15.0-rc2+. It applies cleanly to

    commit: d30ef6d5c013c19e907f2a3a3d6eee04fcd3de0d (for-next)

---
v5:
  Rebase to 5.15.0-rc2+

v4:
  Rebase to 5.15.0-rc1+

v3:
  Split up commits into smaller steps.

v2:
  Rearranged AV in rxe_send_wqe to be in the ud struct but padded to the
  same offset as the original preserving ABI compatibility.

Bob Pearson (6):
  RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wr
  RDMA/rxe: Change AH objects to indexed
  RDMA/rxe: Create AH index and return to user space
  RDMA/rxe: Replace ah->pd by ah->ibah.pd
  RDMA/rxe: Lookup kernel AH from ah index in UD WQEs
  RDMA/rxe: Convert kernel UD post send to use ah_num

 drivers/infiniband/sw/rxe/rxe_av.c    | 20 +++++++++++++-
 drivers/infiniband/sw/rxe/rxe_param.h |  4 ++-
 drivers/infiniband/sw/rxe/rxe_pool.c  |  4 ++-
 drivers/infiniband/sw/rxe/rxe_req.c   |  8 +++---
 drivers/infiniband/sw/rxe/rxe_verbs.c | 39 ++++++++++++++++++++++-----
 drivers/infiniband/sw/rxe/rxe_verbs.h |  8 +++++-
 include/uapi/rdma/rdma_user_rxe.h     | 10 ++++++-
 7 files changed, 79 insertions(+), 14 deletions(-)

Comments

Jason Gunthorpe Oct. 6, 2021, 7:37 p.m. UTC | #1
On Tue, Oct 05, 2021 at 08:58:09PM -0500, Bob Pearson wrote:
> Currently the rdma_rxe driver and its user space provider exchange
> addressing information for UD sends by having the provider compute an
> address vector (AV) and send it with each WQE. This is not the way
> that the RDMA verbs API was intended to operate.
> 
> This series of patches modifies the way UD send WQEs work by exchanging
> an index identifying the AH replacing the 88 byte AV by a 4 byte AH
> index. In order to not break compatibility with the existing API the
> rdma_rxe driver will recognise when an older version of the provider
> is not sending an index (i.e. it is 0) and will use the AV instead.
> 
> This series of patches is identical to the previous version
> but rebased to 5.15.0-rc2+. It applies cleanly to
> 
>     commit: d30ef6d5c013c19e907f2a3a3d6eee04fcd3de0d (for-next)
> 
> v5:
>   Rebase to 5.15.0-rc2+

This is not the right base, I said you needed something path Rao's
patch like current rdma for-next since it gets conflicts:

Applying: RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wr
Applying: RDMA/rxe: Change AH objects to indexed
Using index info to reconstruct a base tree...
M	drivers/infiniband/sw/rxe/rxe_param.h
Falling back to patching base and 3-way merge...
Auto-merging drivers/infiniband/sw/rxe/rxe_param.h
CONFLICT (content): Merge conflict in drivers/infiniband/sw/rxe/rxe_param.h
error: Failed to merge in the changes.
Patch failed at 0002 RDMA/rxe: Change AH objects to indexed
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Try again

Jason
Bob Pearson Oct. 7, 2021, 7:51 p.m. UTC | #2
On 10/7/21 2:05 PM, Jason Gunthorpe wrote:
> On Thu, Oct 07, 2021 at 01:53:27PM -0500, Bob Pearson wrote:
> 
>> On looking, Rao's patch is not in for-next. Last one was
>> January. Which branch are you looking at?
> 
> Oh, it is still in the wip branch, try now
> 
> Jason
> 

I see the issue. Rao is asking for 2^20 objects max by default which will
require 128KiB of memory in the index reservation bit mask for each of them.
There are 4 indexed objects QP by qpn, SRQ by srqn, MR by rkey and MW by rkey.
That's 512KiB of memory which seems excessive to me for many use cases where the
number of objects is fairly small.

The bit mask is used to allocate and free the indices and there is also a red black
tree that is used to look up objects by their index (or key if they use keys instead.)

If there is a usual way to address these kinds of issues in Linux maybe we should
consider that. If not there are a couple of approaches we could take. First would
be to get rid of the index bit mask and just hand out randomly selected indices in
(a bigger range) and detect collisions when we insert the object into the red black
tree and retry. This is basically what happens with 'keys' for example mgids for
multicast group elements. Alternatively we could leave the max big but limit the
allocated indices to a smaller amount until the total number of allocated indices
reached some threshold and then extend the bit mask table. Then only the use cases
that really needed the big index range would pay the price for the memory.

Random indices would slightly reduce some of the security issues that have been
pointed out about the InfiniBand transport.

I am looking for suggestions on how to go forward here.

Bob
Jason Gunthorpe Oct. 7, 2021, 7:57 p.m. UTC | #3
On Thu, Oct 07, 2021 at 02:51:11PM -0500, Bob Pearson wrote:
> On 10/7/21 2:05 PM, Jason Gunthorpe wrote:
> > On Thu, Oct 07, 2021 at 01:53:27PM -0500, Bob Pearson wrote:
> > 
> >> On looking, Rao's patch is not in for-next. Last one was
> >> January. Which branch are you looking at?
> > 
> > Oh, it is still in the wip branch, try now
> > 
> > Jason
> > 
> 
> I see the issue. Rao is asking for 2^20 objects max by default which will
> require 128KiB of memory in the index reservation bit mask for each of them.
> There are 4 indexed objects QP by qpn, SRQ by srqn, MR by rkey and MW by rkey.
> That's 512KiB of memory which seems excessive to me for many use cases where the
> number of objects is fairly small.
> 
> The bit mask is used to allocate and free the indices and there is also a red black
> tree that is used to look up objects by their index (or key if they use keys instead.)
> 
> If there is a usual way to address these kinds of issues in Linux maybe we should
> consider that.

Use an allocating xarray

But for these AV patches just fix the merge conflict to something sane
and go ahead

Jason
Shoaib Rao Oct. 7, 2021, 8:40 p.m. UTC | #4
On 10/7/21 12:57 PM, Jason Gunthorpe wrote:
> On Thu, Oct 07, 2021 at 02:51:11PM -0500, Bob Pearson wrote:
>> On 10/7/21 2:05 PM, Jason Gunthorpe wrote:
>>> On Thu, Oct 07, 2021 at 01:53:27PM -0500, Bob Pearson wrote:
>>>
>>>> On looking, Rao's patch is not in for-next. Last one was
>>>> January. Which branch are you looking at?
>>> Oh, it is still in the wip branch, try now
>>>
>>> Jason
>>>
>> I see the issue. Rao is asking for 2^20 objects max by default which will
>> require 128KiB of memory in the index reservation bit mask for each of them.
>> There are 4 indexed objects QP by qpn, SRQ by srqn, MR by rkey and MW by rkey.
>> That's 512KiB of memory which seems excessive to me for many use cases where the
>> number of objects is fairly small.
>>
>> The bit mask is used to allocate and free the indices and there is also a red black
>> tree that is used to look up objects by their index (or key if they use keys instead.)
>>
>> If there is a usual way to address these kinds of issues in Linux maybe we should
>> consider that.
> Use an allocating xarray
>
> But for these AV patches just fix the merge conflict to something sane
> and go ahead
>
> Jason

I did not want to increase the values too high but we discussed it so I 
did. Let me know if I need to modify the patch and reduce the values.

Shoaib
Bob Pearson Oct. 7, 2021, 10 p.m. UTC | #5
On 10/7/21 3:40 PM, Shoaib Rao wrote:
> 
> On 10/7/21 12:57 PM, Jason Gunthorpe wrote:
>> On Thu, Oct 07, 2021 at 02:51:11PM -0500, Bob Pearson wrote:
>>> On 10/7/21 2:05 PM, Jason Gunthorpe wrote:
>>>> On Thu, Oct 07, 2021 at 01:53:27PM -0500, Bob Pearson wrote:
>>>>
>>>>> On looking, Rao's patch is not in for-next. Last one was
>>>>> January. Which branch are you looking at?
>>>> Oh, it is still in the wip branch, try now
>>>>
>>>> Jason
>>>>
>>> I see the issue. Rao is asking for 2^20 objects max by default which will
>>> require 128KiB of memory in the index reservation bit mask for each of them.
>>> There are 4 indexed objects QP by qpn, SRQ by srqn, MR by rkey and MW by rkey.
>>> That's 512KiB of memory which seems excessive to me for many use cases where the
>>> number of objects is fairly small.
>>>
>>> The bit mask is used to allocate and free the indices and there is also a red black
>>> tree that is used to look up objects by their index (or key if they use keys instead.)
>>>
>>> If there is a usual way to address these kinds of issues in Linux maybe we should
>>> consider that.
>> Use an allocating xarray
>>
>> But for these AV patches just fix the merge conflict to something sane
>> and go ahead
>>
>> Jason
> 
> I did not want to increase the values too high but we discussed it so I did. Let me know if I need to modify the patch and reduce the values.
> 
> Shoaib
> 

If we convert the rxe_pools to use xarrays as Jason suggests it looks like this issue
goes away. I'm looking at that.

Bob
Shoaib Rao Oct. 7, 2021, 10:53 p.m. UTC | #6
On 10/7/21 3:00 PM, Bob Pearson wrote:
> On 10/7/21 3:40 PM, Shoaib Rao wrote:
>> On 10/7/21 12:57 PM, Jason Gunthorpe wrote:
>>> On Thu, Oct 07, 2021 at 02:51:11PM -0500, Bob Pearson wrote:
>>>> On 10/7/21 2:05 PM, Jason Gunthorpe wrote:
>>>>> On Thu, Oct 07, 2021 at 01:53:27PM -0500, Bob Pearson wrote:
>>>>>
>>>>>> On looking, Rao's patch is not in for-next. Last one was
>>>>>> January. Which branch are you looking at?
>>>>> Oh, it is still in the wip branch, try now
>>>>>
>>>>> Jason
>>>>>
>>>> I see the issue. Rao is asking for 2^20 objects max by default which will
>>>> require 128KiB of memory in the index reservation bit mask for each of them.
>>>> There are 4 indexed objects QP by qpn, SRQ by srqn, MR by rkey and MW by rkey.
>>>> That's 512KiB of memory which seems excessive to me for many use cases where the
>>>> number of objects is fairly small.
>>>>
>>>> The bit mask is used to allocate and free the indices and there is also a red black
>>>> tree that is used to look up objects by their index (or key if they use keys instead.)
>>>>
>>>> If there is a usual way to address these kinds of issues in Linux maybe we should
>>>> consider that.
>>> Use an allocating xarray
>>>
>>> But for these AV patches just fix the merge conflict to something sane
>>> and go ahead
>>>
>>> Jason
>> I did not want to increase the values too high but we discussed it so I did. Let me know if I need to modify the patch and reduce the values.
>>
>> Shoaib
>>
> If we convert the rxe_pools to use xarrays as Jason suggests it looks like this issue
> goes away. I'm looking at that.
>
> Bob

Thanks Bob. Let me know if there is anything that I can help out with.

Shoaib