diff mbox

[v1,00/24] New fast registration API

Message ID 560109B7.8070005@mellanox.com (mailing list archive)
State New, archived
Headers show

Commit Message

Sagi Grimberg Sept. 22, 2015, 7:56 a.m. UTC
On 9/22/2015 10:19 AM, Sagi Grimberg wrote:
>>
>> As mentioned earlier, I have a WIP RDS fastreg branch [3]
>> which is functional (at least I can RDMA messages across
>> nodes ;-)).
>
> Nice!
>
>> So merging [2] and [3], I created [4] and applied
>> a delta change based on your other patches. I saw ib_post_send
>> failure with my HCA driver returning '-EINVAL'. I didn't
>> debug it further but at least opcode and num_sge were set
>> correctly so I shouldn't have seen it. So I did memset()
>> on reg_wr which seems to have helped to fix the ib_post_send()
>> failure.
>
> Yep - that was my fault. When converting the ULPs I optimized by
> removing the memset but I forgot to set reg_wr.wr.next = NULL when
> the ULP needed. This caused the driver to read a second bogus work
> request. Steve just reported this as well so I'll fix that in v2.
>
>>
>> But I got into remote access errors which tells me that I
>> have messed up setup(rkey, sge setup or access flags)
>
> One thing that pops is that in the old API the MR was registered
> with iova_start = 0 (which is probably what was sent to the peer),
> but the new API the iova is implicitly sg_dma_address(&sg[0]).
>
> The registered MR holds these attributes in:
> mr->rkey
> mr->iova
> mr->length
>
> These should be passed to a peer to perform rdma.
>
> Hope this helps,
> Sagi.

ohh, I just read the RDS 3.1 specification (for the first time..) and I
noticed that RDS 3.1 header extension contains only a 32bit offset
parameter. Why is that anyway? why not 64bit so it can be a valid mapped
address? Also the code doesn't use it at all and always passes 0 (which
is buggy if sg[0] has an offset from a page).

This won't work with the proposed API as the iova is 64bit (as all other
existing RDMA protocols use 64bit addresses).

In any event, I'd much rather to add ib_map_mr_sg_zbva() just for RDS
to use instead of polluting the API with an iova argument, but I think
that the RDS spec can be updated to use 64bit offsets and align to all
other RDMA protocols (it has enough space in h_exthdr which is 128bit).

I was thinking of:
--

Thoughts?

Santosh, can you use that one instead and let us know if
it resolves your issue?

I think you should make sure to correctly construct the
h_exthdr with: rds_rdma_make_cookie(mr->rkey, (32)mr->iova)
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Santosh Shilimkar Sept. 22, 2015, 6:23 p.m. UTC | #1
On 9/22/2015 12:56 AM, Sagi Grimberg wrote:
> On 9/22/2015 10:19 AM, Sagi Grimberg wrote:
>>>
>>> As mentioned earlier, I have a WIP RDS fastreg branch [3]
>>> which is functional (at least I can RDMA messages across
>>> nodes ;-)).
>>
>> Nice!
>>
>>> So merging [2] and [3], I created [4] and applied
>>> a delta change based on your other patches. I saw ib_post_send
>>> failure with my HCA driver returning '-EINVAL'. I didn't
>>> debug it further but at least opcode and num_sge were set
>>> correctly so I shouldn't have seen it. So I did memset()
>>> on reg_wr which seems to have helped to fix the ib_post_send()
>>> failure.
>>
>> Yep - that was my fault. When converting the ULPs I optimized by
>> removing the memset but I forgot to set reg_wr.wr.next = NULL when
>> the ULP needed. This caused the driver to read a second bogus work
>> request. Steve just reported this as well so I'll fix that in v2.
>>
Ahh, right. There can be chain of wr.

>>>
>>> But I got into remote access errors which tells me that I
>>> have messed up setup(rkey, sge setup or access flags)
>>
>> One thing that pops is that in the old API the MR was registered
>> with iova_start = 0 (which is probably what was sent to the peer),
>> but the new API the iova is implicitly sg_dma_address(&sg[0]).
>>
>> The registered MR holds these attributes in:
>> mr->rkey
>> mr->iova
>> mr->length
>>
>> These should be passed to a peer to perform rdma.
>>
right.

> ohh, I just read the RDS 3.1 specification (for the first time..) and I
> noticed that RDS 3.1 header extension contains only a 32bit offset
> parameter. Why is that anyway? why not 64bit so it can be a valid mapped
> address? Also the code doesn't use it at all and always passes 0 (which
> is buggy if sg[0] has an offset from a page).
>
> This won't work with the proposed API as the iova is 64bit (as all other
> existing RDMA protocols use 64bit addresses).
>
> In any event, I'd much rather to add ib_map_mr_sg_zbva() just for RDS
> to use instead of polluting the API with an iova argument, but I think
> that the RDS spec can be updated to use 64bit offsets and align to all
> other RDMA protocols (it has enough space in h_exthdr which is 128bit).
>
RDS assumes it's an offset and hence it has been used as 32 bit. I need
to look through this carefully though because all the existing
application use this header format. There is also RDMA read/write
byte information sent as part of the header(Not in upstream code yet)
so the space might be less. But point taken. Will look into it.

> I was thinking of:
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index e7e0251..61fcab4 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -3033,6 +3033,21 @@ int ib_map_mr_sg(struct ib_mr *mr,
>                   unsigned int sg_nents,
>                   unsigned int page_size);
>
> +static inline int
> +ib_map_mr_sg_zbva(struct ib_mr *mr,
> +                 struct scatterlist *sg,
> +                 unsigned int sg_nents,
> +                 unsigned int page_size)
> +{
> +       int rc;
> +
> +       rc = ib_map_mr_sg(mr, sg, sg_nents, page_size);
> +       if (likely(!rc))
> +               mr->iova &= ((u64)page_size - 1);
> +
> +       return rc;
> +}
> +
>   int ib_sg_to_pages(struct ib_mr *mr,
>                     struct scatterlist *sgl,
>                     unsigned int sg_nents,
> --
>
> Thoughts?
>
> Santosh, can you use that one instead and let us know if
> it resolves your issue?
>
Unfortunately this change still doesn't fix the issue.

> I think you should make sure to correctly construct the
> h_exthdr with: rds_rdma_make_cookie(mr->rkey, (32)mr->iova)

Will look into it. Thanks for suggestion.

Regards,
Santosh
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e7e0251..61fcab4 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3033,6 +3033,21 @@  int ib_map_mr_sg(struct ib_mr *mr,
                  unsigned int sg_nents,
                  unsigned int page_size);

+static inline int
+ib_map_mr_sg_zbva(struct ib_mr *mr,
+                 struct scatterlist *sg,
+                 unsigned int sg_nents,
+                 unsigned int page_size)
+{
+       int rc;
+
+       rc = ib_map_mr_sg(mr, sg, sg_nents, page_size);
+       if (likely(!rc))
+               mr->iova &= ((u64)page_size - 1);
+
+       return rc;
+}
+
  int ib_sg_to_pages(struct ib_mr *mr,
                    struct scatterlist *sgl,
                    unsigned int sg_nents,