diff mbox series

[04/20] RDMA/core: Introduce ib_map_mr_sg_pi to map data/protection sgl's

Message ID 1559222731-16715-5-git-send-email-maxg@mellanox.com (mailing list archive)
State Superseded
Headers show
Series Introduce new API for T10-PI offload | expand

Commit Message

Max Gurtovoy May 30, 2019, 1:25 p.m. UTC
This function will map the previously dma mapped SG lists for PI
(protection information) and data to an appropriate memory region for
future registration.
The given MR must be allocated as IB_MR_TYPE_INTEGRITY.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/infiniband/core/device.c |  1 +
 drivers/infiniband/core/verbs.c  | 40 +++++++++++++++++++++++++++++++++++++++-
 include/rdma/ib_verbs.h          |  9 +++++++++
 3 files changed, 49 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig June 4, 2019, 7:35 a.m. UTC | #1
On Thu, May 30, 2019 at 04:25:15PM +0300, Max Gurtovoy wrote:
> This function will map the previously dma mapped SG lists for PI
> (protection information) and data to an appropriate memory region for
> future registration.
> The given MR must be allocated as IB_MR_TYPE_INTEGRITY.
> 
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>

Except for a similar nit as for the last patch:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Sagi Grimberg June 5, 2019, 5:38 p.m. UTC | #2
> +/**
> + * ib_map_mr_sg_pi() - Map the dma mapped SG lists for PI (protection
> + *     information) and set an appropriate memory region for registration.
> + * @mr:             memory region
> + * @data_sg:        dma mapped scatterlist for data
> + * @data_sg_nents:  number of entries in data_sg
> + * @data_sg_offset: offset in bytes into data_sg
> + * @meta_sg:        dma mapped scatterlist for metadata
> + * @meta_sg_nents:  number of entries in meta_sg
> + * @meta_sg_offset: offset in bytes into meta_sg
> + * @page_size:      page vector desired page size
> + *
> + * Constraints:
> + * - The MR must be allocated with type IB_MR_TYPE_INTEGRITY.
> + *
> + * Returns the number of sg elements that were mapped to the memory region.

Question, is it possible that all data sges were mapped but not all
meta sges? Given that there is a non-trivial accounting on the relations
between data and meta sges maybe the return value should be
success/failure?

Or, if this cannot happen we need to describe why here.
Max Gurtovoy June 5, 2019, 8:52 p.m. UTC | #3
On 6/5/2019 8:38 PM, Sagi Grimberg wrote:
>
>> +/**
>> + * ib_map_mr_sg_pi() - Map the dma mapped SG lists for PI (protection
>> + *     information) and set an appropriate memory region for 
>> registration.
>> + * @mr:             memory region
>> + * @data_sg:        dma mapped scatterlist for data
>> + * @data_sg_nents:  number of entries in data_sg
>> + * @data_sg_offset: offset in bytes into data_sg
>> + * @meta_sg:        dma mapped scatterlist for metadata
>> + * @meta_sg_nents:  number of entries in meta_sg
>> + * @meta_sg_offset: offset in bytes into meta_sg
>> + * @page_size:      page vector desired page size
>> + *
>> + * Constraints:
>> + * - The MR must be allocated with type IB_MR_TYPE_INTEGRITY.
>> + *
>> + * Returns the number of sg elements that were mapped to the memory 
>> region.
>
> Question, is it possible that all data sges were mapped but not all
> meta sges? Given that there is a non-trivial accounting on the relations
> between data and meta sges maybe the return value should be
> success/failure?

if data_sges will be mapped but not all meta_sges then the check of 
return value n == data_nents + meta_nents will fail.

I don't understand the concern here.

Can you give an example ?

>
>
> Or, if this cannot happen we need to describe why here.

failures can always happen :)
Sagi Grimberg June 5, 2019, 10:31 p.m. UTC | #4
>>> +/**
>>> + * ib_map_mr_sg_pi() - Map the dma mapped SG lists for PI (protection
>>> + *     information) and set an appropriate memory region for 
>>> registration.
>>> + * @mr:             memory region
>>> + * @data_sg:        dma mapped scatterlist for data
>>> + * @data_sg_nents:  number of entries in data_sg
>>> + * @data_sg_offset: offset in bytes into data_sg
>>> + * @meta_sg:        dma mapped scatterlist for metadata
>>> + * @meta_sg_nents:  number of entries in meta_sg
>>> + * @meta_sg_offset: offset in bytes into meta_sg
>>> + * @page_size:      page vector desired page size
>>> + *
>>> + * Constraints:
>>> + * - The MR must be allocated with type IB_MR_TYPE_INTEGRITY.
>>> + *
>>> + * Returns the number of sg elements that were mapped to the memory 
>>> region.
>>
>> Question, is it possible that all data sges were mapped but not all
>> meta sges? Given that there is a non-trivial accounting on the relations
>> between data and meta sges maybe the return value should be
>> success/failure?
> 
> if data_sges will be mapped but not all meta_sges then the check of 
> return value n == data_nents + meta_nents will fail.

That check is in the ulp, the API preferably should not assume that the
ulp will do that and not try to map the remaining sges with a different
mr as it is much less trivial to do than the normal mr mapping.

That's why I suggest to return success/fail and not the number of
SG elems that were mapped.

> I don't understand the concern here.
> 
> Can you give an example ?

In case your max nents for data+meta is 16 but you get data_sg=15
and meta_sg=2, its not that if you map 15+1 you can trivially continue
from that point.

>> Or, if this cannot happen we need to describe why here.
> 
> failures can always happen :)

Yes, but what I was referring to is the difference between the normal
mr_map_sg and the mr_map_sg_pi. srp for example actually uses the
continuation of the number of mapped sg returned, it cannot do the
same with the pi version.
Max Gurtovoy June 5, 2019, 11:23 p.m. UTC | #5
On 6/6/2019 1:31 AM, Sagi Grimberg wrote:
>
>>>> +/**
>>>> + * ib_map_mr_sg_pi() - Map the dma mapped SG lists for PI (protection
>>>> + *     information) and set an appropriate memory region for 
>>>> registration.
>>>> + * @mr:             memory region
>>>> + * @data_sg:        dma mapped scatterlist for data
>>>> + * @data_sg_nents:  number of entries in data_sg
>>>> + * @data_sg_offset: offset in bytes into data_sg
>>>> + * @meta_sg:        dma mapped scatterlist for metadata
>>>> + * @meta_sg_nents:  number of entries in meta_sg
>>>> + * @meta_sg_offset: offset in bytes into meta_sg
>>>> + * @page_size:      page vector desired page size
>>>> + *
>>>> + * Constraints:
>>>> + * - The MR must be allocated with type IB_MR_TYPE_INTEGRITY.
>>>> + *
>>>> + * Returns the number of sg elements that were mapped to the 
>>>> memory region.
>>>
>>> Question, is it possible that all data sges were mapped but not all
>>> meta sges? Given that there is a non-trivial accounting on the 
>>> relations
>>> between data and meta sges maybe the return value should be
>>> success/failure?
>>
>> if data_sges will be mapped but not all meta_sges then the check of 
>> return value n == data_nents + meta_nents will fail.
>
> That check is in the ulp, the API preferably should not assume that the
> ulp will do that and not try to map the remaining sges with a different
> mr as it is much less trivial to do than the normal mr mapping.
>
>
> That's why I suggest to return success/fail and not the number of
> SG elems that were mapped.

It's not a problem doing it, but I wanted it to be similar return code 
and the regular data mapping function.

I guess it safer to return success/fail..

>
>> I don't understand the concern here.
>>
>> Can you give an example ?
>
> In case your max nents for data+meta is 16 but you get data_sg=15
> and meta_sg=2, its not that if you map 15+1 you can trivially continue
> from that point.

This can't happen since if max_nents is 16, I limit the max_data_nents 
to 8 and max_meta_nents to 8.

>
>>> Or, if this cannot happen we need to describe why here.
>>
>> failures can always happen :)
>
> Yes, but what I was referring to is the difference between the normal
> mr_map_sg and the mr_map_sg_pi. srp for example actually uses the
> continuation of the number of mapped sg returned, it cannot do the
> same with the pi version.

Well I guess we can still use this API in SRP and set the pointers and 
offsets correctly.
Sagi Grimberg June 5, 2019, 11:46 p.m. UTC | #6
>> That's why I suggest to return success/fail and not the number of
>> SG elems that were mapped.
> 
> It's not a problem doing it, but I wanted it to be similar return code 
> and the regular data mapping function.
> 
> I guess it safer to return success/fail..
> 
>>
>>> I don't understand the concern here.
>>>
>>> Can you give an example ?
>>
>> In case your max nents for data+meta is 16 but you get data_sg=15
>> and meta_sg=2, its not that if you map 15+1 you can trivially continue
>> from that point.
> 
> This can't happen since if max_nents is 16, I limit the max_data_nents 
> to 8 and max_meta_nents to 8.

Still there is a possibility that we cannot map all nents. Exactly like
srp sends all the nents and allocates more until it finishes in the
normal mr mapping.

>>>> Or, if this cannot happen we need to describe why here.
>>>
>>> failures can always happen :)
>>
>> Yes, but what I was referring to is the difference between the normal
>> mr_map_sg and the mr_map_sg_pi. srp for example actually uses the
>> continuation of the number of mapped sg returned, it cannot do the
>> same with the pi version.
> 
> Well I guess we can still use this API in SRP and set the pointers and 
> offsets correctly.

That is another option, but the mapping routine needs to perform a non
trivial klm/mtt trimming in the case we did not cover all the entries,
its not trivial because the data pages and the meta pages have
completely different alignments.

I'd say that its simpler to make this a success/fail until srp or alike
will actually use it and then we can make it continuation friendly.
diff mbox series

Patch

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 14a6d351ad1e..90c71542e76e 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2389,6 +2389,7 @@  void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
 	SET_DEVICE_OP(dev_ops, iw_reject);
 	SET_DEVICE_OP(dev_ops, iw_rem_ref);
 	SET_DEVICE_OP(dev_ops, map_mr_sg);
+	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
 	SET_DEVICE_OP(dev_ops, map_phys_fmr);
 	SET_DEVICE_OP(dev_ops, mmap);
 	SET_DEVICE_OP(dev_ops, modify_ah);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 80b9dc0dbd6a..96ddae029095 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -2040,7 +2040,8 @@  struct ib_mr *ib_alloc_mr_integrity(struct ib_pd *pd,
 {
 	struct ib_mr *mr;
 
-	if (!pd->device->ops.alloc_mr_integrity)
+	if (!pd->device->ops.alloc_mr_integrity ||
+	    !pd->device->ops.map_mr_sg_pi)
 		return ERR_PTR(-EOPNOTSUPP);
 
 	if (!max_num_meta_sg)
@@ -2423,6 +2424,43 @@  int ib_set_vf_guid(struct ib_device *device, int vf, u8 port, u64 guid,
 }
 EXPORT_SYMBOL(ib_set_vf_guid);
 
+/**
+ * ib_map_mr_sg_pi() - Map the dma mapped SG lists for PI (protection
+ *     information) and set an appropriate memory region for registration.
+ * @mr:             memory region
+ * @data_sg:        dma mapped scatterlist for data
+ * @data_sg_nents:  number of entries in data_sg
+ * @data_sg_offset: offset in bytes into data_sg
+ * @meta_sg:        dma mapped scatterlist for metadata
+ * @meta_sg_nents:  number of entries in meta_sg
+ * @meta_sg_offset: offset in bytes into meta_sg
+ * @page_size:      page vector desired page size
+ *
+ * Constraints:
+ * - The MR must be allocated with type IB_MR_TYPE_INTEGRITY.
+ *
+ * Returns the number of sg elements that were mapped to the memory region.
+ *
+ * After this completes successfully, the  memory region
+ * is ready for registration.
+ */
+int ib_map_mr_sg_pi(struct ib_mr *mr, struct scatterlist *data_sg,
+		    int data_sg_nents, unsigned int *data_sg_offset,
+		    struct scatterlist *meta_sg, int meta_sg_nents,
+		    unsigned int *meta_sg_offset, unsigned int page_size)
+{
+	if (unlikely(!mr->device->ops.map_mr_sg_pi ||
+		     WARN_ON_ONCE(mr->type != IB_MR_TYPE_INTEGRITY)))
+		return -EOPNOTSUPP;
+
+	mr->page_size = page_size;
+
+	return mr->device->ops.map_mr_sg_pi(mr, data_sg, data_sg_nents,
+					    data_sg_offset, meta_sg,
+					    meta_sg_nents, meta_sg_offset);
+}
+EXPORT_SYMBOL(ib_map_mr_sg_pi);
+
 /**
  * ib_map_mr_sg() - Map the largest prefix of a dma mapped SG list
  *     and set it the memory region.
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1669a3886abf..96b868a223c5 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2444,6 +2444,11 @@  struct ib_device_ops {
 	int (*read_counters)(struct ib_counters *counters,
 			     struct ib_counters_read_attr *counters_read_attr,
 			     struct uverbs_attr_bundle *attrs);
+	int (*map_mr_sg_pi)(struct ib_mr *mr, struct scatterlist *data_sg,
+			    int data_sg_nents, unsigned int *data_sg_offset,
+			    struct scatterlist *meta_sg, int meta_sg_nents,
+			    unsigned int *meta_sg_offset);
+
 	/**
 	 * alloc_hw_stats - Allocate a struct rdma_hw_stats and fill in the
 	 *   driver initialized data.  The struct is kfree()'ed by the sysfs
@@ -4241,6 +4246,10 @@  int ib_destroy_rwq_ind_table(struct ib_rwq_ind_table *wq_ind_table);
 
 int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
 		 unsigned int *sg_offset, unsigned int page_size);
+int ib_map_mr_sg_pi(struct ib_mr *mr, struct scatterlist *data_sg,
+		    int data_sg_nents, unsigned int *data_sg_offset,
+		    struct scatterlist *meta_sg, int meta_sg_nents,
+		    unsigned int *meta_sg_offset, unsigned int page_size);
 
 static inline int
 ib_map_mr_sg_zbva(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,