diff mbox

[5/6] IB core: Fix ib_sg_to_pages()

Message ID 565DE49D.4020102@sandisk.com (mailing list archive)
State Superseded
Headers show

Commit Message

Bart Van Assche Dec. 1, 2015, 6:19 p.m. UTC
Fix the code for detecting gaps and disable the code for chunking.
A gap occurs not only if the second or later scatterlist element
is not aligned but also if any scatterlist element other than the
last does not end at a page boundary. Disable the code for chunking.
Ensure that this function returns a negative error code instead of
zero if the first set_page() call fails.

Fixes: commit 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: stable <stable@vger.kernel.org> # v4.4+
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
---
 drivers/infiniband/core/verbs.c | 31 +++++++------------------------
 1 file changed, 7 insertions(+), 24 deletions(-)

Comments

Sagi Grimberg Dec. 1, 2015, 6:32 p.m. UTC | #1
Hi Bart,

> Fix the code for detecting gaps and disable the code for chunking.
> A gap occurs not only if the second or later scatterlist element
> is not aligned but also if any scatterlist element other than the
> last does not end at a page boundary. Disable the code for chunking.

So can you explain what went wrong with the code? If ib_sg_to_pages()
supports chunking, then sg elements are allowed not to end at a page
boundary if it is physically contig to the next sg and then the next
is chunked. Care to explain how did ib_sg_to_pages mess up?

> Ensure that this function returns a negative error code instead of
> zero if the first set_page() call fails.

Umm, my recollection was to make ib_map_mr_sg return the largest prefix
mapped. I don't mind a negative error in this case, but isn't zero an
implicit error (given you didn't want to map 0 sg elements)?

If we do agree on this we need to change ib_map_mr_sg documentation
accordingly.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Dec. 1, 2015, 7:10 p.m. UTC | #2
On 12/01/2015 10:32 AM, Sagi Grimberg wrote:
>> Fix the code for detecting gaps and disable the code for chunking.
>> A gap occurs not only if the second or later scatterlist element
>> is not aligned but also if any scatterlist element other than the
>> last does not end at a page boundary. Disable the code for chunking.
>
> So can you explain what went wrong with the code? If ib_sg_to_pages()
> supports chunking, then sg elements are allowed not to end at a page
> boundary if it is physically contig to the next sg and then the next
> is chunked. Care to explain how did ib_sg_to_pages mess up?

With the upstream implementation, if the previous element ends at a page 
boundary (last_end_dma_addr == dma_addr) but the current element does 
not start at a page boundary (page_addr != dma_addr) then the current 
element is mapped. I think this is wrong because this condition 
indicates a gap and hence that ib_sg_to_pages() should stop iterating in 
this case.

>> Ensure that this function returns a negative error code instead of
>> zero if the first set_page() call fails.
>
> Umm, my recollection was to make ib_map_mr_sg return the largest prefix
> mapped. I don't mind a negative error in this case, but isn't zero an
> implicit error (given you didn't want to map 0 sg elements)?
>
> If we do agree on this we need to change ib_map_mr_sg documentation
> accordingly.

How ib_sg_to_pages() reports to its caller that mapping the first 
scatterlist element failed is not important to me. I included that 
change in this patch because I noticed that in the upstream SRP 
initiator does not consider ib_map_mr_sg() returning zero as an error. I 
think either ib_sg_to_pages() or ib_map_mr_sg() should be modified such 
that "no pages have been mapped" is handled as a mapping failure.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 043a60e..95836c6 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1530,10 +1530,9 @@  int ib_sg_to_pages(struct ib_mr *mr,
 		   int (*set_page)(struct ib_mr *, u64))
 {
 	struct scatterlist *sg;
-	u64 last_end_dma_addr = 0, last_page_addr = 0;
 	unsigned int last_page_off = 0;
 	u64 page_mask = ~((u64)mr->page_size - 1);
-	int i;
+	int i, ret;
 
 	mr->iova = sg_dma_address(&sgl[0]);
 	mr->length = 0;
@@ -1544,37 +1543,21 @@  int ib_sg_to_pages(struct ib_mr *mr,
 		u64 end_dma_addr = dma_addr + dma_len;
 		u64 page_addr = dma_addr & page_mask;
 
-		if (i && page_addr != dma_addr) {
-			if (last_end_dma_addr != dma_addr) {
-				/* gap */
-				goto done;
-
-			} else if (last_page_off + dma_len <= mr->page_size) {
-				/* chunk this fragment with the last */
-				mr->length += dma_len;
-				last_end_dma_addr += dma_len;
-				last_page_off += dma_len;
-				continue;
-			} else {
-				/* map starting from the next page */
-				page_addr = last_page_addr + mr->page_size;
-				dma_len -= mr->page_size - last_page_off;
-			}
-		}
+		/* gap */
+		if (i && (last_page_off || page_addr != dma_addr))
+			break;
 
 		do {
-			if (unlikely(set_page(mr, page_addr)))
-				goto done;
+			ret = set_page(mr, page_addr);
+			if (unlikely(ret < 0))
+				return i ? i : ret;
 			page_addr += mr->page_size;
 		} while (page_addr < end_dma_addr);
 
 		mr->length += dma_len;
-		last_end_dma_addr = end_dma_addr;
-		last_page_addr = end_dma_addr & page_mask;
 		last_page_off = end_dma_addr & ~page_mask;
 	}
 
-done:
 	return i;
 }
 EXPORT_SYMBOL(ib_sg_to_pages);