[[PATCH,v1] 15/37] [CIFS] SMBD: Post a SMBD data transfer message with data payload
diff mbox

Message ID 1501704648-20159-16-git-send-email-longli@exchange.microsoft.com
State New
Headers show

Commit Message

Long Li Aug. 2, 2017, 8:10 p.m. UTC
From: Long Li <longli@microsoft.com>

Similar to sending transfer message with page payload, this function creates a SMBD data packet and send it over to RDMA, from iov passed from upper layer.

Signed-off-by: Long Li <longli@microsoft.com>
---
 fs/cifs/cifsrdma.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 119 insertions(+)

Comments

Christoph Hellwig Aug. 13, 2017, 10:23 a.m. UTC | #1
You can always get the struct page for kernel allocations using
virt_to_page (or vmalloc_to_page, but this code would not handle the
vmalloc case either), so I don't think you need this helper and can
always use the one added in the previous patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Talpey Aug. 14, 2017, 8:26 p.m. UTC | #2
> -----Original Message-----
> From: linux-cifs-owner@vger.kernel.org [mailto:linux-cifs-
> owner@vger.kernel.org] On Behalf Of Long Li
> Sent: Wednesday, August 2, 2017 4:10 PM
> To: Steve French <sfrench@samba.org>; linux-cifs@vger.kernel.org; samba-
> technical@lists.samba.org; linux-kernel@vger.kernel.org
> Cc: Long Li <longli@microsoft.com>
> Subject: [[PATCH v1] 15/37] [CIFS] SMBD: Post a SMBD data transfer message
> with data payload
> 
 
> Similar to sending transfer message with page payload, this function creates a
> SMBD data packet and send it over to RDMA, from iov passed from upper layer.

The following routine is heavily redundant with 14/37 cifs_rdma_post_send_page().
Because they share quite a bit of protocol and DMA mapping logic, strongly suggest
they be merged.

Tom.

> +static int cifs_rdma_post_send_data(
> +               struct cifs_rdma_info *info,
> +               struct kvec *iov, int n_vec, int remaining_data_length);
>  static int cifs_rdma_post_send_page(struct cifs_rdma_info *info,
>                 struct page *page, unsigned long offset,
>                 size_t size, int remaining_data_length);
> @@ -671,6 +674,122 @@ static int cifs_rdma_post_send_page(struct
> cifs_rdma_info *info, struct page *pa
>  }

--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Long Li Aug. 30, 2017, 2:17 a.m. UTC | #3
> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@infradead.org]
> Sent: Sunday, August 13, 2017 3:24 AM
> To: Long Li <longli@microsoft.com>
> Cc: Steve French <sfrench@samba.org>; linux-cifs@vger.kernel.org; samba-
> technical@lists.samba.org; linux-kernel@vger.kernel.org; Long Li
> <longli@microsoft.com>
> Subject: Re: [[PATCH v1] 15/37] [CIFS] SMBD: Post a SMBD data transfer
> message with data payload
> 
> You can always get the struct page for kernel allocations using virt_to_page
> (or vmalloc_to_page, but this code would not handle the vmalloc case either),
> so I don't think you need this helper and can always use the one added in the
> previous patch.

I partially addressed this issue in the V3 patch. Most of the duplicate code on sending path is merged.

The difficulty with translating the buffer to pages is that: I don't know how many pages will be translated, and how many struct page I need to allocate in advance to hold them. I try to avoid memory allocation in the I/O path as much as possible. So I keep two functions of sending data: one for buffer and one for pages.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Aug. 30, 2017, 8:51 a.m. UTC | #4
On Wed, Aug 30, 2017 at 02:17:56AM +0000, Long Li wrote:
> I partially addressed this issue in the V3 patch. Most of the duplicate
> code on sending path is merged.
> 
> The difficulty with translating the buffer to pages is that: I don't
> know how many pages will be translated, and how many struct page I need
> to allocate in advance to hold them. I try to avoid memory allocation
> in the I/O path as much as possible. So I keep two functions of 
> sending data: one for buffer and one for pages.

You do: you'll always need speace for  (len + PAGE_SIZE - 1) >> PAGE_SIZE
pages.

That being said: what callers even send you buffers?  In general we
should aim to work with pages for all allocations that aren't tiny.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Long Li Aug. 30, 2017, 6:17 p.m. UTC | #5
> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@infradead.org]
> Sent: Wednesday, August 30, 2017 1:52 AM
> To: Long Li <longli@microsoft.com>
> Cc: Christoph Hellwig <hch@infradead.org>; Steve French
> <sfrench@samba.org>; linux-cifs@vger.kernel.org; samba-
> technical@lists.samba.org; linux-kernel@vger.kernel.org
> Subject: Re: [[PATCH v1] 15/37] [CIFS] SMBD: Post a SMBD data transfer
> message with data payload
> 
> On Wed, Aug 30, 2017 at 02:17:56AM +0000, Long Li wrote:
> > I partially addressed this issue in the V3 patch. Most of the
> > duplicate code on sending path is merged.
> >
> > The difficulty with translating the buffer to pages is that: I don't
> > know how many pages will be translated, and how many struct page I
> > need to allocate in advance to hold them. I try to avoid memory
> > allocation in the I/O path as much as possible. So I keep two
> > functions of sending data: one for buffer and one for pages.
> 
> You do: you'll always need speace for  (len + PAGE_SIZE - 1) >> PAGE_SIZE
> pages.
> 
> That being said: what callers even send you buffers?  In general we should
> aim to work with pages for all allocations that aren't tiny.

I'll look through the code allocating the buffers, it probably can fit into one page. Will fix this.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/cifs/cifsrdma.c b/fs/cifs/cifsrdma.c
index b3ec109..989cad8 100644
--- a/fs/cifs/cifsrdma.c
+++ b/fs/cifs/cifsrdma.c
@@ -66,6 +66,9 @@  static int cifs_rdma_post_recv(
 		struct cifs_rdma_info *info,
 		struct cifs_rdma_response *response);
 
+static int cifs_rdma_post_send_data(
+		struct cifs_rdma_info *info,
+		struct kvec *iov, int n_vec, int remaining_data_length);
 static int cifs_rdma_post_send_page(struct cifs_rdma_info *info,
 		struct page *page, unsigned long offset,
 		size_t size, int remaining_data_length);
@@ -671,6 +674,122 @@  static int cifs_rdma_post_send_page(struct cifs_rdma_info *info, struct page *pa
 }
 
 /*
+ * Send a data buffer
+ * iov: the iov array describing the data buffers
+ * n_vec: number of iov array
+ * remaining_data_length: remaining data to send in this payload
+ */
+static int cifs_rdma_post_send_data(
+	struct cifs_rdma_info *info, struct kvec *iov, int n_vec,
+	int remaining_data_length)
+{
+	struct cifs_rdma_request *request;
+	struct smbd_data_transfer *packet;
+	struct ib_send_wr send_wr, *send_wr_fail;
+	int rc = -ENOMEM, i;
+	u32 data_length;
+
+	request = mempool_alloc(info->request_mempool, GFP_KERNEL);
+	if (!request)
+		return rc;
+
+	request->info = info;
+
+	wait_event(info->wait_send_queue, atomic_read(&info->send_credits) > 0);
+	atomic_dec(&info->send_credits);
+
+	packet = (struct smbd_data_transfer *) request->packet;
+	packet->credits_requested = cpu_to_le16(info->send_credit_target);
+	packet->flags = cpu_to_le16(0);
+	packet->reserved = cpu_to_le16(0);
+
+	packet->data_offset = cpu_to_le32(24);
+
+	data_length = 0;
+	for (i=0; i<n_vec; i++)
+		data_length += iov[i].iov_len;
+	packet->data_length = cpu_to_le32(data_length);
+
+	packet->remaining_data_length = cpu_to_le32(remaining_data_length);
+	packet->padding = cpu_to_le32(0);
+
+	log_rdma_send("credits_requested=%d credits_granted=%d data_offset=%d "
+		      "data_length=%d remaining_data_length=%d\n",
+		le16_to_cpu(packet->credits_requested),
+		le16_to_cpu(packet->credits_granted),
+		le32_to_cpu(packet->data_offset),
+		le32_to_cpu(packet->data_length),
+		le32_to_cpu(packet->remaining_data_length));
+
+	request->sge = kzalloc(sizeof(struct ib_sge)*(n_vec+1), GFP_KERNEL);
+	if (!request->sge)
+		goto allocate_sge_failed;
+
+	request->num_sge = n_vec+1;
+
+	request->sge[0].addr = ib_dma_map_single(
+				info->id->device, (void *)packet,
+				sizeof(*packet), DMA_BIDIRECTIONAL);
+	if(ib_dma_mapping_error(info->id->device, request->sge[0].addr)) {
+		rc = -EIO;
+		goto dma_mapping_failure;
+	}
+	request->sge[0].length = sizeof(*packet);
+	request->sge[0].lkey = info->pd->local_dma_lkey;
+	ib_dma_sync_single_for_device(info->id->device, request->sge[0].addr,
+		request->sge[0].length, DMA_TO_DEVICE);
+
+	for (i=0; i<n_vec; i++) {
+		request->sge[i+1].addr = ib_dma_map_single(info->id->device, iov[i].iov_base,
+						iov[i].iov_len, DMA_BIDIRECTIONAL);
+		if(ib_dma_mapping_error(info->id->device, request->sge[i+1].addr)) {
+			rc = -EIO;
+			goto dma_mapping_failure;
+		}
+		request->sge[i+1].length = iov[i].iov_len;
+		request->sge[i+1].lkey = info->pd->local_dma_lkey;
+		ib_dma_sync_single_for_device(info->id->device, request->sge[i+i].addr,
+			request->sge[i+i].length, DMA_TO_DEVICE);
+	}
+
+	log_rdma_send("rdma_request sge[0] addr=%llu legnth=%u lkey=%u\n",
+		request->sge[0].addr, request->sge[0].length, request->sge[0].lkey);
+	for (i=0; i<n_vec; i++)
+		log_rdma_send("rdma_request sge[%d] addr=%llu legnth=%u lkey=%u\n",
+			i+1, request->sge[i+1].addr,
+			request->sge[i+1].length, request->sge[i+1].lkey);
+
+	request->cqe.done = send_done;
+
+	send_wr.next = NULL;
+	send_wr.wr_cqe = &request->cqe;
+	send_wr.sg_list = request->sge;
+	send_wr.num_sge = request->num_sge;
+	send_wr.opcode = IB_WR_SEND;
+	send_wr.send_flags = IB_SEND_SIGNALED;
+
+	rc = ib_post_send(info->id->qp, &send_wr, &send_wr_fail);
+	if (!rc)
+		return 0;
+
+	// post send failed
+	log_rdma_send("ib_post_send failed rc=%d\n", rc);
+
+dma_mapping_failure:
+	for (i=0; i<n_vec+1; i++)
+		if (request->sge[i].addr)
+			ib_dma_unmap_single(info->id->device,
+					    request->sge[i].addr,
+					    request->sge[i].length,
+					    DMA_TO_DEVICE);
+	kfree(request->sge);
+
+allocate_sge_failed:
+	mempool_free(request, info->request_mempool);
+	return rc;
+}
+
+/*
  * Post a receive request to the transport
  * The remote peer can only send data when a receive is posted
  * The interaction is controlled by send/recieve credit system