diff mbox

[v2,3/4] nvme: enable SG gaps support

Message ID 1492098977-5231-4-git-send-email-maxg@mellanox.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Max Gurtovoy April 13, 2017, 3:56 p.m. UTC
For controllers that can handle arbitrarily sized bios (e.g advanced
RDMA ctrls) we can allow the block layer to pass us gaps by skip
setting the queue virt_boundary.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Leon Romanovsky <leon@kernel.org>
---
 drivers/nvme/host/core.c |    5 ++++-
 drivers/nvme/host/nvme.h |    1 +
 2 files changed, 5 insertions(+), 1 deletions(-)

Comments

Sagi Grimberg April 20, 2017, 11:27 a.m. UTC | #1
>  	if (ctrl->quirks & NVME_QUIRK_STRIPE_SIZE)
>  		blk_queue_chunk_sectors(q, ctrl->max_hw_sectors);
> -	blk_queue_virt_boundary(q, ctrl->page_size - 1);
> +
> +	if (!ctrl->sg_gaps_support)
> +		blk_queue_virt_boundary(q, ctrl->page_size - 1);
> +

I think it would be better if the ctrl will hold a virt_boundary
provided by the fabric driver? Then we would always
blk_queue_virt_boundary but with capable devices the fabric driver
can set it to 0. This will also allow for dword aligned sgls to
fit in pretty easily.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig April 21, 2017, 6:44 a.m. UTC | #2
On Thu, Apr 20, 2017 at 02:27:59PM +0300, Sagi Grimberg wrote:
> I think it would be better if the ctrl will hold a virt_boundary
> provided by the fabric driver? Then we would always
> blk_queue_virt_boundary but with capable devices the fabric driver
> can set it to 0. This will also allow for dword aligned sgls to
> fit in pretty easily.

All our block I/O must be cache line aligned, so dword aligned SGLs
should not be an issue.  And for PRPs or MRs we'll always use the
host page size.

But I don't think I really care in the end..
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg April 23, 2017, 7:20 a.m. UTC | #3
Christoph,

> All our block I/O must be cache line aligned, so dword aligned SGLs
> should not be an issue.  And for PRPs or MRs we'll always use the
> host page size.

Where is that constraint coming from? Is that new? Back when I wrote
GAP support in rdma I tested a single byte alignment via vectored
direct IO and it seemed to be going through...

I also just tested Bart's unaligned test [1] on scsi_debug and it seems
to go through as well (alignment and length are 4)...

[1]:
https://github.com/bvanassche/srp-test/blob/master/discontiguous-io/discontiguous-io.cpp
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig April 23, 2017, 8:45 a.m. UTC | #4
On Sun, Apr 23, 2017 at 10:20:59AM +0300, Sagi Grimberg wrote:
> Christoph,
>
>> All our block I/O must be cache line aligned, so dword aligned SGLs
>> should not be an issue.  And for PRPs or MRs we'll always use the
>> host page size.
>
> Where is that constraint coming from? Is that new? Back when I wrote
> GAP support in rdma I tested a single byte alignment via vectored
> direct IO and it seemed to be going through...

Each block queue has a ->dma_alignment field, which is set to 511
by default unless changed.  Everything not aligned to that will be
bounced in __blk_rq_map_user_iov and friends before we send it to
the driver.

> I also just tested Bart's unaligned test [1] on scsi_debug and it seems
> to go through as well (alignment and length are 4)...

SCSI reduces the alignment in __scsi_init_queue:

	/*
         * set a reasonable default alignment on word boundaries: the
	 * host and device may alter it using
	 * blk_queue_update_dma_alignment() later.
	 */
	blk_queue_dma_alignment(q, 0x03);

still would fit NVMe dword-alignment SGLs, though :)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg April 23, 2017, 9:45 a.m. UTC | #5
> Each block queue has a ->dma_alignment field, which is set to 511
> by default unless changed.  Everything not aligned to that will be
> bounced in __blk_rq_map_user_iov and friends before we send it to
> the driver.

Thanks, I missed the default setting...

>> I also just tested Bart's unaligned test [1] on scsi_debug and it seems
>> to go through as well (alignment and length are 4)...
>
> SCSI reduces the alignment in __scsi_init_queue:
>
> 	/*
>          * set a reasonable default alignment on word boundaries: the
> 	 * host and device may alter it using
> 	 * blk_queue_update_dma_alignment() later.
> 	 */
> 	blk_queue_dma_alignment(q, 0x03);

That explains why I was able to test that with iSER :)

I guess I didn't test single byte alignment after all, but
I definitely remember using less than a sector...

> still would fit NVMe dword-alignment SGLs, though :)

I would say its useful to have, but its only relevant to ioctls as
normal io we explicitly on sub logical block size alignment.

Anyways, I still think having ctrl->virt_boundary instead of
ctrl->sg_gap_support is cleaner...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Max Gurtovoy April 23, 2017, 1:23 p.m. UTC | #6
On 4/23/2017 12:45 PM, Sagi Grimberg wrote:
>
>> Each block queue has a ->dma_alignment field, which is set to 511
>> by default unless changed.  Everything not aligned to that will be
>> bounced in __blk_rq_map_user_iov and friends before we send it to
>> the driver.
>
> Thanks, I missed the default setting...
>
>>> I also just tested Bart's unaligned test [1] on scsi_debug and it seems
>>> to go through as well (alignment and length are 4)...
>>
>> SCSI reduces the alignment in __scsi_init_queue:
>>
>>     /*
>>          * set a reasonable default alignment on word boundaries: the
>>      * host and device may alter it using
>>      * blk_queue_update_dma_alignment() later.
>>      */
>>     blk_queue_dma_alignment(q, 0x03);
>
> That explains why I was able to test that with iSER :)
>
> I guess I didn't test single byte alignment after all, but
> I definitely remember using less than a sector...
>
>> still would fit NVMe dword-alignment SGLs, though :)
>
> I would say its useful to have, but its only relevant to ioctls as
> normal io we explicitly on sub logical block size alignment.
>
> Anyways, I still think having ctrl->virt_boundary instead of
> ctrl->sg_gap_support is cleaner...

Sure, I tought about it and actually started my implementation like 
that. It wasn't so clean, though.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig April 24, 2017, 7:02 a.m. UTC | #7
On Sun, Apr 23, 2017 at 12:45:39PM +0300, Sagi Grimberg wrote:
>> still would fit NVMe dword-alignment SGLs, though :)
>
> I would say its useful to have, but its only relevant to ioctls as
> normal io we explicitly on sub logical block size alignment.
>
> Anyways, I still think having ctrl->virt_boundary instead of
> ctrl->sg_gap_support is cleaner...

Ok, fine with me.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9583a5f..72bc70e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1252,7 +1252,10 @@  static void nvme_set_queue_limits(struct nvme_ctrl *ctrl,
 	}
 	if (ctrl->quirks & NVME_QUIRK_STRIPE_SIZE)
 		blk_queue_chunk_sectors(q, ctrl->max_hw_sectors);
-	blk_queue_virt_boundary(q, ctrl->page_size - 1);
+
+	if (!ctrl->sg_gaps_support)
+		blk_queue_virt_boundary(q, ctrl->page_size - 1);
+
 	if (ctrl->vwc & NVME_CTRL_VWC_PRESENT)
 		vwc = true;
 	blk_queue_write_cache(q, vwc, vwc);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 2aa20e3..ccb895a 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -162,6 +162,7 @@  struct nvme_ctrl {
 	struct work_struct scan_work;
 	struct work_struct async_event_work;
 	struct delayed_work ka_work;
+	bool sg_gaps_support;
 
 	/* Power saving configuration */
 	u64 ps_max_latency_us;