diff mbox

[rfc,01/30] nvme: Add admin connect request queue

Message ID 1497799324-19598-2-git-send-email-sagi@grimberg.me (mailing list archive)
State New, archived
Headers show

Commit Message

Sagi Grimberg June 18, 2017, 3:21 p.m. UTC
In case we reconnect with inflight admin IO we
need to make sure that the connect comes before
the admin command. This can be only achieved by
using a seperate request queue for admin connects.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/fabrics.c |  2 +-
 drivers/nvme/host/fc.c      | 11 ++++++++++-
 drivers/nvme/host/nvme.h    |  1 +
 drivers/nvme/host/rdma.c    | 19 ++++++++++++++-----
 drivers/nvme/target/loop.c  | 17 +++++++++++++----
 5 files changed, 39 insertions(+), 11 deletions(-)

Comments

Christoph Hellwig June 19, 2017, 7:13 a.m. UTC | #1
On Sun, Jun 18, 2017 at 06:21:35PM +0300, Sagi Grimberg wrote:
> In case we reconnect with inflight admin IO we
> need to make sure that the connect comes before
> the admin command. This can be only achieved by
> using a seperate request queue for admin connects.

Use up a few more lines of the available space for your lines? :)

Wouldn't a head insertation also solve the problem?
Sagi Grimberg June 19, 2017, 7:49 a.m. UTC | #2
>> In case we reconnect with inflight admin IO we
>> need to make sure that the connect comes before
>> the admin command. This can be only achieved by
>> using a seperate request queue for admin connects.
> 
> Use up a few more lines of the available space for your lines? :)

I warned in the cover-letter that the change logs are pure
negligence at the moment :)

> Wouldn't a head insertation also solve the problem?

the head insertion will not protect against it because
we must invoke blk_mq_start_stopped_hw_queues on the admin_q
request queue so the admin connect can make progress, at this
point (and before we actually queue up connect) pending admin
commands can sneak in...

However, you raise a valid point, I think I added this before we
had the queue_is_ready protection, which will reject the command
if the queue is not LIVE (unless its a connect). I think the reason
its still in is that I tested this with loop which doesn't have
a per-queue state machine.

I'm still wandering if its a good idea to rely on the transport
queue state to reject non-connect requests on non-LIVE queues.
if/when we introduce a queue representation to the core and we
drive the state machine there, then we could actually rely on it
(I do have some code for it, but its a pretty massive change which
cannot be added in an incremental fashion).

Thoughts?
Christoph Hellwig June 19, 2017, 12:30 p.m. UTC | #3
On Mon, Jun 19, 2017 at 10:49:15AM +0300, Sagi Grimberg wrote:
> However, you raise a valid point, I think I added this before we
> had the queue_is_ready protection, which will reject the command
> if the queue is not LIVE (unless its a connect). I think the reason
> its still in is that I tested this with loop which doesn't have
> a per-queue state machine.

Yeah.

> I'm still wandering if its a good idea to rely on the transport
> queue state to reject non-connect requests on non-LIVE queues.
> if/when we introduce a queue representation to the core and we
> drive the state machine there, then we could actually rely on it
> (I do have some code for it, but its a pretty massive change which
> cannot be added in an incremental fashion).

I suspect moving the state machine to the core is a good idea.  Note that
the current nvme_rdma_queue_is_ready hack actually seems a bit to simple -
even after the connect we should only allow get/set Property.  Nevermind
the additional complications if/when authentication is implemented.
Hannes Reinecke June 19, 2017, 3:56 p.m. UTC | #4
On 06/19/2017 09:49 AM, Sagi Grimberg wrote:
> 
>>> In case we reconnect with inflight admin IO we
>>> need to make sure that the connect comes before
>>> the admin command. This can be only achieved by
>>> using a seperate request queue for admin connects.
>>
>> Use up a few more lines of the available space for your lines? :)
> 
> I warned in the cover-letter that the change logs are pure
> negligence at the moment :)
> 
>> Wouldn't a head insertation also solve the problem?
> 
> the head insertion will not protect against it because
> we must invoke blk_mq_start_stopped_hw_queues on the admin_q
> request queue so the admin connect can make progress, at this
> point (and before we actually queue up connect) pending admin
> commands can sneak in...
> 
> However, you raise a valid point, I think I added this before we
> had the queue_is_ready protection, which will reject the command
> if the queue is not LIVE (unless its a connect). I think the reason
> its still in is that I tested this with loop which doesn't have
> a per-queue state machine.
> 
> I'm still wandering if its a good idea to rely on the transport
> queue state to reject non-connect requests on non-LIVE queues.
> if/when we introduce a queue representation to the core and we
> drive the state machine there, then we could actually rely on it
> (I do have some code for it, but its a pretty massive change which
> cannot be added in an incremental fashion).
> 
> Thoughts?

I very much prefer this solution, ie rejecting all commands if the queue
state is not 'LIVE', and make the 'connect' command the only one being
accepted in that state.
(Actually, we would need a state 'READY_TO_CONNECT' or somesuch to make
this work properly.)

Cheers,

Hannes
diff mbox

Patch

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 64db2c46c5ea..bd99bbb1faa3 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -412,7 +412,7 @@  int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
 	strncpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE);
 	strncpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE);
 
-	ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, &res,
+	ret = __nvme_submit_sync_cmd(ctrl->admin_connect_q, &cmd, &res,
 			data, sizeof(*data), 0, NVME_QID_ANY, 1,
 			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
 	if (ret) {
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 5d5ecefd8dbe..25ee49037edb 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1703,6 +1703,7 @@  nvme_fc_ctrl_free(struct kref *ref)
 	list_del(&ctrl->ctrl_list);
 	spin_unlock_irqrestore(&ctrl->rport->lock, flags);
 
+	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 	blk_mq_free_tag_set(&ctrl->admin_tag_set);
 
@@ -2745,6 +2746,12 @@  nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 		goto out_free_admin_tag_set;
 	}
 
+	ctrl->ctrl.admin_connect_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+	if (IS_ERR(ctrl->ctrl.admin_connect_q)) {
+		ret = PTR_ERR(ctrl->ctrl.admin_connect_q);
+		goto out_cleanup_admin_q;
+	}
+
 	/*
 	 * Would have been nice to init io queues tag set as well.
 	 * However, we require interaction from the controller
@@ -2754,7 +2761,7 @@  nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 
 	ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_fc_ctrl_ops, 0);
 	if (ret)
-		goto out_cleanup_admin_q;
+		goto out_cleanup_admin_connect_q;
 
 	/* at this point, teardown path changes to ref counting on nvme ctrl */
 
@@ -2791,6 +2798,8 @@  nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 
 	return &ctrl->ctrl;
 
+out_cleanup_admin_connect_q:
+	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 out_cleanup_admin_q:
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 out_free_admin_tag_set:
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index f27c58b860f4..67147b49d992 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -121,6 +121,7 @@  struct nvme_ctrl {
 	const struct nvme_ctrl_ops *ops;
 	struct request_queue *admin_q;
 	struct request_queue *connect_q;
+	struct request_queue *admin_connect_q;
 	struct device *dev;
 	struct kref kref;
 	int instance;
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 7533138d2244..cb7f81d9098f 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -661,6 +661,7 @@  static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl)
 	nvme_rdma_free_qe(ctrl->queues[0].device->dev, &ctrl->async_event_sqe,
 			sizeof(struct nvme_command), DMA_TO_DEVICE);
 	nvme_rdma_stop_and_free_queue(&ctrl->queues[0]);
+	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 	blk_mq_free_tag_set(&ctrl->admin_tag_set);
 	nvme_rdma_dev_put(ctrl->device);
@@ -1583,9 +1584,15 @@  static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
 		goto out_free_tagset;
 	}
 
+	ctrl->ctrl.admin_connect_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+	if (IS_ERR(ctrl->ctrl.admin_connect_q)) {
+		error = PTR_ERR(ctrl->ctrl.admin_connect_q);
+		goto out_cleanup_queue;
+	}
+
 	error = nvmf_connect_admin_queue(&ctrl->ctrl);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags);
 
@@ -1593,7 +1600,7 @@  static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
 	if (error) {
 		dev_err(ctrl->ctrl.device,
 			"prop_get NVME_REG_CAP failed\n");
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 	}
 
 	ctrl->ctrl.sqsize =
@@ -1601,25 +1608,27 @@  static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
 
 	error = nvme_enable_ctrl(&ctrl->ctrl, ctrl->cap);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	ctrl->ctrl.max_hw_sectors =
 		(ctrl->max_fr_pages - 1) << (PAGE_SHIFT - 9);
 
 	error = nvme_init_identify(&ctrl->ctrl);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	error = nvme_rdma_alloc_qe(ctrl->queues[0].device->dev,
 			&ctrl->async_event_sqe, sizeof(struct nvme_command),
 			DMA_TO_DEVICE);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	nvme_start_keep_alive(&ctrl->ctrl);
 
 	return 0;
 
+out_cleanup_connect_queue:
+	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 out_cleanup_queue:
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 out_free_tagset:
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 86c09e2a1490..edd9ee04de02 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -278,6 +278,7 @@  static const struct blk_mq_ops nvme_loop_admin_mq_ops = {
 static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl)
 {
 	nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
+	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 	blk_mq_free_tag_set(&ctrl->admin_tag_set);
 }
@@ -384,15 +385,21 @@  static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 		goto out_free_tagset;
 	}
 
+	ctrl->ctrl.admin_connect_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+	if (IS_ERR(ctrl->ctrl.admin_connect_q)) {
+		error = PTR_ERR(ctrl->ctrl.admin_connect_q);
+		goto out_cleanup_queue;
+	}
+
 	error = nvmf_connect_admin_queue(&ctrl->ctrl);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	error = nvmf_reg_read64(&ctrl->ctrl, NVME_REG_CAP, &ctrl->cap);
 	if (error) {
 		dev_err(ctrl->ctrl.device,
 			"prop_get NVME_REG_CAP failed\n");
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 	}
 
 	ctrl->ctrl.sqsize =
@@ -400,19 +407,21 @@  static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 
 	error = nvme_enable_ctrl(&ctrl->ctrl, ctrl->cap);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	ctrl->ctrl.max_hw_sectors =
 		(NVME_LOOP_MAX_SEGMENTS - 1) << (PAGE_SHIFT - 9);
 
 	error = nvme_init_identify(&ctrl->ctrl);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	nvme_start_keep_alive(&ctrl->ctrl);
 
 	return 0;
 
+out_cleanup_connect_queue:
+	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 out_cleanup_queue:
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 out_free_tagset: