[RFC,5/6] nvme: Add unlock_from_suspend

Message ID	1477951099-3127-6-git-send-email-scott.bauer@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Scott Bauer <scott.bauer@intel.com> To: linux-nvme@lists.infradead.org Cc: Rafael.Antognolli@intel.com, axboe@fb.com, keith.busch@intel.com, jonathan.derrick@intel.com, j.naumann@fu-berlin.de, hch@infradead.org, linux-block@vger.kernel.org, Scott Bauer <scott.bauer@intel.com> Subject: [RFC PATCH 5/6] nvme: Add unlock_from_suspend Date: Mon, 31 Oct 2016 15:58:18 -0600 Message-Id: <1477951099-3127-6-git-send-email-scott.bauer@intel.com> In-Reply-To: <1477951099-3127-1-git-send-email-scott.bauer@intel.com> References: <1477951099-3127-1-git-send-email-scott.bauer@intel.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk

Scott Bauer Oct. 31, 2016, 9:58 p.m. UTC

This patch adds a new function unlock_from_suspend which is used
to call into the Opal code to attempt to unlock Locking Ranges,
after a suspend-to-RAM.

The patch also modifies nvme_req_needs_retry to *not* retry
a request that failed due to a NVME_SC_ACCESS_DENIED, which
gets returned if a request is attempting to much with a locked
range. The range won't magically unlock itself without user
interaction so we shouldn't retry the request -- it will fail
again.

Signed-off-by: Scott Bauer <scott.bauer@intel.com>
Signed-off-by: Rafael Antognolli <Rafael.Antognolli@intel.com>
---
 drivers/nvme/host/core.c | 134 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |   4 +-
 drivers/nvme/host/pci.c  |  19 ++++---
 3 files changed, 149 insertions(+), 8 deletions(-)

Sagi Grimberg Nov. 1, 2016, 8:18 a.m. UTC | #1

> +struct sed_cb_data {
> +	sec_cb	*cb;
> +	void	*cb_data;
> +	struct nvme_command cmd;
> +};
> +
> +static void sec_submit_endio(struct request *req, int error)
> +{
> +	struct sed_cb_data *sed_data = req->end_io_data;
> +
> +	if (sed_data->cb)
> +		sed_data->cb(error, sed_data->cb_data);
> +
> +	kfree(sed_data);
> +	blk_mq_free_request(req);
> +}
> +
> +static int nvme_insert_rq(struct request_queue *q, struct request *rq,
> +			  int at_head, rq_end_io_fn *done)
> +{
> +	WARN_ON(rq->cmd_type == REQ_TYPE_FS);
> +
> +	rq->end_io = done;
> +
> +	if (!q->mq_ops)
> +		return -EINVAL;
> +
> +	blk_mq_insert_request(rq, at_head, true, true);
> +
> +	return 0;
> +}

No need for this function... you control the call site...

> +
> +static int nvme_sec_submit(void *data, u8 opcode, u16 SPSP,
> +			   u8 SECP, void *buffer, size_t len,
> +			   sec_cb *cb, void *cb_data)
> +{
> +	struct request_queue *q;
> +	struct request *req;
> +	struct sed_cb_data *sed_data;
> +	struct nvme_ns *ns;
> +	struct nvme_command *cmd;
> +	int ret;
> +
> +	ns = data;//bdev->bd_disk->private_data;

??

you don't even have data anywhere in here...

> +
> +	sed_data = kzalloc(sizeof(*sed_data), GFP_NOWAIT);
> +	if (!sed_data)
> +		return -ENOMEM;
> +	sed_data->cb = cb;
> +	sed_data->cb_data = cb_data;
> +	cmd = &sed_data->cmd;
> +
> +	cmd->common.opcode = opcode;
> +	cmd->common.nsid = ns->ns_id;
> +	cmd->common.cdw10[0] = SECP << 24 | SPSP << 8;
> +	cmd->common.cdw10[1] = len;
> +
> +	q = ns->ctrl->admin_q;
> +
> +	req = nvme_alloc_request(q, cmd, 0, NVME_QID_ANY);
> +	if (IS_ERR(req)) {
> +		ret = PTR_ERR(req);
> +		goto err_free;
> +	}
> +
> +	req->timeout = ADMIN_TIMEOUT;
> +	req->special = NULL;
> +
> +	if (buffer && len) {
> +		ret = blk_rq_map_kern(q, req, buffer, len, GFP_NOWAIT);
> +		if (ret) {
> +			blk_mq_free_request(req);
> +			goto err_free;
> +		}
> +	}
> +
> +	req->end_io_data = sed_data;
> +	//req->rq_disk = bdev->bd_disk;

??

> +
> +	return nvme_insert_rq(q, req, 1, sec_submit_endio);

No need to introduce nvme_insert_rq at all, just call
blk_mq_insert_request (other examples call blk_execute_rq_nowait
but its pretty much the same...)

> @@ -582,6 +583,7 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	struct nvme_command cmnd;
>  	unsigned map_len;
>  	int ret = BLK_MQ_RQ_QUEUE_OK;
> +	unsigned long flags;
>
>  	/*
>  	 * If formated with metadata, require the block layer provide a buffer
> @@ -614,18 +616,18 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	cmnd.common.command_id = req->tag;
>  	blk_mq_start_request(req);
>
> -	spin_lock_irq(&nvmeq->q_lock);
> +	spin_lock_irqsave(&nvmeq->q_lock, flags);
>  	if (unlikely(nvmeq->cq_vector < 0)) {
>  		if (ns && !test_bit(NVME_NS_DEAD, &ns->flags))
>  			ret = BLK_MQ_RQ_QUEUE_BUSY;
>  		else
>  			ret = BLK_MQ_RQ_QUEUE_ERROR;
> -		spin_unlock_irq(&nvmeq->q_lock);
> +		spin_unlock_irqrestore(&nvmeq->q_lock, flags);
>  		goto out;
>  	}
>  	__nvme_submit_cmd(nvmeq, &cmnd);
>  	nvme_process_cq(nvmeq);
> -	spin_unlock_irq(&nvmeq->q_lock);
> +	spin_unlock_irqrestore(&nvmeq->q_lock, flags);

No documentation why this is needed...

>  	return BLK_MQ_RQ_QUEUE_OK;
>  out:
>  	nvme_free_iod(dev, req);
> @@ -635,11 +637,11 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>  static void nvme_complete_rq(struct request *req)
>  {
>  	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> -	struct nvme_dev *dev = iod->nvmeq->dev;
> +	struct nvme_queue *nvmeq = iod->nvmeq;
> +	struct nvme_dev *dev = nvmeq->dev;

This is a cleanup that should go in a different patch...

>  	int error = 0;
>
>  	nvme_unmap_data(dev, req);
> -

Same here...

>  	if (unlikely(req->errors)) {
>  		if (nvme_req_needs_retry(req, req->errors)) {
>  			req->retries++;
> @@ -658,7 +660,6 @@ static void nvme_complete_rq(struct request *req)
>  			"completing aborted command with status: %04x\n",
>  			req->errors);
>  	}
> -

Here...

>  	blk_mq_end_request(req, error);
>  }
>
> @@ -1758,10 +1759,11 @@ static void nvme_reset_work(struct work_struct *work)
>  {
>  	struct nvme_dev *dev = container_of(work, struct nvme_dev, reset_work);
>  	int result = -ENODEV;
> -
> +	bool was_suspend = false;
>  	if (WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING))
>  		goto out;
>
> +	was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
>  	/*
>  	 * If we're called to reset a live controller first shut it down before
>  	 * moving on.
> @@ -1789,6 +1791,9 @@ static void nvme_reset_work(struct work_struct *work)
>  	if (result)
>  		goto out;
>
> +	if (was_suspend)
> +		nvme_unlock_from_suspend(&dev->ctrl);
> +
>  	result = nvme_setup_io_queues(dev);
>  	if (result)
>  		goto out;
>
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Hellwig Nov. 1, 2016, 1:57 p.m. UTC | #2

On Tue, Nov 01, 2016 at 10:18:13AM +0200, Sagi Grimberg wrote:
> > +
> > +	return nvme_insert_rq(q, req, 1, sec_submit_endio);
> 
> No need to introduce nvme_insert_rq at all, just call
> blk_mq_insert_request (other examples call blk_execute_rq_nowait
> but its pretty much the same...)

blk_execute_rq_nowait is the API to use - blk_mq_insert_request isn't
even exported.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Scott Bauer Nov. 1, 2016, 2:40 p.m. UTC | #3

On Tue, Nov 01, 2016 at 06:57:05AM -0700, Christoph Hellwig wrote:
> On Tue, Nov 01, 2016 at 10:18:13AM +0200, Sagi Grimberg wrote:
> > > +
> > > +	return nvme_insert_rq(q, req, 1, sec_submit_endio);
> > 
> > No need to introduce nvme_insert_rq at all, just call
> > blk_mq_insert_request (other examples call blk_execute_rq_nowait
> > but its pretty much the same...)
> 
> blk_execute_rq_nowait is the API to use - blk_mq_insert_request isn't
> even exported.

Thanks for the reviews. This patch needs to be separated into two 
patches. There is the addition of the nvme-suspend stuff and the
addition of sec_ops. Most of the clutter and weird stuff is coming
from the latter. I'll separate the patches, use the correct api and 
clean the clutter.

Thanks




--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Scott Bauer Nov. 7, 2016, 6:33 p.m. UTC | #4

On Mon, Nov 07, 2016 at 01:45:42PM -0500, Keith Busch wrote:
> On Tue, Nov 01, 2016 at 10:18:13AM +0200, Sagi Grimberg wrote:
> > > -	spin_lock_irq(&nvmeq->q_lock);
> > > +	spin_lock_irqsave(&nvmeq->q_lock, flags);
> > >  	if (unlikely(nvmeq->cq_vector < 0)) {
> > >  		if (ns && !test_bit(NVME_NS_DEAD, &ns->flags))
> > >  			ret = BLK_MQ_RQ_QUEUE_BUSY;
> > >  		else
> > >  			ret = BLK_MQ_RQ_QUEUE_ERROR;
> > > -		spin_unlock_irq(&nvmeq->q_lock);
> > > +		spin_unlock_irqrestore(&nvmeq->q_lock, flags);
> > >  		goto out;
> > >  	}
> > >  	__nvme_submit_cmd(nvmeq, &cmnd);
> > >  	nvme_process_cq(nvmeq);
> > > -	spin_unlock_irq(&nvmeq->q_lock);
> > > +	spin_unlock_irqrestore(&nvmeq->q_lock, flags);
> > 
> > No documentation why this is needed...
> 
> Let's forget documenting why it's needed; this solution should instead
> figure out a way to make it so it's not needed.

This is some code that was used for previous iterations while we were testing.
We missed this and some other things in this patch set when we were cleaning up.
It's gone from V2 as well as the other weirdness in the patch.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Keith Busch Nov. 7, 2016, 6:45 p.m. UTC | #5

On Tue, Nov 01, 2016 at 10:18:13AM +0200, Sagi Grimberg wrote:
> > -	spin_lock_irq(&nvmeq->q_lock);
> > +	spin_lock_irqsave(&nvmeq->q_lock, flags);
> >  	if (unlikely(nvmeq->cq_vector < 0)) {
> >  		if (ns && !test_bit(NVME_NS_DEAD, &ns->flags))
> >  			ret = BLK_MQ_RQ_QUEUE_BUSY;
> >  		else
> >  			ret = BLK_MQ_RQ_QUEUE_ERROR;
> > -		spin_unlock_irq(&nvmeq->q_lock);
> > +		spin_unlock_irqrestore(&nvmeq->q_lock, flags);
> >  		goto out;
> >  	}
> >  	__nvme_submit_cmd(nvmeq, &cmnd);
> >  	nvme_process_cq(nvmeq);
> > -	spin_unlock_irq(&nvmeq->q_lock);
> > +	spin_unlock_irqrestore(&nvmeq->q_lock, flags);
> 
> No documentation why this is needed...

Let's forget documenting why it's needed; this solution should instead
figure out a way to make it so it's not needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Scott Bauer Nov. 10, 2016, 11:01 p.m. UTC | #6

On Tue, Nov 01, 2016 at 06:57:05AM -0700, Christoph Hellwig wrote:
> On Tue, Nov 01, 2016 at 10:18:13AM +0200, Sagi Grimberg wrote:
> > > +
> > > +	return nvme_insert_rq(q, req, 1, sec_submit_endio);
> > 
> > No need to introduce nvme_insert_rq at all, just call
> > blk_mq_insert_request (other examples call blk_execute_rq_nowait
> > but its pretty much the same...)
> 
> blk_execute_rq_nowait is the API to use - blk_mq_insert_request isn't
> even exported.

I remember now, after I changed it to use rq_nowait, why we added this wrapper
function and used blk_mq_insert_request.

When we dispatch opal commands down to the controller we're doing so in an IRQ,
so if we use rq_nowait, we lockup.

Will there be pushback if we continue with the original patch idea, where we 
export blk_mq_insert_request (forgot to send that) and use it? I looked through
the block API and I didn't see a execute_rq that was irq safe.

Any suggestions?

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Hellwig Nov. 10, 2016, 11:19 p.m. UTC | #7

On Thu, Nov 10, 2016 at 06:23:12PM -0500, Keith Busch wrote:
> On Thu, Nov 10, 2016 at 04:01:31PM -0700, Scott Bauer wrote:
> > On Tue, Nov 01, 2016 at 06:57:05AM -0700, Christoph Hellwig wrote:
> > > blk_execute_rq_nowait is the API to use - blk_mq_insert_request isn't
> > > even exported.
> > 
> > I remember now, after I changed it to use rq_nowait, why we added this wrapper
> > function and used blk_mq_insert_request.
> > 
> > When we dispatch opal commands down to the controller we're doing so in an IRQ,
> > so if we use rq_nowait, we lockup.
> 
> In this case can we push the submission off to a work queue and use
> blk_mq_insert_rq_nowait?

That's what we have to do anyway to avoid taking irqsafe locks.
(and the function is blk_execute_rq_nowait)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Keith Busch Nov. 10, 2016, 11:23 p.m. UTC | #8

On Thu, Nov 10, 2016 at 04:01:31PM -0700, Scott Bauer wrote:
> On Tue, Nov 01, 2016 at 06:57:05AM -0700, Christoph Hellwig wrote:
> > blk_execute_rq_nowait is the API to use - blk_mq_insert_request isn't
> > even exported.
> 
> I remember now, after I changed it to use rq_nowait, why we added this wrapper
> function and used blk_mq_insert_request.
> 
> When we dispatch opal commands down to the controller we're doing so in an IRQ,
> so if we use rq_nowait, we lockup.

In this case can we push the submission off to a work queue and use
blk_mq_insert_rq_nowait?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC,5/6] nvme: Add unlock_from_suspend

Commit Message

Comments

Patch