[1/5] blk-mq: Export reading mq request state

Message ID	20190308174006.5032-1-keith.busch@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> From: Keith Busch <keith.busch@intel.com> To: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>, Keith Busch <keith.busch@intel.com> Subject: [PATCH 1/5] blk-mq: Export reading mq request state Date: Fri, 8 Mar 2019 10:40:02 -0700 Message-Id: <20190308174006.5032-1-keith.busch@intel.com> Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	[1/5] blk-mq: Export reading mq request state \| expand [1/5] blk-mq: Export reading mq request state [2/5] blk-mq: Export iterating queue requests [3/5] blk-mq: Iterate tagset over all requests [4/5] nvme: Fail dead namespace's entered requests [5/5] nvme/pci: Remove queue IO flushing hack

Keith Busch March 8, 2019, 5:40 p.m. UTC

Drivers may need to know the state of their requets.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 block/blk-mq.h         | 9 ---------
 include/linux/blkdev.h | 9 +++++++++
 2 files changed, 9 insertions(+), 9 deletions(-)

Bart Van Assche March 8, 2019, 6:07 p.m. UTC | #1

On Fri, 2019-03-08 at 10:40 -0700, Keith Busch wrote:
> Drivers may need to know the state of their requets.

Hi Keith,

What makes you think that drivers should be able to check the state of their
requests? Please elaborate.

> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index faed9d9eb84c..db113aee48bb 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -241,6 +241,15 @@ struct request {
>  	struct request *next_rq;
>  };
>  
> +/**
> + * blk_mq_rq_state() - read the current MQ_RQ_* state of a request
> + * @rq: target request.
> + */
> +static inline enum mq_rq_state blk_mq_rq_state(struct request *rq)
> +{
> +	return READ_ONCE(rq->state);
> +}

Please also explain how drivers can use this function without triggering a
race condition with the code that modifies rq->state.

Thanks,

Bart.

Keith Busch March 8, 2019, 6:15 p.m. UTC | #2

On Fri, Mar 08, 2019 at 10:07:23AM -0800, Bart Van Assche wrote:
> On Fri, 2019-03-08 at 10:40 -0700, Keith Busch wrote:
> > Drivers may need to know the state of their requets.
> 
> Hi Keith,
> 
> What makes you think that drivers should be able to check the state of their
> requests? Please elaborate.

Patches 4 and 5 in this series.
 
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index faed9d9eb84c..db113aee48bb 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -241,6 +241,15 @@ struct request {
> >  	struct request *next_rq;
> >  };
> >  
> > +/**
> > + * blk_mq_rq_state() - read the current MQ_RQ_* state of a request
> > + * @rq: target request.
> > + */
> > +static inline enum mq_rq_state blk_mq_rq_state(struct request *rq)
> > +{
> > +	return READ_ONCE(rq->state);
> > +}
> 
> Please also explain how drivers can use this function without triggering a
> race condition with the code that modifies rq->state.

Either queisced or within a timeout handler that already locks the
request lifetime.

Bart Van Assche March 8, 2019, 6:42 p.m. UTC | #3

On Fri, 2019-03-08 at 11:15 -0700, Keith Busch wrote:
> On Fri, Mar 08, 2019 at 10:07:23AM -0800, Bart Van Assche wrote:
> > On Fri, 2019-03-08 at 10:40 -0700, Keith Busch wrote:
> > > Drivers may need to know the state of their requets.
> > 
> > Hi Keith,
> > 
> > What makes you think that drivers should be able to check the state of their
> > requests? Please elaborate.
> 
> Patches 4 and 5 in this series.
>  
> > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > > index faed9d9eb84c..db113aee48bb 100644
> > > --- a/include/linux/blkdev.h
> > > +++ b/include/linux/blkdev.h
> > > @@ -241,6 +241,15 @@ struct request {
> > >  	struct request *next_rq;
> > >  };
> > >  
> > > +/**
> > > + * blk_mq_rq_state() - read the current MQ_RQ_* state of a request
> > > + * @rq: target request.
> > > + */
> > > +static inline enum mq_rq_state blk_mq_rq_state(struct request *rq)
> > > +{
> > > +	return READ_ONCE(rq->state);
> > > +}
> > 
> > Please also explain how drivers can use this function without triggering a
> > race condition with the code that modifies rq->state.
> 
> Either queisced or within a timeout handler that already locks the
> request lifetime.

Hi Keith,

For future patch series submissions please include a cover letter. The two patch
series that you posted today don't have a cover letter so I can only guess what
the purpose of these two patch series is. Is the purpose of this patch series
perhaps to speed up error handling? If so, why did you choose the approach of
iterating over outstanding requests and telling the block layer to terminate
these requests? I think that the NVMe spec provides a more elegant mechanism,
namely deleting the I/O submission queues. According to what I read in the
1.3c spec deleting an I/O submission queue forces an NVMe controller to post a 
completion for every outstanding request. See also section 5.6 in the NVMe
1.3c spec.

Bart.

Keith Busch March 8, 2019, 7:19 p.m. UTC | #4

On Fri, Mar 08, 2019 at 10:42:17AM -0800, Bart Van Assche wrote:
> On Fri, 2019-03-08 at 11:15 -0700, Keith Busch wrote:
> > On Fri, Mar 08, 2019 at 10:07:23AM -0800, Bart Van Assche wrote:
> > > On Fri, 2019-03-08 at 10:40 -0700, Keith Busch wrote:
> > > > Drivers may need to know the state of their requets.
> > > 
> > > Hi Keith,
> > > 
> > > What makes you think that drivers should be able to check the state of their
> > > requests? Please elaborate.
> > 
> > Patches 4 and 5 in this series.
> >  
> > > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > > > index faed9d9eb84c..db113aee48bb 100644
> > > > --- a/include/linux/blkdev.h
> > > > +++ b/include/linux/blkdev.h
> > > > @@ -241,6 +241,15 @@ struct request {
> > > >  	struct request *next_rq;
> > > >  };
> > > >  
> > > > +/**
> > > > + * blk_mq_rq_state() - read the current MQ_RQ_* state of a request
> > > > + * @rq: target request.
> > > > + */
> > > > +static inline enum mq_rq_state blk_mq_rq_state(struct request *rq)
> > > > +{
> > > > +	return READ_ONCE(rq->state);
> > > > +}
> > > 
> > > Please also explain how drivers can use this function without triggering a
> > > race condition with the code that modifies rq->state.
> > 
> > Either queisced or within a timeout handler that already locks the
> > request lifetime.
> 
> Hi Keith,
> 
> For future patch series submissions please include a cover letter. The two patch
> series that you posted today don't have a cover letter so I can only guess what
> the purpose of these two patch series is. Is the purpose of this patch series
> perhaps to speed up error handling? If so, why did you choose the approach of
> iterating over outstanding requests and telling the block layer to terminate
> these requests? 

Okay, good point. Will do.

> I think that the NVMe spec provides a more elegant mechanism,
> namely deleting the I/O submission queues. According to what I read in the
> 1.3c spec deleting an I/O submission queue forces an NVMe controller to post a 
> completion for every outstanding request. See also section 5.6 in the NVMe
> 1.3c spec.

That's actually not what it says. The controller may or may not post a
completion entry with a delete SQ command. The first behavior is defined
in the spec as "explicit" and the second as "implicit". For implicit,
we have to iterate inflight tags.

Sagi Grimberg March 8, 2019, 8:21 p.m. UTC | #5

For some reason I didn't get patches 2/5 and 3/5...

Keith Busch March 8, 2019, 8:29 p.m. UTC | #6

On Fri, Mar 08, 2019 at 12:21:16PM -0800, Sagi Grimberg wrote:
> For some reason I didn't get patches 2/5 and 3/5...

Unreliable 'git send-email'?! :)

They're copied to patchwork too:

  https://patchwork.kernel.org/patch/10845225/
  https://patchwork.kernel.org/patch/10845229/

Bart Van Assche March 8, 2019, 8:47 p.m. UTC | #7

On Fri, 2019-03-08 at 12:19 -0700, Keith Busch wrote:
> On Fri, Mar 08, 2019 at 10:42:17AM -0800, Bart Van Assche wrote:
> > I think that the NVMe spec provides a more elegant mechanism,
> > namely deleting the I/O submission queues. According to what I read in the
> > 1.3c spec deleting an I/O submission queue forces an NVMe controller to post a 
> > completion for every outstanding request. See also section 5.6 in the NVMe
> > 1.3c spec.
> 
> That's actually not what it says. The controller may or may not post a
> completion entry with a delete SQ command. The first behavior is defined
> in the spec as "explicit" and the second as "implicit". For implicit,
> we have to iterate inflight tags.

Hi Keith,

Thanks for the clarification. Are you aware of any mechanism in the NVMe spec
that causes all outstanding requests to fail? With RDMA this is easy - all
one has to do is to change the queue pair state into IB_QPS_ERR. See also
ib_drain_qp() in the RDMA core.

If no such mechanism has been defined in the NVMe spec: have you considered
to cancel all outstanding requests instead of calling blk_mq_end_request() for
all outstanding requests?

Thanks,

Bart.

Keith Busch March 8, 2019, 9:14 p.m. UTC | #8

On Fri, Mar 08, 2019 at 12:47:10PM -0800, Bart Van Assche wrote:
> Thanks for the clarification. Are you aware of any mechanism in the NVMe spec
> that causes all outstanding requests to fail? With RDMA this is easy - all
> one has to do is to change the queue pair state into IB_QPS_ERR. See also
> ib_drain_qp() in the RDMA core.

Well, we can't always rely on hardware to provide completions. The
driver must be able to make forward progress if the device decides to
stop responding, or maybe it was disconnected, asserts, or experiences
an unsafe shutdown/powerloss.

> If no such mechanism has been defined in the NVMe spec: have you considered
> to cancel all outstanding requests instead of calling blk_mq_end_request() for
> all outstanding requests?

Isn't this cancelling requests? Is there an existing block interface
that accomplishes this?

Bart Van Assche March 8, 2019, 9:25 p.m. UTC | #9

On Fri, 2019-03-08 at 14:14 -0700, Keith Busch wrote:
> On Fri, Mar 08, 2019 at 12:47:10PM -0800, Bart Van Assche wrote:
> > If no such mechanism has been defined in the NVMe spec: have you considered
> > to cancel all outstanding requests instead of calling blk_mq_end_request() for
> > all outstanding requests?
> 
> Isn't this cancelling requests? Is there an existing block interface
> that accomplishes this?

Hi Keith,

Sorry if I was unclear. With "canceling outstanding requests" I was referring to
the NVMe abort command. I think aborting outstanding requests has the advantage
that no new explicit call to blk_mq_end_request() call has to be added.

Bart.

Keith Busch March 8, 2019, 9:31 p.m. UTC | #10

On Fri, Mar 08, 2019 at 01:25:16PM -0800, Bart Van Assche wrote:
> On Fri, 2019-03-08 at 14:14 -0700, Keith Busch wrote:
> > On Fri, Mar 08, 2019 at 12:47:10PM -0800, Bart Van Assche wrote:
> > > If no such mechanism has been defined in the NVMe spec: have you considered
> > > to cancel all outstanding requests instead of calling blk_mq_end_request() for
> > > all outstanding requests?
> > 
> > Isn't this cancelling requests? Is there an existing block interface
> > that accomplishes this?
> 
> Hi Keith,
> 
> Sorry if I was unclear. With "canceling outstanding requests" I was referring to
> the NVMe abort command. I think aborting outstanding requests has the advantage
> that no new explicit call to blk_mq_end_request() call has to be added.

Ah, gotchya. NVMe abort usage is different than what this series wants to
do: we've determined the device is no longer usable, we need to terminate
all requests that are queued in software, but haven't been dispatched
to hardware.

[1/5] blk-mq: Export reading mq request state

Commit Message

Comments

Patch