diff mbox

[1/7] block: use legacy path for flush requests for MQ with a scheduler

Message ID 1480734921-23701-2-git-send-email-axboe@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jens Axboe Dec. 3, 2016, 3:15 a.m. UTC
No functional changes with this patch, it's just in preparation for
supporting legacy schedulers on blk-mq.

Signed-off-by: Jens Axboe <axboe@fb.com>
---
 block/blk-core.c  |  2 +-
 block/blk-exec.c  |  2 +-
 block/blk-flush.c | 26 ++++++++++++++------------
 block/blk.h       | 12 +++++++++++-
 4 files changed, 27 insertions(+), 15 deletions(-)

Comments

Christoph Hellwig Dec. 5, 2016, 1:05 p.m. UTC | #1
On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote:
> No functional changes with this patch, it's just in preparation for
> supporting legacy schedulers on blk-mq.

Ewww.  I think without refactoring to clear what 'use_mq_path'
means here and better naming this is a total non-started.  Even with
that we'll now have yet another code path to worry about.  Is there
any chance to instead consolidate into a single path?

>  struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
>  {
> -	if (q->mq_ops)
> +	if (blk_use_mq_path(q))
>  		return blk_mq_alloc_request(q, rw,
>  			(gfp_mask & __GFP_DIRECT_RECLAIM) ?
>  				0 : BLK_MQ_REQ_NOWAIT);

So now with blk-mq and an elevator set we go into blk_old_get_request,
hich will simply allocate new requests.  How does this not break
every existing driver?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Dec. 5, 2016, 3:07 p.m. UTC | #2
On 12/05/2016 06:05 AM, Christoph Hellwig wrote:
> On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote:
>> No functional changes with this patch, it's just in preparation for
>> supporting legacy schedulers on blk-mq.
> 
> Ewww.  I think without refactoring to clear what 'use_mq_path'
> means here and better naming this is a total non-started.  Even with
> that we'll now have yet another code path to worry about.  Is there
> any chance to instead consolidate into a single path?

It's not pretty at all. I should have prefaced this patchset with saying
that it's an experiment in seeing what it would take to simply use the
old IO schedulers, as a temporary measure, on blk/scsi-mq. I did clean
it up a bit after posting:

http://git.kernel.dk/cgit/linux-block/log/?h=blk-mq-legacy-sched

but I'm not going to claim this is anywhere near merge read, nor clean.

>>  struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
>>  {
>> -	if (q->mq_ops)
>> +	if (blk_use_mq_path(q))
>>  		return blk_mq_alloc_request(q, rw,
>>  			(gfp_mask & __GFP_DIRECT_RECLAIM) ?
>>  				0 : BLK_MQ_REQ_NOWAIT);
> 
> So now with blk-mq and an elevator set we go into blk_old_get_request,
> hich will simply allocate new requests.  How does this not break
> every existing driver?

Since Johannes found that confusion, maybe I should explain how it all
works.

Basically, if we are blk-mq (q->mq_ops) and now have an IO scheduler
attached (q->elevator), then the path from bio to request is essentially
the old one. We allocate a request through get_request() and friends,
and insert it with the elevator interface. When the queue is later run,
our blk_mq_sched_dispatch() asks the IO scheduler for a request, and if
successful, we allocate a real MQ request and dispatch that. So all the
driver ever sees is MQ requests, and it uses the MQ interface. Only the
internal bits now look at blk_use_mq_path() to tell whether they should
alloc+insert with that.

The above will break if we have drivers manually allocating a request
through blk_get_request(), and then using MQ functions to insert/run it.
I didn't audit all of that, and I think most (all?) of them just use the
MQ interfaces. That would also currently be broken, we either/or the
logic to run software queues or run through the elevator.
Johannes Thumshirn Dec. 5, 2016, 3:49 p.m. UTC | #3
On Mon, Dec 05, 2016 at 08:07:10AM -0700, Jens Axboe wrote:
> On 12/05/2016 06:05 AM, Christoph Hellwig wrote:
> > On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote:
> >> No functional changes with this patch, it's just in preparation for
> >> supporting legacy schedulers on blk-mq.
> > 
> > Ewww.  I think without refactoring to clear what 'use_mq_path'
> > means here and better naming this is a total non-started.  Even with
> > that we'll now have yet another code path to worry about.  Is there
> > any chance to instead consolidate into a single path?
> 
> It's not pretty at all. I should have prefaced this patchset with saying
> that it's an experiment in seeing what it would take to simply use the
> old IO schedulers, as a temporary measure, on blk/scsi-mq. I did clean
> it up a bit after posting:
> 
> http://git.kernel.dk/cgit/linux-block/log/?h=blk-mq-legacy-sched
> 
> but I'm not going to claim this is anywhere near merge read, nor clean.
> 
> >>  struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
> >>  {
> >> -	if (q->mq_ops)
> >> +	if (blk_use_mq_path(q))
> >>  		return blk_mq_alloc_request(q, rw,
> >>  			(gfp_mask & __GFP_DIRECT_RECLAIM) ?
> >>  				0 : BLK_MQ_REQ_NOWAIT);
> > 
> > So now with blk-mq and an elevator set we go into blk_old_get_request,
> > hich will simply allocate new requests.  How does this not break
> > every existing driver?
> 
> Since Johannes found that confusion, maybe I should explain how it all
> works.

To clarify the naming, how about sth. like blk_mq_use_sched() (to align
with blk_mq_sched_dispatch())?
Jens Axboe Dec. 5, 2016, 3:49 p.m. UTC | #4
On 12/05/2016 08:49 AM, Johannes Thumshirn wrote:
> On Mon, Dec 05, 2016 at 08:07:10AM -0700, Jens Axboe wrote:
>> On 12/05/2016 06:05 AM, Christoph Hellwig wrote:
>>> On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote:
>>>> No functional changes with this patch, it's just in preparation for
>>>> supporting legacy schedulers on blk-mq.
>>>
>>> Ewww.  I think without refactoring to clear what 'use_mq_path'
>>> means here and better naming this is a total non-started.  Even with
>>> that we'll now have yet another code path to worry about.  Is there
>>> any chance to instead consolidate into a single path?
>>
>> It's not pretty at all. I should have prefaced this patchset with saying
>> that it's an experiment in seeing what it would take to simply use the
>> old IO schedulers, as a temporary measure, on blk/scsi-mq. I did clean
>> it up a bit after posting:
>>
>> http://git.kernel.dk/cgit/linux-block/log/?h=blk-mq-legacy-sched
>>
>> but I'm not going to claim this is anywhere near merge read, nor clean.
>>
>>>>  struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
>>>>  {
>>>> -	if (q->mq_ops)
>>>> +	if (blk_use_mq_path(q))
>>>>  		return blk_mq_alloc_request(q, rw,
>>>>  			(gfp_mask & __GFP_DIRECT_RECLAIM) ?
>>>>  				0 : BLK_MQ_REQ_NOWAIT);
>>>
>>> So now with blk-mq and an elevator set we go into blk_old_get_request,
>>> hich will simply allocate new requests.  How does this not break
>>> every existing driver?
>>
>> Since Johannes found that confusion, maybe I should explain how it all
>> works.
> 
> To clarify the naming, how about sth. like blk_mq_use_sched() (to align
> with blk_mq_sched_dispatch())?

Yeah, that is a much better name indeed. I'll make that change.
Ming Lei Dec. 5, 2016, 5 p.m. UTC | #5
On Sat, Dec 3, 2016 at 11:15 AM, Jens Axboe <axboe@fb.com> wrote:
> No functional changes with this patch, it's just in preparation for
> supporting legacy schedulers on blk-mq.
>
> Signed-off-by: Jens Axboe <axboe@fb.com>
> ---
>  block/blk-core.c  |  2 +-
>  block/blk-exec.c  |  2 +-
>  block/blk-flush.c | 26 ++++++++++++++------------
>  block/blk.h       | 12 +++++++++++-
>  4 files changed, 27 insertions(+), 15 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 3f2eb8d80189..0e23589ab3bf 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1310,7 +1310,7 @@ static struct request *blk_old_get_request(struct request_queue *q, int rw,
>
>  struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
>  {
> -       if (q->mq_ops)
> +       if (blk_use_mq_path(q))
>                 return blk_mq_alloc_request(q, rw,
>                         (gfp_mask & __GFP_DIRECT_RECLAIM) ?
>                                 0 : BLK_MQ_REQ_NOWAIT);

Another way might be to use mq allocator to allocate rq in case of mq_sched,
such as: just replace mempool_alloc in __get_request() with
blk_mq_alloc_request(), in this way, it should be possible to
avoid one extra rq allocation in blk_mq_sched_dispatch(), and keep mq's benefit
of rq preallocation, which can avoid to hold queue_lock during the
allocation too.

> diff --git a/block/blk-exec.c b/block/blk-exec.c
> index 3ecb00a6cf45..73b8a701ae6d 100644
> --- a/block/blk-exec.c
> +++ b/block/blk-exec.c
> @@ -64,7 +64,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
>          * don't check dying flag for MQ because the request won't
>          * be reused after dying flag is set
>          */
> -       if (q->mq_ops) {
> +       if (blk_use_mq_path(q)) {
>                 blk_mq_insert_request(rq, at_head, true, false);
>                 return;
>         }
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 1bdbb3d3e5f5..0b68a1258bdd 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -133,14 +133,16 @@ static void blk_flush_restore_request(struct request *rq)
>
>  static bool blk_flush_queue_rq(struct request *rq, bool add_front)
>  {
> -       if (rq->q->mq_ops) {
> +       struct request_queue *q = rq->q;
> +
> +       if (blk_use_mq_path(q)) {
>                 blk_mq_add_to_requeue_list(rq, add_front, true);
>                 return false;
>         } else {
>                 if (add_front)
> -                       list_add(&rq->queuelist, &rq->q->queue_head);
> +                       list_add(&rq->queuelist, &q->queue_head);
>                 else
> -                       list_add_tail(&rq->queuelist, &rq->q->queue_head);
> +                       list_add_tail(&rq->queuelist, &q->queue_head);
>                 return true;
>         }
>  }
> @@ -201,7 +203,7 @@ static bool blk_flush_complete_seq(struct request *rq,
>                 BUG_ON(!list_empty(&rq->queuelist));
>                 list_del_init(&rq->flush.list);
>                 blk_flush_restore_request(rq);
> -               if (q->mq_ops)
> +               if (blk_use_mq_path(q))
>                         blk_mq_end_request(rq, error);
>                 else
>                         __blk_end_request_all(rq, error);
> @@ -224,7 +226,7 @@ static void flush_end_io(struct request *flush_rq, int error)
>         unsigned long flags = 0;
>         struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
>
> -       if (q->mq_ops) {
> +       if (blk_use_mq_path(q)) {
>                 struct blk_mq_hw_ctx *hctx;
>
>                 /* release the tag's ownership to the req cloned from */
> @@ -240,7 +242,7 @@ static void flush_end_io(struct request *flush_rq, int error)
>         /* account completion of the flush request */
>         fq->flush_running_idx ^= 1;
>
> -       if (!q->mq_ops)
> +       if (!blk_use_mq_path(q))
>                 elv_completed_request(q, flush_rq);
>
>         /* and push the waiting requests to the next stage */
> @@ -267,7 +269,7 @@ static void flush_end_io(struct request *flush_rq, int error)
>                 blk_run_queue_async(q);
>         }
>         fq->flush_queue_delayed = 0;
> -       if (q->mq_ops)
> +       if (blk_use_mq_path(q))
>                 spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
>  }
>
> @@ -315,7 +317,7 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
>          * be in flight at the same time. And acquire the tag's
>          * ownership for flush req.
>          */
> -       if (q->mq_ops) {
> +       if (blk_use_mq_path(q)) {
>                 struct blk_mq_hw_ctx *hctx;
>
>                 flush_rq->mq_ctx = first_rq->mq_ctx;
> @@ -409,7 +411,7 @@ void blk_insert_flush(struct request *rq)
>          * complete the request.
>          */
>         if (!policy) {
> -               if (q->mq_ops)
> +               if (blk_use_mq_path(q))
>                         blk_mq_end_request(rq, 0);
>                 else
>                         __blk_end_bidi_request(rq, 0, 0, 0);
> @@ -425,9 +427,9 @@ void blk_insert_flush(struct request *rq)
>          */
>         if ((policy & REQ_FSEQ_DATA) &&
>             !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
> -               if (q->mq_ops) {
> +               if (blk_use_mq_path(q))
>                         blk_mq_insert_request(rq, false, false, true);
> -               } else
> +               else
>                         list_add_tail(&rq->queuelist, &q->queue_head);
>                 return;
>         }
> @@ -440,7 +442,7 @@ void blk_insert_flush(struct request *rq)
>         INIT_LIST_HEAD(&rq->flush.list);
>         rq->rq_flags |= RQF_FLUSH_SEQ;
>         rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
> -       if (q->mq_ops) {
> +       if (blk_use_mq_path(q)) {
>                 rq->end_io = mq_flush_data_end_io;
>
>                 spin_lock_irq(&fq->mq_flush_lock);
> diff --git a/block/blk.h b/block/blk.h
> index 041185e5f129..094fc10429c3 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -36,10 +36,20 @@ extern struct kmem_cache *request_cachep;
>  extern struct kobj_type blk_queue_ktype;
>  extern struct ida blk_queue_ida;
>
> +/*
> + * Use the MQ path if we have mq_ops, but not if we are using an IO
> + * scheduler. For the scheduler, we should use the legacy path. Only
> + * for internal use in the block layer.
> + */
> +static inline bool blk_use_mq_path(struct request_queue *q)
> +{
> +       return q->mq_ops && !q->elevator;
> +}
> +
>  static inline struct blk_flush_queue *blk_get_flush_queue(
>                 struct request_queue *q, struct blk_mq_ctx *ctx)
>  {
> -       if (q->mq_ops)
> +       if (blk_use_mq_path(q))
>                 return blk_mq_map_queue(q, ctx->cpu)->fq;
>         return q->fq;
>  }
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Dec. 5, 2016, 5:09 p.m. UTC | #6
On 12/05/2016 10:00 AM, Ming Lei wrote:
> On Sat, Dec 3, 2016 at 11:15 AM, Jens Axboe <axboe@fb.com> wrote:
>> No functional changes with this patch, it's just in preparation for
>> supporting legacy schedulers on blk-mq.
>>
>> Signed-off-by: Jens Axboe <axboe@fb.com>
>> ---
>>  block/blk-core.c  |  2 +-
>>  block/blk-exec.c  |  2 +-
>>  block/blk-flush.c | 26 ++++++++++++++------------
>>  block/blk.h       | 12 +++++++++++-
>>  4 files changed, 27 insertions(+), 15 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 3f2eb8d80189..0e23589ab3bf 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1310,7 +1310,7 @@ static struct request *blk_old_get_request(struct request_queue *q, int rw,
>>
>>  struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
>>  {
>> -       if (q->mq_ops)
>> +       if (blk_use_mq_path(q))
>>                 return blk_mq_alloc_request(q, rw,
>>                         (gfp_mask & __GFP_DIRECT_RECLAIM) ?
>>                                 0 : BLK_MQ_REQ_NOWAIT);
> 
> Another way might be to use mq allocator to allocate rq in case of mq_sched,
> such as: just replace mempool_alloc in __get_request() with
> blk_mq_alloc_request(), in this way, it should be possible to
> avoid one extra rq allocation in blk_mq_sched_dispatch(), and keep mq's benefit
> of rq preallocation, which can avoid to hold queue_lock during the
> allocation too.

One problem with the MQ rq allocation is that it's tied to the device
queue depth. This is a problem for scheduling, since we want to have a
larger pool of requests that the IO scheduler can use, so that we
actually have something that we can schedule with. This is a non-starter
on QD=1 devices, but it's also a problem for SATA with 31 effectively
usable tags.

That's why I split it in two, so we have the "old" requests that we hand
to the scheduler. I know the 'rq' field copy isn't super pretty, though.
Ming Lei Dec. 5, 2016, 7:22 p.m. UTC | #7
On Tue, Dec 6, 2016 at 1:09 AM, Jens Axboe <axboe@fb.com> wrote:
> On 12/05/2016 10:00 AM, Ming Lei wrote:
>> On Sat, Dec 3, 2016 at 11:15 AM, Jens Axboe <axboe@fb.com> wrote:
>>> No functional changes with this patch, it's just in preparation for
>>> supporting legacy schedulers on blk-mq.
>>>
>>> Signed-off-by: Jens Axboe <axboe@fb.com>
>>> ---
>>>  block/blk-core.c  |  2 +-
>>>  block/blk-exec.c  |  2 +-
>>>  block/blk-flush.c | 26 ++++++++++++++------------
>>>  block/blk.h       | 12 +++++++++++-
>>>  4 files changed, 27 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index 3f2eb8d80189..0e23589ab3bf 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -1310,7 +1310,7 @@ static struct request *blk_old_get_request(struct request_queue *q, int rw,
>>>
>>>  struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
>>>  {
>>> -       if (q->mq_ops)
>>> +       if (blk_use_mq_path(q))
>>>                 return blk_mq_alloc_request(q, rw,
>>>                         (gfp_mask & __GFP_DIRECT_RECLAIM) ?
>>>                                 0 : BLK_MQ_REQ_NOWAIT);
>>
>> Another way might be to use mq allocator to allocate rq in case of mq_sched,
>> such as: just replace mempool_alloc in __get_request() with
>> blk_mq_alloc_request(), in this way, it should be possible to
>> avoid one extra rq allocation in blk_mq_sched_dispatch(), and keep mq's benefit
>> of rq preallocation, which can avoid to hold queue_lock during the
>> allocation too.
>
> One problem with the MQ rq allocation is that it's tied to the device
> queue depth. This is a problem for scheduling, since we want to have a
> larger pool of requests that the IO scheduler can use, so that we
> actually have something that we can schedule with. This is a non-starter
> on QD=1 devices, but it's also a problem for SATA with 31 effectively
> usable tags.
>
> That's why I split it in two, so we have the "old" requests that we hand
> to the scheduler. I know the 'rq' field copy isn't super pretty, though.

OK, got it, thanks for your explanation.

So could we fall back to mempool_alloc() for allocating rq with mq
size if MQ rq allocator fails? Then in this way the extra rq allocation
in blk_mq_alloc_request() may be killed.


Thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Dec. 5, 2016, 7:35 p.m. UTC | #8
On 12/05/2016 12:22 PM, Ming Lei wrote:
> On Tue, Dec 6, 2016 at 1:09 AM, Jens Axboe <axboe@fb.com> wrote:
>> On 12/05/2016 10:00 AM, Ming Lei wrote:
>>> On Sat, Dec 3, 2016 at 11:15 AM, Jens Axboe <axboe@fb.com> wrote:
>>>> No functional changes with this patch, it's just in preparation for
>>>> supporting legacy schedulers on blk-mq.
>>>>
>>>> Signed-off-by: Jens Axboe <axboe@fb.com>
>>>> ---
>>>>  block/blk-core.c  |  2 +-
>>>>  block/blk-exec.c  |  2 +-
>>>>  block/blk-flush.c | 26 ++++++++++++++------------
>>>>  block/blk.h       | 12 +++++++++++-
>>>>  4 files changed, 27 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>> index 3f2eb8d80189..0e23589ab3bf 100644
>>>> --- a/block/blk-core.c
>>>> +++ b/block/blk-core.c
>>>> @@ -1310,7 +1310,7 @@ static struct request *blk_old_get_request(struct request_queue *q, int rw,
>>>>
>>>>  struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
>>>>  {
>>>> -       if (q->mq_ops)
>>>> +       if (blk_use_mq_path(q))
>>>>                 return blk_mq_alloc_request(q, rw,
>>>>                         (gfp_mask & __GFP_DIRECT_RECLAIM) ?
>>>>                                 0 : BLK_MQ_REQ_NOWAIT);
>>>
>>> Another way might be to use mq allocator to allocate rq in case of mq_sched,
>>> such as: just replace mempool_alloc in __get_request() with
>>> blk_mq_alloc_request(), in this way, it should be possible to
>>> avoid one extra rq allocation in blk_mq_sched_dispatch(), and keep mq's benefit
>>> of rq preallocation, which can avoid to hold queue_lock during the
>>> allocation too.
>>
>> One problem with the MQ rq allocation is that it's tied to the device
>> queue depth. This is a problem for scheduling, since we want to have a
>> larger pool of requests that the IO scheduler can use, so that we
>> actually have something that we can schedule with. This is a non-starter
>> on QD=1 devices, but it's also a problem for SATA with 31 effectively
>> usable tags.
>>
>> That's why I split it in two, so we have the "old" requests that we hand
>> to the scheduler. I know the 'rq' field copy isn't super pretty, though.
> 
> OK, got it, thanks for your explanation.
> 
> So could we fall back to mempool_alloc() for allocating rq with mq
> size if MQ rq allocator fails? Then in this way the extra rq allocation
> in blk_mq_alloc_request() may be killed.

We could, yes, though I'm not sure it's worth special casing that. The
copy is pretty damn cheap compared to the high costs of going through
the legacy path. And given that, I'd probably prefer to keep it all the
same, regardless or the depth of the device. I don't think the change
would be noticable.
Mike Snitzer Dec. 5, 2016, 10:40 p.m. UTC | #9
On Mon, Dec 5, 2016 at 10:07 AM, Jens Axboe <axboe@fb.com> wrote:
>
> On 12/05/2016 06:05 AM, Christoph Hellwig wrote:
> > On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote:
> >> No functional changes with this patch, it's just in preparation for
> >> supporting legacy schedulers on blk-mq.
> >
> > Ewww.  I think without refactoring to clear what 'use_mq_path'
> > means here and better naming this is a total non-started.  Even with
> > that we'll now have yet another code path to worry about.  Is there
> > any chance to instead consolidate into a single path?
>
> It's not pretty at all. I should have prefaced this patchset with saying
> that it's an experiment in seeing what it would take to simply use the
> old IO schedulers, as a temporary measure, on blk/scsi-mq. I did clean
> it up a bit after posting:
>
> http://git.kernel.dk/cgit/linux-block/log/?h=blk-mq-legacy-sched
>
> but I'm not going to claim this is anywhere near merge read, nor clean.

Nice to see you've lowered your standards...

Maybe now we can revisit this line of work? ;)
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip

To fix: https://bugzilla.kernel.org/show_bug.cgi?id=119841

> >>  struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
> >>  {
> >> -    if (q->mq_ops)
> >> +    if (blk_use_mq_path(q))
> >>              return blk_mq_alloc_request(q, rw,
> >>                      (gfp_mask & __GFP_DIRECT_RECLAIM) ?
> >>                              0 : BLK_MQ_REQ_NOWAIT);
> >
> > So now with blk-mq and an elevator set we go into blk_old_get_request,
> > hich will simply allocate new requests.  How does this not break
> > every existing driver?
>
> Since Johannes found that confusion, maybe I should explain how it all
> works.
>
> Basically, if we are blk-mq (q->mq_ops) and now have an IO scheduler
> attached (q->elevator), then the path from bio to request is essentially
> the old one. We allocate a request through get_request() and friends,
> and insert it with the elevator interface. When the queue is later run,
> our blk_mq_sched_dispatch() asks the IO scheduler for a request, and if
> successful, we allocate a real MQ request and dispatch that. So all the
> driver ever sees is MQ requests, and it uses the MQ interface. Only the
> internal bits now look at blk_use_mq_path() to tell whether they should
> alloc+insert with that.
>
> The above will break if we have drivers manually allocating a request
> through blk_get_request(), and then using MQ functions to insert/run it.
> I didn't audit all of that, and I think most (all?) of them just use the
> MQ interfaces. That would also currently be broken, we either/or the
> logic to run software queues or run through the elevator.

I'm not seeing anything in elevator_switch() that would prevent an
elevator from attempting to be used on an mq device with > 1 hw queue.
But I could easily be missing it.

That aside, this patchset has all the makings of a _serious_ problem
for dm-mq multipath (e.g. if dm_mod.use_blk_mq=Y and
dm_mod.dm_mq_nr_hw_queues=1).  There is a bunch of code in
drivers/dm/dm-rq.c that looks at q->mq_ops vs not to determine if mq
is used vs old .request_fn.

I think we really need a way to force an allocated mq request_queue
(with a single hw_queue) to not support this technological terror
you've constructed. (*cough*
http://es.memegenerator.net/instance/58341711 )

In fact I'd prefer there be a mechanism to only allow this if some
opt-in interface is used... or the inverse: dm-mq can allocate a
blk-mq request_requeue that will _never_ support this.

I could be super dense on this line of work.  But what is the point of
all this?  Seems like a really unfortunate distraction that makes the
block code all the more encumbered with fiddley old vs new logic.  So
now we're opening old .request_fn users up to blk-mq-with-scheduler vs
non-blk-mq bugs.

Color me skeptical.  In time, maybe I'll warm to all this but for now
we need a big "OFF!" switch (aside from DEFAULT_MQ_SCHED_NONE, in
request_queue allocation interface).

FWIW, dm-multipath has supported a top-level .request_fn
requeue_queue, legacy elevator and all, stacked on blk-mq queue(s) for
quite a while.  If people _really_ want this you could easily give it
to them by forcing the use of the DM "multipath" target with
multipath's "queue_mode" feature set to "rq".
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe Dec. 5, 2016, 10:50 p.m. UTC | #10
On 12/05/2016 03:40 PM, Mike Snitzer wrote:
> On Mon, Dec 5, 2016 at 10:07 AM, Jens Axboe <axboe@fb.com> wrote:
>>
>> On 12/05/2016 06:05 AM, Christoph Hellwig wrote:
>>> On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote:
>>>> No functional changes with this patch, it's just in preparation for
>>>> supporting legacy schedulers on blk-mq.
>>>
>>> Ewww.  I think without refactoring to clear what 'use_mq_path'
>>> means here and better naming this is a total non-started.  Even with
>>> that we'll now have yet another code path to worry about.  Is there
>>> any chance to instead consolidate into a single path?
>>
>> It's not pretty at all. I should have prefaced this patchset with saying
>> that it's an experiment in seeing what it would take to simply use the
>> old IO schedulers, as a temporary measure, on blk/scsi-mq. I did clean
>> it up a bit after posting:
>>
>> http://git.kernel.dk/cgit/linux-block/log/?h=blk-mq-legacy-sched
>>
>> but I'm not going to claim this is anywhere near merge read, nor clean.
> 
> Nice to see you've lowered your standards...
> 
> Maybe now we can revisit this line of work? ;)
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip

I haven't lowered my standards. I thought this posting was pretty clear
- it's an experiment in what supporting legacy schedulers would look
like. As you quote me above, this is NOT proposed for merging, nor do I
consider it anywhere near clean.

> I'm not seeing anything in elevator_switch() that would prevent an
> elevator from attempting to be used on an mq device with > 1 hw queue.
> But I could easily be missing it.

You missed it, it's in blk_mq_sched_init(), called from
elv_iosched_store() when trying to switch (or setup a new) schedulers.

> That aside, this patchset has all the makings of a _serious_ problem
> for dm-mq multipath (e.g. if dm_mod.use_blk_mq=Y and
> dm_mod.dm_mq_nr_hw_queues=1).  There is a bunch of code in
> drivers/dm/dm-rq.c that looks at q->mq_ops vs not to determine if mq
> is used vs old .request_fn.
> 
> I think we really need a way to force an allocated mq request_queue
> (with a single hw_queue) to not support this technological terror
> you've constructed. (*cough*

See BLK_MQ_F_NO_SCHED.

> I could be super dense on this line of work.  But what is the point of
> all this?  Seems like a really unfortunate distraction that makes the
> block code all the more encumbered with fiddley old vs new logic.  So
> now we're opening old .request_fn users up to blk-mq-with-scheduler vs
> non-blk-mq bugs.

See above, it's just an experiment in seeing what this would look like,
how transparent (or not) we could make that.

Don't overthink any of this, and don't start making plans or coming up
with problems on why X or Y would not work with whatever interface
variant of dm. That's jumping the gun.
Mike Snitzer Dec. 6, 2016, 7:50 p.m. UTC | #11
On Mon, Dec 05 2016 at  5:50pm -0500,
Jens Axboe <axboe@fb.com> wrote:

> On 12/05/2016 03:40 PM, Mike Snitzer wrote:
> > On Mon, Dec 5, 2016 at 10:07 AM, Jens Axboe <axboe@fb.com> wrote:
> >>
> >> On 12/05/2016 06:05 AM, Christoph Hellwig wrote:
> >>> On Fri, Dec 02, 2016 at 08:15:15PM -0700, Jens Axboe wrote:
> >>>> No functional changes with this patch, it's just in preparation for
> >>>> supporting legacy schedulers on blk-mq.
> >>>
> >>> Ewww.  I think without refactoring to clear what 'use_mq_path'
> >>> means here and better naming this is a total non-started.  Even with
> >>> that we'll now have yet another code path to worry about.  Is there
> >>> any chance to instead consolidate into a single path?
> >>
> >> It's not pretty at all. I should have prefaced this patchset with saying
> >> that it's an experiment in seeing what it would take to simply use the
> >> old IO schedulers, as a temporary measure, on blk/scsi-mq. I did clean
> >> it up a bit after posting:
> >>
> >> http://git.kernel.dk/cgit/linux-block/log/?h=blk-mq-legacy-sched
> >>
> >> but I'm not going to claim this is anywhere near merge read, nor clean.
> > 
> > Nice to see you've lowered your standards...
> > 
> > Maybe now we can revisit this line of work? ;)
> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip
> 
> I haven't lowered my standards. I thought this posting was pretty clear
> - it's an experiment in what supporting legacy schedulers would look
> like. As you quote me above, this is NOT proposed for merging, nor do I
> consider it anywhere near clean.

Where'd your sense of humor go?

> > I'm not seeing anything in elevator_switch() that would prevent an
> > elevator from attempting to be used on an mq device with > 1 hw queue.
> > But I could easily be missing it.
> 
> You missed it, it's in blk_mq_sched_init(), called from
> elv_iosched_store() when trying to switch (or setup a new) schedulers.
> 
> > That aside, this patchset has all the makings of a _serious_ problem
> > for dm-mq multipath (e.g. if dm_mod.use_blk_mq=Y and
> > dm_mod.dm_mq_nr_hw_queues=1).  There is a bunch of code in
> > drivers/dm/dm-rq.c that looks at q->mq_ops vs not to determine if mq
> > is used vs old .request_fn.
> > 
> > I think we really need a way to force an allocated mq request_queue
> > (with a single hw_queue) to not support this technological terror
> > you've constructed. (*cough*
> 
> See BLK_MQ_F_NO_SCHED.

Yeap, missed it, thanks.  Reviewing patches via gmail _sucks_ I
should've just looked at your git branch(es) from the start.
 
> > I could be super dense on this line of work.  But what is the point of
> > all this?  Seems like a really unfortunate distraction that makes the
> > block code all the more encumbered with fiddley old vs new logic.  So
> > now we're opening old .request_fn users up to blk-mq-with-scheduler vs
> > non-blk-mq bugs.
> 
> See above, it's just an experiment in seeing what this would look like,
> how transparent (or not) we could make that.

OK, seems not very transparent so far.  But that aside, I'm more curious
on what the goal(s) and/or benefit(s) might be?  I know that before you
were hopeful to eventually eliminate the old .request_fn path in block
core (in favor of blk-mq, once it grew IO scheduling capabilties).

But by tieing blk-mq through to the old request path (and associated IO
schedulers) it certainly complicates getting rid of all the legacy code.

Selfishly, I'm looking forward to eliminating the old .request_fn
request-based code in DM core.  This advance to supporting the old IO
schedulers make that less likely.

> Don't overthink any of this, and don't start making plans or coming up
> with problems on why X or Y would not work with whatever interface
> variant of dm. That's jumping the gun.

Not overthinking.. just thinking ;)  But if this does happen then maybe
I should look to invert the request-based DM core cleanup: remove all
the old .request_fn support and impose the same (namely IO scheduler
enabled DM multipath) via dm_mod.use_blk_mq=Y and
dm_mod.dm_mod.dm_mq_nr_hw_queues=1

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index 3f2eb8d80189..0e23589ab3bf 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1310,7 +1310,7 @@  static struct request *blk_old_get_request(struct request_queue *q, int rw,
 
 struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
 {
-	if (q->mq_ops)
+	if (blk_use_mq_path(q))
 		return blk_mq_alloc_request(q, rw,
 			(gfp_mask & __GFP_DIRECT_RECLAIM) ?
 				0 : BLK_MQ_REQ_NOWAIT);
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 3ecb00a6cf45..73b8a701ae6d 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -64,7 +64,7 @@  void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
 	 * don't check dying flag for MQ because the request won't
 	 * be reused after dying flag is set
 	 */
-	if (q->mq_ops) {
+	if (blk_use_mq_path(q)) {
 		blk_mq_insert_request(rq, at_head, true, false);
 		return;
 	}
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 1bdbb3d3e5f5..0b68a1258bdd 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -133,14 +133,16 @@  static void blk_flush_restore_request(struct request *rq)
 
 static bool blk_flush_queue_rq(struct request *rq, bool add_front)
 {
-	if (rq->q->mq_ops) {
+	struct request_queue *q = rq->q;
+
+	if (blk_use_mq_path(q)) {
 		blk_mq_add_to_requeue_list(rq, add_front, true);
 		return false;
 	} else {
 		if (add_front)
-			list_add(&rq->queuelist, &rq->q->queue_head);
+			list_add(&rq->queuelist, &q->queue_head);
 		else
-			list_add_tail(&rq->queuelist, &rq->q->queue_head);
+			list_add_tail(&rq->queuelist, &q->queue_head);
 		return true;
 	}
 }
@@ -201,7 +203,7 @@  static bool blk_flush_complete_seq(struct request *rq,
 		BUG_ON(!list_empty(&rq->queuelist));
 		list_del_init(&rq->flush.list);
 		blk_flush_restore_request(rq);
-		if (q->mq_ops)
+		if (blk_use_mq_path(q))
 			blk_mq_end_request(rq, error);
 		else
 			__blk_end_request_all(rq, error);
@@ -224,7 +226,7 @@  static void flush_end_io(struct request *flush_rq, int error)
 	unsigned long flags = 0;
 	struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
 
-	if (q->mq_ops) {
+	if (blk_use_mq_path(q)) {
 		struct blk_mq_hw_ctx *hctx;
 
 		/* release the tag's ownership to the req cloned from */
@@ -240,7 +242,7 @@  static void flush_end_io(struct request *flush_rq, int error)
 	/* account completion of the flush request */
 	fq->flush_running_idx ^= 1;
 
-	if (!q->mq_ops)
+	if (!blk_use_mq_path(q))
 		elv_completed_request(q, flush_rq);
 
 	/* and push the waiting requests to the next stage */
@@ -267,7 +269,7 @@  static void flush_end_io(struct request *flush_rq, int error)
 		blk_run_queue_async(q);
 	}
 	fq->flush_queue_delayed = 0;
-	if (q->mq_ops)
+	if (blk_use_mq_path(q))
 		spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
 }
 
@@ -315,7 +317,7 @@  static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
 	 * be in flight at the same time. And acquire the tag's
 	 * ownership for flush req.
 	 */
-	if (q->mq_ops) {
+	if (blk_use_mq_path(q)) {
 		struct blk_mq_hw_ctx *hctx;
 
 		flush_rq->mq_ctx = first_rq->mq_ctx;
@@ -409,7 +411,7 @@  void blk_insert_flush(struct request *rq)
 	 * complete the request.
 	 */
 	if (!policy) {
-		if (q->mq_ops)
+		if (blk_use_mq_path(q))
 			blk_mq_end_request(rq, 0);
 		else
 			__blk_end_bidi_request(rq, 0, 0, 0);
@@ -425,9 +427,9 @@  void blk_insert_flush(struct request *rq)
 	 */
 	if ((policy & REQ_FSEQ_DATA) &&
 	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
-		if (q->mq_ops) {
+		if (blk_use_mq_path(q))
 			blk_mq_insert_request(rq, false, false, true);
-		} else
+		else
 			list_add_tail(&rq->queuelist, &q->queue_head);
 		return;
 	}
@@ -440,7 +442,7 @@  void blk_insert_flush(struct request *rq)
 	INIT_LIST_HEAD(&rq->flush.list);
 	rq->rq_flags |= RQF_FLUSH_SEQ;
 	rq->flush.saved_end_io = rq->end_io; /* Usually NULL */
-	if (q->mq_ops) {
+	if (blk_use_mq_path(q)) {
 		rq->end_io = mq_flush_data_end_io;
 
 		spin_lock_irq(&fq->mq_flush_lock);
diff --git a/block/blk.h b/block/blk.h
index 041185e5f129..094fc10429c3 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -36,10 +36,20 @@  extern struct kmem_cache *request_cachep;
 extern struct kobj_type blk_queue_ktype;
 extern struct ida blk_queue_ida;
 
+/*
+ * Use the MQ path if we have mq_ops, but not if we are using an IO
+ * scheduler. For the scheduler, we should use the legacy path. Only
+ * for internal use in the block layer.
+ */
+static inline bool blk_use_mq_path(struct request_queue *q)
+{
+	return q->mq_ops && !q->elevator;
+}
+
 static inline struct blk_flush_queue *blk_get_flush_queue(
 		struct request_queue *q, struct blk_mq_ctx *ctx)
 {
-	if (q->mq_ops)
+	if (blk_use_mq_path(q))
 		return blk_mq_map_queue(q, ctx->cpu)->fq;
 	return q->fq;
 }