Message ID | 1501790516-6924-4-git-send-email-axboe@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 2017-08-03 at 14:01 -0600, Jens Axboe wrote: > We don't have to inc/dec some counter, since we can just > iterate the tags. That makes inc/dec a noop, but means we > have to iterate busy tags to get an in-flight count. > [ ... ] > +unsigned int blk_mq_in_flight(struct request_queue *q, > + struct hd_struct *part) > +{ > + struct mq_inflight mi = { .part = part, .inflight = 0 }; Hello Jens, A minor stylistic comment: since a C compiler is required to initialize to zero all member variables that have not been initialized explicitly I think ".inflight = 0" can be left out. > diff --git a/block/genhd.c b/block/genhd.c > index f735af67a0c9..ad5dc567d57f 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -45,6 +45,35 @@ static void disk_add_events(struct gendisk *disk); > static void disk_del_events(struct gendisk *disk); > static void disk_release_events(struct gendisk *disk); > > +void part_inc_in_flight(struct request_queue *q, struct hd_struct *part, int rw) > +{ > + if (q->mq_ops) > + return; > + > + atomic_inc(&part->in_flight[rw]); > + if (part->partno) > + atomic_inc(&part_to_disk(part)->part0.in_flight[rw]); > +} > [ ... ] > diff --git a/include/linux/genhd.h b/include/linux/genhd.h > index 7f7427e00f9c..f2c5096b3a7e 100644 > --- a/include/linux/genhd.h > +++ b/include/linux/genhd.h > @@ -362,28 +362,9 @@ static inline void free_part_stats(struct hd_struct *part) > #define part_stat_sub(cpu, gendiskp, field, subnd) \ > part_stat_add(cpu, gendiskp, field, -subnd) > > -static inline void part_inc_in_flight(struct request_queue *q, > - struct hd_struct *part, int rw) > -{ > - atomic_inc(&part->in_flight[rw]); > - if (part->partno) > - atomic_inc(&part_to_disk(part)->part0.in_flight[rw]); > -} > [ ... ] Sorry but to me it seems like this part of the patch does match with the patch description? The patch description mentions that inc and dec become a noop but it seems to me that these functions have been uninlined instead of making these a noop? Bart.
On 08/03/2017 02:41 PM, Bart Van Assche wrote: > On Thu, 2017-08-03 at 14:01 -0600, Jens Axboe wrote: >> We don't have to inc/dec some counter, since we can just >> iterate the tags. That makes inc/dec a noop, but means we >> have to iterate busy tags to get an in-flight count. >> [ ... ] >> +unsigned int blk_mq_in_flight(struct request_queue *q, >> + struct hd_struct *part) >> +{ >> + struct mq_inflight mi = { .part = part, .inflight = 0 }; > > Hello Jens, > > A minor stylistic comment: since a C compiler is required to > initialize to zero all member variables that have not been initialized > explicitly I think ".inflight = 0" can be left out. It can, I'll kill it. >> diff --git a/block/genhd.c b/block/genhd.c >> index f735af67a0c9..ad5dc567d57f 100644 >> --- a/block/genhd.c >> +++ b/block/genhd.c >> @@ -45,6 +45,35 @@ static void disk_add_events(struct gendisk *disk); >> static void disk_del_events(struct gendisk *disk); >> static void disk_release_events(struct gendisk *disk); >> >> +void part_inc_in_flight(struct request_queue *q, struct hd_struct *part, int rw) >> +{ >> + if (q->mq_ops) >> + return; >> + >> + atomic_inc(&part->in_flight[rw]); >> + if (part->partno) >> + atomic_inc(&part_to_disk(part)->part0.in_flight[rw]); >> +} >> [ ... ] >> diff --git a/include/linux/genhd.h b/include/linux/genhd.h >> index 7f7427e00f9c..f2c5096b3a7e 100644 >> --- a/include/linux/genhd.h >> +++ b/include/linux/genhd.h >> @@ -362,28 +362,9 @@ static inline void free_part_stats(struct hd_struct *part) >> #define part_stat_sub(cpu, gendiskp, field, subnd) \ >> part_stat_add(cpu, gendiskp, field, -subnd) >> >> -static inline void part_inc_in_flight(struct request_queue *q, >> - struct hd_struct *part, int rw) >> -{ >> - atomic_inc(&part->in_flight[rw]); >> - if (part->partno) >> - atomic_inc(&part_to_disk(part)->part0.in_flight[rw]); >> -} >> [ ... ] > > Sorry but to me it seems like this part of the patch does match with > the patch description? The patch description mentions that inc and dec > become a noop but it seems to me that these functions have been > uninlined instead of making these a noop? The inc/dec goes away for mq, the non-mq path still has to use them. I just move them as well. Could be a prep patch, but it's just moving the code out of the header and into a normal C file instead.
On Thu, 2017-08-03 at 14:45 -0600, Jens Axboe wrote: > The inc/dec goes away for mq, the non-mq path still has to use them. I just > move them as well. Could be a prep patch, but it's just moving the code out > of the header and into a normal C file instead. Hello Jens, I misread that part of the patch. Now I have had another look these changes look fine to me. Hence: Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
On Thu, 2017-08-03 at 14:01 -0600, Jens Axboe wrote: > +static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx, > + struct request *rq, void *priv, > + bool reserved) > +{ > + struct mq_inflight *mi = priv; > + > + if (rq->part == mi->part) > + mi->inflight++; > +} > [ ... ] > -static inline void part_inc_in_flight(struct request_queue *q, > - struct hd_struct *part, int rw) > -{ > - atomic_inc(&part->in_flight[rw]); > - if (part->partno) > - atomic_inc(&part_to_disk(part)->part0.in_flight[rw]); > -} Hello Jens, The existing part_inc_in_flight() code includes all requests in the in_flight statistics for part0 but the new code in blk_mq_check_inflight() not. Is that on purpose? Should the rq->part == mi->part check perhaps be skipped if mi->part represents part0? Thanks, Bart.
On 08/03/2017 03:25 PM, Bart Van Assche wrote: > On Thu, 2017-08-03 at 14:01 -0600, Jens Axboe wrote: >> +static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx, >> + struct request *rq, void *priv, >> + bool reserved) >> +{ >> + struct mq_inflight *mi = priv; >> + >> + if (rq->part == mi->part) >> + mi->inflight++; >> +} >> [ ... ] >> -static inline void part_inc_in_flight(struct request_queue *q, >> - struct hd_struct *part, int rw) >> -{ >> - atomic_inc(&part->in_flight[rw]); >> - if (part->partno) >> - atomic_inc(&part_to_disk(part)->part0.in_flight[rw]); >> -} > > Hello Jens, > > The existing part_inc_in_flight() code includes all requests in the in_flight > statistics for part0 but the new code in blk_mq_check_inflight() not. Is that > on purpose? Should the rq->part == mi->part check perhaps be skipped if > mi->part represents part0? The existing code increments always for the partition in question, and for the root if it's a partition. I'll take a look at that logic, and ensure it's all correct.
On Thu, Aug 03, 2017 at 02:01:55PM -0600, Jens Axboe wrote: > We don't have to inc/dec some counter, since we can just > iterate the tags. That makes inc/dec a noop, but means we > have to iterate busy tags to get an in-flight count. > > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > block/blk-mq.c | 24 ++++++++++++++++++++++++ > block/blk-mq.h | 2 ++ > block/genhd.c | 29 +++++++++++++++++++++++++++++ > include/linux/genhd.h | 25 +++---------------------- > 4 files changed, 58 insertions(+), 22 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 05dfa3f270ae..37035891e120 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -86,6 +86,30 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx, > sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw); > } > > +struct mq_inflight { > + struct hd_struct *part; > + unsigned int inflight; > +}; > + > +static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx, > + struct request *rq, void *priv, > + bool reserved) > +{ > + struct mq_inflight *mi = priv; > + > + if (rq->part == mi->part) > + mi->inflight++; > +} > + > +unsigned int blk_mq_in_flight(struct request_queue *q, > + struct hd_struct *part) > +{ > + struct mq_inflight mi = { .part = part, .inflight = 0 }; > + > + blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi); > + return mi.inflight; > +} IMO it might not be as efficient as per-cpu variable. For example, NVMe on one 128-core system, if we use percpu variable, it is enough to read 128 local variable from each CPU for accounting one in_flight. But in this way of blk_mq_in_flight(), we need to do 128 sbitmap search, and one sbitmap search need to read at least 16 words of 'unsigned long', and total 128*16 read. So maybe we need to compare the two approaches first. Thanks, Ming
On 08/04/2017 05:17 AM, Ming Lei wrote: > On Thu, Aug 03, 2017 at 02:01:55PM -0600, Jens Axboe wrote: >> We don't have to inc/dec some counter, since we can just >> iterate the tags. That makes inc/dec a noop, but means we >> have to iterate busy tags to get an in-flight count. >> >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> --- >> block/blk-mq.c | 24 ++++++++++++++++++++++++ >> block/blk-mq.h | 2 ++ >> block/genhd.c | 29 +++++++++++++++++++++++++++++ >> include/linux/genhd.h | 25 +++---------------------- >> 4 files changed, 58 insertions(+), 22 deletions(-) >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 05dfa3f270ae..37035891e120 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -86,6 +86,30 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx, >> sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw); >> } >> >> +struct mq_inflight { >> + struct hd_struct *part; >> + unsigned int inflight; >> +}; >> + >> +static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx, >> + struct request *rq, void *priv, >> + bool reserved) >> +{ >> + struct mq_inflight *mi = priv; >> + >> + if (rq->part == mi->part) >> + mi->inflight++; >> +} >> + >> +unsigned int blk_mq_in_flight(struct request_queue *q, >> + struct hd_struct *part) >> +{ >> + struct mq_inflight mi = { .part = part, .inflight = 0 }; >> + >> + blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi); >> + return mi.inflight; >> +} > > IMO it might not be as efficient as per-cpu variable. > > For example, NVMe on one 128-core system, if we use percpu variable, > it is enough to read 128 local variable from each CPU for accounting > one in_flight. IFF the system is configured with NR_CPUS=128. Most distros go much bigger. On the other hand, we know that nr_queues will never be bigger than the number of online cpus, not the number of possible cpus. > But in this way of blk_mq_in_flight(), we need to do 128 > sbitmap search, and one sbitmap search need to read at least > 16 words of 'unsigned long', and total 128*16 read. If that ends up being a problem, it hasn't in testing, then we could always stuff an index in front of the full sbitmap. > So maybe we need to compare the two approaches first. We already did, back when this was originally posted. See the thread from end may / start june and the results from Brian.
On Fri, Aug 04, 2017 at 07:55:41AM -0600, Jens Axboe wrote: > On 08/04/2017 05:17 AM, Ming Lei wrote: > > On Thu, Aug 03, 2017 at 02:01:55PM -0600, Jens Axboe wrote: > >> We don't have to inc/dec some counter, since we can just > >> iterate the tags. That makes inc/dec a noop, but means we > >> have to iterate busy tags to get an in-flight count. > >> > >> Signed-off-by: Jens Axboe <axboe@kernel.dk> > >> --- > >> block/blk-mq.c | 24 ++++++++++++++++++++++++ > >> block/blk-mq.h | 2 ++ > >> block/genhd.c | 29 +++++++++++++++++++++++++++++ > >> include/linux/genhd.h | 25 +++---------------------- > >> 4 files changed, 58 insertions(+), 22 deletions(-) > >> > >> diff --git a/block/blk-mq.c b/block/blk-mq.c > >> index 05dfa3f270ae..37035891e120 100644 > >> --- a/block/blk-mq.c > >> +++ b/block/blk-mq.c > >> @@ -86,6 +86,30 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx, > >> sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw); > >> } > >> > >> +struct mq_inflight { > >> + struct hd_struct *part; > >> + unsigned int inflight; > >> +}; > >> + > >> +static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx, > >> + struct request *rq, void *priv, > >> + bool reserved) > >> +{ > >> + struct mq_inflight *mi = priv; > >> + > >> + if (rq->part == mi->part) > >> + mi->inflight++; > >> +} > >> + > >> +unsigned int blk_mq_in_flight(struct request_queue *q, > >> + struct hd_struct *part) > >> +{ > >> + struct mq_inflight mi = { .part = part, .inflight = 0 }; > >> + > >> + blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi); > >> + return mi.inflight; > >> +} > > > > IMO it might not be as efficient as per-cpu variable. > > > > For example, NVMe on one 128-core system, if we use percpu variable, > > it is enough to read 128 local variable from each CPU for accounting > > one in_flight. > > IFF the system is configured with NR_CPUS=128. Most distros go > much bigger. On the other hand, we know that nr_queues will > never be bigger than the number of online cpus, not the number > of possible cpus. We usually use for_each_possible_cpu() for aggregating CPU local counters, and num_possible_cpus() is the number of CPUs polulatable in system, which is much less than NR_CPUs: include/linux/cpumask.h: * cpu_possible_mask- has bit 'cpu' set iff cpu is populatable > > > But in this way of blk_mq_in_flight(), we need to do 128 > > sbitmap search, and one sbitmap search need to read at least > > 16 words of 'unsigned long', and total 128*16 read. > > If that ends up being a problem, it hasn't in testing, then we > could always stuff an index in front of the full sbitmap. > > > So maybe we need to compare the two approaches first. > > We already did, back when this was originally posted. See the > thread from end may / start june and the results from Brian. Can't find the compasison data between percpu accounting vs. mq-infilight in that thread. Just saw Brian mentioned in patch log that percpu may reach 11.4M(I guess 'M' is missed) [1]: "When running this on a Power system, to a single null_blk device with 80 submission queues, irq mode 0, with 80 fio jobs, I saw IOPs go from 1.5M IO/s to 11.4 IO/s." But in link[2], he said mq-flight can reach 9.4M. Could Brian explain it a bit? Maybe the two tests were run in different settings, don't know. Even though mq-flight is better, I guess we need to understand the principle behind why it is better than percpu... [1] http://marc.info/?l=linux-block&m=149868436905520&w=2 [2] http://marc.info/?l=linux-block&m=149920174301644&w=2 Thanks, Ming
On 08/04/2017 05:19 PM, Ming Lei wrote: > On Fri, Aug 04, 2017 at 07:55:41AM -0600, Jens Axboe wrote: >> On 08/04/2017 05:17 AM, Ming Lei wrote: >>> On Thu, Aug 03, 2017 at 02:01:55PM -0600, Jens Axboe wrote: >>>> We don't have to inc/dec some counter, since we can just >>>> iterate the tags. That makes inc/dec a noop, but means we >>>> have to iterate busy tags to get an in-flight count. >>>> >>>> Signed-off-by: Jens Axboe <axboe@kernel.dk> >>>> --- >>>> block/blk-mq.c | 24 ++++++++++++++++++++++++ >>>> block/blk-mq.h | 2 ++ >>>> block/genhd.c | 29 +++++++++++++++++++++++++++++ >>>> include/linux/genhd.h | 25 +++---------------------- >>>> 4 files changed, 58 insertions(+), 22 deletions(-) >>>> >>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> index 05dfa3f270ae..37035891e120 100644 >>>> --- a/block/blk-mq.c >>>> +++ b/block/blk-mq.c >>>> @@ -86,6 +86,30 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx, >>>> sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw); >>>> } >>>> >>>> +struct mq_inflight { >>>> + struct hd_struct *part; >>>> + unsigned int inflight; >>>> +}; >>>> + >>>> +static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx, >>>> + struct request *rq, void *priv, >>>> + bool reserved) >>>> +{ >>>> + struct mq_inflight *mi = priv; >>>> + >>>> + if (rq->part == mi->part) >>>> + mi->inflight++; >>>> +} >>>> + >>>> +unsigned int blk_mq_in_flight(struct request_queue *q, >>>> + struct hd_struct *part) >>>> +{ >>>> + struct mq_inflight mi = { .part = part, .inflight = 0 }; >>>> + >>>> + blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi); >>>> + return mi.inflight; >>>> +} >>> >>> IMO it might not be as efficient as per-cpu variable. >>> >>> For example, NVMe on one 128-core system, if we use percpu variable, >>> it is enough to read 128 local variable from each CPU for accounting >>> one in_flight. >> >> IFF the system is configured with NR_CPUS=128. Most distros go >> much bigger. On the other hand, we know that nr_queues will >> never be bigger than the number of online cpus, not the number >> of possible cpus. > > We usually use for_each_possible_cpu() for aggregating CPU > local counters, and num_possible_cpus() is the number of > CPUs polulatable in system, which is much less than NR_CPUs: > > include/linux/cpumask.h: > * cpu_possible_mask- has bit 'cpu' set iff cpu is populatable > >> >>> But in this way of blk_mq_in_flight(), we need to do 128 >>> sbitmap search, and one sbitmap search need to read at least >>> 16 words of 'unsigned long', and total 128*16 read. >> >> If that ends up being a problem, it hasn't in testing, then we >> could always stuff an index in front of the full sbitmap. >> >>> So maybe we need to compare the two approaches first. >> >> We already did, back when this was originally posted. See the >> thread from end may / start june and the results from Brian. > > Can't find the compasison data between percpu accounting vs. mq-infilight > in that thread. > > Just saw Brian mentioned in patch log that percpu may reach > 11.4M(I guess 'M' is missed) [1]: > > "When running this on a Power system, to a single null_blk device > with 80 submission queues, irq mode 0, with 80 fio jobs, I saw IOPs > go from 1.5M IO/s to 11.4 IO/s." > > But in link[2], he said mq-flight can reach 9.4M. > > Could Brian explain it a bit? Maybe the two tests were run in > different settings, don't know. The 11.4M IOPs vs 9.4M IOPs runs cannot be directly compared, as they were run on different systems with different NVMe devices. The key comparison I kept coming back to in my measurements was: N jobs run to 1 null_blk vs. N null_blk devices, each with 1 job If the IOPs I get between the two is similar, that should show that we don't have scaling issues in blk-mq. There were three variations of patches I tried with: * per-cpu - Patch from me * per-node-atomic - Patch from Ming * mq-inflight - Patch from Jens All of them provided a massive improvement in my environment. The mq-inflight was only marginally better than the per node and it was most prominent in the N null_blk each with 1 job. While the per-node atomic approach certainly reduced cross node contention on the atomics, they are still atomics, which have a bit of overhead, particularly on the Power platform. As for the difference between the percpu approach and the mq-inflight approach, I didn't compare them directly in the same config, since I didn't think the percpu approach would go anywhere after the initial discussion we had on the list. I'll get some time on the test machine again and do a direct comparison between in-flight and percpu to see if there are any significant difference. Thanks, Brian
diff --git a/block/blk-mq.c b/block/blk-mq.c index 05dfa3f270ae..37035891e120 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -86,6 +86,30 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx, sbitmap_clear_bit(&hctx->ctx_map, ctx->index_hw); } +struct mq_inflight { + struct hd_struct *part; + unsigned int inflight; +}; + +static void blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx, + struct request *rq, void *priv, + bool reserved) +{ + struct mq_inflight *mi = priv; + + if (rq->part == mi->part) + mi->inflight++; +} + +unsigned int blk_mq_in_flight(struct request_queue *q, + struct hd_struct *part) +{ + struct mq_inflight mi = { .part = part, .inflight = 0 }; + + blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, &mi); + return mi.inflight; +} + void blk_freeze_queue_start(struct request_queue *q) { int freeze_depth; diff --git a/block/blk-mq.h b/block/blk-mq.h index 1a06fdf9fd4d..cade1a512a01 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -138,4 +138,6 @@ static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx) return hctx->nr_ctx && hctx->tags; } +unsigned int blk_mq_in_flight(struct request_queue *q, struct hd_struct *part); + #endif diff --git a/block/genhd.c b/block/genhd.c index f735af67a0c9..ad5dc567d57f 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -45,6 +45,35 @@ static void disk_add_events(struct gendisk *disk); static void disk_del_events(struct gendisk *disk); static void disk_release_events(struct gendisk *disk); +void part_inc_in_flight(struct request_queue *q, struct hd_struct *part, int rw) +{ + if (q->mq_ops) + return; + + atomic_inc(&part->in_flight[rw]); + if (part->partno) + atomic_inc(&part_to_disk(part)->part0.in_flight[rw]); +} + +void part_dec_in_flight(struct request_queue *q, struct hd_struct *part, int rw) +{ + if (q->mq_ops) + return; + + atomic_dec(&part->in_flight[rw]); + if (part->partno) + atomic_dec(&part_to_disk(part)->part0.in_flight[rw]); +} + +int part_in_flight(struct request_queue *q, struct hd_struct *part) +{ + if (q->mq_ops) + return blk_mq_in_flight(q, part); + + return atomic_read(&part->in_flight[0]) + + atomic_read(&part->in_flight[1]); +} + /** * disk_get_part - get partition * @disk: disk to look partition from diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 7f7427e00f9c..f2c5096b3a7e 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -362,28 +362,9 @@ static inline void free_part_stats(struct hd_struct *part) #define part_stat_sub(cpu, gendiskp, field, subnd) \ part_stat_add(cpu, gendiskp, field, -subnd) -static inline void part_inc_in_flight(struct request_queue *q, - struct hd_struct *part, int rw) -{ - atomic_inc(&part->in_flight[rw]); - if (part->partno) - atomic_inc(&part_to_disk(part)->part0.in_flight[rw]); -} - -static inline void part_dec_in_flight(struct request_queue *q, - struct hd_struct *part, int rw) -{ - atomic_dec(&part->in_flight[rw]); - if (part->partno) - atomic_dec(&part_to_disk(part)->part0.in_flight[rw]); -} - -static inline int part_in_flight(struct request_queue *q, - struct hd_struct *part) -{ - return atomic_read(&part->in_flight[0]) + - atomic_read(&part->in_flight[1]); -} +int part_in_flight(struct request_queue *q, struct hd_struct *part); +void part_dec_in_flight(struct request_queue *q, struct hd_struct *part, int rw); +void part_inc_in_flight(struct request_queue *q, struct hd_struct *part, int rw); static inline struct partition_meta_info *alloc_part_info(struct gendisk *disk) {
We don't have to inc/dec some counter, since we can just iterate the tags. That makes inc/dec a noop, but means we have to iterate busy tags to get an in-flight count. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- block/blk-mq.c | 24 ++++++++++++++++++++++++ block/blk-mq.h | 2 ++ block/genhd.c | 29 +++++++++++++++++++++++++++++ include/linux/genhd.h | 25 +++---------------------- 4 files changed, 58 insertions(+), 22 deletions(-)