Message ID | 20230522004328.760024-1-tilan7663@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/1] blk-mq: fix race condition in active queue accounting | expand |
On Mon, May 22, 2023 at 8:45 AM Tian Lan <tilan7663@gmail.com> wrote: > > From: Tian Lan <tian.lan@twosigma.com> > > If multiple CPUs are sharing the same hardware queue, it can > cause leak in the active queue counter tracking when __blk_mq_tag_busy() > is executed simultaneously. > > Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value") > Signed-off-by: Tian Lan <tian.lan@twosigma.com> Looks fine, Reviewed-by: Ming Lei <ming.lei@redhat.com>
On 5/22/23 09:43, Tian Lan wrote: > From: Tian Lan <tian.lan@twosigma.com> > > If multiple CPUs are sharing the same hardware queue, it can > cause leak in the active queue counter tracking when __blk_mq_tag_busy() > is executed simultaneously. > > Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value") > Signed-off-by: Tian Lan <tian.lan@twosigma.com> > --- > block/blk-mq-tag.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c > index d6af9d431dc6..07372032238a 100644 > --- a/block/blk-mq-tag.c > +++ b/block/blk-mq-tag.c > @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) > if (blk_mq_is_shared_tags(hctx->flags)) { > struct request_queue *q = hctx->queue; > > - if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) > + if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) || > + test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) { This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be: if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) return; ? > return; > - set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags); > + } > } else { > - if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) > + if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) || > + test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) { > return; > - set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state); > + } Same here. And given that this pattern is the same for the if and the else, this entire hunk can likely be simplified. > } > > users = atomic_inc_return(&hctx->tags->active_queues);
On Mon, May 22, 2023 at 10:15:22AM +0900, Damien Le Moal wrote: > On 5/22/23 09:43, Tian Lan wrote: > > From: Tian Lan <tian.lan@twosigma.com> > > > > If multiple CPUs are sharing the same hardware queue, it can > > cause leak in the active queue counter tracking when __blk_mq_tag_busy() > > is executed simultaneously. > > > > Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value") > > Signed-off-by: Tian Lan <tian.lan@twosigma.com> > > --- > > block/blk-mq-tag.c | 10 ++++++---- > > 1 file changed, 6 insertions(+), 4 deletions(-) > > > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c > > index d6af9d431dc6..07372032238a 100644 > > --- a/block/blk-mq-tag.c > > +++ b/block/blk-mq-tag.c > > @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) > > if (blk_mq_is_shared_tags(hctx->flags)) { > > struct request_queue *q = hctx->queue; > > > > - if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) > > + if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) || > > + test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) { > > This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be: > > if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) > return; > > ? It is one micro optimization since test_and_set_bit is much heavier than test_bit, so test_and_set_bit() is just needed in the 1st time. Thanks, Ming
On 5/22/23 10:23, Ming Lei wrote: > On Mon, May 22, 2023 at 10:15:22AM +0900, Damien Le Moal wrote: >> On 5/22/23 09:43, Tian Lan wrote: >>> From: Tian Lan <tian.lan@twosigma.com> >>> >>> If multiple CPUs are sharing the same hardware queue, it can >>> cause leak in the active queue counter tracking when __blk_mq_tag_busy() >>> is executed simultaneously. >>> >>> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value") >>> Signed-off-by: Tian Lan <tian.lan@twosigma.com> >>> --- >>> block/blk-mq-tag.c | 10 ++++++---- >>> 1 file changed, 6 insertions(+), 4 deletions(-) >>> >>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c >>> index d6af9d431dc6..07372032238a 100644 >>> --- a/block/blk-mq-tag.c >>> +++ b/block/blk-mq-tag.c >>> @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) >>> if (blk_mq_is_shared_tags(hctx->flags)) { >>> struct request_queue *q = hctx->queue; >>> >>> - if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) >>> + if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) || >>> + test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) { >> >> This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be: >> >> if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) >> return; >> >> ? > > It is one micro optimization since test_and_set_bit is much heavier than > test_bit, so test_and_set_bit() is just needed in the 1st time. But having the 2 calls test_bit + test_and_set_bit creates a race, doesn't it ? What if another cpu clears the bit between these 2 calls ? At the very least, the patch should remove the curly braces around that if. > > Thanks, > Ming >
On 5/21/23 7:23?PM, Ming Lei wrote: > On Mon, May 22, 2023 at 10:15:22AM +0900, Damien Le Moal wrote: >> On 5/22/23 09:43, Tian Lan wrote: >>> From: Tian Lan <tian.lan@twosigma.com> >>> >>> If multiple CPUs are sharing the same hardware queue, it can >>> cause leak in the active queue counter tracking when __blk_mq_tag_busy() >>> is executed simultaneously. >>> >>> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value") >>> Signed-off-by: Tian Lan <tian.lan@twosigma.com> >>> --- >>> block/blk-mq-tag.c | 10 ++++++---- >>> 1 file changed, 6 insertions(+), 4 deletions(-) >>> >>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c >>> index d6af9d431dc6..07372032238a 100644 >>> --- a/block/blk-mq-tag.c >>> +++ b/block/blk-mq-tag.c >>> @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) >>> if (blk_mq_is_shared_tags(hctx->flags)) { >>> struct request_queue *q = hctx->queue; >>> >>> - if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) >>> + if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) || >>> + test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) { >> >> This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be: >> >> if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) >> return; >> >> ? > > It is one micro optimization since test_and_set_bit is much heavier > than test_bit, so test_and_set_bit() is just needed in the 1st time. It's an optimization, but it's certainly not a micro one. If the common case is always hitting that, then test_and_set_bit() will repeatedly dirty that cacheline. And obviously it's useless to do that if the bit is already set. This makes it a pretty nice optimization and definitely outside the realm of "micro optimization" as it can have quite a large impact. I used that in various spots in blk-mq, which I suspect is where the inspiration for this one came too.
On 5/21/23 7:29?PM, Damien Le Moal wrote: > On 5/22/23 10:23, Ming Lei wrote: >> On Mon, May 22, 2023 at 10:15:22AM +0900, Damien Le Moal wrote: >>> On 5/22/23 09:43, Tian Lan wrote: >>>> From: Tian Lan <tian.lan@twosigma.com> >>>> >>>> If multiple CPUs are sharing the same hardware queue, it can >>>> cause leak in the active queue counter tracking when __blk_mq_tag_busy() >>>> is executed simultaneously. >>>> >>>> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value") >>>> Signed-off-by: Tian Lan <tian.lan@twosigma.com> >>>> --- >>>> block/blk-mq-tag.c | 10 ++++++---- >>>> 1 file changed, 6 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c >>>> index d6af9d431dc6..07372032238a 100644 >>>> --- a/block/blk-mq-tag.c >>>> +++ b/block/blk-mq-tag.c >>>> @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) >>>> if (blk_mq_is_shared_tags(hctx->flags)) { >>>> struct request_queue *q = hctx->queue; >>>> >>>> - if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) >>>> + if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) || >>>> + test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) { >>> >>> This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be: >>> >>> if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) >>> return; >>> >>> ? >> >> It is one micro optimization since test_and_set_bit is much heavier than >> test_bit, so test_and_set_bit() is just needed in the 1st time. > > But having the 2 calls test_bit + test_and_set_bit creates a race, > doesn't it ? What if another cpu clears the bit between these 2 calls > ? How so? If the bit is already set or you're racing with it being set or cleared, that race already exists before. This simply prevent unnecessary dirtying of that cacheline.
On Mon, May 22, 2023 at 10:29:05AM +0900, Damien Le Moal wrote: > On 5/22/23 10:23, Ming Lei wrote: > > On Mon, May 22, 2023 at 10:15:22AM +0900, Damien Le Moal wrote: > >> On 5/22/23 09:43, Tian Lan wrote: > >>> From: Tian Lan <tian.lan@twosigma.com> > >>> > >>> If multiple CPUs are sharing the same hardware queue, it can > >>> cause leak in the active queue counter tracking when __blk_mq_tag_busy() > >>> is executed simultaneously. > >>> > >>> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value") > >>> Signed-off-by: Tian Lan <tian.lan@twosigma.com> > >>> --- > >>> block/blk-mq-tag.c | 10 ++++++---- > >>> 1 file changed, 6 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c > >>> index d6af9d431dc6..07372032238a 100644 > >>> --- a/block/blk-mq-tag.c > >>> +++ b/block/blk-mq-tag.c > >>> @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) > >>> if (blk_mq_is_shared_tags(hctx->flags)) { > >>> struct request_queue *q = hctx->queue; > >>> > >>> - if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) > >>> + if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) || > >>> + test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) { > >> > >> This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be: > >> > >> if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) > >> return; > >> > >> ? > > > > It is one micro optimization since test_and_set_bit is much heavier than > > test_bit, so test_and_set_bit() is just needed in the 1st time. > > But having the 2 calls test_bit + test_and_set_bit creates a race, doesn't it ? > What if another cpu clears the bit between these 2 calls ? If test_bit() returns 0, there isn't such race since both sides are atomic OP. If test_bit() returns 1: 1) __blk_mq_tag_busy() vs. __blk_mq_tag_busy() - both does nothing 2) __blk_mq_tag_busy() vs. __blk_mq_tag_idle() - hctx_may_queue() is always following __blk_mq_tag_busy() - hctx_may_queue() returns true in case that this flag is cleared - current __blk_mq_tag_busy() does nothing, the following allocation works fine given hctx_may_queue() returns true - new __blk_mq_tag_busy() will setup everything as fine Thanks, Ming
On 5/22/23 11:17, Ming Lei wrote: > On Mon, May 22, 2023 at 10:29:05AM +0900, Damien Le Moal wrote: >> On 5/22/23 10:23, Ming Lei wrote: >>> On Mon, May 22, 2023 at 10:15:22AM +0900, Damien Le Moal wrote: >>>> On 5/22/23 09:43, Tian Lan wrote: >>>>> From: Tian Lan <tian.lan@twosigma.com> >>>>> >>>>> If multiple CPUs are sharing the same hardware queue, it can >>>>> cause leak in the active queue counter tracking when __blk_mq_tag_busy() >>>>> is executed simultaneously. >>>>> >>>>> Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value") >>>>> Signed-off-by: Tian Lan <tian.lan@twosigma.com> >>>>> --- >>>>> block/blk-mq-tag.c | 10 ++++++---- >>>>> 1 file changed, 6 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c >>>>> index d6af9d431dc6..07372032238a 100644 >>>>> --- a/block/blk-mq-tag.c >>>>> +++ b/block/blk-mq-tag.c >>>>> @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) >>>>> if (blk_mq_is_shared_tags(hctx->flags)) { >>>>> struct request_queue *q = hctx->queue; >>>>> >>>>> - if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) >>>>> + if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) || >>>>> + test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) { >>>> >>>> This is weird. test_and_set_bit() returns the bit old value, so shouldn't this be: >>>> >>>> if (test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) >>>> return; >>>> >>>> ? >>> >>> It is one micro optimization since test_and_set_bit is much heavier than >>> test_bit, so test_and_set_bit() is just needed in the 1st time. >> >> But having the 2 calls test_bit + test_and_set_bit creates a race, doesn't it ? >> What if another cpu clears the bit between these 2 calls ? > > If test_bit() returns 0, there isn't such race since both sides are atomic OP. > > If test_bit() returns 1: > > 1) __blk_mq_tag_busy() vs. __blk_mq_tag_busy() > - both does nothing > > 2) __blk_mq_tag_busy() vs. __blk_mq_tag_idle() > - hctx_may_queue() is always following __blk_mq_tag_busy() > - hctx_may_queue() returns true in case that this flag is cleared > - current __blk_mq_tag_busy() does nothing, the following allocation > works fine given hctx_may_queue() returns true > - new __blk_mq_tag_busy() will setup everything as fine OK. Thanks. It would be nice to update the function comment (which is not very clear due to grammar issues) to document this unusual use of the bit functions, including the fact that it is an optimization to avoid dirtying cache lines. > > > Thanks, > Ming >
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index d6af9d431dc6..07372032238a 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -42,13 +42,15 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) if (blk_mq_is_shared_tags(hctx->flags)) { struct request_queue *q = hctx->queue; - if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) + if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) || + test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) { return; - set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags); + } } else { - if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) + if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) || + test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) { return; - set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state); + } } users = atomic_inc_return(&hctx->tags->active_queues);