diff mbox series

No protection on the hctx->dispatch_busy

Message ID 306399af-99d9-ed45-bf3b-75908ff9187c@oracle.com (mailing list archive)
State New, archived
Headers show
Series No protection on the hctx->dispatch_busy | expand

Commit Message

jianchao.wang Aug. 27, 2018, 5:56 a.m. UTC
Hi Ming

Currently, blk_mq_update_dispatch_busy is hooked in blk_mq_dispatch_rq_list
and __blk_mq_issue_directly. blk_mq_update_dispatch_busy could be invoked on multiple
cpus concurrently. But there is not any protection on the hctx->dispatch_busy. We cannot
ensure the update on the dispatch_busy atomically.


Look at the test result after applied the debug patch below:

             fio-1761  [000] ....   227.246251: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2
             fio-1766  [004] ....   227.246252: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1
             fio-1755  [000] ....   227.246366: blk_mq_update_dispatch_busy.part.50: old 1 ewma 0 cur 0
             fio-1754  [003] ....   227.266050: blk_mq_update_dispatch_busy.part.50: old 2 ewma 3 cur 3
             fio-1763  [007] ....   227.266050: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2
             fio-1761  [000] ....   227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2
             fio-1766  [004] ....   227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2
             fio-1760  [005] ....   227.266165: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1

Is it expected ?

Thanks
Jianchao

Comments

Ming Lei Aug. 27, 2018, 7 a.m. UTC | #1
On Mon, Aug 27, 2018 at 01:56:39PM +0800, jianchao.wang wrote:
> Hi Ming
> 
> Currently, blk_mq_update_dispatch_busy is hooked in blk_mq_dispatch_rq_list
> and __blk_mq_issue_directly. blk_mq_update_dispatch_busy could be invoked on multiple
> cpus concurrently. But there is not any protection on the hctx->dispatch_busy. We cannot
> ensure the update on the dispatch_busy atomically.

The update itself is atomic given type of this variable is 'unsigned int'.

> 
> 
> Look at the test result after applied the debug patch below:
> 
>              fio-1761  [000] ....   227.246251: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2
>              fio-1766  [004] ....   227.246252: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1
>              fio-1755  [000] ....   227.246366: blk_mq_update_dispatch_busy.part.50: old 1 ewma 0 cur 0
>              fio-1754  [003] ....   227.266050: blk_mq_update_dispatch_busy.part.50: old 2 ewma 3 cur 3
>              fio-1763  [007] ....   227.266050: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2
>              fio-1761  [000] ....   227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2
>              fio-1766  [004] ....   227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2
>              fio-1760  [005] ....   227.266165: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1
> 
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1088,11 +1088,12 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
>  static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy)
>  {
>         unsigned int ewma;
> +       unsigned int old;
>  
>         if (hctx->queue->elevator)
>                 return;
>  
> -       ewma = hctx->dispatch_busy;
> +       old = ewma = hctx->dispatch_busy;
>  
>         if (!ewma && !busy)
>                 return;
> @@ -1103,6 +1104,8 @@ static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy)
>         ewma /= BLK_MQ_DISPATCH_BUSY_EWMA_WEIGHT;
>  
>         hctx->dispatch_busy = ewma;
> +
> +       trace_printk("old %u ewma %u cur %u\n", old, ewma, READ_ONCE(hctx->dispatch_busy));
>  }
> 
> 
> Is it expected ?

Yes, it won't be a issue in reality given hctx->dispatch_busy is used as
a hint, and it often works as expected and hctx->dispatch_busy is convergent
finally because it is exponential weighted moving average.

Thanks,
Ming
jianchao.wang Aug. 27, 2018, 7:25 a.m. UTC | #2
On 08/27/2018 03:00 PM, Ming Lei wrote:
> On Mon, Aug 27, 2018 at 01:56:39PM +0800, jianchao.wang wrote:
>> Hi Ming
>>
>> Currently, blk_mq_update_dispatch_busy is hooked in blk_mq_dispatch_rq_list
>> and __blk_mq_issue_directly. blk_mq_update_dispatch_busy could be invoked on multiple
>> cpus concurrently. But there is not any protection on the hctx->dispatch_busy. We cannot
>> ensure the update on the dispatch_busy atomically.
> 
> The update itself is atomic given type of this variable is 'unsigned int'.

The blk_mq_update_dispatch_busy doesn't just write on a unsigned int variable,
but read, calculate and write. The whole operation is not atomic.

> 
>>
>>
>> Look at the test result after applied the debug patch below:
>>
>>              fio-1761  [000] ....   227.246251: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2
>>              fio-1766  [004] ....   227.246252: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1
>>              fio-1755  [000] ....   227.246366: blk_mq_update_dispatch_busy.part.50: old 1 ewma 0 cur 0
>>              fio-1754  [003] ....   227.266050: blk_mq_update_dispatch_busy.part.50: old 2 ewma 3 cur 3
>>              fio-1763  [007] ....   227.266050: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2
>>              fio-1761  [000] ....   227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2
>>              fio-1766  [004] ....   227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2
>>              fio-1760  [005] ....   227.266165: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1
>>
...
>>
>> Is it expected ?
> 
> Yes, it won't be a issue in reality given hctx->dispatch_busy is used as
> a hint, and it often works as expected and hctx->dispatch_busy is convergent
> finally because it is exponential weighted moving average.
I just concern the value of dispatch_busy will bounce up and down in small range
with high workload on 32 or higher core system due to the cache and non-atomic update


> 
> Thanks,
> Ming
>
Ming Lei Aug. 27, 2018, 7:34 a.m. UTC | #3
On Mon, Aug 27, 2018 at 03:25:50PM +0800, jianchao.wang wrote:
> 
> 
> On 08/27/2018 03:00 PM, Ming Lei wrote:
> > On Mon, Aug 27, 2018 at 01:56:39PM +0800, jianchao.wang wrote:
> >> Hi Ming
> >>
> >> Currently, blk_mq_update_dispatch_busy is hooked in blk_mq_dispatch_rq_list
> >> and __blk_mq_issue_directly. blk_mq_update_dispatch_busy could be invoked on multiple
> >> cpus concurrently. But there is not any protection on the hctx->dispatch_busy. We cannot
> >> ensure the update on the dispatch_busy atomically.
> > 
> > The update itself is atomic given type of this variable is 'unsigned int'.
> 
> The blk_mq_update_dispatch_busy doesn't just write on a unsigned int variable,
> but read, calculate and write. The whole operation is not atomic.

It won't be a big deal since the update is exponential weighted wrt. busy, and again
hctx->dispatch_busy is just an hint for improving performance.

> 
> > 
> >>
> >>
> >> Look at the test result after applied the debug patch below:
> >>
> >>              fio-1761  [000] ....   227.246251: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2
> >>              fio-1766  [004] ....   227.246252: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1
> >>              fio-1755  [000] ....   227.246366: blk_mq_update_dispatch_busy.part.50: old 1 ewma 0 cur 0
> >>              fio-1754  [003] ....   227.266050: blk_mq_update_dispatch_busy.part.50: old 2 ewma 3 cur 3
> >>              fio-1763  [007] ....   227.266050: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2
> >>              fio-1761  [000] ....   227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2
> >>              fio-1766  [004] ....   227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2
> >>              fio-1760  [005] ....   227.266165: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1
> >>
> ...
> >>
> >> Is it expected ?
> > 
> > Yes, it won't be a issue in reality given hctx->dispatch_busy is used as
> > a hint, and it often works as expected and hctx->dispatch_busy is convergent
> > finally because it is exponential weighted moving average.
> I just concern the value of dispatch_busy will bounce up and down in small range
> with high workload on 32 or higher core system due to the cache and non-atomic update

If it is figured out as busy on one path, we take it as busy, that is fine in current
usage. If the IO dispatch finally becomes not busy, hctx->dispatch_busy will be updated
as expected sooner or later.

In short, I don't see actual issue by this simple way without any protection.

Thanks,
Ming
diff mbox series

Patch

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1088,11 +1088,12 @@  static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy)
 {
        unsigned int ewma;
+       unsigned int old;
 
        if (hctx->queue->elevator)
                return;
 
-       ewma = hctx->dispatch_busy;
+       old = ewma = hctx->dispatch_busy;
 
        if (!ewma && !busy)
                return;
@@ -1103,6 +1104,8 @@  static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy)
        ewma /= BLK_MQ_DISPATCH_BUSY_EWMA_WEIGHT;
 
        hctx->dispatch_busy = ewma;
+
+       trace_printk("old %u ewma %u cur %u\n", old, ewma, READ_ONCE(hctx->dispatch_busy));
 }