Message ID | 306399af-99d9-ed45-bf3b-75908ff9187c@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | No protection on the hctx->dispatch_busy | expand |
On Mon, Aug 27, 2018 at 01:56:39PM +0800, jianchao.wang wrote: > Hi Ming > > Currently, blk_mq_update_dispatch_busy is hooked in blk_mq_dispatch_rq_list > and __blk_mq_issue_directly. blk_mq_update_dispatch_busy could be invoked on multiple > cpus concurrently. But there is not any protection on the hctx->dispatch_busy. We cannot > ensure the update on the dispatch_busy atomically. The update itself is atomic given type of this variable is 'unsigned int'. > > > Look at the test result after applied the debug patch below: > > fio-1761 [000] .... 227.246251: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2 > fio-1766 [004] .... 227.246252: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1 > fio-1755 [000] .... 227.246366: blk_mq_update_dispatch_busy.part.50: old 1 ewma 0 cur 0 > fio-1754 [003] .... 227.266050: blk_mq_update_dispatch_busy.part.50: old 2 ewma 3 cur 3 > fio-1763 [007] .... 227.266050: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2 > fio-1761 [000] .... 227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2 > fio-1766 [004] .... 227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2 > fio-1760 [005] .... 227.266165: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1 > > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1088,11 +1088,12 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx, > static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy) > { > unsigned int ewma; > + unsigned int old; > > if (hctx->queue->elevator) > return; > > - ewma = hctx->dispatch_busy; > + old = ewma = hctx->dispatch_busy; > > if (!ewma && !busy) > return; > @@ -1103,6 +1104,8 @@ static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy) > ewma /= BLK_MQ_DISPATCH_BUSY_EWMA_WEIGHT; > > hctx->dispatch_busy = ewma; > + > + trace_printk("old %u ewma %u cur %u\n", old, ewma, READ_ONCE(hctx->dispatch_busy)); > } > > > Is it expected ? Yes, it won't be a issue in reality given hctx->dispatch_busy is used as a hint, and it often works as expected and hctx->dispatch_busy is convergent finally because it is exponential weighted moving average. Thanks, Ming
On 08/27/2018 03:00 PM, Ming Lei wrote: > On Mon, Aug 27, 2018 at 01:56:39PM +0800, jianchao.wang wrote: >> Hi Ming >> >> Currently, blk_mq_update_dispatch_busy is hooked in blk_mq_dispatch_rq_list >> and __blk_mq_issue_directly. blk_mq_update_dispatch_busy could be invoked on multiple >> cpus concurrently. But there is not any protection on the hctx->dispatch_busy. We cannot >> ensure the update on the dispatch_busy atomically. > > The update itself is atomic given type of this variable is 'unsigned int'. The blk_mq_update_dispatch_busy doesn't just write on a unsigned int variable, but read, calculate and write. The whole operation is not atomic. > >> >> >> Look at the test result after applied the debug patch below: >> >> fio-1761 [000] .... 227.246251: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2 >> fio-1766 [004] .... 227.246252: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1 >> fio-1755 [000] .... 227.246366: blk_mq_update_dispatch_busy.part.50: old 1 ewma 0 cur 0 >> fio-1754 [003] .... 227.266050: blk_mq_update_dispatch_busy.part.50: old 2 ewma 3 cur 3 >> fio-1763 [007] .... 227.266050: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2 >> fio-1761 [000] .... 227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2 >> fio-1766 [004] .... 227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2 >> fio-1760 [005] .... 227.266165: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1 >> ... >> >> Is it expected ? > > Yes, it won't be a issue in reality given hctx->dispatch_busy is used as > a hint, and it often works as expected and hctx->dispatch_busy is convergent > finally because it is exponential weighted moving average. I just concern the value of dispatch_busy will bounce up and down in small range with high workload on 32 or higher core system due to the cache and non-atomic update > > Thanks, > Ming >
On Mon, Aug 27, 2018 at 03:25:50PM +0800, jianchao.wang wrote: > > > On 08/27/2018 03:00 PM, Ming Lei wrote: > > On Mon, Aug 27, 2018 at 01:56:39PM +0800, jianchao.wang wrote: > >> Hi Ming > >> > >> Currently, blk_mq_update_dispatch_busy is hooked in blk_mq_dispatch_rq_list > >> and __blk_mq_issue_directly. blk_mq_update_dispatch_busy could be invoked on multiple > >> cpus concurrently. But there is not any protection on the hctx->dispatch_busy. We cannot > >> ensure the update on the dispatch_busy atomically. > > > > The update itself is atomic given type of this variable is 'unsigned int'. > > The blk_mq_update_dispatch_busy doesn't just write on a unsigned int variable, > but read, calculate and write. The whole operation is not atomic. It won't be a big deal since the update is exponential weighted wrt. busy, and again hctx->dispatch_busy is just an hint for improving performance. > > > > >> > >> > >> Look at the test result after applied the debug patch below: > >> > >> fio-1761 [000] .... 227.246251: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2 > >> fio-1766 [004] .... 227.246252: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1 > >> fio-1755 [000] .... 227.246366: blk_mq_update_dispatch_busy.part.50: old 1 ewma 0 cur 0 > >> fio-1754 [003] .... 227.266050: blk_mq_update_dispatch_busy.part.50: old 2 ewma 3 cur 3 > >> fio-1763 [007] .... 227.266050: blk_mq_update_dispatch_busy.part.50: old 0 ewma 2 cur 2 > >> fio-1761 [000] .... 227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2 > >> fio-1766 [004] .... 227.266051: blk_mq_update_dispatch_busy.part.50: old 3 ewma 2 cur 2 > >> fio-1760 [005] .... 227.266165: blk_mq_update_dispatch_busy.part.50: old 2 ewma 1 cur 1 > >> > ... > >> > >> Is it expected ? > > > > Yes, it won't be a issue in reality given hctx->dispatch_busy is used as > > a hint, and it often works as expected and hctx->dispatch_busy is convergent > > finally because it is exponential weighted moving average. > I just concern the value of dispatch_busy will bounce up and down in small range > with high workload on 32 or higher core system due to the cache and non-atomic update If it is figured out as busy on one path, we take it as busy, that is fine in current usage. If the IO dispatch finally becomes not busy, hctx->dispatch_busy will be updated as expected sooner or later. In short, I don't see actual issue by this simple way without any protection. Thanks, Ming
--- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1088,11 +1088,12 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx, static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy) { unsigned int ewma; + unsigned int old; if (hctx->queue->elevator) return; - ewma = hctx->dispatch_busy; + old = ewma = hctx->dispatch_busy; if (!ewma && !busy) return; @@ -1103,6 +1104,8 @@ static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy) ewma /= BLK_MQ_DISPATCH_BUSY_EWMA_WEIGHT; hctx->dispatch_busy = ewma; + + trace_printk("old %u ewma %u cur %u\n", old, ewma, READ_ONCE(hctx->dispatch_busy)); }