Message ID | 20240903081653.65613-1-songmuchun@bytedance.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix some starvation problems in block layer | expand |
> On Sep 3, 2024, at 16:16, Muchun Song <songmuchun@bytedance.com> wrote: > > We encounter a problem on our servers where hundreds of UNINTERRUPTED > processes are all waiting in the WBT wait queue. And the IO hung detector > logged so many messages about "blocked for more than 122 seconds". The > call trace is as follows: > > Call Trace: > __schedule+0x959/0xee0 > schedule+0x40/0xb0 > io_schedule+0x12/0x40 > rq_qos_wait+0xaf/0x140 > wbt_wait+0x92/0xc0 > __rq_qos_throttle+0x20/0x30 > blk_mq_make_request+0x12a/0x5c0 > generic_make_request_nocheck+0x172/0x3f0 > submit_bio+0x42/0x1c0 > ... > > The WBT module is used to throttle buffered writeback, which will block > any buffered writeback IO request until the previous inflight IOs have > been completed. So I checked the inflight IO counter. That was one meaning > one IO request was submitted to the downstream interface like block core > layer or device driver (virtio_blk driver in our case). We need to figure > out why the inflight IO is not completed in time. I confirmed that all > the virtio ring buffers of virtio_blk are empty and the hardware dispatch > list had one IO request, so the root cause is not related to the block > device or the virtio_blk driver since the driver has never received that > IO request. > > We know that block core layer could submit IO requests to the driver through > kworker (the callback function is blk_mq_run_work_fn). I thought maybe the > kworker was blocked by some other resources causing the callback to not be > evoked in time. So I checked all the kworkers and workqueues and confirmed > there was no pending work on any kworker or workqueue. > > Integrate all the investigation information, the problem should be in the > block core layer missing a chance to submit that IO request. After > some investigation of code, I found some scenarios which could cause the > problem. Hi Jens Axboe, May I ask if you have any suggestions for those fixes? Or if they could be merged? Muchun, Thanks. > > Changes in v2: > - Collect RB tag from Ming Lei. > - Use barrier-less approach to fix QUEUE_FLAG_QUIESCED ordering problem > suggested by Ming Lei. > - Apply new approach to fix BLK_MQ_S_STOPPED ordering for easier maintenance. > - Add Fixes tag to each patch. > > Muchun Song (3): > block: fix missing dispatching request when queue is started or > unquiesced > block: fix ordering between checking QUEUE_FLAG_QUIESCED and adding > requests > block: fix ordering between checking BLK_MQ_S_STOPPED and adding > requests > > block/blk-mq.c | 55 ++++++++++++++++++++++++++++++++++++++------------ > block/blk-mq.h | 13 ++++++++++++ > 2 files changed, 55 insertions(+), 13 deletions(-) > > -- > 2.20.1 >