From patchwork Sun Sep 27 13:04:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: fangying X-Patchwork-Id: 11802107 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E2239112C for ; Sun, 27 Sep 2020 13:08:23 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A0DB52399C for ; Sun, 27 Sep 2020 13:08:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A0DB52399C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:36056 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWPe-0004NS-QV for patchwork-qemu-devel@patchwork.kernel.org; Sun, 27 Sep 2020 09:08:22 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60868) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMS-0007qq-Ny for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:04 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:5149 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMQ-0003N5-Bk for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:04 -0400 Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id D389B1639CE2190F26BD; Sun, 27 Sep 2020 21:04:48 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:38 +0800 From: Ying Fang To: Subject: [RFC PATCH 1/7] block-backend: introduce I/O rehandle info Date: Sun, 27 Sep 2020 21:04:14 +0800 Message-ID: <20200927130420.1095-2-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.190; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" The I/O hang feature is realized based on a rehandle mechanism. Each block backend will have a list to store hanging block AIOs, and a timer to regularly resend these aios. In order to issue the AIOs again, each block AIOs also need to store its coroutine entry. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index 24dd0670d1..bf104a7cf5 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -35,6 +35,18 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb); +/* block backend rehandle timer interval 5s */ +#define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL 5000 + +typedef struct BlockBackendRehandleInfo { + bool enable; + QEMUTimer *ts; + unsigned timer_interval_ms; + + unsigned int in_flight; + QTAILQ_HEAD(, BlkAioEmAIOCB) re_aios; +} BlockBackendRehandleInfo; + typedef struct BlockBackendAioNotifier { void (*attached_aio_context)(AioContext *new_context, void *opaque); void (*detach_aio_context)(void *opaque); @@ -95,6 +107,8 @@ struct BlockBackend { * Accessed with atomic ops. */ unsigned int in_flight; + + BlockBackendRehandleInfo reinfo; }; typedef struct BlockBackendAIOCB { @@ -350,6 +364,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm) qemu_co_queue_init(&blk->queued_requests); notifier_list_init(&blk->remove_bs_notifiers); notifier_list_init(&blk->insert_bs_notifiers); + QLIST_INIT(&blk->aio_notifiers); QTAILQ_INSERT_TAIL(&block_backends, blk, link); @@ -1392,6 +1407,10 @@ typedef struct BlkAioEmAIOCB { BlkRwCo rwco; int bytes; bool has_returned; + + /* for rehandle */ + CoroutineEntry *co_entry; + QTAILQ_ENTRY(BlkAioEmAIOCB) list; } BlkAioEmAIOCB; static AioContext *blk_aio_em_aiocb_get_aio_context(BlockAIOCB *acb_) From patchwork Sun Sep 27 13:04:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: fangying X-Patchwork-Id: 11802111 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 19CAD618 for ; Sun, 27 Sep 2020 13:08:35 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E05A72389F for ; Sun, 27 Sep 2020 13:08:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E05A72389F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:36460 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWPq-0004XO-28 for patchwork-qemu-devel@patchwork.kernel.org; Sun, 27 Sep 2020 09:08:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60872) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMS-0007qy-U9 for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:04 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:60246 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMQ-0003N2-C2 for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:04 -0400 Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id C68EDEBA499277E74A44; Sun, 27 Sep 2020 21:04:48 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS402-HUB.china.huawei.com (10.3.19.202) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:39 +0800 From: Ying Fang To: Subject: [RFC PATCH 2/7] block-backend: rehandle block aios when EIO Date: Sun, 27 Sep 2020 21:04:15 +0800 Message-ID: <20200927130420.1095-3-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.32; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" When a backend device temporarily does not response, like a network disk down due to some network faults, any IO to the coresponding virtual block device in VM would return I/O error. If the hypervisor returns the error to VM, the filesystem on this block device may not work as usual. And in many situations, the returned error is often an EIO. To avoid this unavailablity, we can store the failed AIOs, and resend them later. If the error is temporary, the retries can succeed and the AIOs can be successfully completed. Signed-off-by: Ying Fang Signed-off-by: Jiahui Cen --- block/block-backend.c | 89 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index bf104a7cf5..90f1ca5753 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -365,6 +365,12 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm) notifier_list_init(&blk->remove_bs_notifiers); notifier_list_init(&blk->insert_bs_notifiers); + /* for rehandle */ + blk->reinfo.enable = false; + blk->reinfo.ts = NULL; + atomic_set(&blk->reinfo.in_flight, 0); + QTAILQ_INIT(&blk->reinfo.re_aios); + QLIST_INIT(&blk->aio_notifiers); QTAILQ_INSERT_TAIL(&block_backends, blk, link); @@ -1425,8 +1431,16 @@ static const AIOCBInfo blk_aio_em_aiocb_info = { .get_aio_context = blk_aio_em_aiocb_get_aio_context, }; +static void blk_rehandle_timer_cb(void *opaque); +static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb); + static void blk_aio_complete(BlkAioEmAIOCB *acb) { + if (acb->rwco.blk->reinfo.enable) { + blk_rehandle_aio_complete(acb); + return; + } + if (acb->has_returned) { acb->common.cb(acb->common.opaque, acb->rwco.ret); blk_dec_in_flight(acb->rwco.blk); @@ -1459,6 +1473,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes, .ret = NOT_DONE, }; acb->bytes = bytes; + acb->co_entry = co_entry; acb->has_returned = false; co = qemu_coroutine_create(co_entry, acb); @@ -2054,6 +2069,20 @@ static int blk_do_set_aio_context(BlockBackend *blk, AioContext *new_context, throttle_group_attach_aio_context(tgm, new_context); bdrv_drained_end(bs); } + + if (blk->reinfo.enable) { + if (blk->reinfo.ts) { + timer_del(blk->reinfo.ts); + timer_free(blk->reinfo.ts); + } + blk->reinfo.ts = aio_timer_new(new_context, QEMU_CLOCK_REALTIME, + SCALE_MS, blk_rehandle_timer_cb, + blk); + if (atomic_read(&blk->reinfo.in_flight)) { + timer_mod(blk->reinfo.ts, + qemu_clock_get_ms(QEMU_CLOCK_REALTIME)); + } + } } blk->ctx = new_context; @@ -2405,6 +2434,66 @@ static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter) } } +static void blk_rehandle_insert_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb) +{ + assert(blk->reinfo.enable); + + atomic_inc(&blk->reinfo.in_flight); + QTAILQ_INSERT_TAIL(&blk->reinfo.re_aios, acb, list); + timer_mod(blk->reinfo.ts, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + + blk->reinfo.timer_interval_ms); +} + +static void blk_rehandle_remove_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb) +{ + QTAILQ_REMOVE(&blk->reinfo.re_aios, acb, list); + atomic_dec(&blk->reinfo.in_flight); +} + +static void blk_rehandle_timer_cb(void *opaque) +{ + BlockBackend *blk = opaque; + BlockBackendRehandleInfo *reinfo = &blk->reinfo; + BlkAioEmAIOCB *acb, *tmp; + Coroutine *co; + + aio_context_acquire(blk_get_aio_context(blk)); + QTAILQ_FOREACH_SAFE(acb, &reinfo->re_aios, list, tmp) { + if (acb->rwco.ret == NOT_DONE) { + continue; + } + + blk_inc_in_flight(acb->rwco.blk); + acb->rwco.ret = NOT_DONE; + acb->has_returned = false; + + co = qemu_coroutine_create(acb->co_entry, acb); + bdrv_coroutine_enter(blk_bs(blk), co); + + acb->has_returned = true; + if (acb->rwco.ret != NOT_DONE) { + blk_rehandle_remove_aiocb(acb->rwco.blk, acb); + replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); + } + } + aio_context_release(blk_get_aio_context(blk)); +} + +static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) +{ + if (acb->has_returned) { + blk_dec_in_flight(acb->rwco.blk); + if (acb->rwco.ret == -EIO) { + blk_rehandle_insert_aiocb(acb->rwco.blk, acb); + return; + } + + acb->common.cb(acb->common.opaque, acb->rwco.ret); + qemu_aio_unref(acb); + } +} + void blk_register_buf(BlockBackend *blk, void *host, size_t size) { bdrv_register_buf(blk_bs(blk), host, size); From patchwork Sun Sep 27 13:04:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: fangying X-Patchwork-Id: 11802103 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D3B01580 for ; Sun, 27 Sep 2020 13:06:15 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BA91C2389F for ; Sun, 27 Sep 2020 13:06:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BA91C2389F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:56494 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWNZ-0001Ev-MH for patchwork-qemu-devel@patchwork.kernel.org; Sun, 27 Sep 2020 09:06:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60892) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMU-0007tk-PU for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:06 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:60240 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMQ-0003N0-Vy for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:06 -0400 Received: from DGGEMS401-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 2361F5E510B5427236C8; Sun, 27 Sep 2020 21:04:47 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS401-HUB.china.huawei.com (10.3.19.201) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:40 +0800 From: Ying Fang To: Subject: [RFC PATCH 3/7] block-backend: add I/O hang timeout Date: Sun, 27 Sep 2020 21:04:16 +0800 Message-ID: <20200927130420.1095-4-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.32; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Not all errors would be fixed, so it is better to add a rehandle timeout for I/O hang. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 99 +++++++++++++++++++++++++++++++++- include/sysemu/block-backend.h | 2 + 2 files changed, 100 insertions(+), 1 deletion(-) diff --git a/block/block-backend.c b/block/block-backend.c index 90f1ca5753..d0b2b59f55 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -38,6 +38,11 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb); /* block backend rehandle timer interval 5s */ #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL 5000 +enum BlockIOHangStatus { + BLOCK_IO_HANG_STATUS_NORMAL = 0, + BLOCK_IO_HANG_STATUS_HANG, +}; + typedef struct BlockBackendRehandleInfo { bool enable; QEMUTimer *ts; @@ -109,6 +114,11 @@ struct BlockBackend { unsigned int in_flight; BlockBackendRehandleInfo reinfo; + + int64_t iohang_timeout; /* The I/O hang timeout value in sec. */ + int64_t iohang_time; /* The I/O hang start time */ + bool is_iohang_timeout; + int iohang_status; }; typedef struct BlockBackendAIOCB { @@ -2480,20 +2490,107 @@ static void blk_rehandle_timer_cb(void *opaque) aio_context_release(blk_get_aio_context(blk)); } +static bool blk_iohang_handle(BlockBackend *blk, int new_status) +{ + int64_t now; + int old_status = blk->iohang_status; + bool need_rehandle = false; + + switch (new_status) { + case BLOCK_IO_HANG_STATUS_NORMAL: + if (old_status == BLOCK_IO_HANG_STATUS_HANG) { + /* Case when I/O Hang is recovered */ + blk->is_iohang_timeout = false; + blk->iohang_time = 0; + } + break; + case BLOCK_IO_HANG_STATUS_HANG: + if (old_status != BLOCK_IO_HANG_STATUS_HANG) { + /* Case when I/O hang is first triggered */ + blk->iohang_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; + need_rehandle = true; + } else { + if (!blk->is_iohang_timeout) { + now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; + if (now >= (blk->iohang_time + blk->iohang_timeout)) { + /* Case when I/O hang is timeout */ + blk->is_iohang_timeout = true; + } else { + /* Case when I/O hang is continued */ + need_rehandle = true; + } + } + } + break; + default: + break; + } + + blk->iohang_status = new_status; + return need_rehandle; +} + +static bool blk_rehandle_aio(BlkAioEmAIOCB *acb, bool *has_timeout) +{ + bool need_rehandle = false; + + /* Rehandle aio which returns EIO before hang timeout */ + if (acb->rwco.ret == -EIO) { + if (acb->rwco.blk->is_iohang_timeout) { + /* I/O hang has timeout and not recovered */ + *has_timeout = true; + } else { + need_rehandle = blk_iohang_handle(acb->rwco.blk, + BLOCK_IO_HANG_STATUS_HANG); + /* I/O hang timeout first trigger */ + if (acb->rwco.blk->is_iohang_timeout) { + *has_timeout = true; + } + } + } + + return need_rehandle; +} + static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) { + bool has_timeout = false; + bool need_rehandle = false; + if (acb->has_returned) { blk_dec_in_flight(acb->rwco.blk); - if (acb->rwco.ret == -EIO) { + need_rehandle = blk_rehandle_aio(acb, &has_timeout); + if (need_rehandle) { blk_rehandle_insert_aiocb(acb->rwco.blk, acb); return; } acb->common.cb(acb->common.opaque, acb->rwco.ret); + + /* I/O hang return to normal status */ + if (!has_timeout) { + blk_iohang_handle(acb->rwco.blk, BLOCK_IO_HANG_STATUS_NORMAL); + } + qemu_aio_unref(acb); } } +void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout) +{ + if (!blk) { + return; + } + + blk->is_iohang_timeout = false; + blk->iohang_time = 0; + blk->iohang_timeout = 0; + blk->iohang_status = BLOCK_IO_HANG_STATUS_NORMAL; + if (iohang_timeout > 0) { + blk->iohang_timeout = iohang_timeout; + } +} + void blk_register_buf(BlockBackend *blk, void *host, size_t size) { bdrv_register_buf(blk_bs(blk), host, size); diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index 8203d7f6f9..bfebe3a960 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -268,4 +268,6 @@ const BdrvChild *blk_root(BlockBackend *blk); int blk_make_empty(BlockBackend *blk, Error **errp); +void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout); + #endif From patchwork Sun Sep 27 13:04:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: fangying X-Patchwork-Id: 11802105 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7DAD81668 for ; Sun, 27 Sep 2020 13:06:15 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 48F242389F for ; Sun, 27 Sep 2020 13:06:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 48F242389F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:56506 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWNa-0001F8-79 for patchwork-qemu-devel@patchwork.kernel.org; Sun, 27 Sep 2020 09:06:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60858) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMS-0007qe-5d for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:04 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:40272 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMQ-0003Mz-1H for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:03 -0400 Received: from DGGEMS404-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id AFA9BD578B7977E05671; Sun, 27 Sep 2020 21:04:47 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS404-HUB.china.huawei.com (10.3.19.204) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:41 +0800 From: Ying Fang To: Subject: [RFC PATCH 4/7] block-backend: add I/O hang drain when disbale Date: Sun, 27 Sep 2020 21:04:17 +0800 Message-ID: <20200927130420.1095-5-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.35; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:48 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" To disable I/O hang, all hanging AIOs need to be drained. A rehandle status field is introduced to notify rehandle mechanism not to rehandle failed AIOs when I/O hang is disabled. Signed-off-by: Ying Fang Signed-off-by: Jiahui Cen --- block/block-backend.c | 85 ++++++++++++++++++++++++++++++++-- include/sysemu/block-backend.h | 3 ++ 2 files changed, 84 insertions(+), 4 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index d0b2b59f55..95b2d6a679 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -37,6 +37,9 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb); /* block backend rehandle timer interval 5s */ #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL 5000 +#define BLOCK_BACKEND_REHANDLE_NORMAL 1 +#define BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED 2 +#define BLOCK_BACKEND_REHANDLE_DRAINED 3 enum BlockIOHangStatus { BLOCK_IO_HANG_STATUS_NORMAL = 0, @@ -50,6 +53,8 @@ typedef struct BlockBackendRehandleInfo { unsigned int in_flight; QTAILQ_HEAD(, BlkAioEmAIOCB) re_aios; + + int status; } BlockBackendRehandleInfo; typedef struct BlockBackendAioNotifier { @@ -471,6 +476,8 @@ static void blk_delete(BlockBackend *blk) assert(!blk->refcnt); assert(!blk->name); assert(!blk->dev); + assert(atomic_read(&blk->reinfo.in_flight) == 0); + blk_rehandle_disable(blk); if (blk->public.throttle_group_member.throttle_state) { blk_io_limits_disable(blk); } @@ -2460,6 +2467,37 @@ static void blk_rehandle_remove_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb) atomic_dec(&blk->reinfo.in_flight); } +static void blk_rehandle_drain(BlockBackend *blk) +{ + if (blk_bs(blk)) { + bdrv_drained_begin(blk_bs(blk)); + BDRV_POLL_WHILE(blk_bs(blk), atomic_read(&blk->reinfo.in_flight) > 0); + bdrv_drained_end(blk_bs(blk)); + } +} + +static bool blk_rehandle_is_paused(BlockBackend *blk) +{ + return blk->reinfo.status == BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED || + blk->reinfo.status == BLOCK_BACKEND_REHANDLE_DRAINED; +} + +static void blk_rehandle_pause(BlockBackend *blk) +{ + BlockBackendRehandleInfo *reinfo = &blk->reinfo; + + aio_context_acquire(blk_get_aio_context(blk)); + if (!reinfo->enable || reinfo->status == BLOCK_BACKEND_REHANDLE_DRAINED) { + aio_context_release(blk_get_aio_context(blk)); + return; + } + + reinfo->status = BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED; + blk_rehandle_drain(blk); + reinfo->status = BLOCK_BACKEND_REHANDLE_DRAINED; + aio_context_release(blk_get_aio_context(blk)); +} + static void blk_rehandle_timer_cb(void *opaque) { BlockBackend *blk = opaque; @@ -2559,10 +2597,12 @@ static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) if (acb->has_returned) { blk_dec_in_flight(acb->rwco.blk); - need_rehandle = blk_rehandle_aio(acb, &has_timeout); - if (need_rehandle) { - blk_rehandle_insert_aiocb(acb->rwco.blk, acb); - return; + if (!blk_rehandle_is_paused(acb->rwco.blk)) { + need_rehandle = blk_rehandle_aio(acb, &has_timeout); + if (need_rehandle) { + blk_rehandle_insert_aiocb(acb->rwco.blk, acb); + return; + } } acb->common.cb(acb->common.opaque, acb->rwco.ret); @@ -2576,6 +2616,42 @@ static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) } } +void blk_rehandle_enable(BlockBackend *blk) +{ + BlockBackendRehandleInfo *reinfo = &blk->reinfo; + + aio_context_acquire(blk_get_aio_context(blk)); + if (reinfo->enable) { + aio_context_release(blk_get_aio_context(blk)); + return; + } + + reinfo->ts = aio_timer_new(blk_get_aio_context(blk), QEMU_CLOCK_REALTIME, + SCALE_MS, blk_rehandle_timer_cb, blk); + reinfo->timer_interval_ms = BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL; + reinfo->status = BLOCK_BACKEND_REHANDLE_NORMAL; + reinfo->enable = true; + aio_context_release(blk_get_aio_context(blk)); +} + +void blk_rehandle_disable(BlockBackend *blk) +{ + if (!blk->reinfo.enable) { + return; + } + + blk_rehandle_pause(blk); + timer_del(blk->reinfo.ts); + timer_free(blk->reinfo.ts); + blk->reinfo.ts = NULL; + blk->reinfo.enable = false; +} + +bool blk_iohang_is_enabled(BlockBackend *blk) +{ + return blk->iohang_timeout != 0; +} + void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout) { if (!blk) { @@ -2588,6 +2664,7 @@ void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout) blk->iohang_status = BLOCK_IO_HANG_STATUS_NORMAL; if (iohang_timeout > 0) { blk->iohang_timeout = iohang_timeout; + blk_rehandle_enable(blk); } } diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index bfebe3a960..375ae13b0b 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -268,6 +268,9 @@ const BdrvChild *blk_root(BlockBackend *blk); int blk_make_empty(BlockBackend *blk, Error **errp); +void blk_rehandle_enable(BlockBackend *blk); +void blk_rehandle_disable(BlockBackend *blk); +bool blk_iohang_is_enabled(BlockBackend *blk); void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout); #endif From patchwork Sun Sep 27 13:04:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: fangying X-Patchwork-Id: 11802109 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5732F112C for ; Sun, 27 Sep 2020 13:08:27 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 11E972389F for ; Sun, 27 Sep 2020 13:08:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 11E972389F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:36186 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWPi-0004QW-8a for patchwork-qemu-devel@patchwork.kernel.org; Sun, 27 Sep 2020 09:08:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60918) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMX-00081g-EL for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:09 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:60448 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMV-0003Nt-F2 for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:09 -0400 Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 03E51313DBA182BE1FC0; Sun, 27 Sep 2020 21:04:50 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:41 +0800 From: Ying Fang To: Subject: [RFC PATCH 5/7] virtio-blk: disable I/O hang when resetting Date: Sun, 27 Sep 2020 21:04:18 +0800 Message-ID: <20200927130420.1095-6-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.32; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" All AIOs including the hanging AIOs need to be drained when resetting virtio-blk. So it is necessary to disable I/O hang before resetting and enable I/O hang again after resetting if I/O hang is enabled. Signed-off-by: Ying Fang Signed-off-by: Jiahui Cen --- hw/block/virtio-blk.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 2204ba149e..11837a54f5 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -892,6 +892,10 @@ static void virtio_blk_reset(VirtIODevice *vdev) AioContext *ctx; VirtIOBlockReq *req; + if (blk_iohang_is_enabled(s->blk)) { + blk_rehandle_disable(s->blk); + } + ctx = blk_get_aio_context(s->blk); aio_context_acquire(ctx); blk_drain(s->blk); @@ -909,6 +913,10 @@ static void virtio_blk_reset(VirtIODevice *vdev) assert(!s->dataplane_started); blk_set_enable_write_cache(s->blk, s->original_wce); + + if (blk_iohang_is_enabled(s->blk)) { + blk_rehandle_enable(s->blk); + } } /* coalesce internal state, copy to pci i/o region 0 From patchwork Sun Sep 27 13:04:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: fangying X-Patchwork-Id: 11802099 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DE5DB1668 for ; Sun, 27 Sep 2020 13:06:07 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 897642389F for ; Sun, 27 Sep 2020 13:06:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 897642389F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:56088 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWNS-00013b-2G for patchwork-qemu-devel@patchwork.kernel.org; Sun, 27 Sep 2020 09:06:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60820) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMR-0007qX-DZ for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:03 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:60242 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMP-0003N3-5j for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:03 -0400 Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id A1FD2B9EE6899670B86C; Sun, 27 Sep 2020 21:04:48 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:42 +0800 From: Ying Fang To: Subject: [RFC PATCH 6/7] qemu-option: add I/O hang timeout option Date: Sun, 27 Sep 2020 21:04:19 +0800 Message-ID: <20200927130420.1095-7-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.32; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" I/O hang timeout should be different under different situations. So it is better to provide an option for user to determine I/O hang timeout for each block device. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- blockdev.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/blockdev.c b/blockdev.c index 7f2561081e..ff8cdcd497 100644 --- a/blockdev.c +++ b/blockdev.c @@ -500,6 +500,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, BlockdevDetectZeroesOptions detect_zeroes = BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF; const char *throttling_group = NULL; + int64_t iohang_timeout = 0; /* Check common options by copying from bs_opts to opts, all other options * stay in bs_opts for processing by bdrv_open(). */ @@ -622,6 +623,12 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, bs->detect_zeroes = detect_zeroes; + /* init timeout value for I/O Hang */ + iohang_timeout = qemu_opt_get_number(opts, "iohang-timeout", 0); + if (iohang_timeout > 0) { + blk_iohang_init(blk, iohang_timeout); + } + block_acct_setup(blk_get_stats(blk), account_invalid, account_failed); if (!parse_stats_intervals(blk_get_stats(blk), interval_list, errp)) { @@ -3786,6 +3793,10 @@ QemuOptsList qemu_common_drive_opts = { .type = QEMU_OPT_BOOL, .help = "whether to account for failed I/O operations " "in the statistics", + },{ + .name = "iohang-timeout", + .type = QEMU_OPT_NUMBER, + .help = "timeout value for I/O Hang", }, { /* end of list */ } }, From patchwork Sun Sep 27 13:04:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: fangying X-Patchwork-Id: 11802113 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6E1A9618 for ; Sun, 27 Sep 2020 13:10:42 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3155E23899 for ; Sun, 27 Sep 2020 13:10:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3155E23899 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:40558 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kMWRt-0006Ig-1L for patchwork-qemu-devel@patchwork.kernel.org; Sun, 27 Sep 2020 09:10:41 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60886) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMU-0007so-Ac for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:06 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:60250 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kMWMQ-0003N4-NT for qemu-devel@nongnu.org; Sun, 27 Sep 2020 09:05:06 -0400 Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 9CBBE948378066848F43; Sun, 27 Sep 2020 21:04:48 +0800 (CST) Received: from localhost (10.174.185.104) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.487.0; Sun, 27 Sep 2020 21:04:42 +0800 From: Ying Fang To: Subject: [RFC PATCH 7/7] qapi: add I/O hang and I/O hang timeout qapi event Date: Sun, 27 Sep 2020 21:04:20 +0800 Message-ID: <20200927130420.1095-8-fangying1@huawei.com> X-Mailer: git-send-email 2.28.0.windows.1 In-Reply-To: <20200927130420.1095-1-fangying1@huawei.com> References: <20200927130420.1095-1-fangying1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.185.104] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.32; envelope-from=fangying1@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/27 09:04:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Ying Fang , Jiahui Cen , zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Sometimes hypervisor management tools like libvirt may need to monitor I/O hang events. Let's report I/O hang and I/O hang timeout event via qapi. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 3 +++ qapi/block-core.json | 26 ++++++++++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index 95b2d6a679..5dc5b11bcc 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2540,6 +2540,7 @@ static bool blk_iohang_handle(BlockBackend *blk, int new_status) /* Case when I/O Hang is recovered */ blk->is_iohang_timeout = false; blk->iohang_time = 0; + qapi_event_send_block_io_hang(false); } break; case BLOCK_IO_HANG_STATUS_HANG: @@ -2547,12 +2548,14 @@ static bool blk_iohang_handle(BlockBackend *blk, int new_status) /* Case when I/O hang is first triggered */ blk->iohang_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; need_rehandle = true; + qapi_event_send_block_io_hang(true); } else { if (!blk->is_iohang_timeout) { now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; if (now >= (blk->iohang_time + blk->iohang_timeout)) { /* Case when I/O hang is timeout */ blk->is_iohang_timeout = true; + qapi_event_send_block_io_hang_timeout(true); } else { /* Case when I/O hang is continued */ need_rehandle = true; diff --git a/qapi/block-core.json b/qapi/block-core.json index 3c16f1e11d..7bdf75c6d7 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -5535,3 +5535,29 @@ { 'command': 'blockdev-snapshot-delete-internal-sync', 'data': { 'device': 'str', '*id': 'str', '*name': 'str'}, 'returns': 'SnapshotInfo' } + +## +# @BLOCK_IO_HANG: +# +# Emitted when device I/O hang trigger event begin or end +# +# @set: true if I/O hang begin; false if I/O hang end. +# +# Since: 5.2 +# +## +{ 'event': 'BLOCK_IO_HANG', + 'data': { 'set': 'bool' }} + +## +# @BLOCK_IO_HANG_TIMEOUT: +# +# Emitted when device I/O hang timeout event set or clear +# +# @set: true if set; false if clear. +# +# Since: 5.2 +# +## +{ 'event': 'BLOCK_IO_HANG_TIMEOUT', + 'data': { 'set': 'bool' }}